From bb71ad880204b79d60331d3384103976e086cb9f Mon Sep 17 00:00:00 2001 From: Gary Hade Date: Mon, 12 May 2008 13:57:46 -0700 Subject: PCI: boot parameter to avoid expansion ROM memory allocation Contention for scarce PCI memory resources has been growing due to an increasing number of PCI slots in large multi-node systems. The kernel currently attempts by default to allocate memory for all PCI expansion ROMs so there has also been an increasing number of PCI memory allocation failures seen on these systems. This occurs because the BIOS either (1) provides insufficient PCI memory resource for all the expansion ROMs or (2) provides adequate PCI memory resource for expansion ROMs but provides the space in kernel unexpected BIOS assigned P2P non-prefetch windows. The resulting PCI memory allocation failures may be benign when related to memory requests for expansion ROMs themselves but in some cases they can occur when attempting to allocate space for more critical BARs. This can happen when a successful expansion ROM allocation request consumes memory resource that was intended for a non-ROM BAR. We have seen this happen during PCI hotplug of an adapter that contains a P2P bridge where successful memory allocation for an expansion ROM BAR on device behind the bridge consumed memory that was intended for a non-ROM BAR on the P2P bridge. In all cases the allocation failure messages can be very confusing for users. This patch provides a new 'pci=norom' kernel boot parameter that can be used to disable the default PCI expansion ROM memory resource allocation. This provides a way to avoid the above described issues on systems that do not contain PCI devices for which drivers or user-level applications depend on the default PCI expansion ROM memory resource allocation behavior. Signed-off-by: Gary Hade Signed-off-by: Jesse Barnes --- Documentation/kernel-parameters.txt | 3 +++ 1 file changed, 3 insertions(+) (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index e07c432c731f..9cf7b34f2db0 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1496,6 +1496,9 @@ and is between 256 and 4096 characters. It is defined in the file Use with caution as certain devices share address decoders between ROMs and other resources. + norom [X86-32,X86_64] Do not assign address space to + expansion ROMs that do not already have + BIOS assigned address ranges. irqmask=0xMMMM [X86-32] Set a bit mask of IRQs allowed to be assigned automatically to PCI devices. You can make the kernel exclude IRQs of your ISA cards -- cgit v1.2.3 From d8f3de0d2412bb91639cfefc5b3c79dbf3812212 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 12 Jun 2008 23:24:06 +0200 Subject: Suspend-related patches for 2.6.27 ACPI PM: Add possibility to change suspend sequence There are some systems out there that don't work correctly with our current suspend/hibernation code ordering. Provide a workaround for these systems allowing them to pass 'acpi_sleep=old_ordering' in the kernel command line so that it will use the pre-ACPI 2.0 ("old") suspend code ordering. Unfortunately, this requires us to add a platform hook to the resuming of devices for recovering the platform in case one of the device drivers' .suspend() routines returns error code. Namely, ACPI 1.0 specifies that _PTS should be called before suspending devices, but _WAK still should be called before resuming them in order to undo the changes made by _PTS. However, if there is an error during suspending devices, they are automatically resumed without returning control to the PM core, so the _WAK has to be called from within device_resume() in that cases. The patch also reorders and refactors the ACPI suspend/hibernation code to avoid duplication as far as reasonably possible. Signed-off-by: Rafael J. Wysocki Acked-by: Pavel Machek Signed-off-by: Jesse Barnes --- Documentation/kernel-parameters.txt | 6 +- arch/x86/kernel/acpi/sleep.c | 2 + drivers/acpi/sleep/main.c | 276 +++++++++++++++++++++--------------- drivers/base/power/main.c | 2 - include/linux/acpi.h | 3 + include/linux/suspend.h | 14 +- kernel/power/disk.c | 28 +++- kernel/power/main.c | 10 +- 8 files changed, 215 insertions(+), 126 deletions(-) (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 9cf7b34f2db0..18d793ea0dd3 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -147,10 +147,14 @@ and is between 256 and 4096 characters. It is defined in the file default: 0 acpi_sleep= [HW,ACPI] Sleep options - Format: { s3_bios, s3_mode, s3_beep } + Format: { s3_bios, s3_mode, s3_beep, old_ordering } See Documentation/power/video.txt for s3_bios and s3_mode. s3_beep is for debugging; it makes the PC's speaker beep as soon as the kernel's real-mode entry point is called. + old_ordering causes the ACPI 1.0 ordering of the _PTS + control method, wrt putting devices into low power + states, to be enforced (the ACPI 2.0 ordering of _PTS is + used by default). acpi_sci= [HW,ACPI] ACPI System Control Interrupt trigger mode Format: { level | edge | high | low } diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c index afc25ee9964b..882e970032d5 100644 --- a/arch/x86/kernel/acpi/sleep.c +++ b/arch/x86/kernel/acpi/sleep.c @@ -124,6 +124,8 @@ static int __init acpi_sleep_setup(char *str) acpi_realmode_flags |= 2; if (strncmp(str, "s3_beep", 7) == 0) acpi_realmode_flags |= 4; + if (strncmp(str, "old_ordering", 12) == 0) + acpi_old_suspend_ordering(); str = strchr(str, ','); if (str != NULL) str += strspn(str, ", \t"); diff --git a/drivers/acpi/sleep/main.c b/drivers/acpi/sleep/main.c index 0f2caea2fc83..4addf8ad50ae 100644 --- a/drivers/acpi/sleep/main.c +++ b/drivers/acpi/sleep/main.c @@ -24,10 +24,6 @@ u8 sleep_states[ACPI_S_STATE_COUNT]; -#ifdef CONFIG_PM_SLEEP -static u32 acpi_target_sleep_state = ACPI_STATE_S0; -#endif - static int acpi_sleep_prepare(u32 acpi_state) { #ifdef CONFIG_ACPI_SLEEP @@ -50,9 +46,96 @@ static int acpi_sleep_prepare(u32 acpi_state) return 0; } -#ifdef CONFIG_SUSPEND -static struct platform_suspend_ops acpi_suspend_ops; +#ifdef CONFIG_PM_SLEEP +static u32 acpi_target_sleep_state = ACPI_STATE_S0; + +/* + * ACPI 1.0 wants us to execute _PTS before suspending devices, so we allow the + * user to request that behavior by using the 'acpi_old_suspend_ordering' + * kernel command line option that causes the following variable to be set. + */ +static bool old_suspend_ordering; + +void __init acpi_old_suspend_ordering(void) +{ + old_suspend_ordering = true; +} + +/** + * acpi_pm_disable_gpes - Disable the GPEs. + */ +static int acpi_pm_disable_gpes(void) +{ + acpi_hw_disable_all_gpes(); + return 0; +} + +/** + * __acpi_pm_prepare - Prepare the platform to enter the target state. + * + * If necessary, set the firmware waking vector and do arch-specific + * nastiness to get the wakeup code to the waking vector. + */ +static int __acpi_pm_prepare(void) +{ + int error = acpi_sleep_prepare(acpi_target_sleep_state); + + if (error) + acpi_target_sleep_state = ACPI_STATE_S0; + return error; +} + +/** + * acpi_pm_prepare - Prepare the platform to enter the target sleep + * state and disable the GPEs. + */ +static int acpi_pm_prepare(void) +{ + int error = __acpi_pm_prepare(); + + if (!error) + acpi_hw_disable_all_gpes(); + return error; +} + +/** + * acpi_pm_finish - Instruct the platform to leave a sleep state. + * + * This is called after we wake back up (or if entering the sleep state + * failed). + */ +static void acpi_pm_finish(void) +{ + u32 acpi_state = acpi_target_sleep_state; + + if (acpi_state == ACPI_STATE_S0) + return; + printk(KERN_INFO PREFIX "Waking up from system sleep state S%d\n", + acpi_state); + acpi_disable_wakeup_device(acpi_state); + acpi_leave_sleep_state(acpi_state); + + /* reset firmware waking vector */ + acpi_set_firmware_waking_vector((acpi_physical_address) 0); + + acpi_target_sleep_state = ACPI_STATE_S0; +} + +/** + * acpi_pm_end - Finish up suspend sequence. + */ +static void acpi_pm_end(void) +{ + /* + * This is necessary in case acpi_pm_finish() is not called during a + * failing transition to a sleep state. + */ + acpi_target_sleep_state = ACPI_STATE_S0; +} +#endif /* CONFIG_PM_SLEEP */ + +#ifdef CONFIG_SUSPEND extern void do_suspend_lowlevel(void); static u32 acpi_suspend_states[] = { @@ -66,7 +149,6 @@ static u32 acpi_suspend_states[] = { * acpi_suspend_begin - Set the target system sleep state to the state * associated with given @pm_state, if supported. */ - static int acpi_suspend_begin(suspend_state_t pm_state) { u32 acpi_state = acpi_suspend_states[pm_state]; @@ -82,25 +164,6 @@ static int acpi_suspend_begin(suspend_state_t pm_state) return error; } -/** - * acpi_suspend_prepare - Do preliminary suspend work. - * - * If necessary, set the firmware waking vector and do arch-specific - * nastiness to get the wakeup code to the waking vector. - */ - -static int acpi_suspend_prepare(void) -{ - int error = acpi_sleep_prepare(acpi_target_sleep_state); - - if (error) { - acpi_target_sleep_state = ACPI_STATE_S0; - return error; - } - - return ACPI_SUCCESS(acpi_hw_disable_all_gpes()) ? 0 : -EFAULT; -} - /** * acpi_suspend_enter - Actually enter a sleep state. * @pm_state: ignored @@ -109,7 +172,6 @@ static int acpi_suspend_prepare(void) * assembly, which in turn call acpi_enter_sleep_state(). * It's unfortunate, but it works. Please fix if you're feeling frisky. */ - static int acpi_suspend_enter(suspend_state_t pm_state) { acpi_status status = AE_OK; @@ -166,39 +228,6 @@ static int acpi_suspend_enter(suspend_state_t pm_state) return ACPI_SUCCESS(status) ? 0 : -EFAULT; } -/** - * acpi_suspend_finish - Instruct the platform to leave a sleep state. - * - * This is called after we wake back up (or if entering the sleep state - * failed). - */ - -static void acpi_suspend_finish(void) -{ - u32 acpi_state = acpi_target_sleep_state; - - acpi_disable_wakeup_device(acpi_state); - acpi_leave_sleep_state(acpi_state); - - /* reset firmware waking vector */ - acpi_set_firmware_waking_vector((acpi_physical_address) 0); - - acpi_target_sleep_state = ACPI_STATE_S0; -} - -/** - * acpi_suspend_end - Finish up suspend sequence. - */ - -static void acpi_suspend_end(void) -{ - /* - * This is necessary in case acpi_suspend_finish() is not called during a - * failing transition to a sleep state. - */ - acpi_target_sleep_state = ACPI_STATE_S0; -} - static int acpi_suspend_state_valid(suspend_state_t pm_state) { u32 acpi_state; @@ -218,10 +247,39 @@ static int acpi_suspend_state_valid(suspend_state_t pm_state) static struct platform_suspend_ops acpi_suspend_ops = { .valid = acpi_suspend_state_valid, .begin = acpi_suspend_begin, - .prepare = acpi_suspend_prepare, + .prepare = acpi_pm_prepare, + .enter = acpi_suspend_enter, + .finish = acpi_pm_finish, + .end = acpi_pm_end, +}; + +/** + * acpi_suspend_begin_old - Set the target system sleep state to the + * state associated with given @pm_state, if supported, and + * execute the _PTS control method. This function is used if the + * pre-ACPI 2.0 suspend ordering has been requested. + */ +static int acpi_suspend_begin_old(suspend_state_t pm_state) +{ + int error = acpi_suspend_begin(pm_state); + + if (!error) + error = __acpi_pm_prepare(); + return error; +} + +/* + * The following callbacks are used if the pre-ACPI 2.0 suspend ordering has + * been requested. + */ +static struct platform_suspend_ops acpi_suspend_ops_old = { + .valid = acpi_suspend_state_valid, + .begin = acpi_suspend_begin_old, + .prepare = acpi_pm_disable_gpes, .enter = acpi_suspend_enter, - .finish = acpi_suspend_finish, - .end = acpi_suspend_end, + .finish = acpi_pm_finish, + .end = acpi_pm_end, + .recover = acpi_pm_finish, }; #endif /* CONFIG_SUSPEND */ @@ -229,22 +287,9 @@ static struct platform_suspend_ops acpi_suspend_ops = { static int acpi_hibernation_begin(void) { acpi_target_sleep_state = ACPI_STATE_S4; - return 0; } -static int acpi_hibernation_prepare(void) -{ - int error = acpi_sleep_prepare(ACPI_STATE_S4); - - if (error) { - acpi_target_sleep_state = ACPI_STATE_S0; - return error; - } - - return ACPI_SUCCESS(acpi_hw_disable_all_gpes()) ? 0 : -EFAULT; -} - static int acpi_hibernation_enter(void) { acpi_status status = AE_OK; @@ -274,52 +319,55 @@ static void acpi_hibernation_leave(void) acpi_leave_sleep_state_prep(ACPI_STATE_S4); } -static void acpi_hibernation_finish(void) +static void acpi_pm_enable_gpes(void) { - acpi_disable_wakeup_device(ACPI_STATE_S4); - acpi_leave_sleep_state(ACPI_STATE_S4); - - /* reset firmware waking vector */ - acpi_set_firmware_waking_vector((acpi_physical_address) 0); - - acpi_target_sleep_state = ACPI_STATE_S0; + acpi_hw_enable_all_runtime_gpes(); } -static void acpi_hibernation_end(void) -{ - /* - * This is necessary in case acpi_hibernation_finish() is not called - * during a failing transition to the sleep state. - */ - acpi_target_sleep_state = ACPI_STATE_S0; -} +static struct platform_hibernation_ops acpi_hibernation_ops = { + .begin = acpi_hibernation_begin, + .end = acpi_pm_end, + .pre_snapshot = acpi_pm_prepare, + .finish = acpi_pm_finish, + .prepare = acpi_pm_prepare, + .enter = acpi_hibernation_enter, + .leave = acpi_hibernation_leave, + .pre_restore = acpi_pm_disable_gpes, + .restore_cleanup = acpi_pm_enable_gpes, +}; -static int acpi_hibernation_pre_restore(void) +/** + * acpi_hibernation_begin_old - Set the target system sleep state to + * ACPI_STATE_S4 and execute the _PTS control method. This + * function is used if the pre-ACPI 2.0 suspend ordering has been + * requested. + */ +static int acpi_hibernation_begin_old(void) { - acpi_status status; - - status = acpi_hw_disable_all_gpes(); - - return ACPI_SUCCESS(status) ? 0 : -EFAULT; -} + int error = acpi_sleep_prepare(ACPI_STATE_S4); -static void acpi_hibernation_restore_cleanup(void) -{ - acpi_hw_enable_all_runtime_gpes(); + if (!error) + acpi_target_sleep_state = ACPI_STATE_S4; + return error; } -static struct platform_hibernation_ops acpi_hibernation_ops = { - .begin = acpi_hibernation_begin, - .end = acpi_hibernation_end, - .pre_snapshot = acpi_hibernation_prepare, - .finish = acpi_hibernation_finish, - .prepare = acpi_hibernation_prepare, +/* + * The following callbacks are used if the pre-ACPI 2.0 suspend ordering has + * been requested. + */ +static struct platform_hibernation_ops acpi_hibernation_ops_old = { + .begin = acpi_hibernation_begin_old, + .end = acpi_pm_end, + .pre_snapshot = acpi_pm_disable_gpes, + .finish = acpi_pm_finish, + .prepare = acpi_pm_disable_gpes, .enter = acpi_hibernation_enter, .leave = acpi_hibernation_leave, - .pre_restore = acpi_hibernation_pre_restore, - .restore_cleanup = acpi_hibernation_restore_cleanup, + .pre_restore = acpi_pm_disable_gpes, + .restore_cleanup = acpi_pm_enable_gpes, + .recover = acpi_pm_finish, }; -#endif /* CONFIG_HIBERNATION */ +#endif /* CONFIG_HIBERNATION */ int acpi_suspend(u32 acpi_state) { @@ -461,13 +509,15 @@ int __init acpi_sleep_init(void) } } - suspend_set_ops(&acpi_suspend_ops); + suspend_set_ops(old_suspend_ordering ? + &acpi_suspend_ops_old : &acpi_suspend_ops); #endif #ifdef CONFIG_HIBERNATION status = acpi_get_sleep_type_data(ACPI_STATE_S4, &type_a, &type_b); if (ACPI_SUCCESS(status)) { - hibernation_set_ops(&acpi_hibernation_ops); + hibernation_set_ops(old_suspend_ordering ? + &acpi_hibernation_ops_old : &acpi_hibernation_ops); sleep_states[ACPI_STATE_S4] = 1; printk(" S4"); } diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index d571204aaff7..3250c5257b74 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -781,8 +781,6 @@ int device_suspend(pm_message_t state) error = dpm_prepare(state); if (!error) error = dpm_suspend(state); - if (error) - device_resume(resume_event(state)); return error; } EXPORT_SYMBOL_GPL(device_suspend); diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 41f7ce7edd7a..33adcf91ef41 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -234,6 +234,9 @@ int acpi_check_region(resource_size_t start, resource_size_t n, int acpi_check_mem_region(resource_size_t start, resource_size_t n, const char *name); +#ifdef CONFIG_PM_SLEEP +void __init acpi_old_suspend_ordering(void); +#endif /* CONFIG_PM_SLEEP */ #else /* CONFIG_ACPI */ static inline int early_acpi_boot_init(void) diff --git a/include/linux/suspend.h b/include/linux/suspend.h index a6977423baf7..e8e69159af71 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -86,6 +86,11 @@ typedef int __bitwise suspend_state_t; * that implement @begin(), but platforms implementing @begin() should * also provide a @end() which cleans up transitions aborted before * @enter(). + * + * @recover: Recover the platform from a suspend failure. + * Called by the PM core if the suspending of devices fails. + * This callback is optional and should only be implemented by platforms + * which require special recovery actions in that situation. */ struct platform_suspend_ops { int (*valid)(suspend_state_t state); @@ -94,6 +99,7 @@ struct platform_suspend_ops { int (*enter)(suspend_state_t state); void (*finish)(void); void (*end)(void); + void (*recover)(void); }; #ifdef CONFIG_SUSPEND @@ -149,7 +155,7 @@ extern void mark_free_pages(struct zone *zone); * The methods in this structure allow a platform to carry out special * operations required by it during a hibernation transition. * - * All the methods below must be implemented. + * All the methods below, except for @recover(), must be implemented. * * @begin: Tell the platform driver that we're starting hibernation. * Called right after shrinking memory and before freezing devices. @@ -189,6 +195,11 @@ extern void mark_free_pages(struct zone *zone); * @restore_cleanup: Clean up after a failing image restoration. * Called right after the nonboot CPUs have been enabled and before * thawing devices (runs with IRQs on). + * + * @recover: Recover the platform from a failure to suspend devices. + * Called by the PM core if the suspending of devices during hibernation + * fails. This callback is optional and should only be implemented by + * platforms which require special recovery actions in that situation. */ struct platform_hibernation_ops { int (*begin)(void); @@ -200,6 +211,7 @@ struct platform_hibernation_ops { void (*leave)(void); int (*pre_restore)(void); void (*restore_cleanup)(void); + void (*recover)(void); }; #ifdef CONFIG_HIBERNATION diff --git a/kernel/power/disk.c b/kernel/power/disk.c index d416be0efa8a..f011e0870b52 100644 --- a/kernel/power/disk.c +++ b/kernel/power/disk.c @@ -179,6 +179,17 @@ static void platform_restore_cleanup(int platform_mode) hibernation_ops->restore_cleanup(); } +/** + * platform_recover - recover the platform from a failure to suspend + * devices. + */ + +static void platform_recover(int platform_mode) +{ + if (platform_mode && hibernation_ops && hibernation_ops->recover) + hibernation_ops->recover(); +} + /** * create_image - freeze devices that need to be frozen with interrupts * off, create the hibernation image and thaw those devices. Control @@ -258,10 +269,10 @@ int hibernation_snapshot(int platform_mode) suspend_console(); error = device_suspend(PMSG_FREEZE); if (error) - goto Resume_console; + goto Recover_platform; if (hibernation_test(TEST_DEVICES)) - goto Resume_devices; + goto Recover_platform; error = platform_pre_snapshot(platform_mode); if (error || hibernation_test(TEST_PLATFORM)) @@ -285,11 +296,14 @@ int hibernation_snapshot(int platform_mode) Resume_devices: device_resume(in_suspend ? (error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE); - Resume_console: resume_console(); Close: platform_end(platform_mode); return error; + + Recover_platform: + platform_recover(platform_mode); + goto Resume_devices; } /** @@ -398,8 +412,11 @@ int hibernation_platform_enter(void) suspend_console(); error = device_suspend(PMSG_HIBERNATE); - if (error) - goto Resume_console; + if (error) { + if (hibernation_ops->recover) + hibernation_ops->recover(); + goto Resume_devices; + } error = hibernation_ops->prepare(); if (error) @@ -428,7 +445,6 @@ int hibernation_platform_enter(void) hibernation_ops->finish(); Resume_devices: device_resume(PMSG_RESTORE); - Resume_console: resume_console(); Close: hibernation_ops->end(); diff --git a/kernel/power/main.c b/kernel/power/main.c index d023b6b584e5..3398f4651aa1 100644 --- a/kernel/power/main.c +++ b/kernel/power/main.c @@ -269,11 +269,11 @@ int suspend_devices_and_enter(suspend_state_t state) error = device_suspend(PMSG_SUSPEND); if (error) { printk(KERN_ERR "PM: Some devices failed to suspend\n"); - goto Resume_console; + goto Recover_platform; } if (suspend_test(TEST_DEVICES)) - goto Resume_devices; + goto Recover_platform; if (suspend_ops->prepare) { error = suspend_ops->prepare(); @@ -294,12 +294,16 @@ int suspend_devices_and_enter(suspend_state_t state) suspend_ops->finish(); Resume_devices: device_resume(PMSG_RESUME); - Resume_console: resume_console(); Close: if (suspend_ops->end) suspend_ops->end(); return error; + + Recover_platform: + if (suspend_ops->recover) + suspend_ops->recover(); + goto Resume_devices; } /** -- cgit v1.2.3 From c1e3b377ad48febba6f91b8ae42c44ee4d4ab45e Mon Sep 17 00:00:00 2001 From: Zhao Yakui Date: Tue, 24 Jun 2008 17:58:53 +0800 Subject: ACPI: Create "idle=halt" bootparam "idle=halt" limits the idle loop to using the halt instruction. No MWAIT, no IO accesses, no C-states deeper than C1. If something is broken in the idle code, "idle=halt" is a less severe workaround than "idle=poll" which disables all power savings. Signed-off-by: Zhao Yakui Signed-off-by: Len Brown Signed-off-by: Andi Kleen --- Documentation/kernel-parameters.txt | 4 +++- arch/ia64/kernel/process.c | 2 ++ arch/x86/kernel/process.c | 17 ++++++++++++++++- drivers/acpi/processor_idle.c | 22 ++++++++++++++++++++++ include/asm-ia64/processor.h | 1 + include/asm-x86/processor.h | 1 + 6 files changed, 45 insertions(+), 2 deletions(-) (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 312fe77764a4..65db7f4711aa 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -818,7 +818,7 @@ and is between 256 and 4096 characters. It is defined in the file See Documentation/ide/ide.txt. idle= [X86] - Format: idle=poll or idle=mwait + Format: idle=poll or idle=mwait, idle=halt Poll forces a polling idle loop that can slightly improves the performance of waking up a idle CPU, but will use a lot of power and make the system run hot. Not recommended. @@ -826,6 +826,8 @@ and is between 256 and 4096 characters. It is defined in the file to not use it because it doesn't save as much power as a normal idle loop use the MONITOR/MWAIT idle loop anyways. Performance should be the same as idle=poll. + idle=halt. Halt is forced to be used for CPU idle. + In such case C2/C3 won't be used again. ide-pci-generic.all-generic-ide [HW] (E)IDE subsystem Claim all unknown PCI IDE storage controllers. diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c index fabaf08d9a69..612b3c4a0603 100644 --- a/arch/ia64/kernel/process.c +++ b/arch/ia64/kernel/process.c @@ -55,6 +55,8 @@ void (*ia64_mark_idle)(int); unsigned long boot_option_idle_override = 0; EXPORT_SYMBOL(boot_option_idle_override); +unsigned long idle_halt; +EXPORT_SYMBOL(idle_halt); void ia64_do_show_stack (struct unw_frame_info *info, void *arg) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 7dceea947232..7fc729498760 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -7,6 +7,10 @@ #include #include #include +#include + +unsigned long idle_halt; +EXPORT_SYMBOL(idle_halt); struct kmem_cache *task_xstate_cachep; @@ -325,7 +329,18 @@ static int __init idle_setup(char *str) pm_idle = poll_idle; } else if (!strcmp(str, "mwait")) force_mwait = 1; - else + else if (!strcmp(str, "halt")) { + /* + * When the boot option of idle=halt is added, halt is + * forced to be used for CPU idle. In such case CPU C2/C3 + * won't be used again. + * To continue to load the CPU idle driver, don't touch + * the boot_option_idle_override. + */ + pm_idle = default_idle; + idle_halt = 1; + return 0; + } else return -1; boot_option_idle_override = 1; diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index 0fc310e7dfd6..c75c7ace8c13 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -41,6 +41,7 @@ #include #include #include +#include /* * Include the apic definitions for x86 to have the APIC timer related defines @@ -57,6 +58,7 @@ #include #include +#include #define ACPI_PROCESSOR_COMPONENT 0x01000000 #define ACPI_PROCESSOR_CLASS "processor" @@ -955,6 +957,17 @@ static int acpi_processor_get_power_info_cst(struct acpi_processor *pr) } else { continue; } + if (cx.type == ACPI_STATE_C1 && idle_halt) { + /* + * In most cases the C1 space_id obtained from + * _CST object is FIXED_HARDWARE access mode. + * But when the option of idle=halt is added, + * the entry_method type should be changed from + * CSTATE_FFH to CSTATE_HALT. + */ + cx.entry_method = ACPI_CSTATE_HALT; + snprintf(cx.desc, ACPI_CX_DESC_LEN, "ACPI HLT"); + } } else { snprintf(cx.desc, ACPI_CX_DESC_LEN, "ACPI IOPORT 0x%x", cx.address); @@ -1780,6 +1793,15 @@ int __cpuinit acpi_processor_power_init(struct acpi_processor *pr, return 0; if (!first_run) { + if (idle_halt) { + /* + * When the boot option of "idle=halt" is added, halt + * is used for CPU IDLE. + * In such case C2/C3 is meaningless. So the max_cstate + * is set to one. + */ + max_cstate = 1; + } dmi_check_system(processor_power_dmi_table); max_cstate = acpi_processor_cstate_check(max_cstate); if (max_cstate < ACPI_C_STATES_MAX) diff --git a/include/asm-ia64/processor.h b/include/asm-ia64/processor.h index 6aff126fc07e..f36e28a5f61e 100644 --- a/include/asm-ia64/processor.h +++ b/include/asm-ia64/processor.h @@ -763,6 +763,7 @@ prefetchw (const void *x) #define spin_lock_prefetch(x) prefetchw(x) extern unsigned long boot_option_idle_override; +extern unsigned long idle_halt; #endif /* !__ASSEMBLY__ */ diff --git a/include/asm-x86/processor.h b/include/asm-x86/processor.h index 7f7382704592..bc221623248e 100644 --- a/include/asm-x86/processor.h +++ b/include/asm-x86/processor.h @@ -727,6 +727,7 @@ extern int force_mwait; extern void select_idle_routine(const struct cpuinfo_x86 *c); extern unsigned long boot_option_idle_override; +extern unsigned long idle_halt; extern void enable_sep_cpu(void); extern int sysenter_setup(void); -- cgit v1.2.3 From da5e09a1b3e5a9fc0b15a3feb64e921ccc55ba74 Mon Sep 17 00:00:00 2001 From: Zhao Yakui Date: Tue, 24 Jun 2008 18:01:09 +0800 Subject: ACPI : Create "idle=nomwait" bootparam "idle=nomwait" disables the use of the MWAIT instruction from both C1 (C1_FFH) and deeper (C2C3_FFH) C-states. When MWAIT is unavailable, the BIOS and OS generally negotiate to use the HALT instruction for C1, and use IO accesses for deeper C-states. This option is useful for power and performance comparisons, and also to work around BIOS bugs where broken MWAIT support is advertised. http://bugzilla.kernel.org/show_bug.cgi?id=10807 http://bugzilla.kernel.org/show_bug.cgi?id=10914 Signed-off-by: Zhao Yakui Signed-off-by: Li Shaohua Signed-off-by: Len Brown Signed-off-by: Andi Kleen --- Documentation/kernel-parameters.txt | 3 ++- arch/ia64/kernel/process.c | 2 ++ arch/x86/kernel/process.c | 11 +++++++++++ drivers/acpi/processor_core.c | 13 +++++++++++++ drivers/acpi/processor_idle.c | 6 +++++- include/asm-ia64/processor.h | 1 + include/asm-x86/processor.h | 1 + 7 files changed, 35 insertions(+), 2 deletions(-) (limited to 'Documentation/kernel-parameters.txt') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 65db7f4711aa..5e497d16fb51 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -818,7 +818,7 @@ and is between 256 and 4096 characters. It is defined in the file See Documentation/ide/ide.txt. idle= [X86] - Format: idle=poll or idle=mwait, idle=halt + Format: idle=poll or idle=mwait, idle=halt, idle=nomwait Poll forces a polling idle loop that can slightly improves the performance of waking up a idle CPU, but will use a lot of power and make the system run hot. Not recommended. @@ -828,6 +828,7 @@ and is between 256 and 4096 characters. It is defined in the file as idle=poll. idle=halt. Halt is forced to be used for CPU idle. In such case C2/C3 won't be used again. + idle=nomwait. Disable mwait for CPU C-states ide-pci-generic.all-generic-ide [HW] (E)IDE subsystem Claim all unknown PCI IDE storage controllers. diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c index 612b3c4a0603..3ab8373103ec 100644 --- a/arch/ia64/kernel/process.c +++ b/arch/ia64/kernel/process.c @@ -57,6 +57,8 @@ unsigned long boot_option_idle_override = 0; EXPORT_SYMBOL(boot_option_idle_override); unsigned long idle_halt; EXPORT_SYMBOL(idle_halt); +unsigned long idle_nomwait; +EXPORT_SYMBOL(idle_nomwait); void ia64_do_show_stack (struct unw_frame_info *info, void *arg) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 7fc729498760..4d629c62f4f8 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -11,6 +11,8 @@ unsigned long idle_halt; EXPORT_SYMBOL(idle_halt); +unsigned long idle_nomwait; +EXPORT_SYMBOL(idle_nomwait); struct kmem_cache *task_xstate_cachep; @@ -340,6 +342,15 @@ static int __init idle_setup(char *str) pm_idle = default_idle; idle_halt = 1; return 0; + } else if (!strcmp(str, "nomwait")) { + /* + * If the boot option of "idle=nomwait" is added, + * it means that mwait will be disabled for CPU C2/C3 + * states. In such case it won't touch the variable + * of boot_option_idle_override. + */ + idle_nomwait = 1; + return 0; } else return -1; diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c index 9a803f85ccfe..4e1bb89fd6c3 100644 --- a/drivers/acpi/processor_core.c +++ b/drivers/acpi/processor_core.c @@ -265,7 +265,20 @@ static int acpi_processor_set_pdc(struct acpi_processor *pr) if (!pdc_in) return status; + if (idle_nomwait) { + /* + * If mwait is disabled for CPU C-states, the C2C3_FFH access + * mode will be disabled in the parameter of _PDC object. + * Of course C1_FFH access mode will also be disabled. + */ + union acpi_object *obj; + u32 *buffer = NULL; + obj = pdc_in->pointer; + buffer = (u32 *)(obj->buffer.pointer); + buffer[2] &= ~(ACPI_PDC_C_C2C3_FFH | ACPI_PDC_C_C1_FFH); + + } status = acpi_evaluate_object(pr->handle, "_PDC", pdc_in, NULL); if (ACPI_FAILURE(status)) diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index c75c7ace8c13..d592dbb1d12a 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -957,13 +957,17 @@ static int acpi_processor_get_power_info_cst(struct acpi_processor *pr) } else { continue; } - if (cx.type == ACPI_STATE_C1 && idle_halt) { + if (cx.type == ACPI_STATE_C1 && + (idle_halt || idle_nomwait)) { /* * In most cases the C1 space_id obtained from * _CST object is FIXED_HARDWARE access mode. * But when the option of idle=halt is added, * the entry_method type should be changed from * CSTATE_FFH to CSTATE_HALT. + * When the option of idle=nomwait is added, + * the C1 entry_method type should be + * CSTATE_HALT. */ cx.entry_method = ACPI_CSTATE_HALT; snprintf(cx.desc, ACPI_CX_DESC_LEN, "ACPI HLT"); diff --git a/include/asm-ia64/processor.h b/include/asm-ia64/processor.h index f36e28a5f61e..f88fa054d01d 100644 --- a/include/asm-ia64/processor.h +++ b/include/asm-ia64/processor.h @@ -764,6 +764,7 @@ prefetchw (const void *x) extern unsigned long boot_option_idle_override; extern unsigned long idle_halt; +extern unsigned long idle_nomwait; #endif /* !__ASSEMBLY__ */ diff --git a/include/asm-x86/processor.h b/include/asm-x86/processor.h index bc221623248e..55402d2ab938 100644 --- a/include/asm-x86/processor.h +++ b/include/asm-x86/processor.h @@ -728,6 +728,7 @@ extern void select_idle_routine(const struct cpuinfo_x86 *c); extern unsigned long boot_option_idle_override; extern unsigned long idle_halt; +extern unsigned long idle_nomwait; extern void enable_sep_cpu(void); extern int sysenter_setup(void); -- cgit v1.2.3