From 1c33be57496d927ce05b2513ff0c108f69db4345 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Thu, 12 Apr 2012 02:56:10 -0400 Subject: ARM: b.L: core switcher code This is the core code implementing big.LITTLE switcher functionality. Rationale for this code is available here: http://lwn.net/Articles/481055/ The main entry point for a switch request is: void bL_switch_request(unsigned int cpu, unsigned int new_cluster_id) If the calling CPU is not the wanted one, this wrapper takes care of sending the request to the appropriate CPU with schedule_work_on(). At the moment the core switch operation is handled by bL_switch_to() which must be called on the CPU for which a switch is requested. What this code does: * Return early if the current cluster is the wanted one. * Close the gate in the kernel entry vector for both the inbound and outbound CPUs. * Wake up the inbound CPU so it can perform its reset sequence in parallel up to the kernel entry vector gate. * Migrate all interrupts in the GIC targeting the outbound CPU interface to the inbound CPU interface, including SGIs. This is performed by gic_migrate_target() in drivers/irqchip/irq-gic.c. * Call cpu_pm_enter() which takes care of flushing the VFP state to RAM and save the CPU interface config from the GIC to RAM. * Modify the cpu_logical_map to refer to the inbound physical CPU. * Call cpu_suspend() which saves the CPU state (general purpose registers, page table address) onto the stack and store the resulting stack pointer in an array indexed by the updated cpu_logical_map, then call the provided shutdown function. This happens in arch/arm/kernel/sleep.S. At this point, the provided shutdown function executed by the outbound CPU ungates the inbound CPU. Therefore the inbound CPU: * Picks up the saved stack pointer in the array indexed by its MPIDR in arch/arm/kernel/sleep.S. * The MMU and caches are re-enabled using the saved state on the provided stack, just like if this was a resume operation from a suspended state. * Then cpu_suspend() returns, although this is on the inbound CPU rather than the outbound CPU which called it initially. * The function cpu_pm_exit() is called which effect is to restore the CPU interface state in the GIC using the state previously saved by the outbound CPU. * Exit of bL_switch_to() to resume normal kernel execution on the new CPU. However, the outbound CPU is potentially still running in parallel while the inbound CPU is resuming normal kernel execution, hence we need per CPU stack isolation to execute bL_do_switch(). After the outbound CPU has ungated the inbound CPU, it calls mcpm_cpu_power_down() to: * Clean its L1 cache. * If it is the last CPU still alive in its cluster (last man standing), it also cleans its L2 cache and disables cache snooping from the other cluster. * Power down the CPU (or whole cluster). Code called from bL_do_switch() might end up referencing 'current' for some reasons. However, 'current' is derived from the stack pointer. With any arbitrary stack, the returned value for 'current' and any dereferenced values through it are just random garbage which may lead to segmentation faults. The active page table during the execution of bL_do_switch() is also a problem. There is no guarantee that the inbound CPU won't destroy the corresponding task which would free the attached page table while the outbound CPU is still running and relying on it. To solve both issues, we borrow some of the task space belonging to the init/idle task which, by its nature, is lightly used and therefore is unlikely to clash with our usage. The init task is also never going away. Right now the logical CPU number is assumed to be equivalent to the physical CPU number within each cluster. The kernel should also be booted with only one cluster active. These limitations will be lifted eventually. Signed-off-by: Nicolas Pitre --- arch/arm/include/asm/bL_switcher.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 arch/arm/include/asm/bL_switcher.h (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/bL_switcher.h b/arch/arm/include/asm/bL_switcher.h new file mode 100644 index 000000000000..72efe3f349b9 --- /dev/null +++ b/arch/arm/include/asm/bL_switcher.h @@ -0,0 +1,17 @@ +/* + * arch/arm/include/asm/bL_switcher.h + * + * Created by: Nicolas Pitre, April 2012 + * Copyright: (C) 2012-2013 Linaro Limited + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#ifndef ASM_BL_SWITCHER_H +#define ASM_BL_SWITCHER_H + +void bL_switch_request(unsigned int cpu, unsigned int new_cluster_id); + +#endif -- cgit v1.2.3 From 71ce1deeff8f9341ae3b21983e9bdde28e8c96fe Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Fri, 26 Oct 2012 02:36:17 -0400 Subject: ARM: bL_switcher: move to dedicated threads rather than workqueues The workqueues are problematic as they may be contended. They can't be scheduled with top priority either. Also the optimization in bL_switch_request() to skip the workqueue entirely when the target CPU and the calling CPU were the same didn't allow for bL_switch_request() to be called from atomic context, as might be the case for some cpufreq drivers. Let's move to dedicated kthreads instead. Signed-off-by: Nicolas Pitre --- arch/arm/common/bL_switcher.c | 101 ++++++++++++++++++++++++++++--------- arch/arm/include/asm/bL_switcher.h | 2 +- 2 files changed, 79 insertions(+), 24 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/common/bL_switcher.c b/arch/arm/common/bL_switcher.c index ca04b5384bb0..407c4cc64c0b 100644 --- a/arch/arm/common/bL_switcher.c +++ b/arch/arm/common/bL_switcher.c @@ -15,8 +15,10 @@ #include #include #include +#include #include -#include +#include +#include #include #include #include @@ -219,15 +221,48 @@ static int bL_switch_to(unsigned int new_cluster_id) return ret; } -struct switch_args { - unsigned int cluster; - struct work_struct work; +struct bL_thread { + struct task_struct *task; + wait_queue_head_t wq; + int wanted_cluster; }; -static void __bL_switch_to(struct work_struct *work) +static struct bL_thread bL_threads[NR_CPUS]; + +static int bL_switcher_thread(void *arg) +{ + struct bL_thread *t = arg; + struct sched_param param = { .sched_priority = 1 }; + int cluster; + + sched_setscheduler_nocheck(current, SCHED_FIFO, ¶m); + + do { + if (signal_pending(current)) + flush_signals(current); + wait_event_interruptible(t->wq, + t->wanted_cluster != -1 || + kthread_should_stop()); + cluster = xchg(&t->wanted_cluster, -1); + if (cluster != -1) + bL_switch_to(cluster); + } while (!kthread_should_stop()); + + return 0; +} + +static struct task_struct * __init bL_switcher_thread_create(int cpu, void *arg) { - struct switch_args *args = container_of(work, struct switch_args, work); - bL_switch_to(args->cluster); + struct task_struct *task; + + task = kthread_create_on_node(bL_switcher_thread, arg, + cpu_to_node(cpu), "kswitcher_%d", cpu); + if (!IS_ERR(task)) { + kthread_bind(task, cpu); + wake_up_process(task); + } else + pr_err("%s failed for CPU %d\n", __func__, cpu); + return task; } /* @@ -236,26 +271,46 @@ static void __bL_switch_to(struct work_struct *work) * @cpu: the CPU to switch * @new_cluster_id: the ID of the cluster to switch to. * - * This function causes a cluster switch on the given CPU. If the given - * CPU is the same as the calling CPU then the switch happens right away. - * Otherwise the request is put on a work queue to be scheduled on the - * remote CPU. + * This function causes a cluster switch on the given CPU by waking up + * the appropriate switcher thread. This function may or may not return + * before the switch has occurred. */ -void bL_switch_request(unsigned int cpu, unsigned int new_cluster_id) +int bL_switch_request(unsigned int cpu, unsigned int new_cluster_id) { - unsigned int this_cpu = get_cpu(); - struct switch_args args; + struct bL_thread *t; - if (cpu == this_cpu) { - bL_switch_to(new_cluster_id); - put_cpu(); - return; + if (cpu >= ARRAY_SIZE(bL_threads)) { + pr_err("%s: cpu %d out of bounds\n", __func__, cpu); + return -EINVAL; } - put_cpu(); - args.cluster = new_cluster_id; - INIT_WORK_ONSTACK(&args.work, __bL_switch_to); - schedule_work_on(cpu, &args.work); - flush_work(&args.work); + t = &bL_threads[cpu]; + if (IS_ERR(t->task)) + return PTR_ERR(t->task); + if (!t->task) + return -ESRCH; + + t->wanted_cluster = new_cluster_id; + wake_up(&t->wq); + return 0; } EXPORT_SYMBOL_GPL(bL_switch_request); + +static int __init bL_switcher_init(void) +{ + int cpu; + + pr_info("big.LITTLE switcher initializing\n"); + + for_each_online_cpu(cpu) { + struct bL_thread *t = &bL_threads[cpu]; + init_waitqueue_head(&t->wq); + t->wanted_cluster = -1; + t->task = bL_switcher_thread_create(cpu, t); + } + + pr_info("big.LITTLE switcher initialized\n"); + return 0; +} + +late_initcall(bL_switcher_init); diff --git a/arch/arm/include/asm/bL_switcher.h b/arch/arm/include/asm/bL_switcher.h index 72efe3f349b9..e0c0bba70bbf 100644 --- a/arch/arm/include/asm/bL_switcher.h +++ b/arch/arm/include/asm/bL_switcher.h @@ -12,6 +12,6 @@ #ifndef ASM_BL_SWITCHER_H #define ASM_BL_SWITCHER_H -void bL_switch_request(unsigned int cpu, unsigned int new_cluster_id); +int bL_switch_request(unsigned int cpu, unsigned int new_cluster_id); #endif -- cgit v1.2.3 From c0f4375146a738bae23e48fa8b5383abf02177cb Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Mon, 10 Dec 2012 17:19:57 +0000 Subject: ARM: bL_switcher: Add synchronous enable/disable interface Some subsystems will need to know for sure whether the switcher is enabled or disabled during certain critical regions. This patch provides a simple mutex-based mechanism to discover whether the switcher is enabled and temporarily lock out further enable/disable: * bL_switcher_get_enabled() returns true iff the switcher is enabled and temporarily inhibits enable/disable. * bL_switcher_put_enabled() permits enable/disable of the switcher again after a previous call to bL_switcher_get_enabled(). Signed-off-by: Dave Martin Signed-off-by: Nicolas Pitre --- arch/arm/common/bL_switcher.c | 27 +++++++++++++++++++++++++-- arch/arm/include/asm/bL_switcher.h | 3 +++ 2 files changed, 28 insertions(+), 2 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/common/bL_switcher.c b/arch/arm/common/bL_switcher.c index 335ff76d4c5a..7d98629aa446 100644 --- a/arch/arm/common/bL_switcher.c +++ b/arch/arm/common/bL_switcher.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -302,6 +303,7 @@ EXPORT_SYMBOL_GPL(bL_switch_request); * Activation and configuration code. */ +static DEFINE_MUTEX(bL_switcher_activation_lock); static unsigned int bL_switcher_active; static unsigned int bL_switcher_cpu_original_cluster[NR_CPUS]; static cpumask_t bL_switcher_removed_logical_cpus; @@ -413,9 +415,11 @@ static int bL_switcher_enable(void) { int cpu, ret; + mutex_lock(&bL_switcher_activation_lock); cpu_hotplug_driver_lock(); if (bL_switcher_active) { cpu_hotplug_driver_unlock(); + mutex_unlock(&bL_switcher_activation_lock); return 0; } @@ -424,6 +428,7 @@ static int bL_switcher_enable(void) ret = bL_switcher_halve_cpus(); if (ret) { cpu_hotplug_driver_unlock(); + mutex_unlock(&bL_switcher_activation_lock); return ret; } @@ -436,9 +441,10 @@ static int bL_switcher_enable(void) } bL_switcher_active = 1; - cpu_hotplug_driver_unlock(); - pr_info("big.LITTLE switcher initialized\n"); + + cpu_hotplug_driver_unlock(); + mutex_unlock(&bL_switcher_activation_lock); return 0; } @@ -450,9 +456,11 @@ static void bL_switcher_disable(void) struct bL_thread *t; struct task_struct *task; + mutex_lock(&bL_switcher_activation_lock); cpu_hotplug_driver_lock(); if (!bL_switcher_active) { cpu_hotplug_driver_unlock(); + mutex_unlock(&bL_switcher_activation_lock); return; } bL_switcher_active = 0; @@ -497,6 +505,7 @@ static void bL_switcher_disable(void) bL_switcher_restore_cpus(); cpu_hotplug_driver_unlock(); + mutex_unlock(&bL_switcher_activation_lock); } static ssize_t bL_switcher_active_show(struct kobject *kobj, @@ -554,6 +563,20 @@ static int __init bL_switcher_sysfs_init(void) #endif /* CONFIG_SYSFS */ +bool bL_switcher_get_enabled(void) +{ + mutex_lock(&bL_switcher_activation_lock); + + return bL_switcher_active; +} +EXPORT_SYMBOL_GPL(bL_switcher_get_enabled); + +void bL_switcher_put_enabled(void) +{ + mutex_unlock(&bL_switcher_activation_lock); +} +EXPORT_SYMBOL_GPL(bL_switcher_put_enabled); + /* * Veto any CPU hotplug operation on those CPUs we've removed * while the switcher is active. diff --git a/arch/arm/include/asm/bL_switcher.h b/arch/arm/include/asm/bL_switcher.h index e0c0bba70bbf..05d7c4cb9473 100644 --- a/arch/arm/include/asm/bL_switcher.h +++ b/arch/arm/include/asm/bL_switcher.h @@ -14,4 +14,7 @@ int bL_switch_request(unsigned int cpu, unsigned int new_cluster_id); +bool bL_switcher_get_enabled(void); +void bL_switcher_put_enabled(void); + #endif -- cgit v1.2.3 From 491990e29f5d285a1b75e74785e3160716b79040 Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Mon, 10 Dec 2012 17:19:58 +0000 Subject: ARM: bL_switcher: Add runtime control notifier Some subsystems will need to respond synchronously to runtime enabling and disabling of the switcher. This patch adds a dedicated notifier interface to support such subsystems. Pre- and post- enable/disable notifications are sent to registered callbacks, allowing safe transition of non-b.L- transparent subsystems across these control transitions. Notifier callbacks may veto switcher (de)activation on pre notifications only. Post notifications won't revert the action. If enabling or disabling of the switcher fails after the pre-change notification has been sent, subsystems which have registered notifiers can be left in an inappropriate state. This patch sends a suitable post-change notification on failure, indicating that the old state has been reestablished. For example, a failed initialisation will result in the following sequence: BL_NOTIFY_PRE_ENABLE /* switcher initialisation fails */ BL_NOTIFY_POST_DISABLE It is the responsibility of notified subsystems to respond in an appropriate way. Signed-off-by: Dave Martin Signed-off-by: Nicolas Pitre --- arch/arm/common/bL_switcher.c | 60 +++++++++++++++++++++++++++++++------- arch/arm/include/asm/bL_switcher.h | 44 ++++++++++++++++++++++++++++ 2 files changed, 94 insertions(+), 10 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/common/bL_switcher.c b/arch/arm/common/bL_switcher.c index 7d98629aa446..016488730cb7 100644 --- a/arch/arm/common/bL_switcher.c +++ b/arch/arm/common/bL_switcher.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -304,10 +305,34 @@ EXPORT_SYMBOL_GPL(bL_switch_request); */ static DEFINE_MUTEX(bL_switcher_activation_lock); +static BLOCKING_NOTIFIER_HEAD(bL_activation_notifier); static unsigned int bL_switcher_active; static unsigned int bL_switcher_cpu_original_cluster[NR_CPUS]; static cpumask_t bL_switcher_removed_logical_cpus; +int bL_switcher_register_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(&bL_activation_notifier, nb); +} +EXPORT_SYMBOL_GPL(bL_switcher_register_notifier); + +int bL_switcher_unregister_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_unregister(&bL_activation_notifier, nb); +} +EXPORT_SYMBOL_GPL(bL_switcher_unregister_notifier); + +static int bL_activation_notify(unsigned long val) +{ + int ret; + + ret = blocking_notifier_call_chain(&bL_activation_notifier, val, NULL); + if (ret & NOTIFY_STOP_MASK) + pr_err("%s: notifier chain failed with status 0x%x\n", + __func__, ret); + return notifier_to_errno(ret); +} + static void bL_switcher_restore_cpus(void) { int i; @@ -425,12 +450,13 @@ static int bL_switcher_enable(void) pr_info("big.LITTLE switcher initializing\n"); + ret = bL_activation_notify(BL_NOTIFY_PRE_ENABLE); + if (ret) + goto error; + ret = bL_switcher_halve_cpus(); - if (ret) { - cpu_hotplug_driver_unlock(); - mutex_unlock(&bL_switcher_activation_lock); - return ret; - } + if (ret) + goto error; for_each_online_cpu(cpu) { struct bL_thread *t = &bL_threads[cpu]; @@ -441,11 +467,18 @@ static int bL_switcher_enable(void) } bL_switcher_active = 1; + bL_activation_notify(BL_NOTIFY_POST_ENABLE); pr_info("big.LITTLE switcher initialized\n"); + goto out; + +error: + pr_warn("big.LITTLE switcher initialization failed\n"); + bL_activation_notify(BL_NOTIFY_POST_DISABLE); +out: cpu_hotplug_driver_unlock(); mutex_unlock(&bL_switcher_activation_lock); - return 0; + return ret; } #ifdef CONFIG_SYSFS @@ -458,11 +491,15 @@ static void bL_switcher_disable(void) mutex_lock(&bL_switcher_activation_lock); cpu_hotplug_driver_lock(); - if (!bL_switcher_active) { - cpu_hotplug_driver_unlock(); - mutex_unlock(&bL_switcher_activation_lock); - return; + + if (!bL_switcher_active) + goto out; + + if (bL_activation_notify(BL_NOTIFY_PRE_DISABLE) != 0) { + bL_activation_notify(BL_NOTIFY_POST_ENABLE); + goto out; } + bL_switcher_active = 0; /* @@ -504,6 +541,9 @@ static void bL_switcher_disable(void) } bL_switcher_restore_cpus(); + bL_activation_notify(BL_NOTIFY_POST_DISABLE); + +out: cpu_hotplug_driver_unlock(); mutex_unlock(&bL_switcher_activation_lock); } diff --git a/arch/arm/include/asm/bL_switcher.h b/arch/arm/include/asm/bL_switcher.h index 05d7c4cb9473..b243ca93e8e9 100644 --- a/arch/arm/include/asm/bL_switcher.h +++ b/arch/arm/include/asm/bL_switcher.h @@ -12,9 +12,53 @@ #ifndef ASM_BL_SWITCHER_H #define ASM_BL_SWITCHER_H +#include +#include + int bL_switch_request(unsigned int cpu, unsigned int new_cluster_id); +/* + * Register here to be notified about runtime enabling/disabling of + * the switcher. + * + * The notifier chain is called with the switcher activation lock held: + * the switcher will not be enabled or disabled during callbacks. + * Callbacks must not call bL_switcher_{get,put}_enabled(). + */ +#define BL_NOTIFY_PRE_ENABLE 0 +#define BL_NOTIFY_POST_ENABLE 1 +#define BL_NOTIFY_PRE_DISABLE 2 +#define BL_NOTIFY_POST_DISABLE 3 + +#ifdef CONFIG_BL_SWITCHER + +int bL_switcher_register_notifier(struct notifier_block *nb); +int bL_switcher_unregister_notifier(struct notifier_block *nb); + +/* + * Use these functions to temporarily prevent enabling/disabling of + * the switcher. + * bL_switcher_get_enabled() returns true if the switcher is currently + * enabled. Each call to bL_switcher_get_enabled() must be followed + * by a call to bL_switcher_put_enabled(). These functions are not + * recursive. + */ bool bL_switcher_get_enabled(void); void bL_switcher_put_enabled(void); +#else +static inline int bL_switcher_register_notifier(struct notifier_block *nb) +{ + return 0; +} + +static inline int bL_switcher_unregister_notifier(struct notifier_block *nb) +{ + return 0; +} + +static inline bool bL_switcher_get_enabled(void) { return false; } +static inline void bL_switcher_put_enabled(void) { } +#endif /* CONFIG_BL_SWITCHER */ + #endif -- cgit v1.2.3 From 0577fee283fb385afbcdb78d1f4c398d7326b68f Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Wed, 22 May 2013 19:08:16 +0100 Subject: ARM: bL_switcher: Add switch completion callback for bL_switch_request() There is no explicit way to know when a switch started via bL_switch_request() is complete. This can lead to unpredictable behaviour when the switcher is controlled by a subsystem which makes dynamic decisions (such as cpufreq). The CPU PM notifier is not really suitable for signalling completion, because the CPU could get suspended and resumed for other, independent reasons while a switch request is in flight. Adding a whole new notifier for this seems excessive, and may tempt people to put heavyweight code on this path. This patch implements a new bL_switch_request_cb() function that allows for a per-request lightweight callback, private between the switcher and the caller of bL_switch_request_cb(). Overlapping switches on a single CPU are considered incorrect if they are requested via bL_switch_request_cb() with a callback (they will lead to an unpredictable final state without explicit external synchronisation to force the requests into a particular order). Queuing requests robustly would be overkill because only one subsystem should be attempting to control the switcher at any time. Overlapping requests of this kind will be failed with -EBUSY to indicate that the second request won't take effect and the completer will never be called for it. bL_switch_request() is retained as a wrapper round the new function, with the old, fire-and-forget semantics. In this case the last request will always win. The request may still be denied if a previous request with a completer is still pending. Signed-off-by: Dave Martin Signed-off-by: Nicolas Pitre --- arch/arm/common/bL_switcher.c | 53 ++++++++++++++++++++++++++++++++++---- arch/arm/include/asm/bL_switcher.h | 10 ++++++- 2 files changed, 57 insertions(+), 6 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/common/bL_switcher.c b/arch/arm/common/bL_switcher.c index 016488730cb7..34316be404d5 100644 --- a/arch/arm/common/bL_switcher.c +++ b/arch/arm/common/bL_switcher.c @@ -9,6 +9,7 @@ * published by the Free Software Foundation. */ +#include #include #include #include @@ -25,6 +26,7 @@ #include #include #include +#include #include #include #include @@ -224,10 +226,13 @@ static int bL_switch_to(unsigned int new_cluster_id) } struct bL_thread { + spinlock_t lock; struct task_struct *task; wait_queue_head_t wq; int wanted_cluster; struct completion started; + bL_switch_completion_handler completer; + void *completer_cookie; }; static struct bL_thread bL_threads[NR_CPUS]; @@ -237,6 +242,8 @@ static int bL_switcher_thread(void *arg) struct bL_thread *t = arg; struct sched_param param = { .sched_priority = 1 }; int cluster; + bL_switch_completion_handler completer; + void *completer_cookie; sched_setscheduler_nocheck(current, SCHED_FIFO, ¶m); complete(&t->started); @@ -247,9 +254,21 @@ static int bL_switcher_thread(void *arg) wait_event_interruptible(t->wq, t->wanted_cluster != -1 || kthread_should_stop()); - cluster = xchg(&t->wanted_cluster, -1); - if (cluster != -1) + + spin_lock(&t->lock); + cluster = t->wanted_cluster; + completer = t->completer; + completer_cookie = t->completer_cookie; + t->wanted_cluster = -1; + t->completer = NULL; + spin_unlock(&t->lock); + + if (cluster != -1) { bL_switch_to(cluster); + + if (completer) + completer(completer_cookie); + } } while (!kthread_should_stop()); return 0; @@ -270,16 +289,30 @@ static struct task_struct *bL_switcher_thread_create(int cpu, void *arg) } /* - * bL_switch_request - Switch to a specific cluster for the given CPU + * bL_switch_request_cb - Switch to a specific cluster for the given CPU, + * with completion notification via a callback * * @cpu: the CPU to switch * @new_cluster_id: the ID of the cluster to switch to. + * @completer: switch completion callback. if non-NULL, + * @completer(@completer_cookie) will be called on completion of + * the switch, in non-atomic context. + * @completer_cookie: opaque context argument for @completer. * * This function causes a cluster switch on the given CPU by waking up * the appropriate switcher thread. This function may or may not return * before the switch has occurred. + * + * If a @completer callback function is supplied, it will be called when + * the switch is complete. This can be used to determine asynchronously + * when the switch is complete, regardless of when bL_switch_request() + * returns. When @completer is supplied, no new switch request is permitted + * for the affected CPU until after the switch is complete, and @completer + * has returned. */ -int bL_switch_request(unsigned int cpu, unsigned int new_cluster_id) +int bL_switch_request_cb(unsigned int cpu, unsigned int new_cluster_id, + bL_switch_completion_handler completer, + void *completer_cookie) { struct bL_thread *t; @@ -289,16 +322,25 @@ int bL_switch_request(unsigned int cpu, unsigned int new_cluster_id) } t = &bL_threads[cpu]; + if (IS_ERR(t->task)) return PTR_ERR(t->task); if (!t->task) return -ESRCH; + spin_lock(&t->lock); + if (t->completer) { + spin_unlock(&t->lock); + return -EBUSY; + } + t->completer = completer; + t->completer_cookie = completer_cookie; t->wanted_cluster = new_cluster_id; + spin_unlock(&t->lock); wake_up(&t->wq); return 0; } -EXPORT_SYMBOL_GPL(bL_switch_request); +EXPORT_SYMBOL_GPL(bL_switch_request_cb); /* * Activation and configuration code. @@ -460,6 +502,7 @@ static int bL_switcher_enable(void) for_each_online_cpu(cpu) { struct bL_thread *t = &bL_threads[cpu]; + spin_lock_init(&t->lock); init_waitqueue_head(&t->wq); init_completion(&t->started); t->wanted_cluster = -1; diff --git a/arch/arm/include/asm/bL_switcher.h b/arch/arm/include/asm/bL_switcher.h index b243ca93e8e9..7d1cce8b8a0d 100644 --- a/arch/arm/include/asm/bL_switcher.h +++ b/arch/arm/include/asm/bL_switcher.h @@ -15,7 +15,15 @@ #include #include -int bL_switch_request(unsigned int cpu, unsigned int new_cluster_id); +typedef void (*bL_switch_completion_handler)(void *cookie); + +int bL_switch_request_cb(unsigned int cpu, unsigned int new_cluster_id, + bL_switch_completion_handler completer, + void *completer_cookie); +static inline int bL_switch_request(unsigned int cpu, unsigned int new_cluster_id) +{ + return bL_switch_request_cb(cpu, new_cluster_id, NULL, NULL); +} /* * Register here to be notified about runtime enabling/disabling of -- cgit v1.2.3 From 5135d875e1457ef946a055003d8f80713e862135 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Tue, 27 Nov 2012 21:54:41 -0500 Subject: ARM: SMP: basic IPI triggered completion support We need a mechanism to let an inbound CPU signal that it is alive before even getting into the kernel environment i.e. from early assembly code. Using an IPI is the simplest way to achieve that. This adds some basic infrastructure to register a struct completion pointer to be "completed" when the dedicated IPI for this task is received. Signed-off-by: Nicolas Pitre --- arch/arm/include/asm/hardirq.h | 2 +- arch/arm/include/asm/smp.h | 2 ++ arch/arm/kernel/smp.c | 21 +++++++++++++++++++++ 3 files changed, 24 insertions(+), 1 deletion(-) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/hardirq.h b/arch/arm/include/asm/hardirq.h index 2740c2a2df63..3d7351c844aa 100644 --- a/arch/arm/include/asm/hardirq.h +++ b/arch/arm/include/asm/hardirq.h @@ -5,7 +5,7 @@ #include #include -#define NR_IPI 6 +#define NR_IPI 7 typedef struct { unsigned int __softirq_pending; diff --git a/arch/arm/include/asm/smp.h b/arch/arm/include/asm/smp.h index a8cae71caceb..22a3b9b5d4a1 100644 --- a/arch/arm/include/asm/smp.h +++ b/arch/arm/include/asm/smp.h @@ -84,6 +84,8 @@ extern void arch_send_call_function_single_ipi(int cpu); extern void arch_send_call_function_ipi_mask(const struct cpumask *mask); extern void arch_send_wakeup_ipi_mask(const struct cpumask *mask); +extern int register_ipi_completion(struct completion *completion, int cpu); + struct smp_operations { #ifdef CONFIG_SMP /* diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 72024ea8a3a6..7d80a549cae5 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -66,6 +66,7 @@ enum ipi_msg_type { IPI_CALL_FUNC, IPI_CALL_FUNC_SINGLE, IPI_CPU_STOP, + IPI_COMPLETION, }; static DECLARE_COMPLETION(cpu_running); @@ -456,6 +457,7 @@ static const char *ipi_types[NR_IPI] = { S(IPI_CALL_FUNC, "Function call interrupts"), S(IPI_CALL_FUNC_SINGLE, "Single function call interrupts"), S(IPI_CPU_STOP, "CPU stop interrupts"), + S(IPI_COMPLETION, "completion interrupts"), }; void show_ipi_list(struct seq_file *p, int prec) @@ -515,6 +517,19 @@ static void ipi_cpu_stop(unsigned int cpu) cpu_relax(); } +static DEFINE_PER_CPU(struct completion *, cpu_completion); + +int register_ipi_completion(struct completion *completion, int cpu) +{ + per_cpu(cpu_completion, cpu) = completion; + return IPI_COMPLETION; +} + +static void ipi_complete(unsigned int cpu) +{ + complete(per_cpu(cpu_completion, cpu)); +} + /* * Main handler for inter-processor interrupts */ @@ -565,6 +580,12 @@ void handle_IPI(int ipinr, struct pt_regs *regs) irq_exit(); break; + case IPI_COMPLETION: + irq_enter(); + ipi_complete(cpu); + irq_exit(); + break; + default: printk(KERN_CRIT "CPU%u: Unknown IPI message 0x%x\n", cpu, ipinr); -- cgit v1.2.3 From de885d147ad2c4a66777e3557440247efde1cc8d Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Tue, 27 Nov 2012 23:11:20 -0500 Subject: ARM: mcpm: add a simple poke mechanism to the early entry code This allows to poke a predetermined value into a specific address upon entering the early boot code in bL_head.S. Signed-off-by: Nicolas Pitre --- arch/arm/common/mcpm_entry.c | 12 ++++++++++++ arch/arm/common/mcpm_head.S | 16 ++++++++++++++-- arch/arm/include/asm/mcpm.h | 8 ++++++++ 3 files changed, 34 insertions(+), 2 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/common/mcpm_entry.c b/arch/arm/common/mcpm_entry.c index 370236dd1a03..4a2b32fd53a1 100644 --- a/arch/arm/common/mcpm_entry.c +++ b/arch/arm/common/mcpm_entry.c @@ -27,6 +27,18 @@ void mcpm_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr) sync_cache_w(&mcpm_entry_vectors[cluster][cpu]); } +extern unsigned long mcpm_entry_early_pokes[MAX_NR_CLUSTERS][MAX_CPUS_PER_CLUSTER][2]; + +void mcpm_set_early_poke(unsigned cpu, unsigned cluster, + unsigned long poke_phys_addr, unsigned long poke_val) +{ + unsigned long *poke = &mcpm_entry_early_pokes[cluster][cpu][0]; + poke[0] = poke_phys_addr; + poke[1] = poke_val; + __cpuc_flush_dcache_area((void *)poke, 8); + outer_clean_range(__pa(poke), __pa(poke + 2)); +} + static const struct mcpm_platform_ops *platform_ops; int __init mcpm_platform_register(const struct mcpm_platform_ops *ops) diff --git a/arch/arm/common/mcpm_head.S b/arch/arm/common/mcpm_head.S index 39c96df3477a..49dd5352fe70 100644 --- a/arch/arm/common/mcpm_head.S +++ b/arch/arm/common/mcpm_head.S @@ -71,12 +71,19 @@ ENTRY(mcpm_entry_point) * position independent way. */ adr r5, 3f - ldmia r5, {r6, r7, r8, r11} + ldmia r5, {r0, r6, r7, r8, r11} + add r0, r5, r0 @ r0 = mcpm_entry_early_pokes add r6, r5, r6 @ r6 = mcpm_entry_vectors ldr r7, [r5, r7] @ r7 = mcpm_power_up_setup_phys add r8, r5, r8 @ r8 = mcpm_sync add r11, r5, r11 @ r11 = first_man_locks + @ Perform an early poke, if any + add r0, r0, r4, lsl #3 + ldmia r0, {r0, r1} + teq r0, #0 + strne r1, [r0] + mov r0, #MCPM_SYNC_CLUSTER_SIZE mla r8, r0, r10, r8 @ r8 = sync cluster base @@ -195,7 +202,8 @@ mcpm_entry_gated: .align 2 -3: .word mcpm_entry_vectors - . +3: .word mcpm_entry_early_pokes - . + .word mcpm_entry_vectors - 3b .word mcpm_power_up_setup_phys - 3b .word mcpm_sync - 3b .word first_man_locks - 3b @@ -214,6 +222,10 @@ first_man_locks: ENTRY(mcpm_entry_vectors) .space 4 * MAX_NR_CLUSTERS * MAX_CPUS_PER_CLUSTER + .type mcpm_entry_early_pokes, #object +ENTRY(mcpm_entry_early_pokes) + .space 8 * MAX_NR_CLUSTERS * MAX_CPUS_PER_CLUSTER + .type mcpm_power_up_setup_phys, #object ENTRY(mcpm_power_up_setup_phys) .space 4 @ set by mcpm_sync_init() diff --git a/arch/arm/include/asm/mcpm.h b/arch/arm/include/asm/mcpm.h index 0f7b7620e9a5..7626a7fd4938 100644 --- a/arch/arm/include/asm/mcpm.h +++ b/arch/arm/include/asm/mcpm.h @@ -41,6 +41,14 @@ extern void mcpm_entry_point(void); */ void mcpm_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr); +/* + * This sets an early poke i.e a value to be poked into some address + * from very early assembly code before the CPU is ungated. The + * address must be physical, and if 0 then nothing will happen. + */ +void mcpm_set_early_poke(unsigned cpu, unsigned cluster, + unsigned long poke_phys_addr, unsigned long poke_val); + /* * CPU/cluster power operations API for higher subsystems to use. */ -- cgit v1.2.3 From 29064b88466ee725613db16d8c05b0ec5443a309 Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Mon, 11 Feb 2013 14:39:19 +0000 Subject: ARM: bL_switcher/trace: Add kernel trace trigger interface This patch exports a bL_switcher_trace_trigger() function to provide a means for drivers using the trace events to get the current status when starting a trace session. Calling this function is equivalent to pinging the trace_trigger file in sysfs. Signed-off-by: Dave Martin --- arch/arm/common/bL_switcher.c | 3 ++- arch/arm/include/asm/bL_switcher.h | 3 +++ 2 files changed, 5 insertions(+), 1 deletion(-) (limited to 'arch/arm/include') diff --git a/arch/arm/common/bL_switcher.c b/arch/arm/common/bL_switcher.c index f0dc025077d5..f4878a36047a 100644 --- a/arch/arm/common/bL_switcher.c +++ b/arch/arm/common/bL_switcher.c @@ -537,7 +537,7 @@ static void bL_switcher_trace_trigger_cpu(void *__always_unused info) trace_cpu_migrate_current(get_ns(), read_mpidr()); } -static int bL_switcher_trace_trigger(void) +int bL_switcher_trace_trigger(void) { int ret; @@ -550,6 +550,7 @@ static int bL_switcher_trace_trigger(void) return ret; } +EXPORT_SYMBOL_GPL(bL_switcher_trace_trigger); static int bL_switcher_enable(void) { diff --git a/arch/arm/include/asm/bL_switcher.h b/arch/arm/include/asm/bL_switcher.h index 7d1cce8b8a0d..8ada5a885c70 100644 --- a/arch/arm/include/asm/bL_switcher.h +++ b/arch/arm/include/asm/bL_switcher.h @@ -54,6 +54,8 @@ int bL_switcher_unregister_notifier(struct notifier_block *nb); bool bL_switcher_get_enabled(void); void bL_switcher_put_enabled(void); +int bL_switcher_trace_trigger(void); + #else static inline int bL_switcher_register_notifier(struct notifier_block *nb) { @@ -67,6 +69,7 @@ static inline int bL_switcher_unregister_notifier(struct notifier_block *nb) static inline bool bL_switcher_get_enabled(void) { return false; } static inline void bL_switcher_put_enabled(void) { } +static inline int bL_switcher_trace_trigger(void) { return 0; } #endif /* CONFIG_BL_SWITCHER */ #endif -- cgit v1.2.3 From d08e2e09042bd3f7ef66a35cb4bb92794ab26bb2 Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Wed, 13 Feb 2013 16:20:44 +0000 Subject: ARM: bL_switcher: Add query interface to discover CPU affinities When the switcher is active, there is no straightforward way to figure out which logical CPU a given physical CPU maps to. This patch provides a function bL_switcher_get_logical_index(mpidr), which is analogous to get_logical_index(). This function returns the logical CPU on which the specified physical CPU is grouped (or -EINVAL if unknown). If the switcher is inactive or not present, -EUNATCH is returned instead. Signed-off-by: Dave Martin Signed-off-by: Nicolas Pitre --- arch/arm/common/bL_switcher.c | 20 ++++++++++++++++++++ arch/arm/include/asm/bL_switcher.h | 2 ++ 2 files changed, 22 insertions(+) (limited to 'arch/arm/include') diff --git a/arch/arm/common/bL_switcher.c b/arch/arm/common/bL_switcher.c index f4878a36047a..63bbc4f70564 100644 --- a/arch/arm/common/bL_switcher.c +++ b/arch/arm/common/bL_switcher.c @@ -532,6 +532,26 @@ static int bL_switcher_halve_cpus(void) return 0; } +/* Determine the logical CPU a given physical CPU is grouped on. */ +int bL_switcher_get_logical_index(u32 mpidr) +{ + int cpu; + + if (!bL_switcher_active) + return -EUNATCH; + + mpidr &= MPIDR_HWID_BITMASK; + for_each_online_cpu(cpu) { + int pairing = bL_switcher_cpu_pairing[cpu]; + if (pairing == -1) + continue; + if ((mpidr == cpu_logical_map(cpu)) || + (mpidr == cpu_logical_map(pairing))) + return cpu; + } + return -EINVAL; +} + static void bL_switcher_trace_trigger_cpu(void *__always_unused info) { trace_cpu_migrate_current(get_ns(), read_mpidr()); diff --git a/arch/arm/include/asm/bL_switcher.h b/arch/arm/include/asm/bL_switcher.h index 8ada5a885c70..1714800fa113 100644 --- a/arch/arm/include/asm/bL_switcher.h +++ b/arch/arm/include/asm/bL_switcher.h @@ -55,6 +55,7 @@ bool bL_switcher_get_enabled(void); void bL_switcher_put_enabled(void); int bL_switcher_trace_trigger(void); +int bL_switcher_get_logical_index(u32 mpidr); #else static inline int bL_switcher_register_notifier(struct notifier_block *nb) @@ -70,6 +71,7 @@ static inline int bL_switcher_unregister_notifier(struct notifier_block *nb) static inline bool bL_switcher_get_enabled(void) { return false; } static inline void bL_switcher_put_enabled(void) { } static inline int bL_switcher_trace_trigger(void) { return 0; } +static inline int bL_switcher_get_logical_index(u32 mpidr) { return -EUNATCH; } #endif /* CONFIG_BL_SWITCHER */ #endif -- cgit v1.2.3 From 49863894db3ed7bd41541b1c17733273966cea71 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Thu, 26 Sep 2013 12:36:35 +0100 Subject: ARM: perf: add support for perf registers API This patch implements the functions required for the perf registers API, allowing the perf tool to interface kernel register dumps with libunwind in order to provide userspace backtracing. Cc: Jean Pihet Signed-off-by: Will Deacon --- arch/arm/Kconfig | 2 ++ arch/arm/include/uapi/asm/Kbuild | 1 + arch/arm/include/uapi/asm/perf_regs.h | 23 +++++++++++++++++++++++ arch/arm/kernel/Makefile | 1 + arch/arm/kernel/perf_regs.c | 30 ++++++++++++++++++++++++++++++ 5 files changed, 57 insertions(+) create mode 100644 arch/arm/include/uapi/asm/perf_regs.h create mode 100644 arch/arm/kernel/perf_regs.c (limited to 'arch/arm/include') diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 1ad6fb6c094d..899d0c6ec2c5 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -51,6 +51,8 @@ config ARM select HAVE_MOD_ARCH_SPECIFIC if ARM_UNWIND select HAVE_OPROFILE if (HAVE_PERF_EVENTS) select HAVE_PERF_EVENTS + select HAVE_PERF_REGS + select HAVE_PERF_USER_STACK_DUMP select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_SYSCALL_TRACEPOINTS select HAVE_UID16 diff --git a/arch/arm/include/uapi/asm/Kbuild b/arch/arm/include/uapi/asm/Kbuild index 18d76fd5a2af..70a1c9da30ca 100644 --- a/arch/arm/include/uapi/asm/Kbuild +++ b/arch/arm/include/uapi/asm/Kbuild @@ -7,6 +7,7 @@ header-y += hwcap.h header-y += ioctls.h header-y += kvm_para.h header-y += mman.h +header-y += perf_regs.h header-y += posix_types.h header-y += ptrace.h header-y += setup.h diff --git a/arch/arm/include/uapi/asm/perf_regs.h b/arch/arm/include/uapi/asm/perf_regs.h new file mode 100644 index 000000000000..ce59448458b2 --- /dev/null +++ b/arch/arm/include/uapi/asm/perf_regs.h @@ -0,0 +1,23 @@ +#ifndef _ASM_ARM_PERF_REGS_H +#define _ASM_ARM_PERF_REGS_H + +enum perf_event_arm_regs { + PERF_REG_ARM_R0, + PERF_REG_ARM_R1, + PERF_REG_ARM_R2, + PERF_REG_ARM_R3, + PERF_REG_ARM_R4, + PERF_REG_ARM_R5, + PERF_REG_ARM_R6, + PERF_REG_ARM_R7, + PERF_REG_ARM_R8, + PERF_REG_ARM_R9, + PERF_REG_ARM_R10, + PERF_REG_ARM_FP, + PERF_REG_ARM_IP, + PERF_REG_ARM_SP, + PERF_REG_ARM_LR, + PERF_REG_ARM_PC, + PERF_REG_ARM_MAX, +}; +#endif /* _ASM_ARM_PERF_REGS_H */ diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile index 5140df5f23aa..9b818ca3610b 100644 --- a/arch/arm/kernel/Makefile +++ b/arch/arm/kernel/Makefile @@ -78,6 +78,7 @@ obj-$(CONFIG_CPU_XSC3) += xscale-cp0.o obj-$(CONFIG_CPU_MOHAWK) += xscale-cp0.o obj-$(CONFIG_CPU_PJ4) += pj4-cp0.o obj-$(CONFIG_IWMMXT) += iwmmxt.o +obj-$(CONFIG_PERF_EVENTS) += perf_regs.o obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o perf_event_cpu.o AFLAGS_iwmmxt.o := -Wa,-mcpu=iwmmxt obj-$(CONFIG_ARM_CPU_TOPOLOGY) += topology.o diff --git a/arch/arm/kernel/perf_regs.c b/arch/arm/kernel/perf_regs.c new file mode 100644 index 000000000000..6e4379c67cbc --- /dev/null +++ b/arch/arm/kernel/perf_regs.c @@ -0,0 +1,30 @@ + +#include +#include +#include +#include +#include +#include + +u64 perf_reg_value(struct pt_regs *regs, int idx) +{ + if (WARN_ON_ONCE((u32)idx >= PERF_REG_ARM_MAX)) + return 0; + + return regs->uregs[idx]; +} + +#define REG_RESERVED (~((1ULL << PERF_REG_ARM_MAX) - 1)) + +int perf_reg_validate(u64 mask) +{ + if (!mask || mask & REG_RESERVED) + return -EINVAL; + + return 0; +} + +u64 perf_reg_abi(struct task_struct *task) +{ + return PERF_SAMPLE_REGS_ABI_32; +} -- cgit v1.2.3 From e744dff72bcd4d9588af91e1feb662702222ca12 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Wed, 26 Jun 2013 16:47:35 +0100 Subject: ARM: prefetch: remove redundant "cc" clobber The pld instruction does not affect the condition flags, so don't bother clobbering them. Acked-by: Nicolas Pitre Signed-off-by: Will Deacon --- arch/arm/include/asm/processor.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h index 413f3876341c..514a989cbd4b 100644 --- a/arch/arm/include/asm/processor.h +++ b/arch/arm/include/asm/processor.h @@ -97,9 +97,7 @@ static inline void prefetch(const void *ptr) { __asm__ __volatile__( "pld\t%a0" - : - : "p" (ptr) - : "cc"); + :: "p" (ptr)); } #define ARCH_HAS_PREFETCHW -- cgit v1.2.3 From 27a84793e42084392181ef2ef51a954f1cf0c519 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Tue, 2 Jul 2013 12:10:42 +0100 Subject: ARM: smp_on_up: move inline asm ALT_SMP patching macro out of spinlock.h Patching UP/SMP alternatives inside inline assembly blocks is useful outside of the spinlock implementation, where it is used for sev and wfe. This patch lifts the macro into processor.h and gives it a scarier name to (a) avoid conflicts in the global namespace and (b) to try and deter its usage unless you "know what you're doing". The W macro for generating wide instructions when targetting Thumb-2 is also made available under the name WASM, to reduce the potential for conflicts with other headers. Acked-by: Nicolas Pitre Signed-off-by: Will Deacon --- arch/arm/include/asm/processor.h | 12 ++++++++++++ arch/arm/include/asm/spinlock.h | 15 ++++----------- arch/arm/include/asm/unified.h | 4 ++++ 3 files changed, 20 insertions(+), 11 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h index 514a989cbd4b..26164c92fa30 100644 --- a/arch/arm/include/asm/processor.h +++ b/arch/arm/include/asm/processor.h @@ -22,6 +22,7 @@ #include #include #include +#include #ifdef __KERNEL__ #define STACK_TOP ((current->personality & ADDR_LIMIT_32BIT) ? \ @@ -87,6 +88,17 @@ unsigned long get_wchan(struct task_struct *p); #define KSTK_EIP(tsk) task_pt_regs(tsk)->ARM_pc #define KSTK_ESP(tsk) task_pt_regs(tsk)->ARM_sp +#ifdef CONFIG_SMP +#define __ALT_SMP_ASM(smp, up) \ + "9998: " smp "\n" \ + " .pushsection \".alt.smp.init\", \"a\"\n" \ + " .long 9998b\n" \ + " " up "\n" \ + " .popsection\n" +#else +#define __ALT_SMP_ASM(smp, up) up +#endif + /* * Prefetching support - only ARMv5. */ diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h index 4f2c28060c9a..e1ce45230913 100644 --- a/arch/arm/include/asm/spinlock.h +++ b/arch/arm/include/asm/spinlock.h @@ -11,15 +11,7 @@ * sev and wfe are ARMv6K extensions. Uniprocessor ARMv6 may not have the K * extensions, so when running on UP, we have to patch these instructions away. */ -#define ALT_SMP(smp, up) \ - "9998: " smp "\n" \ - " .pushsection \".alt.smp.init\", \"a\"\n" \ - " .long 9998b\n" \ - " " up "\n" \ - " .popsection\n" - #ifdef CONFIG_THUMB2_KERNEL -#define SEV ALT_SMP("sev.w", "nop.w") /* * For Thumb-2, special care is needed to ensure that the conditional WFE * instruction really does assemble to exactly 4 bytes (as required by @@ -31,17 +23,18 @@ * the assembler won't change IT instructions which are explicitly present * in the input. */ -#define WFE(cond) ALT_SMP( \ +#define WFE(cond) __ALT_SMP_ASM( \ "it " cond "\n\t" \ "wfe" cond ".n", \ \ "nop.w" \ ) #else -#define SEV ALT_SMP("sev", "nop") -#define WFE(cond) ALT_SMP("wfe" cond, "nop") +#define WFE(cond) __ALT_SMP_ASM("wfe" cond, "nop") #endif +#define SEV __ALT_SMP_ASM(WASM(sev), WASM(nop)) + static inline void dsb_sev(void) { #if __LINUX_ARM_ARCH__ >= 7 diff --git a/arch/arm/include/asm/unified.h b/arch/arm/include/asm/unified.h index f5989f46b4d2..b88beaba6b4a 100644 --- a/arch/arm/include/asm/unified.h +++ b/arch/arm/include/asm/unified.h @@ -38,6 +38,8 @@ #ifdef __ASSEMBLY__ #define W(instr) instr.w #define BSYM(sym) sym + 1 +#else +#define WASM(instr) #instr ".w" #endif #else /* !CONFIG_THUMB2_KERNEL */ @@ -50,6 +52,8 @@ #ifdef __ASSEMBLY__ #define W(instr) instr #define BSYM(sym) sym +#else +#define WASM(instr) #instr #endif #endif /* CONFIG_THUMB2_KERNEL */ -- cgit v1.2.3 From d8f57aa4bc5860df68d4c332d2a89c131417ee7b Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Wed, 26 Jun 2013 17:03:40 +0100 Subject: ARM: prefetch: add support for prefetchw using pldw on SMP ARMv7+ CPUs SMP ARMv7 CPUs implement the pldw instruction, which allows them to prefetch data cachelines in an exclusive state. This patch defines the prefetchw macro using pldw for CPUs that support it. Acked-by: Nicolas Pitre Signed-off-by: Will Deacon --- arch/arm/include/asm/processor.h | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h index 26164c92fa30..c3d5fc124a05 100644 --- a/arch/arm/include/asm/processor.h +++ b/arch/arm/include/asm/processor.h @@ -112,12 +112,19 @@ static inline void prefetch(const void *ptr) :: "p" (ptr)); } +#if __LINUX_ARM_ARCH__ >= 7 && defined(CONFIG_SMP) #define ARCH_HAS_PREFETCHW -#define prefetchw(ptr) prefetch(ptr) - -#define ARCH_HAS_SPINLOCK_PREFETCH -#define spin_lock_prefetch(x) do { } while (0) - +static inline void prefetchw(const void *ptr) +{ + __asm__ __volatile__( + ".arch_extension mp\n" + __ALT_SMP_ASM( + WASM(pldw) "\t%a0", + WASM(pld) "\t%a0" + ) + :: "p" (ptr)); +} +#endif #endif #define HAVE_ARCH_PICK_MMAP_LAYOUT -- cgit v1.2.3 From 9bb17be062de6f5a9c9643258951aa0935652ec3 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Tue, 2 Jul 2013 14:54:33 +0100 Subject: ARM: locks: prefetch the destination word for write prior to strex The cost of changing a cacheline from shared to exclusive state can be significant, especially when this is triggered by an exclusive store, since it may result in having to retry the transaction. This patch prefixes our {spin,read,write}_[try]lock implementations with pldw instructions (on CPUs which support them) to try and grab the line in exclusive state from the start. arch_rwlock_t is changed to avoid using a volatile member, since this generates compiler warnings when falling back on the __builtin_prefetch intrinsic which expects a const void * argument. Acked-by: Nicolas Pitre Signed-off-by: Will Deacon --- arch/arm/include/asm/spinlock.h | 13 ++++++++++--- arch/arm/include/asm/spinlock_types.h | 2 +- 2 files changed, 11 insertions(+), 4 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/spinlock.h b/arch/arm/include/asm/spinlock.h index e1ce45230913..499900781d59 100644 --- a/arch/arm/include/asm/spinlock.h +++ b/arch/arm/include/asm/spinlock.h @@ -5,7 +5,7 @@ #error SMP not supported on pre-ARMv6 CPUs #endif -#include +#include /* * sev and wfe are ARMv6K extensions. Uniprocessor ARMv6 may not have the K @@ -70,6 +70,7 @@ static inline void arch_spin_lock(arch_spinlock_t *lock) u32 newval; arch_spinlock_t lockval; + prefetchw(&lock->slock); __asm__ __volatile__( "1: ldrex %0, [%3]\n" " add %1, %0, %4\n" @@ -93,6 +94,7 @@ static inline int arch_spin_trylock(arch_spinlock_t *lock) unsigned long contended, res; u32 slock; + prefetchw(&lock->slock); do { __asm__ __volatile__( " ldrex %0, [%3]\n" @@ -145,6 +147,7 @@ static inline void arch_write_lock(arch_rwlock_t *rw) { unsigned long tmp; + prefetchw(&rw->lock); __asm__ __volatile__( "1: ldrex %0, [%1]\n" " teq %0, #0\n" @@ -163,6 +166,7 @@ static inline int arch_write_trylock(arch_rwlock_t *rw) { unsigned long contended, res; + prefetchw(&rw->lock); do { __asm__ __volatile__( " ldrex %0, [%2]\n" @@ -196,7 +200,7 @@ static inline void arch_write_unlock(arch_rwlock_t *rw) } /* write_can_lock - would write_trylock() succeed? */ -#define arch_write_can_lock(x) ((x)->lock == 0) +#define arch_write_can_lock(x) (ACCESS_ONCE((x)->lock) == 0) /* * Read locks are a bit more hairy: @@ -214,6 +218,7 @@ static inline void arch_read_lock(arch_rwlock_t *rw) { unsigned long tmp, tmp2; + prefetchw(&rw->lock); __asm__ __volatile__( "1: ldrex %0, [%2]\n" " adds %0, %0, #1\n" @@ -234,6 +239,7 @@ static inline void arch_read_unlock(arch_rwlock_t *rw) smp_mb(); + prefetchw(&rw->lock); __asm__ __volatile__( "1: ldrex %0, [%2]\n" " sub %0, %0, #1\n" @@ -252,6 +258,7 @@ static inline int arch_read_trylock(arch_rwlock_t *rw) { unsigned long contended, res; + prefetchw(&rw->lock); do { __asm__ __volatile__( " ldrex %0, [%2]\n" @@ -273,7 +280,7 @@ static inline int arch_read_trylock(arch_rwlock_t *rw) } /* read_can_lock - would read_trylock() succeed? */ -#define arch_read_can_lock(x) ((x)->lock < 0x80000000) +#define arch_read_can_lock(x) (ACCESS_ONCE((x)->lock) < 0x80000000) #define arch_read_lock_flags(lock, flags) arch_read_lock(lock) #define arch_write_lock_flags(lock, flags) arch_write_lock(lock) diff --git a/arch/arm/include/asm/spinlock_types.h b/arch/arm/include/asm/spinlock_types.h index b262d2f8b478..47663fcb10ad 100644 --- a/arch/arm/include/asm/spinlock_types.h +++ b/arch/arm/include/asm/spinlock_types.h @@ -25,7 +25,7 @@ typedef struct { #define __ARCH_SPIN_LOCK_UNLOCKED { { 0 } } typedef struct { - volatile unsigned int lock; + u32 lock; } arch_rwlock_t; #define __ARCH_RW_LOCK_UNLOCKED { 0 } -- cgit v1.2.3 From f38d999c4d16fc0fce4270374f15fbb2d8713c09 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Thu, 4 Jul 2013 11:43:18 +0100 Subject: ARM: atomics: prefetch the destination word for write prior to strex The cost of changing a cacheline from shared to exclusive state can be significant, especially when this is triggered by an exclusive store, since it may result in having to retry the transaction. This patch prefixes our atomic access implementations with pldw instructions (on CPUs which support them) to try and grab the line in exclusive state from the start. Only the barrier-less functions are updated, since memory barriers can limit the usefulness of prefetching data. Acked-by: Nicolas Pitre Signed-off-by: Will Deacon --- arch/arm/include/asm/atomic.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h index da1c77d39327..55ffc3b850f4 100644 --- a/arch/arm/include/asm/atomic.h +++ b/arch/arm/include/asm/atomic.h @@ -12,6 +12,7 @@ #define __ASM_ARM_ATOMIC_H #include +#include #include #include #include @@ -41,6 +42,7 @@ static inline void atomic_add(int i, atomic_t *v) unsigned long tmp; int result; + prefetchw(&v->counter); __asm__ __volatile__("@ atomic_add\n" "1: ldrex %0, [%3]\n" " add %0, %0, %4\n" @@ -79,6 +81,7 @@ static inline void atomic_sub(int i, atomic_t *v) unsigned long tmp; int result; + prefetchw(&v->counter); __asm__ __volatile__("@ atomic_sub\n" "1: ldrex %0, [%3]\n" " sub %0, %0, %4\n" @@ -138,6 +141,7 @@ static inline void atomic_clear_mask(unsigned long mask, unsigned long *addr) { unsigned long tmp, tmp2; + prefetchw(addr); __asm__ __volatile__("@ atomic_clear_mask\n" "1: ldrex %0, [%3]\n" " bic %0, %0, %4\n" @@ -283,6 +287,7 @@ static inline void atomic64_set(atomic64_t *v, u64 i) { u64 tmp; + prefetchw(&v->counter); __asm__ __volatile__("@ atomic64_set\n" "1: ldrexd %0, %H0, [%2]\n" " strexd %0, %3, %H3, [%2]\n" @@ -299,6 +304,7 @@ static inline void atomic64_add(u64 i, atomic64_t *v) u64 result; unsigned long tmp; + prefetchw(&v->counter); __asm__ __volatile__("@ atomic64_add\n" "1: ldrexd %0, %H0, [%3]\n" " adds %0, %0, %4\n" @@ -339,6 +345,7 @@ static inline void atomic64_sub(u64 i, atomic64_t *v) u64 result; unsigned long tmp; + prefetchw(&v->counter); __asm__ __volatile__("@ atomic64_sub\n" "1: ldrexd %0, %H0, [%3]\n" " subs %0, %0, %4\n" -- cgit v1.2.3 From cf154b7e22e9f6714f0a806c023c329b890132a2 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Fri, 20 Sep 2013 09:57:37 +0200 Subject: ARM: pull in from asm-generic Signed-off-by: Ard Biesheuvel --- arch/arm/include/asm/Kbuild | 1 + 1 file changed, 1 insertion(+) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index d3db39860b9c..6577b8aeb711 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -24,6 +24,7 @@ generic-y += sembuf.h generic-y += serial.h generic-y += shmbuf.h generic-y += siginfo.h +generic-y += simd.h generic-y += sizes.h generic-y += socket.h generic-y += sockios.h -- cgit v1.2.3 From ca5a45c06cd4764fb8510740f7fc550d9a0208d4 Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar Date: Wed, 31 Jul 2013 12:44:41 -0400 Subject: ARM: mm: use phys_addr_t appropriately in p2v and v2p conversions Fix remainder types used when converting back and forth between physical and virtual addresses. Cc: Russell King Acked-by: Nicolas Pitre Signed-off-by: Santosh Shilimkar --- arch/arm/include/asm/memory.h | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h index e750a938fd3c..c133bd915f48 100644 --- a/arch/arm/include/asm/memory.h +++ b/arch/arm/include/asm/memory.h @@ -185,22 +185,32 @@ extern unsigned long __pv_phys_offset; : "=r" (to) \ : "r" (from), "I" (type)) -static inline unsigned long __virt_to_phys(unsigned long x) +static inline phys_addr_t __virt_to_phys(unsigned long x) { unsigned long t; __pv_stub(x, t, "add", __PV_BITS_31_24); return t; } -static inline unsigned long __phys_to_virt(unsigned long x) +static inline unsigned long __phys_to_virt(phys_addr_t x) { unsigned long t; __pv_stub(x, t, "sub", __PV_BITS_31_24); return t; } + #else -#define __virt_to_phys(x) ((x) - PAGE_OFFSET + PHYS_OFFSET) -#define __phys_to_virt(x) ((x) - PHYS_OFFSET + PAGE_OFFSET) + +static inline phys_addr_t __virt_to_phys(unsigned long x) +{ + return (phys_addr_t)x - PAGE_OFFSET + PHYS_OFFSET; +} + +static inline unsigned long __phys_to_virt(phys_addr_t x) +{ + return x - PHYS_OFFSET + PAGE_OFFSET; +} + #endif #endif #endif /* __ASSEMBLY__ */ @@ -238,14 +248,14 @@ static inline phys_addr_t virt_to_phys(const volatile void *x) static inline void *phys_to_virt(phys_addr_t x) { - return (void *)(__phys_to_virt((unsigned long)(x))); + return (void *)__phys_to_virt(x); } /* * Drivers should NOT use these either. */ #define __pa(x) __virt_to_phys((unsigned long)(x)) -#define __va(x) ((void *)__phys_to_virt((unsigned long)(x))) +#define __va(x) ((void *)__phys_to_virt((phys_addr_t)(x))) #define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) /* -- cgit v1.2.3 From 4dc9a81715973cb137a14399420bb35b0ed7d6ef Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar Date: Wed, 31 Jul 2013 12:44:42 -0400 Subject: ARM: mm: Introduce virt_to_idmap() with an arch hook On some PAE systems (e.g. TI Keystone), memory is above the 32-bit addressable limit, and the interconnect provides an aliased view of parts of physical memory in the 32-bit addressable space. This alias is strictly for boot time usage, and is not otherwise usable because of coherency limitations. On such systems, the idmap mechanism needs to take this aliased mapping into account. This patch introduces virt_to_idmap() and a arch function pointer which can be populated by platform which needs it. Also populate necessary idmap spots with now available virt_to_idmap(). Avoided #ifdef approach to be compatible with multi-platform builds. Most architecture won't touch it and in that case virt_to_idmap() fall-back to existing virt_to_phys() macro. Cc: Russell King Acked-by: Nicolas Pitre Signed-off-by: Santosh Shilimkar --- arch/arm/include/asm/memory.h | 16 ++++++++++++++++ arch/arm/kernel/smp.c | 2 +- arch/arm/mm/idmap.c | 5 +++-- 3 files changed, 20 insertions(+), 3 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h index c133bd915f48..d9b96c65e594 100644 --- a/arch/arm/include/asm/memory.h +++ b/arch/arm/include/asm/memory.h @@ -173,6 +173,7 @@ */ #define __PV_BITS_31_24 0x81000000 +extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x); extern unsigned long __pv_phys_offset; #define PHYS_OFFSET __pv_phys_offset @@ -258,6 +259,21 @@ static inline void *phys_to_virt(phys_addr_t x) #define __va(x) ((void *)__phys_to_virt((phys_addr_t)(x))) #define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) +/* + * These are for systems that have a hardware interconnect supported alias of + * physical memory for idmap purposes. Most cases should leave these + * untouched. + */ +static inline phys_addr_t __virt_to_idmap(unsigned long x) +{ + if (arch_virt_to_idmap) + return arch_virt_to_idmap(x); + else + return __virt_to_phys(x); +} + +#define virt_to_idmap(x) __virt_to_idmap((unsigned long)(x)) + /* * Virtual <-> DMA view memory address translations * Again, these are *only* valid on the kernel direct mapped RAM diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 72024ea8a3a6..a0eb830c3daf 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -80,7 +80,7 @@ void __init smp_set_ops(struct smp_operations *ops) static unsigned long get_arch_pgd(pgd_t *pgd) { - phys_addr_t pgdir = virt_to_phys(pgd); + phys_addr_t pgdir = virt_to_idmap(pgd); BUG_ON(pgdir & ARCH_PGD_MASK); return pgdir >> ARCH_PGD_SHIFT; } diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c index 83cb3ac27095..c0a1e48f6733 100644 --- a/arch/arm/mm/idmap.c +++ b/arch/arm/mm/idmap.c @@ -10,6 +10,7 @@ #include pgd_t *idmap_pgd; +phys_addr_t (*arch_virt_to_idmap) (unsigned long x); #ifdef CONFIG_ARM_LPAE static void idmap_add_pmd(pud_t *pud, unsigned long addr, unsigned long end, @@ -67,8 +68,8 @@ static void identity_mapping_add(pgd_t *pgd, const char *text_start, unsigned long addr, end; unsigned long next; - addr = virt_to_phys(text_start); - end = virt_to_phys(text_end); + addr = virt_to_idmap(text_start); + end = virt_to_idmap(text_end); prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF; -- cgit v1.2.3 From f52bb722547f43caeaecbcc62db9f3c3b80ead9b Mon Sep 17 00:00:00 2001 From: Sricharan R Date: Mon, 29 Jul 2013 20:26:22 +0530 Subject: ARM: mm: Correct virt_to_phys patching for 64 bit physical addresses The current phys_to_virt patching mechanism works only for 32 bit physical addresses and this patch extends the idea for 64bit physical addresses. The 64bit v2p patching mechanism patches the higher 8 bits of physical address with a constant using 'mov' instruction and lower 32bits are patched using 'add'. While this is correct, in those platforms where the lowmem addressable physical memory spawns across 4GB boundary, a carry bit can be produced as a result of addition of lower 32bits. This has to be taken in to account and added in to the upper. The patched __pv_offset and va are added in lower 32bits, where __pv_offset can be in two's complement form when PA_START < VA_START and that can result in a false carry bit. e.g 1) PA = 0x80000000; VA = 0xC0000000 __pv_offset = PA - VA = 0xC0000000 (2's complement) 2) PA = 0x2 80000000; VA = 0xC000000 __pv_offset = PA - VA = 0x1 C0000000 So adding __pv_offset + VA should never result in a true overflow for (1). So in order to differentiate between a true carry, a __pv_offset is extended to 64bit and the upper 32bits will have 0xffffffff if __pv_offset is 2's complement. So 'mvn #0' is inserted instead of 'mov' while patching for the same reason. Since mov, add, sub instruction are to patched with different constants inside the same stub, the rotation field of the opcode is using to differentiate between them. So the above examples for v2p translation becomes for VA=0xC0000000, 1) PA[63:32] = 0xffffffff PA[31:0] = VA + 0xC0000000 --> results in a carry PA[63:32] = PA[63:32] + carry PA[63:0] = 0x0 80000000 2) PA[63:32] = 0x1 PA[31:0] = VA + 0xC0000000 --> results in a carry PA[63:32] = PA[63:32] + carry PA[63:0] = 0x2 80000000 The above ideas were suggested by Nicolas Pitre as part of the review of first and second versions of the subject patch. There is no corresponding change on the phys_to_virt() side, because computations on the upper 32-bits would be discarded anyway. Cc: Russell King Reviewed-by: Nicolas Pitre Signed-off-by: Sricharan R Signed-off-by: Santosh Shilimkar --- arch/arm/include/asm/memory.h | 37 ++++++++++++++++++++++--- arch/arm/kernel/armksyms.c | 1 + arch/arm/kernel/head.S | 63 ++++++++++++++++++++++++++++++++----------- 3 files changed, 82 insertions(+), 19 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h index d9b96c65e594..6748d6295a1a 100644 --- a/arch/arm/include/asm/memory.h +++ b/arch/arm/include/asm/memory.h @@ -172,9 +172,14 @@ * so that all we need to do is modify the 8-bit constant field. */ #define __PV_BITS_31_24 0x81000000 +#define __PV_BITS_7_0 0x81 extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x); -extern unsigned long __pv_phys_offset; +extern u64 __pv_phys_offset; +extern u64 __pv_offset; +extern void fixup_pv_table(const void *, unsigned long); +extern const void *__pv_table_begin, *__pv_table_end; + #define PHYS_OFFSET __pv_phys_offset #define __pv_stub(from,to,instr,type) \ @@ -186,10 +191,36 @@ extern unsigned long __pv_phys_offset; : "=r" (to) \ : "r" (from), "I" (type)) +#define __pv_stub_mov_hi(t) \ + __asm__ volatile("@ __pv_stub_mov\n" \ + "1: mov %R0, %1\n" \ + " .pushsection .pv_table,\"a\"\n" \ + " .long 1b\n" \ + " .popsection\n" \ + : "=r" (t) \ + : "I" (__PV_BITS_7_0)) + +#define __pv_add_carry_stub(x, y) \ + __asm__ volatile("@ __pv_add_carry_stub\n" \ + "1: adds %Q0, %1, %2\n" \ + " adc %R0, %R0, #0\n" \ + " .pushsection .pv_table,\"a\"\n" \ + " .long 1b\n" \ + " .popsection\n" \ + : "+r" (y) \ + : "r" (x), "I" (__PV_BITS_31_24) \ + : "cc") + static inline phys_addr_t __virt_to_phys(unsigned long x) { - unsigned long t; - __pv_stub(x, t, "add", __PV_BITS_31_24); + phys_addr_t t; + + if (sizeof(phys_addr_t) == 4) { + __pv_stub(x, t, "add", __PV_BITS_31_24); + } else { + __pv_stub_mov_hi(t); + __pv_add_carry_stub(x, t); + } return t; } diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c index 60d3b738d420..1f031ddd0667 100644 --- a/arch/arm/kernel/armksyms.c +++ b/arch/arm/kernel/armksyms.c @@ -155,4 +155,5 @@ EXPORT_SYMBOL(__gnu_mcount_nc); #ifdef CONFIG_ARM_PATCH_PHYS_VIRT EXPORT_SYMBOL(__pv_phys_offset); +EXPORT_SYMBOL(__pv_offset); #endif diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index 2c7cc1e03473..54547947a4e9 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -536,6 +536,14 @@ ENTRY(fixup_smp) ldmfd sp!, {r4 - r6, pc} ENDPROC(fixup_smp) +#ifdef __ARMEB_ +#define LOW_OFFSET 0x4 +#define HIGH_OFFSET 0x0 +#else +#define LOW_OFFSET 0x0 +#define HIGH_OFFSET 0x4 +#endif + #ifdef CONFIG_ARM_PATCH_PHYS_VIRT /* __fixup_pv_table - patch the stub instructions with the delta between @@ -546,17 +554,20 @@ ENDPROC(fixup_smp) __HEAD __fixup_pv_table: adr r0, 1f - ldmia r0, {r3-r5, r7} - sub r3, r0, r3 @ PHYS_OFFSET - PAGE_OFFSET + ldmia r0, {r3-r7} + mvn ip, #0 + subs r3, r0, r3 @ PHYS_OFFSET - PAGE_OFFSET add r4, r4, r3 @ adjust table start address add r5, r5, r3 @ adjust table end address - add r7, r7, r3 @ adjust __pv_phys_offset address - str r8, [r7] @ save computed PHYS_OFFSET to __pv_phys_offset + add r6, r6, r3 @ adjust __pv_phys_offset address + add r7, r7, r3 @ adjust __pv_offset address + str r8, [r6, #LOW_OFFSET] @ save computed PHYS_OFFSET to __pv_phys_offset + strcc ip, [r7, #HIGH_OFFSET] @ save to __pv_offset high bits mov r6, r3, lsr #24 @ constant for add/sub instructions teq r3, r6, lsl #24 @ must be 16MiB aligned THUMB( it ne @ cross section branch ) bne __error - str r6, [r7, #4] @ save to __pv_offset + str r3, [r7, #LOW_OFFSET] @ save to __pv_offset low bits b __fixup_a_pv_table ENDPROC(__fixup_pv_table) @@ -565,10 +576,19 @@ ENDPROC(__fixup_pv_table) .long __pv_table_begin .long __pv_table_end 2: .long __pv_phys_offset + .long __pv_offset .text __fixup_a_pv_table: + adr r0, 3f + ldr r6, [r0] + add r6, r6, r3 + ldr r0, [r6, #HIGH_OFFSET] @ pv_offset high word + ldr r6, [r6, #LOW_OFFSET] @ pv_offset low word + mov r6, r6, lsr #24 + cmn r0, #1 #ifdef CONFIG_THUMB2_KERNEL + moveq r0, #0x200000 @ set bit 21, mov to mvn instruction lsls r6, #24 beq 2f clz r7, r6 @@ -582,18 +602,28 @@ __fixup_a_pv_table: b 2f 1: add r7, r3 ldrh ip, [r7, #2] - and ip, 0x8f00 - orr ip, r6 @ mask in offset bits 31-24 + tst ip, #0x4000 + and ip, #0x8f00 + orrne ip, r6 @ mask in offset bits 31-24 + orreq ip, r0 @ mask in offset bits 7-0 strh ip, [r7, #2] + ldrheq ip, [r7] + biceq ip, #0x20 + orreq ip, ip, r0, lsr #16 + strheq ip, [r7] 2: cmp r4, r5 ldrcc r7, [r4], #4 @ use branch for delay slot bcc 1b bx lr #else + moveq r0, #0x400000 @ set bit 22, mov to mvn instruction b 2f 1: ldr ip, [r7, r3] bic ip, ip, #0x000000ff - orr ip, ip, r6 @ mask in offset bits 31-24 + tst ip, #0xf00 @ check the rotation field + orrne ip, ip, r6 @ mask in offset bits 31-24 + biceq ip, ip, #0x400000 @ clear bit 22 + orreq ip, ip, r0 @ mask in offset bits 7-0 str ip, [r7, r3] 2: cmp r4, r5 ldrcc r7, [r4], #4 @ use branch for delay slot @@ -602,28 +632,29 @@ __fixup_a_pv_table: #endif ENDPROC(__fixup_a_pv_table) +3: .long __pv_offset + ENTRY(fixup_pv_table) stmfd sp!, {r4 - r7, lr} - ldr r2, 2f @ get address of __pv_phys_offset mov r3, #0 @ no offset mov r4, r0 @ r0 = table start add r5, r0, r1 @ r1 = table size - ldr r6, [r2, #4] @ get __pv_offset bl __fixup_a_pv_table ldmfd sp!, {r4 - r7, pc} ENDPROC(fixup_pv_table) - .align -2: .long __pv_phys_offset - .data .globl __pv_phys_offset .type __pv_phys_offset, %object __pv_phys_offset: - .long 0 - .size __pv_phys_offset, . - __pv_phys_offset + .quad 0 + .size __pv_phys_offset, . -__pv_phys_offset + + .globl __pv_offset + .type __pv_offset, %object __pv_offset: - .long 0 + .quad 0 + .size __pv_offset, . -__pv_offset #endif #include "head-common.S" -- cgit v1.2.3 From a77e0c7b2774fd52ce6bf25c2c3ffdccb7b110ff Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar Date: Wed, 31 Jul 2013 12:44:46 -0400 Subject: ARM: mm: Recreate kernel mappings in early_paging_init() This patch adds a step in the init sequence, in order to recreate the kernel code/data page table mappings prior to full paging initialization. This is necessary on LPAE systems that run out of a physical address space outside the 4G limit. On these systems, this implementation provides a machine descriptor hook that allows the PHYS_OFFSET to be overridden in a machine specific fashion. Cc: Russell King Acked-by: Nicolas Pitre Signed-off-by: R Sricharan Signed-off-by: Santosh Shilimkar --- arch/arm/include/asm/mach/arch.h | 1 + arch/arm/kernel/setup.c | 4 ++ arch/arm/mm/mmu.c | 82 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 87 insertions(+) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h index 402a2bc6aa68..17a3fa2979e8 100644 --- a/arch/arm/include/asm/mach/arch.h +++ b/arch/arm/include/asm/mach/arch.h @@ -49,6 +49,7 @@ struct machine_desc { bool (*smp_init)(void); void (*fixup)(struct tag *, char **, struct meminfo *); + void (*init_meminfo)(void); void (*reserve)(void);/* reserve mem blocks */ void (*map_io)(void);/* IO mapping function */ void (*init_early)(void); diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 0e1e2b3afa45..af7b7db4699e 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -73,6 +73,8 @@ __setup("fpe=", fpe_setup); #endif extern void paging_init(const struct machine_desc *desc); +extern void early_paging_init(const struct machine_desc *, + struct proc_info_list *); extern void sanity_check_meminfo(void); extern enum reboot_mode reboot_mode; extern void setup_dma_zone(const struct machine_desc *desc); @@ -878,6 +880,8 @@ void __init setup_arch(char **cmdline_p) parse_early_param(); sort(&meminfo.bank, meminfo.nr_banks, sizeof(meminfo.bank[0]), meminfo_cmp, NULL); + + early_paging_init(mdesc, lookup_processor_type(read_cpuid_id())); sanity_check_meminfo(); arm_memblock_init(&meminfo, mdesc); diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index b1d17eeb59b8..78eeeca78f5a 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -28,6 +28,8 @@ #include #include #include +#include +#include #include #include @@ -1315,6 +1317,86 @@ static void __init map_lowmem(void) } } +#ifdef CONFIG_ARM_LPAE +/* + * early_paging_init() recreates boot time page table setup, allowing machines + * to switch over to a high (>4G) address space on LPAE systems + */ +void __init early_paging_init(const struct machine_desc *mdesc, + struct proc_info_list *procinfo) +{ + pmdval_t pmdprot = procinfo->__cpu_mm_mmu_flags; + unsigned long map_start, map_end; + pgd_t *pgd0, *pgdk; + pud_t *pud0, *pudk, *pud_start; + pmd_t *pmd0, *pmdk; + phys_addr_t phys; + int i; + + if (!(mdesc->init_meminfo)) + return; + + /* remap kernel code and data */ + map_start = init_mm.start_code; + map_end = init_mm.brk; + + /* get a handle on things... */ + pgd0 = pgd_offset_k(0); + pud_start = pud0 = pud_offset(pgd0, 0); + pmd0 = pmd_offset(pud0, 0); + + pgdk = pgd_offset_k(map_start); + pudk = pud_offset(pgdk, map_start); + pmdk = pmd_offset(pudk, map_start); + + mdesc->init_meminfo(); + + /* Run the patch stub to update the constants */ + fixup_pv_table(&__pv_table_begin, + (&__pv_table_end - &__pv_table_begin) << 2); + + /* + * Cache cleaning operations for self-modifying code + * We should clean the entries by MVA but running a + * for loop over every pv_table entry pointer would + * just complicate the code. + */ + flush_cache_louis(); + dsb(); + isb(); + + /* remap level 1 table */ + for (i = 0; i < PTRS_PER_PGD; pud0++, i++) { + set_pud(pud0, + __pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER)); + pmd0 += PTRS_PER_PMD; + } + + /* remap pmds for kernel mapping */ + phys = __pa(map_start) & PMD_MASK; + do { + *pmdk++ = __pmd(phys | pmdprot); + phys += PMD_SIZE; + } while (phys < map_end); + + flush_cache_all(); + cpu_switch_mm(pgd0, &init_mm); + cpu_set_ttbr(1, __pa(pgd0) + TTBR1_OFFSET); + local_flush_bp_all(); + local_flush_tlb_all(); +} + +#else + +void __init early_paging_init(const struct machine_desc *mdesc, + struct proc_info_list *procinfo) +{ + if (mdesc->init_meminfo) + mdesc->init_meminfo(); +} + +#endif + /* * paging_init() sets up the page tables, initialises the zone memory * maps, and sets up the zero page, bad page and bad page tables. -- cgit v1.2.3 From 457c2403c513c74f60d5757fd11ae927e5554a38 Mon Sep 17 00:00:00 2001 From: Ben Dooks Date: Tue, 12 Feb 2013 18:59:57 +0000 Subject: ARM: asm: Add ARM_BE8() assembly helper Add ARM_BE8() helper to wrap any code conditional on being compile when CONFIG_ARM_ENDIAN_BE8 is selected and convert existing places where this is to use it. Acked-by: Nicolas Pitre Reviewed-by: Will Deacon Signed-off-by: Ben Dooks --- arch/arm/boot/compressed/head.S | 8 ++------ arch/arm/include/asm/assembler.h | 7 +++++++ arch/arm/kernel/entry-armv.S | 5 ++--- arch/arm/kernel/entry-common.S | 4 +--- arch/arm/mm/abort-ev6.S | 5 ++--- arch/arm/mm/proc-v6.S | 4 +--- arch/arm/mm/proc-v7.S | 4 +--- 7 files changed, 16 insertions(+), 21 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S index 75189f13cf54..c912c2a95de8 100644 --- a/arch/arm/boot/compressed/head.S +++ b/arch/arm/boot/compressed/head.S @@ -699,9 +699,7 @@ __armv4_mmu_cache_on: mrc p15, 0, r0, c1, c0, 0 @ read control reg orr r0, r0, #0x5000 @ I-cache enable, RR cache replacement orr r0, r0, #0x0030 -#ifdef CONFIG_CPU_ENDIAN_BE8 - orr r0, r0, #1 << 25 @ big-endian page tables -#endif + ARM_BE8( orr r0, r0, #1 << 25 ) @ big-endian page tables bl __common_mmu_cache_on mov r0, #0 mcr p15, 0, r0, c8, c7, 0 @ flush I,D TLBs @@ -728,9 +726,7 @@ __armv7_mmu_cache_on: orr r0, r0, #1 << 22 @ U (v6 unaligned access model) @ (needed for ARM1176) #ifdef CONFIG_MMU -#ifdef CONFIG_CPU_ENDIAN_BE8 - orr r0, r0, #1 << 25 @ big-endian page tables -#endif + ARM_BE8( orr r0, r0, #1 << 25 ) @ big-endian page tables mrcne p15, 0, r6, c2, c0, 2 @ read ttb control reg orrne r0, r0, #1 @ MMU enabled movne r1, #0xfffffffd @ domain 0 = client diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index fcc1b5bf6979..5c2285160575 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -53,6 +53,13 @@ #define put_byte_3 lsl #0 #endif +/* Select code for any configuration running in BE8 mode */ +#ifdef CONFIG_CPU_ENDIAN_BE8 +#define ARM_BE8(code...) code +#else +#define ARM_BE8(code...) +#endif + /* * Data preload for architectures that support it */ diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S index 9cbe70c8b0ef..55090fbb81a2 100644 --- a/arch/arm/kernel/entry-armv.S +++ b/arch/arm/kernel/entry-armv.S @@ -416,9 +416,8 @@ __und_usr: bne __und_usr_thumb sub r4, r2, #4 @ ARM instr at LR - 4 1: ldrt r0, [r4] -#ifdef CONFIG_CPU_ENDIAN_BE8 - rev r0, r0 @ little endian instruction -#endif + ARM_BE8(rev r0, r0) @ little endian instruction + @ r0 = 32-bit ARM instruction which caused the exception @ r2 = PC value for the following instruction (:= regs->ARM_pc) @ r4 = PC value for the faulting instruction diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S index bc6bd9683ba4..a2dcafdf1bc8 100644 --- a/arch/arm/kernel/entry-common.S +++ b/arch/arm/kernel/entry-common.S @@ -393,9 +393,7 @@ ENTRY(vector_swi) #else USER( ldr r10, [lr, #-4] ) @ get SWI instruction #endif -#ifdef CONFIG_CPU_ENDIAN_BE8 - rev r10, r10 @ little endian instruction -#endif + ARM_BE8(rev r10, r10) @ little endian instruction #elif defined(CONFIG_AEABI) diff --git a/arch/arm/mm/abort-ev6.S b/arch/arm/mm/abort-ev6.S index 80741992a9fc..3815a8262af0 100644 --- a/arch/arm/mm/abort-ev6.S +++ b/arch/arm/mm/abort-ev6.S @@ -38,9 +38,8 @@ ENTRY(v6_early_abort) bne do_DataAbort bic r1, r1, #1 << 11 @ clear bit 11 of FSR ldr r3, [r4] @ read aborted ARM instruction -#ifdef CONFIG_CPU_ENDIAN_BE8 - rev r3, r3 -#endif + ARM_BE8(rev r3, r3) + do_ldrd_abort tmp=ip, insn=r3 tst r3, #1 << 20 @ L = 0 -> write orreq r1, r1, #1 << 11 @ yes. diff --git a/arch/arm/mm/proc-v6.S b/arch/arm/mm/proc-v6.S index 1128064fddcb..45dc29f85d56 100644 --- a/arch/arm/mm/proc-v6.S +++ b/arch/arm/mm/proc-v6.S @@ -220,9 +220,7 @@ __v6_setup: #endif /* CONFIG_MMU */ adr r5, v6_crval ldmia r5, {r5, r6} -#ifdef CONFIG_CPU_ENDIAN_BE8 - orr r6, r6, #1 << 25 @ big-endian page tables -#endif + ARM_BE8(orr r6, r6, #1 << 25) @ big-endian page tables mrc p15, 0, r0, c1, c0, 0 @ read control register bic r0, r0, r5 @ clear bits them orr r0, r0, r6 @ set them diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S index c63d9bdee51e..60920f62fdf5 100644 --- a/arch/arm/mm/proc-v7.S +++ b/arch/arm/mm/proc-v7.S @@ -367,9 +367,7 @@ __v7_setup: #endif adr r5, v7_crval ldmia r5, {r5, r6} -#ifdef CONFIG_CPU_ENDIAN_BE8 - orr r6, r6, #1 << 25 @ big-endian page tables -#endif + ARM_BE8(orr r6, r6, #1 << 25) @ big-endian page tables #ifdef CONFIG_SWP_EMULATE orr r5, r5, #(1 << 10) @ set SW bit in "clear" bic r6, r6, #(1 << 10) @ clear it in "mmuset" -- cgit v1.2.3 From 76e3faf156fa95b6465e747d702b94faf67117fc Mon Sep 17 00:00:00 2001 From: Ben Dooks Date: Wed, 6 Feb 2013 18:25:36 +0000 Subject: ARM: pl01x debug code endian fix The PL01X debug code needs to take into account which endian mode the processor is running in. If it is big-endian, ensure the data is swapped appropriately. Note, we could do this slightly more efficiently if we have an macro to do the necessary swap for the bits used by test. Reviewed-by: Will Deacon Signed-off-by: Ben Dooks --- arch/arm/include/debug/pl01x.S | 2 ++ 1 file changed, 2 insertions(+) (limited to 'arch/arm/include') diff --git a/arch/arm/include/debug/pl01x.S b/arch/arm/include/debug/pl01x.S index 37c6895b87e6..92ef808a2337 100644 --- a/arch/arm/include/debug/pl01x.S +++ b/arch/arm/include/debug/pl01x.S @@ -25,12 +25,14 @@ .macro waituart,rd,rx 1001: ldr \rd, [\rx, #UART01x_FR] + ARM_BE8( rev \rd, \rd ) tst \rd, #UART01x_FR_TXFF bne 1001b .endm .macro busyuart,rd,rx 1001: ldr \rd, [\rx, #UART01x_FR] + ARM_BE8( rev \rd, \rd ) tst \rd, #UART01x_FR_BUSY bne 1001b .endm -- cgit v1.2.3 From bfdef3b32d2f36bf137c039de9a545cdfcfbafe2 Mon Sep 17 00:00:00 2001 From: Ben Dooks Date: Wed, 24 Jul 2013 16:09:57 +0100 Subject: ARM: hardware: fix endian-ness in The needs to take into account the endian-ness of the processor when reading and writing data, so change to using the readl/writel relaxed variants from the raw ones. Signed-off-by: Ben Dooks --- arch/arm/include/asm/hardware/coresight.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/hardware/coresight.h b/arch/arm/include/asm/hardware/coresight.h index 0cf7a6b842ff..ad774f37c47c 100644 --- a/arch/arm/include/asm/hardware/coresight.h +++ b/arch/arm/include/asm/hardware/coresight.h @@ -24,8 +24,8 @@ #define TRACER_TIMEOUT 10000 #define etm_writel(t, v, x) \ - (__raw_writel((v), (t)->etm_regs + (x))) -#define etm_readl(t, x) (__raw_readl((t)->etm_regs + (x))) + (writel_relaxed((v), (t)->etm_regs + (x))) +#define etm_readl(t, x) (readl_relaxed((t)->etm_regs + (x))) /* CoreSight Management Registers */ #define CSMR_LOCKACCESS 0xfb0 @@ -142,8 +142,8 @@ #define ETBFF_TRIGFL BIT(10) #define etb_writel(t, v, x) \ - (__raw_writel((v), (t)->etb_regs + (x))) -#define etb_readl(t, x) (__raw_readl((t)->etb_regs + (x))) + (writel_relaxed((v), (t)->etb_regs + (x))) +#define etb_readl(t, x) (readl_relaxed((t)->etb_regs + (x))) #define etm_lock(t) do { etm_writel((t), 0, CSMR_LOCKACCESS); } while (0) #define etm_unlock(t) \ -- cgit v1.2.3 From 63328070eff2f4fd730c86966a0dbc976147c39f Mon Sep 17 00:00:00 2001 From: Ben Dooks Date: Thu, 25 Jul 2013 14:38:03 +0100 Subject: ARM: Correct BUG() assembly to ensure it is endian-agnostic Currently BUG() uses .word or .hword to create the necessary illegal instructions. However if we are building BE8 then these get swapped by the linker into different illegal instructions in the text. This means that the BUG() macro does not get trapped properly. Change to using to provide the necessary ARM instruction building as we cannot rely on gcc/gas having the `.inst` instructions which where added to try and resolve this issue (reported by Dave Martin ). Signed-off-by: Ben Dooks Reviewed-by: Dave Martin --- arch/arm/include/asm/bug.h | 10 ++++++---- arch/arm/kernel/traps.c | 8 +++++--- 2 files changed, 11 insertions(+), 7 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/bug.h b/arch/arm/include/asm/bug.h index 7af5c6c3653a..b274bde24905 100644 --- a/arch/arm/include/asm/bug.h +++ b/arch/arm/include/asm/bug.h @@ -2,6 +2,8 @@ #define _ASMARM_BUG_H #include +#include +#include #ifdef CONFIG_BUG @@ -12,10 +14,10 @@ */ #ifdef CONFIG_THUMB2_KERNEL #define BUG_INSTR_VALUE 0xde02 -#define BUG_INSTR_TYPE ".hword " +#define BUG_INSTR(__value) __inst_thumb16(__value) #else #define BUG_INSTR_VALUE 0xe7f001f2 -#define BUG_INSTR_TYPE ".word " +#define BUG_INSTR(__value) __inst_arm(__value) #endif @@ -33,7 +35,7 @@ #define __BUG(__file, __line, __value) \ do { \ - asm volatile("1:\t" BUG_INSTR_TYPE #__value "\n" \ + asm volatile("1:\t" BUG_INSTR(__value) "\n" \ ".pushsection .rodata.str, \"aMS\", %progbits, 1\n" \ "2:\t.asciz " #__file "\n" \ ".popsection\n" \ @@ -48,7 +50,7 @@ do { \ #define __BUG(__file, __line, __value) \ do { \ - asm volatile(BUG_INSTR_TYPE #__value); \ + asm volatile(BUG_INSTR(__value) "\n"); \ unreachable(); \ } while (0) #endif /* CONFIG_DEBUG_BUGVERBOSE */ diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c index caf96da27360..6125f259b7b5 100644 --- a/arch/arm/kernel/traps.c +++ b/arch/arm/kernel/traps.c @@ -342,15 +342,17 @@ void arm_notify_die(const char *str, struct pt_regs *regs, int is_valid_bugaddr(unsigned long pc) { #ifdef CONFIG_THUMB2_KERNEL - unsigned short bkpt; + u16 bkpt; + u16 insn = __opcode_to_mem_thumb16(BUG_INSTR_VALUE); #else - unsigned long bkpt; + u32 bkpt; + u32 insn = __opcode_to_mem_arm(BUG_INSTR_VALUE); #endif if (probe_kernel_address((unsigned *)pc, bkpt)) return 0; - return bkpt == BUG_INSTR_VALUE; + return bkpt == insn; } #endif -- cgit v1.2.3 From 5a8b93fc9457be90adfa10d3df6497393c5e2dc2 Mon Sep 17 00:00:00 2001 From: Ben Dooks Date: Thu, 25 Jul 2013 15:47:40 +0100 Subject: ARM: kdgb: use for data to be assembled as intruction The arch_kgdb_breakpoint() function uses an inline assembly directive to assemble a specific instruction using .word. This means the linker will not treat is as an instruction, and therefore incorrectly swap the endian-ness if running BE8. As noted, this code means that kgdb is really only usable on arm32 kernels, and should be made dependant on not being a thumb2 kernel until fixed. However this is not something to be added to this patch. Signed-off-by: Ben Dooks Reviewed-by: Dave Martin --- arch/arm/include/asm/kgdb.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/kgdb.h b/arch/arm/include/asm/kgdb.h index 48066ce9ea34..0a9d5dd93294 100644 --- a/arch/arm/include/asm/kgdb.h +++ b/arch/arm/include/asm/kgdb.h @@ -11,6 +11,7 @@ #define __ARM_KGDB_H__ #include +#include /* * GDB assumes that we're a user process being debugged, so @@ -41,7 +42,7 @@ static inline void arch_kgdb_breakpoint(void) { - asm(".word 0xe7ffdeff"); + asm(__inst_arm(0xe7ffdeff)); } extern void kgdb_handle_bus_error(void); -- cgit v1.2.3 From 2245f92498b216b50e744423bde17626287409d8 Mon Sep 17 00:00:00 2001 From: Victor Kamensky Date: Fri, 26 Jul 2013 09:28:53 -0700 Subject: ARM: atomic64: fix endian-ness in atomic.h Fix inline asm for atomic64_xxx functions in arm atomic.h. Instead of %H operand specifiers code should use %Q for least significant part of the value, and %R for the most significant part of the value. %H always returns the higher of the two register numbers, and therefore it is not endian neutral. %H should be used with ldrexd and strexd instructions. Signed-off-by: Victor Kamensky Acked-by: Will Deacon Signed-off-by: Ben Dooks --- arch/arm/include/asm/atomic.h | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h index da1c77d39327..6447a0b7b127 100644 --- a/arch/arm/include/asm/atomic.h +++ b/arch/arm/include/asm/atomic.h @@ -301,8 +301,8 @@ static inline void atomic64_add(u64 i, atomic64_t *v) __asm__ __volatile__("@ atomic64_add\n" "1: ldrexd %0, %H0, [%3]\n" -" adds %0, %0, %4\n" -" adc %H0, %H0, %H4\n" +" adds %Q0, %Q0, %Q4\n" +" adc %R0, %R0, %R4\n" " strexd %1, %0, %H0, [%3]\n" " teq %1, #0\n" " bne 1b" @@ -320,8 +320,8 @@ static inline u64 atomic64_add_return(u64 i, atomic64_t *v) __asm__ __volatile__("@ atomic64_add_return\n" "1: ldrexd %0, %H0, [%3]\n" -" adds %0, %0, %4\n" -" adc %H0, %H0, %H4\n" +" adds %Q0, %Q0, %Q4\n" +" adc %R0, %R0, %R4\n" " strexd %1, %0, %H0, [%3]\n" " teq %1, #0\n" " bne 1b" @@ -341,8 +341,8 @@ static inline void atomic64_sub(u64 i, atomic64_t *v) __asm__ __volatile__("@ atomic64_sub\n" "1: ldrexd %0, %H0, [%3]\n" -" subs %0, %0, %4\n" -" sbc %H0, %H0, %H4\n" +" subs %Q0, %Q0, %Q4\n" +" sbc %R0, %R0, %R4\n" " strexd %1, %0, %H0, [%3]\n" " teq %1, #0\n" " bne 1b" @@ -360,8 +360,8 @@ static inline u64 atomic64_sub_return(u64 i, atomic64_t *v) __asm__ __volatile__("@ atomic64_sub_return\n" "1: ldrexd %0, %H0, [%3]\n" -" subs %0, %0, %4\n" -" sbc %H0, %H0, %H4\n" +" subs %Q0, %Q0, %Q4\n" +" sbc %R0, %R0, %R4\n" " strexd %1, %0, %H0, [%3]\n" " teq %1, #0\n" " bne 1b" @@ -428,9 +428,9 @@ static inline u64 atomic64_dec_if_positive(atomic64_t *v) __asm__ __volatile__("@ atomic64_dec_if_positive\n" "1: ldrexd %0, %H0, [%3]\n" -" subs %0, %0, #1\n" -" sbc %H0, %H0, #0\n" -" teq %H0, #0\n" +" subs %Q0, %Q0, #1\n" +" sbc %R0, %R0, #0\n" +" teq %R0, #0\n" " bmi 2f\n" " strexd %1, %0, %H0, [%3]\n" " teq %1, #0\n" @@ -459,8 +459,8 @@ static inline int atomic64_add_unless(atomic64_t *v, u64 a, u64 u) " teqeq %H0, %H5\n" " moveq %1, #0\n" " beq 2f\n" -" adds %0, %0, %6\n" -" adc %H0, %H0, %H6\n" +" adds %Q0, %Q0, %Q6\n" +" adc %R0, %R0, %R6\n" " strexd %2, %0, %H0, [%4]\n" " teq %2, #0\n" " bne 1b\n" -- cgit v1.2.3 From a1af3474487cc3b8731b990dceac6b6aad7f3ed8 Mon Sep 17 00:00:00 2001 From: Victor Kamensky Date: Mon, 7 Oct 2013 08:48:23 -0700 Subject: ARM: tlb: ASID macro should give 32bit result for BE correct operation In order for ASID macro to be used as expression passed to inline asm as 'r' operand it needs to give 32 bit unsigned result, not unsigned 64bit expression. Otherwise when 64bit ASID is passed to inline assembler statement as 'r' operand (32bit) compiler behavior is not well specified. For example when __flush_tlb_mm function compiled in big endian case, and ASID is passed to tlb_op macro directly, 0 will be passed as 'mcr 15, 0, r4, cr8, cr3, {2}' argument in r4, unless ASID macro changed to produce 32 bit result. Signed-off-by: Victor Kamensky Acked-by: Will Deacon Signed-off-by: Ben Dooks --- arch/arm/include/asm/mmu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/mmu.h b/arch/arm/include/asm/mmu.h index 6f18da09668b..64fd15159b7d 100644 --- a/arch/arm/include/asm/mmu.h +++ b/arch/arm/include/asm/mmu.h @@ -16,7 +16,7 @@ typedef struct { #ifdef CONFIG_CPU_HAS_ASID #define ASID_BITS 8 #define ASID_MASK ((~0ULL) << ASID_BITS) -#define ASID(mm) ((mm)->context.id.counter & ~ASID_MASK) +#define ASID(mm) ((unsigned int)((mm)->context.id.counter & ~ASID_MASK)) #else #define ASID(mm) (0) #endif -- cgit v1.2.3 From 5e4432d3bd6b5b19e10bb263e7dbe8e74d7cf1c2 Mon Sep 17 00:00:00 2001 From: Russell King Date: Tue, 29 Oct 2013 23:06:44 +0000 Subject: ARM: fix misplaced arch_virt_to_idmap() Olof Johansson reported: In file included from arch/arm/include/asm/page.h:163:0, from include/linux/mm_types.h:16, from include/linux/sched.h:24, from arch/arm/kernel/asm-offsets.c:13: arch/arm/include/asm/memory.h: In function '__virt_to_idmap': arch/arm/include/asm/memory.h:300:6: error: 'arch_virt_to_idmap' undeclared (first use in this function) caused by arch_virt_to_idmap being placed inside a different preprocessor conditional to its user. Move it along side its user. Signed-off-by: Russell King --- arch/arm/include/asm/memory.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'arch/arm/include') diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h index 6748d6295a1a..4dd21457ef9d 100644 --- a/arch/arm/include/asm/memory.h +++ b/arch/arm/include/asm/memory.h @@ -174,7 +174,6 @@ #define __PV_BITS_31_24 0x81000000 #define __PV_BITS_7_0 0x81 -extern phys_addr_t (*arch_virt_to_idmap) (unsigned long x); extern u64 __pv_phys_offset; extern u64 __pv_offset; extern void fixup_pv_table(const void *, unsigned long); @@ -290,6 +289,8 @@ static inline void *phys_to_virt(phys_addr_t x) #define __va(x) ((void *)__phys_to_virt((phys_addr_t)(x))) #define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT) +extern phys_addr_t (*arch_virt_to_idmap)(unsigned long x); + /* * These are for systems that have a hardware interconnect supported alias of * physical memory for idmap purposes. Most cases should leave these -- cgit v1.2.3