summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-03-23sched: add sched feature FORCE_CPU_THROTTLING_IMMINENTJoonwoo Park
Add a new sched feature FORCE_CPU_THROTTLING_IMMINENT to perform migration due to EA without checking frequency throttling. This option can give us better debugging and verification capability. Change-Id: Iba445961a7f9812528b4e3aa9c6ddf47a3aad583 [joonwoop@codeaurora.org: fixed trivial conflict in kernel/sched/features.h] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: continue to search less power efficient cpu for load balancerJoonwoo Park
When choosing a CPU to do power-aware active balance from the load balancer currently selects the first eligible CPU it finds, even if there is another eligible CPU which is higher-power. This can lead to suboptimal load balancing behavior and extra migrations. Power and performance will be impacted. Achieve better power and performance by continuing to search the least power efficient cpu as long as the cpu's load average is higher than or equal to the busiest cpu found by far. CRs-fixed: 777341 Change-Id: I14eb21ab725bf7dab88b2e1e169aced6f2d712ca Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: Update cur_freq for offline CPUs in notifier callbackSyed Rameez Mustafa
cpufreq governor does not send frequency change notifications for offline CPUs. This means that a hot removed CPU's cur_freq information can get stale if there is a frequency change while that CPU is offline. When the offline CPU is hotplugged back in, all subsequent load calculations are based off the stale information until another frequency change occurs and the corresponding set of notifications are sent out. Avoid this incorrect load tracking by updating the cur_freq for all CPUs in the same frequency domain. Change-Id: Ie11ad9a64e7c9b115d01a7c065f22d386eb431d5 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: Fix overflow in max possible capacity calculationOlav Haugan
The max possible capacity calculation might overflow given large enough max possible frequency and capacity. Fix potential for overflow. Change-Id: Ie9345bc657988845aeb450d922052550cca48a5f Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-03-23sched: add preference for prev_cpu in HMP task placementSteve Muckle
At present the HMP task placement algorithm scans CPUs in numerical order and if two identical options are found, the first one encountered is chosen, even if it is different from the task's previous CPU. Add a bias towards the task's previous CPU in such situations. Any time two or more CPUs are considered equivalent (load, C-state, power cost), if one of them is the task's previous CPU, bias towards that CPU. The algorithm is otherwise unchanged. CRs-Fixed: 772033 Change-Id: I511f5b929c2bfa6fdea9e7433893c27b29ed8026 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23sched: Per-cpu prefer_idle flagSrivatsa Vaddagiri
Remove the global sysctl_sched_prefer_idle flag and replace it with a per-cpu prefer_idle flag. The per-cpu flag is expected to same for all cpus in a cluster. It thus provides convenient means to disable packing in one cluster while allowing packing in another cluster. Change-Id: Ie4cc73bb1a55b4eac5697be38e558546161faca1 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23sched: Consider PF_WAKE_UP_IDLE in select_best_cpu()Srivatsa Vaddagiri
sysctl_sched_prefer_idle controls selection of idle cpus for waking tasks. In some cases, waking to idle cpus help performance while in other cases it hurts (as tasks incur latency associated with C-state wakeup). Its ideal if scheduler can adapt prefer_idle behavior based on the task that is waking up, but that's hard for scheduler to figure by itself. PF_WAKE_UP_IDLE hint can be provided by external module/driver in such case to guide scheduler in preferring an idle cpu for select tasks irrespective of sysctl_sched_prefer_idle flag. This patch enhances select_best_cpu() to consider PF_WAKE_UP_IDLE hint. Wakeup posted from any task that has PF_WAKE_UP_IDLE set is a hint for scheduler to prefer idle cpu for waking tasks. Similarly scheduler will attempt to place any task with PF_WAKE_UP_IDLE set on idle cpu when they wakeup. CRs-Fixed: 773101 Change-Id: Ia8bf334d98fd9fd2ff9eda875430497d55d64ce6 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23sched: Add sysctl to enable power aware schedulingOlav Haugan
Add sysctl to enable energy awareness at runtime. This is useful for performance/power tuning/measurements and debugging. In addition this will match up with the Documentation/scheduler/sched-hmp.txt documentation. Change-Id: I0a9185498640d66917b38bf5d55f6c59fc60ad5c Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
2016-03-23sched: Ensure no active EA migration occurs when EA is disabledOlav Haugan
There exists a flag called "sched_enable_power_aware" that is not honored everywhere. Fix this. Change-Id: I62225939b71b25970115565b4e9ccb450e252d7c Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
2016-03-23sched: take account of irq preemption when calculating irqload deltaJoonwoo Park
If irq raises while sched_irqload() is calculating irqload delta, sched_account_irqtime() can update rq's irqload_ts which can be greater than the jiffies stored in sched_irqload()'s context so delta can be negative. This negative delta means there was recent irq occurence. So remove improper BUG_ON(). CRs-fixed: 771894 Change-Id: I5bb01b50ec84c14bf9f26dd9c95de82ec2cd19b5 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Prevent race conditions where upmigrate_min_nice changesJoonwoo Park
When upmigrate_min_nice is changed dec_nr_big_small_task() can trigger BUG_ON(rq->nr_big_tasks < 0). This happens when there is a task which was considered as non-big task due to its nice > upmigrate_min_nice and later upmigrate_min_nice is changed to higher value so the task becomes big task. In this case runqueue still has nr_big_tasks = 0 incorrectly with current implementation. Consequently next scheduler tick sees a big task to schedule and try to decrease nr_big_tasks which is already 0. Introduce sched_upmigrate_min_nice which is updated atomically and re-count the number of big and small tasks to fix BUG_ON() triggering. Change-Id: I6f5fc62ed22bbe5c52ec71613082a6e64f406e58 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Avoid frequent task migration due to EA in lbOlav Haugan
A new tunable exists that allow task migration to be throttled when the scheduler tries to do task migrations due to Energy Awareness (EA). This tunable is only taken into account when migrations occur in the tick path. Extend the usage of the tunable to take into account the load balancer (lb) path also. In addition ensure that the start of task execution on a CPU is updated correctly. If a task is preempted but still runnable on the same CPU the start of execution should not be updated. Only update the start of execution when a task wakes up after sleep or moves to a new CPU. Change-Id: I6b2a8e06d8d2df8e0f9f62b7aba3b4ee4b2c1c4d Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org [joonwoop@codeaurora.org: fixed conflict in group_classify() and set_task_cpu().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Avoid migrating tasks to little cores due to EAOlav Haugan
If during the check whether migration is needed we find that there is a lower power CPU available we commence to find a new CPU for this task. However, by the time we search for a new CPU the lower power CPU might no longer be available. We should abort the attempt to migrate a task in this case. CRs-Fixed: 764788 Change-Id: I867923a82b95c599278b81cd73bb102b6aff4d03 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-03-23sched: Add temperature to cpu_load trace pointOlav Haugan
Add the current CPU temperature to the sched_cpu_load trace point. This will allow us to track the CPU temperature. CRs-Fixed: 764788 Change-Id: Ib2e3559bbbe3fe07a6b7c8115db606828bc36254 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-03-23sched: Only do EA migration when CPU throttling is imminentOlav Haugan
We do not want to migrate tasks unnecessary to avoid cache hit and other migration latencies that could affect the performance of the system. Add a check to only try EA migration when CPU frequency throttling is imminent. CRs-Fixed: 764788 Change-Id: I92e86e62da10ce15f1e76a980df3545e93d76348 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
2016-03-23sched: Avoid frequent migration of running taskSrivatsa Vaddagiri
Power values for cpus can drop quite considerably when it goes idle. As a result, the best choice for running a single task in a cluster can vary quite rapidly. As the task keeps hopping cpus, other cpus go idle and start being seen as more favorable target for running a task, leading to task migrating almost every scheduler tick! Prevent this by keeping track of when a task started running on a cpu and allowing task migration in tick path (migration_needed()) on account of energy efficiency reasons only if the task has run sufficiently long (as determined by sysctl_sched_min_runtime variable). Note that currently sysctl_sched_min_runtime setting is considered only in scheduler_tick()->migration_needed() path and not in idle_balance() path. In other words, a task could be migrated to another cpu which did a idle_balance(). This limitation should not affect high-frequency migrations seen typically (when a single high-demand task runs on high-performance cpu). CRs-Fixed: 756570 Change-Id: I96413b7a81b623193c3bbcec6f3fa9dfec367d99 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in set_task_cpu() and __schedule().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: treat sync waker CPUs with 1 task as idleSteve Muckle
When a CPU with one task performs a sync wakeup, its one task is expected to sleep immediately so this CPU should be treated as idle for the purposes of CPU selection for the waking task. This is only done when idle CPUs are the preferred targets for non-small task wakeups. When prefer_idle is 0, the CPU is left as non-idle in the selection logic so it is still a preferred candidate for the sync wakeup. Change-Id: I65c6535169293e8ba0c37fb5e88aec336338f7d7 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23sched: extend sched_task_load tracepoint to indicate prefer_idleSyed Rameez Mustafa
Prefer idle determines whether the scheduler prefers an idle CPU over a busy CPU or not to wake up a task on. Knowing the correct value of this tunable is essential in understanding placement decisions made in select_best_cpu(). Change-Id: I955d7577061abccb65d01f560e1911d9db70298a Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: extend sched_task_load tracepoint to indicate sync wakeupSteve Muckle
Sync wakeups provide a hint to the scheduler about upcoming task activity. Knowing which wakeups are sync wakeups from logs will assist in workload analysis. Change-Id: I6ffe73f2337e56b8234d4097069d5d70ab045eda Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23sched: add sync wakeup recognition in select_best_cpuSteve Muckle
If a wakeup is a sync wakeup, we need to discount the currently running task's load from the waker's CPU as we calculate the best CPU for the waking task to land on. Change-Id: I00c5df626d17868323d60fb90b4513c0dd314825 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23sched: Provide knob to prefer mostly_idle over idle cpusSrivatsa Vaddagiri
sysctl_sched_prefer_idle lets the scheduler bias selection of idle cpus over mostly idle cpus for tasks. This knob could be useful to control balance between power and performance. Change-Id: Ide6eef684ef94ac8b9927f53c220ccf94976fe67 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23sched: make sched_cpu_high_irqload a runtime tunableSteve Muckle
It may be desirable to be able to alter the scehd_cpu_high_irqload setting easily, so make it a runtime tunable value. Change-Id: I832030eec2aafa101f0f435a4fd2d401d447880d Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23sched: trace: extend sched_cpu_load to print irqloadSteve Muckle
The irqload is used in determining whether CPUs are mostly idle so it is useful to know this value while viewing scheduler traces. Change-Id: Icbb74fc1285be878f254ae54886bdb161b14a270 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23sched: avoid CPUs with high irq activitySteve Muckle
CPUs with significant IRQ activity will not be able to serve tasks quickly. Avoid them if possible by disqualifying such CPUs from being recognized as mostly idle. Change-Id: I2c09272a4f259f0283b272455147d288fce11982 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23sched: refresh sched_clock() after acquiring rq lock in irq pathSteve Muckle
The wallclock time passed to sched_account_irqtime() may be stale after we wait to acquire the runqueue lock. This could cause problems in update_task_ravg because a different CPU may have advanced this CPU's window_start based on a more up-to-date wallclock value, triggering a BUG_ON(window_start > wallclock). Change-Id: I316af62d1716e9b59c4a2898a2d9b44d6c7a75d8 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23sched: track soft/hard irqload per-RQ with decaying avgSteve Muckle
The scheduler currently ignores irq activity when deciding which CPUs to place tasks on. If a CPU is getting hammered with IRQ activity but has no tasks it will look attractive to the scheduler as it will not be in a low power mode. Track irqload with a decaying average. This quantity can be used in the task placement logic to avoid CPUs which are under high irqload. The decay factor is 3/4. Note that with this algorithm the tracked irqload quantity will be higher than the actual irq time observed in any single window. Some sample outcomes with steady irqloads per 10ms window and the 3/4 decay factor (irqload of 10 is used as a threshold in a subsequent patch): irqload per window load value asymptote # windows to > 10 2ms 8 n/a 3ms 12 7 4ms 16 4 5ms 20 3 Of course irqload will not be constant in each window, these are just given as simple examples. Change-Id: I9dba049f5dfdcecc04339f727c8dd4ff554e01a5 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23sched: do not set window until sched_clock is fully initializedSteve Muckle
The system initially uses a jiffy-based sched clock. When the platform registers a new timer for sched_clock, sched_clock can jump backwards. Once sched_clock_postinit() runs it should be safe to rely on it. Also sched_clock_cpu() relies on completion of sched_clock_init() and until that happens sched_clock_cpu() returns zero. This is used in the irq accounting path which window-based stats relies upon. So do not set window_start until sched_clock_cpu() is working. Change-Id: Ided349de8f8554f80a027ace0f63ea52b1c38c68 Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23sched: Make RT tasks eligible for boostSyed Rameez Mustafa
During sched boost RT tasks currently end up going to the lowest power cluster. This can be a performance bottleneck especially if the frequency and IPC differences between clusters are high. Furthermore, when RT tasks go over to the little cluster during boost, the load balancer keeps attempting to pull work over to the big cluster. This results in pre-emption of the executing RT task causing more delays. Finally, containing more work on a single cluster during boost might help save some power if the little cluster can then enter deeper low power modes. Change-Id: I177b2e81be5657c23e7ac43889472561ce9993a9 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: Limit LBF_PWR_ACTIVE_BALANCE to within clusterSrivatsa Vaddagiri
When higher power (performance) cluster has only one online cpu, we currently let an idle cpu in lower power cluster pull a running task from performance cluster via active balance. Active balance for power-aware reasons is supposed to be restricted to balance within cluster, the check for which is not correctly implemented. Change-Id: I5fba7f01ad80c082a9b27e89b7f6b17a6d9cde14 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23sched: Packing support until a frequency thresholdSrivatsa Vaddagiri
Add another dimension for task packing based on frequency. This patch adds a per-cpu tunable, rq->mostly_idle_freq, which when set will result in tasks being packed on a single cpu in cluster as long as cluster frequency is less than set threshold. Change-Id: I318e9af6c8788ddf5dfcda407d621449ea5343c0 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23sched: tighten up jiffy to sched_clock mappingSteve Muckle
The tick code already tracks exact time a tick is expected to arrive. This can be used to eliminate slack in the jiffy to sched_clock mapping that aligns windows between a caller of sched_set_window and the scheduler itself. Change-Id: I9d47466658d01e6857d7457405459436d504a2ca Signed-off-by: Steve Muckle <smuckle@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in include/linux/tick.h] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Avoid unnecessary load balance when tasks don't fit on dst_cpuSyed Rameez Mustafa
When considering to pull over a task that does not fit on the destination CPU make sure that the busiest group has exceeded its capacity. While the change is applicable to all groups, the biggest impact will be on migrating big tasks to little CPUs. This should only happen when the big cluster is no longer capable of balancing load within the cluster. This change should have no impact on single cluster systems. Change-Id: I6d1ef0e0d878460530f036921ce4a4a9c1e1394b Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: print sched_cpu_load tracepoint for all CPUsSteve Muckle
When select_best_cpu() is called because a task is on a suboptimal CPU, certain CPUs are skipped because moving the task there would not make things any better. For the purposes of debugging though it is useful to always see the state of all CPUs. Change-Id: I76965663c1feef5c4cfab9909e477b0dcf67272d Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23sched: per-cpu mostly_idle thresholdSrivatsa Vaddagiri
sched_mostly_idle_load and sched_mostly_idle_nr_run knobs help pack tasks on cpus to some extent. In some cases, it may be desirable to have different packing limits for different cpus. For example, pack to a higher limit on high-performance cpus compared to power-efficient cpus. This patch removes the global mostly_idle tunables and makes them per-cpu, thus letting task packing behavior to be controlled in a fine-grained manner. Change-Id: Ifc254cda34b928eae9d6c342ce4c0f64e531e6c2 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23sched: Add API to set task's initial task loadSrivatsa Vaddagiri
Add a per-task attribute, init_load_pct, that is used to initialize newly created children's initial task load. This helps important applications launch their child tasks on cpus with highest capacity. Change-Id: Ie9665fd2aeb15203f95fd7f211c50bebbaa18727 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict int init_new_task_load. se.avg.runnable_avg_sum has deprecated.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Document HMP schedulerSrivatsa Vaddagiri
Documentation on HMP scheduler design. Change-Id: I594e55531dafa5cf8f41ba34e1ae5bed0473c18a Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23sched: use C-states in non-small task wakeup placement logicSyed Rameez Mustafa
Currently when a non-small task wakes up, the task placement logic first tries to find the least loaded CPU before breaking any ties via the power cost of running the task on those CPUs. When the power cost is also same, however, the scheduler just selects the first CPU it came across. Use C-states to further break ties when the power cost is the same for multiple CPUs. The scheduler will now pick a CPU in the shallowest C-state. Change-Id: Ie1401b305fa02758a2f7b30cfca1afe64459fc2b Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: take rq lock prior to saving idle task's mark_startSteve Muckle
When the idle task is being re-initialized during hotplug its mark_start value must be retained. The runqueue lock must be held when reading this value though to serialize this with other CPUs that could update the idle task's window-based statistics. CRs-Fixed: 743991 Change-Id: I1bca092d9ebc32a808cea2b9fe890cd24dc868cd Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23sched: update governor notification logicSrivatsa Vaddagiri
Make criteria for notifying governor to be per-cpu. Governor is notified of any large change in cpu's busy time statistics (rq->prev_runnable_sum) since the last reported value. Change-Id: I727354d994d909b166d093b94d3dade7c7dddc0d Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23sched: window-stats: Retain idle thread's mark_startSrivatsa Vaddagiri
init_idle() is called on a cpu's idle-thread once at bootup and subsequently everytime the cpu is hot-added. Since init_idle() calls __sched_fork(), we end up blowing idle thread's ravg.mark_start value. As a result we will fail to accurately maintain cpu's curr/prev_runnable_sum counters. Below example illustrates such a failure: CS = curr_runnable_sum, PS = prev_runnable_sum t0 -> New window starts for CPU2 <after some_task_activity> CS = X, PS = Y t1 -> <cpu2 is hot-removed. idle_task start's running on cpu2> At this time, cpu2_idle_thread.ravg.mark_start = t1 t1 -> t0 + W. One window elapses. CPU2 still hot-removed. We defer swapping CS and PS until some future task event occurs t2 -> CPU2 hot-added. _cpu_up()->idle_thread_get()->init_idle() ->__sched_fork() results in cpu2_idle_thread.ravg.mark_start = 0 t3 -> Some task wakes on cpu2. Since mark_start = 0, we don't swap CS and PS => which is a BUG! Fix this by retaining idle task's original mark_start value during init_idle() call. Change-Id: I4ac9bfe3a58fb5da8a6c7bc378c79d9930d17942 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23sched: Add checks for frequency changeOlav Haugan
We need to check for frequency change when a task is migrated due to affinity change and during active balance. Change-Id: I96676db04d34b5b91edd83431c236a1c28166985 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org> [rameezmustafa@codeaurora.org]: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org [joonwoop@codeaurora.org: fixed minor conflict in core.c] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Use absolute scale for notifying governorSrivatsa Vaddagiri
Make the tunables used for deciding the need for notification to be on absolute scale. The earlier scale (in percent terms relative to cur_freq) does not work well with available range of frequencies. For example, 100% tunable value would work well for lower range of frequencies and not for higher range. Having the tunable to be on absolute scale makes tuning more realistic. Change-Id: I35a8c4e2f2e9da57f4ca4462072276d06ad386f1 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23sched: window-stats: Enhance cpu busy time accountingSrivatsa Vaddagiri
rq->curr/prev_runnable_sum counters represent cpu demand from various tasks that have run on a cpu. Any task that runs on a cpu will have a representation in rq->curr_runnable_sum. Their partial_demand value will be included in rq->curr_runnable_sum. Since partial_demand is derived from historical load samples for a task, rq->curr_runnable_sum could represent "inflated/un-realistic" cpu usage. As an example, lets say that task with partial_demand of 10ms runs for only 1ms on a cpu. What is included in rq->curr_runnable_sum is 10ms (and not the actual execution time of 1ms). This leads to cpu busy time being reported on the upside causing frequency to stay higher than necessary. This patch fixes cpu busy accounting scheme to strictly represent actual usage. It also provides for conditional fixup of busy time upon migration and upon heavy-task wakeup. CRs-Fixed: 691443 Change-Id: Ic4092627668053934049af4dfef65d9b6b901e6b Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in init_task_load(), se.avg.decay_count has deprecated.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: window-stats: ftrace event improvementsSrivatsa Vaddagiri
Add two new ftrace event: * trace_sched_freq_alert, to log notifications sent to governor for requesting change in frequency. * trace_sched_get_busy, to log cpu busytime information returned by scheduler Extend existing ftrace events as follows: * sched_update_task_ravg() event to log irqtime parameter * sched_migration_update_sum() to log threadid which is being migrated (and thus responsible for update of curr_runnable_sum and prev_runnable_sum counters) Change-Id: Ia68ce0953a2d21d319a1db7f916c51ff6a91557c Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23sched: improve logic for alerting governorSrivatsa Vaddagiri
Currently we send notification to governor not taking note of cpus that are synchronized with regard to their frequency. As a result, scheduler could send pointless notifications (notification spam!). Avoid this by considering synchronized cpus and alerting governor only when the highest demand of any cpu within cluster far exceeds or falls behind current frequency. Change-Id: I74908b5a212404ca56b38eb94548f9b1fbcca33d Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23sched: Stop task migration to busy CPUs due to power active balanceSyed Rameez Mustafa
Power active balance should only be invoked when the destination CPU is calling load balance with either a CPU_IDLE or a CPU_NEWLY_IDLE environment. We do not want to push tasks towards busy CPUs even they are a more power efficient place to run that task. This can cause higher scheduling latencies due to the resulting load imbalance. Change-Id: I8e0f242338887d189e2fc17acfb63586e7c40839 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: window-stats: Fix accounting bug in legacy modeSrivatsa Vaddagiri
TASK_UPDATE event currently does not result in increment of rq->curr_runnable_sum in legacy mode, which is wrong. As a result, cpu busy time reported under legacy mode could be incorrect. Change-Id: Ifa76c735a0ead23062c1a64faf97e7b801b66bf9 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23sched: window-stats: Note legacy mode in fork() and exit()Srivatsa Vaddagiri
In legacy mode, mark_task_starting() should avoid adding (new) task's (initial) demand to rq->curr_runnable_sum and rq->prev_runnable_sum. Similarly exit() should avoid removing (exiting) task's demand from rq->curr_runnable_sum and rq->prev_runnable_sum (as those counters don't include task's demand and partial_demand values in legacy mode). Change-Id: I26820b1ac5885a9d681d363ec53d6866a2ea2e6f Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23sched: Fix reference to stale task_struct in try_to_wake_up()Srivatsa Vaddagiri
try_to_wake_up() currently drops p->pi_lock and later checks for need to notify cpufreq governor on task migrations or wakeups. However the woken task could exit between the time p->pi_lock is released and the time the test for notification is run. As a result, the test for notification could refer to an exited task. task_notify_on_migrate(p) could thus lead to invalid memory reference. Fix this by running the test for notification with task's pi_lock held. Change-Id: I1c7a337473d2d8e79342a015a179174ce00702e1 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: Remove hack to enable/disable HMP scheduling extensionsSyed Rameez Mustafa
The current method of turning HMP scheduling extensions on or off based on the number of CPUs is inappropriate as there may be SoCs with 4 or less cores that require the use of these extensions. Remove this hack as HMP extensions will now be enabled/disabled via command line options. Change-Id: Id44b53c2c3b3c3b83e1911a834e2c824f3958135 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>