summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-03-23sched: add preference for prev and sibling CPU in HMP task placementJoonwoo Park
At present HMP task placement algorithm places wake-up task on any lowest power cost CPU in the system even if the task's previous CPU is also one of the lowest power cost CPU. Placing task on the previous CPU can reduce cache bouncing. Add a bias towards the task's previous CPU and CPU in the same cache domain with previous CPU. Change-Id: Ieab3840432e277048058da76764b3a3f16e20c56 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Update task->on_rq when tasks are moving between runqueuesOlav Haugan
Task->on_rq has three states: 0 - Task is not on runqueue (rq) 1 (TASK_ON_RQ_QUEUED) - Task is on rq 2 (TASK_ON_RQ_MIGRATING) - Task is on rq but in the process of being migrated to another rq When a task is moving between rqs task->on_rq state should be TASK_ON_RQ_MIGRATING CRs-fixed: 884720 Change-Id: I1572aba00a0273d4ad5bc9a3dd60fb68e2f0b895 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-03-23sched: remove temporary demand fixups in fixup_busy_time()Syed Rameez Mustafa
On older kernel versions p->on_rq was a binary value that did not allow distinguishing between enqueued and migrating tasks. As a result fixup_busy_time would have to do temporary load adjustments to ensure that update_history does not do incorrect demand adjustments for migrating tasks. Since p->on_rq can now be used make a distinction between migrating and enqueued tasks, there is no need to do these temporary load calculations. Instead make sure update_history() only does load adjustments on enqueued tasks. Change-Id: I1f800ac61a045a66ab44b9219516c39aa08db087 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: add frequency zone awareness to the load balancerSyed Rameez Mustafa
Add zone awareness to the load balancer. Remove all earlier restrictions that the load balancer had for inter cluster kicks and migration. Change-Id: I12ad3d0c2d2e9bb498f49a231810f2ad418b061f Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in nohz_kick_needed() due to its return type change.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Update the wakeup placement logic for fair and rt tasksSyed Rameez Mustafa
For the fair sched class, update the select_best_cpu() policy to do power based placement. The hope is to minimize the voltage at which the CPU runs. While RT tasks already do power based placement, their placement preference has to now take into account the power cost of all tasks on a given CPU. Also remove the check for sched_boost since sched_boost no longer intends to elevate all tasks to the highest capacity cluster. Change-Id: Ic6a7625c97d567254d93b94cec3174a91727cb87 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: remove the notion of small tasks and small task packingSyed Rameez Mustafa
Task packing will now be determined solely on the basis of the power cost of task placement. All tasks are eligible for packing. Remove the notion of "small" tasks from the scheduler. Change-Id: I72d52d04b2677c6a8d0bc6aa7d50ff0f1a4f5ebb Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: Rework energy aware schedulingSyed Rameez Mustafa
Energy aware core rotation is not compatible with the power based task placement being introduced in subsequent patches. Remove all existing EA based task placement/migration logic. power_cost() is the only function remaining. This function has been modified to return the total power cost associated with a task on a given CPU taking existing load on that CPU into account. Change-Id: Ia00501e3cbfc6e11446a9a2e93e318c4c42bdab4 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed multiple conflicts in fair.c and minor conflict in features.h] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: encourage idle load balance and discourage active load balanceJoonwoo Park
Encourage IDLE and NEWLY_IDLE load balance by ignoring cache hotness and discourage active load balance by increasing busy balancing failure threshold. Such changes are for idle CPUs to help out busy CPUs more aggressively and reduce unnecessary active load balance within the same CPU domain. Change-Id: I22f6aba11932ccbb82a436c0532589c46f9148ed [joonwoop@codeaurora.org: fixed conflict in need_active_balance() and can_migrate_task().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: avoid stale cumulative_runnable_avg HMP statisticsJoonwoo Park
When a new window starts for a task and the task is on a rq, scheduler decreases rq's cumulative_runnable_avg momentarily, re-account task's demand and increases rq's cumulative_runnable_avg with newly accounted task's demand. Therefore there is short time period that rq's cumulative_runnable_avg is less than what it's supposed to be. Meanwhile, there is chance that other CPU is in search of best CPU to place a task and makes suboptimal decision with momentarily stale cumulative_runnable_avg. Fix such issue by adding or subtracting of delta between task's old and new demand instead of decrementing and incrementing of entire task's load. Change-Id: I3c9329961e6f96e269fa13359e7d1c39c4973ff2 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Add load based placement for RT tasksSyed Rameez Mustafa
Currently RT tasks prefer to go to the lowest power CPU in the system. This can end up causing contention on the lowest power CPU. Instead ensure that RT tasks end up on the lowest power cluster and the least loaded CPU within that cluster. Change-Id: I363b3d43236924962c67d2fb5d3d2d09800cd994 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: Avoid running idle_balance() consecutivelySyed Rameez Mustafa
With the introduction of "6dd123a sched: update ld_moved for active balance from the load balancer" the function load_balance() returns a non zero number of migrated tasks in anticipation of tasks that will end up on that CPU via active migration. Unfortunately on kernel versions 3.14 and beyond this ends up breaking pick_next_task_fair() which assumes that the load balancer only returns non zero numbers for tasks already migrated on to the destination CPU. A non zero number then triggers a rerun of the pick_next_task_fair() logic so that it can return one of the migrated tasks as the next task. When the load balancer returns a non zero number for tasks that will be moved via active migration, the rerun of pick_next_task_fair() finds the CPU to still have no runnable tasks. This in turn causes a rerun of idle_balance() and possibly migrating another task. Hence the destination CPU can unintentionally end up pulling several tasks. The intent of the change above is still necessary though to indicate termination of load balance at higher scheduling domains when active migration occurs. Achieve the same effect by using continue_balancing instead of faking the number of pulled tasks. This way pick_next_task_fair() stays happy and load balance stops at higher scheduling domains. Change-Id: Id223a3287e5d401e10fbc67316f8551303c7ff96 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: inline function scale_load_to_cpu()Joonwoo Park
Inline relatively small and frequently used function scale_load_to_cpu(). CRs-fixed: 849655 Change-Id: Id5f60595c394959d78e6da4cc4c18c338fec285b Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: look for least busy and fallback CPU only when it's neededJoonwoo Park
Function best_small_task_cpu() has bias on mostly idle CPUs and shallow cstate CPUs. Thus chance of needing to find the least busy or the least power cost fallback CPU is quite rare typically. At present, however, the function finds those two CPUs always unnecessarily for most of time. Optimize the function by amending it to look for the least busy CPU and the least power cost fallback CPU only when those are in need. This change is solely for optimization and doesn't make functional changes. CRs-fixed: 849655 Change-Id: I5eca11436e85b448142a7a7644f422c71eb25e8e Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: iterate search CPUs starting from prev_cpu for optimizationJoonwoo Park
Function best_small_task_cpu() looks for a mostly idle CPU and returns it as the best CPU for a given small task. At present, however, it cannot break the CPU search loop when the function found a mostly idle CPU but continues to iterate CPU search loop because the function needs to find and return the given task's previous CPU as the best CPU to avoid unnecessary task migration when the previous CPU is mostly idle. Optimize the function best_small_task_cpu() to iterate search CPUs starting from the given task's CPU so it can break the loop as soon as mostly idle CPU found. This optimization saves few hundreds ns spent by the function and doesn't make any functional change. CRs-fixed: 849655 Change-Id: I8c540963487f4102dac4d54e9f98e24a4a92a7b3 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Optimize the select_best_cpu() "for" loopSyed Rameez Mustafa
select_best_cpu() is agnostic of the hardware topology. This means that certain functions such as task_will_fit() and skip_cpu() are run unnecessarily for every CPU in a cluster whereas they need to run only once per cluster. Reduce the execution time of select_best_cpu() by ensuring these functions run only once per cluster. The frequency domain mask is used to identify CPUs that fall in the same cluster. CRs-fixed: 849655 Change-Id: Id24208710a0fc6321e24d9a773f00be9312b75de Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: added continue after clearing search_cpus. fixed indentations with space. fixed skip_cpu() to return true when rq == task_rq.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Optimize select_best_cpu() to reduce execution timeSyed Rameez Mustafa
select_best_cpu() is a crucial wakeup routine that determines the time taken by the scheduler to wake up a task. Optimize this routine to get higher performance. The following changes have been made as part of the optimization listed in order of how they built on top of one another: * Several routines called by select_best_cpu() recalculate task load and CPU load even though these are already known quantities. For example mostly_idle_cpu_sync() calculates CPU load; task_will_fit() calculates task load before spill_threshold_crossed() recalculates both. Remove these redundant calculations by moving the task load and CPU load computations to the select_best_cpu() 'for' loop and passing to any functions that need the information. * Rewrite best_small_task_cpu() to avoid the existing two pass approach. The two pass approach was only in place to find the minimum power cluster for small task placement. This information can easily be established by looking at runqueue capacities. The cluster with not the highest capacity constitutes the minimum power cluster. A special CPU mask is called the mpc_mask required to safeguard against undue side effects on SMP systems. Also terminate the function early if the previous CPU is found to be mostly_idle. * Reorganize code to ensure that no unnecessary computations or variable assignments are done. For example there is no need to compute CPU load if that information does not end up getting used in any iteration of the 'for' loop. * The tick logic for EA migrations unnecessarily checks for the power of all CPUs only for skip_cpu() to throw away the result later. Ensure that for EA we only check CPUs within the same cluster and avoid running select_best_cpu() whenever possible. CRs-fixed: 849655 Change-Id: I4e722912fcf3fe4e365a826d4d92a4dd45c05ef3 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed cpufreq_notifier_policy() to set mpc_mask. added a comment about prerequisite of lower_power_cpu_available(). s/struct rq * rq/struct rq *rq/. s/TASK_NICE/task_nice/] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched/debug: Add Kconfig to trigger panics on all 'BUG:' conditionsMatt Wagantall
Introduce CONFIG_PANIC_ON_SCHED_BUG to trigger panics along with all 'BUG:' prints from the scheduler core, even potentially-recoverable ones such as scheduling while atomic, sleeping from invalid context, and detection of broken arch topologies. Change-Id: I5d2f561614604357a2bc7900b047e53b3a0b7c6d Signed-off-by: Matt Wagantall <mattw@codeaurora.org> [joonwoop@codeaurora.org: fixed trivial merge conflict in lib/Kconfig.debug.] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: fix incorrect prev_runnable_sum accounting with long ISR runJoonwoo Park
At present, when IRQ handler spans multiple scheduler windows, HMP scheduler resets the IRQ CPU's prev_runnable_sum with its current max capacity under the assumption that there is no other possible contribution to the CPU's prev_runnable_sum. This isn't correct as another CPU can migrate tasks to the IRQ CPU. Furthermore such incorrectness can trigger BUG_ON() if the migrated task's prev_window is larger than migrating CPU's current capacity in following scenario. 1. ISR on the power efficient CPU has been running for multiple windows. 2. A task which has prev_window higher than IRQ CPU's current capacity migrated to the IRQ CPU. 3. Servicing IRQ is done and the IRQ CPU resets its prev_runnable_rum = CPU's current capacity. 4. Before window rollover, the task on the IRQ CPU migrates to other CPU and fixes up source and destnation CPUs' busy time. 5. BUG_ON(src_rq->prev_runnable_sum < 0) triggers as p->ravg.prev_window is larger than src_rq->prev_runnable_sum. Fix such incorrectness by preserving prev_runnable_sum when ISR spans multiple scheduler windows. There is no need to reset it. CRs-fixed: 828055 Change-Id: I1f95ece026493e49d3810f9c940ec5f698cc0b81 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: prevent task migration while governor queries CPUs' loadJoonwoo Park
At present, governor retrieves each CPUs' load sequentially. In this way, there is chance of race between governor's CPU load query and task migration that would result in reporting of lesser CPUs' load than actual. For example, CPU0 load = 30%. CPU1 load = 50%. Governor Load balancer - sched_get_busy(cpu 0) = 30%. - A task 'p' migrated from CPU 1 to CPU 0. p->ravg->prev_window = 50. Now CPU 0's load = 80%, CPU 1's load = 0%. - sched_get_busy(cpu 1) = 0% 50% of load from CPU 1 to 0 never accounted. Fix such issues by introducing a new API sched_get_cpus_busy() which makes for governor to be able to get set of CPUs' load. The loads set internally constructed with blocking load balancer to ensure migration cannot occur in the meantime. Change-Id: I4fa4dd1195eff26aa603829aca2054871521495e Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: report loads greater than 100% only during load alert notificationsSrivatsa Vaddagiri
The busy time of CPUs is adjusted during task migrations. This can result in reporting the load greater than 100% to the governor and causes direct jumps to the higher frequencies during the intra cluster migrations. Hence clip the load to 100% during the load reporting at the end of the window. The load is not clipped for load alert notifications which allows ramping up the frequency faster for inter cluster migrations and heavy task wakeup scenarios. Change-Id: I7347260aa476287ecfc706d4dd0877f4b75a1089 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org> Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: turn off the TTWU_QUEUE featureSyed Rameez Mustafa
While the feature TTWU_QUEUE has the advantage of reducing cache bouncing of runqueue locks, it has the side effect that runqueue statistics are not updated until the remote CPU has a chance to enqueue the task. Since there is no upper bound on the amount of time it can take the remote CPU to enqueue the task, several sequential wakeups can result in suboptimal task placement based on the stale statistics. Turn off the feature as the cost of sub-optimal placement is much higher than the cost of cache bouncing spinlocks for msm based systems. Change-Id: I0b85c0225237b2bc44f54934769f5e3750c0f3d6 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: avoid unnecessary HMP scheduler stat re-accountingJoonwoo Park
When sched_entity's runnable average changes, before and after, we decrease and increase HMP scheduler's statistics of the sched_entity to take into account of updated runnable average. In that period, however, other CPUs would see that the runnable average updating CPU's load as less than actual. This is suboptimal and can lead improper task placement and load balance decision. We can avoid such situation at least with window based load tracking as sched_entity's load average which is for PELT won't affect to HMP scheduler's load tracking statistics. Thus fix to update HMP statistics only when HMP scheduler uses PELT based load statistics. Change-Id: I9eb615c248c79daab5d22cbb4a994f94be6a968d [joonwoop@codeaurora.org: applied fix into __update_load_avg() instead of update_entity_load_avg().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched/fair: Fix capacity and nr_run comparisons in can_migrate_task()Syed Rameez Mustafa
Kernel version 3.18 and beyond alter the definition of sgs->group_capacity whereby it reflects the load a group is capable of taking. In previous kernel versions the term used to refer to the number of effective CPUs available. This change breaks the comparison of capacity with the number of running tasks on a group. To fix this convert the capacity metric before doing the comparison. Change-Id: I3ebd941273edbcc903a611d9c883773172e86c8e Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in can_migrate_task().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23Revert "sched: Use only partial wait time as task demand"Joonwoo Park
This reverts commit 0e2092e47488 ("sched: Use only partial wait time as task demand") as it causes performance regression. Change-Id: I3917858be98530807c479fc31eb76c0f22b4ea89 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched/deadline: Add basic HMP extensionsSyed Rameez Mustafa
Some HMP extensions have to be supported by all scheduling classes irrespective of them using HMP task placement or not. Add these basic extensions to make deadline scheduling work. Also during the tick, if a deadline task gets throttled, its HMP stats get decremented as part of the dequeue. However, the throttled task does not update its on_rq flag causing HMP stats to be double decremented when update_history() is called as part of a window rollover. Avoid this by checking for throttled deadline tasks before subtracting and adding the deadline tasks load from the rq cumulative runnable avg. Change-Id: I9e2ed6675a730f2ec830f764f911e71c00a7d87a Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: Fix racy invocation of fixup_busy_time via move_queued_taskVikram Mulukutla
set_task_cpu uses fixup_busy_time to redistribute a task's load information between source and destination runqueues. fixup_busy_time assumes that both source and destination runqueue locks have been acquired if the task is not being concurrently woken up. However this is no longer true, since move_queued_task does not acquire the destination CPU's runqueue lock due to optimizations brought in by recent kernels. Acquire both source and destination runqueue locks before invoking set_task_cpu in move_queued_tasks. Change-Id: I39fadf0508ad42e511db43428e52c8aa8bf9baf6 Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org> [joonwoop@codeaurora.org: fixed conflict in move_queued_task().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: don't inflate the task load when the CPU max freq is restrictedPavankumar Kondeti
When the CPU max freq is restricted and the CPU is running at the max freq, the task load is inflated by max_possible_freq/max_freq factor. This results in tasks migrating early to the better capacity CPUs which makes things worse if the frequency restriction is due to the thermal condition. Change-Id: Ie0ea405d7005764a6fb852914e88cf97102c138a Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-03-23sched: auto adjust the upmigrate and downmigrate thresholdsPavankumar Kondeti
The load scale factor of a CPU gets boosted when its max freq is restricted. A task load at the same frequency is scaled higher than normal under this scenario. This results in tasks migrating early to the better capacity CPUs and their residency over there also gets increased as their inflated load would be relatively higher than than the downmigrate threshold. Auto adjust the upmigrate and downmigrate thresholds by a factor equal to rq->max_possible_freq/rq->max_freq of a lower capacity CPU. If the adjusted upmigrate threshold exceeds the window size, it is clipped to the window size. If the adjusted downmigrate threshold decreases the difference between the upmigrate and downmigrate, it is clipped to a value such that the difference between the modified and the original thresholds is same. Change-Id: Ifa70ee5d4ca5fe02789093c7f070c77629907f04 Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-03-23sched: don't inherit initial task load from the parentPavankumar Kondeti
child task is not supposed to inherit initial task load attribute from the parent. Reset the child's init_load_pct attribute during fork. Change-Id: I458b121f10f996fda364e97b51aaaf6c345c1dbb Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-03-23sched/fair: Add irq load awareness to the tick CPU selection logicOlav Haugan
IRQ load is not taken into account when determining whether a task should be migrated to a different CPU. A task that runs for a long time could get stuck on CPU with high IRQ load causing degraded performance. Add irq load awareness to the tick CPU selection logic. CRs-fixed: 809119 Change-Id: I7969f7dd947fb5d66fce0bedbc212bfb2d42c8c1 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-03-23sched: disable IRQs in update_min_max_capacitySteve Muckle
IRQs must be disabled while locking runqueues since an interrupt may cause a runqueue lock to be acquired. CRs-fixed: 828598 Change-Id: Id66f2e25ed067fc4af028482db8c3abd3d10c20f Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23sched: Use only partial wait time as task demandSyed Rameez Mustafa
The scheduler currently either considers a tasks entire wait time as task demand or completely ignores wait time based on the tunable sched_account_wait_time. Both approaches have their limitations, however. The former artificially boosts tasks demand when it may not actually be justified. With the latter, the scheduler runs the risk of never being able to recognize true load (consider two CPU hogs on a single little CPU). To achieve a compromise between these two extremes, change the load tracking algorithm to only consider part of a tasks wait time as its demand. The portion of wait time accounted as demand is determined by each tasks percent load, i.e. a task that waits for 10ms and has 60 % task load, only 6 ms of the wait will contribute to task demand. This approach is more fair as the scheduler now tries to determine how much of its wait time would a task actually have been using the CPU if it had been executing. It ensures that tasks with high demand continue to see most of the benefits of accounting wait time as busy time, however, lower demand tasks don't experience a disproportionately high boost to demand triggering unjustified big CPU usage. Note that this new approach is only applicable to wait time being considered as task demand and not wait time considered as CPU busy time. To achieve the above effect, ensure that anytime a task is waiting, its runtime in every relevant window segment is appropriately adjusted using its pct load. Change-Id: I6a698d6cb1adeca49113c3499029b422daf7871f Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: fix race conditions where HMP tunables changeJoonwoo Park
When multiple threads race to update HMP scheduler tunables, at present, the tunables which require big/small task count fix-up can be updated without fix-up and it can trigger BUG_ON(). This happens because sched_hmp_proc_update_handler() acquires rq locks and does fix-up only when number of big/small tasks affecting tunables are updated even though the function sched_hmp_proc_update_handler() calls set_hmp_defaults() which re-calculates all sysctl input data at that point. Consequently a thread that is trying to update a tunable which does not affect big/small task count can call set_hmp_defaults() and update big/small task count affecting tunable without fix-up if there is another thread and it just set fix-up needed sysctl value. Example of problem scenario : thread 0 thread 1 Set sched_small_task – needs fix up. Set sched_init_task_load – no fix up needed. proc_dointvec_minmax() completed which means sysctl_sched_small_task has new value. Call set_hmp_defaults() without lock/fixup. set_hmp_defaults() still updates sched_small_tasks with new sysctl_sched_small_task value by thread 0. Fix such issue by embracing proc update handler with already existing policy mutex. CRs-fixed: 812443 Change-Id: I7aa4c0efc1ca56e28dc0513480aca3264786d4f7 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: check HMP scheduler tunables validityJoonwoo Park
Check tunables validity to take valid values only. CRs-fixed: 812443 Change-Id: Ibb9ec0d6946247068174ab7abe775a6389412d5b Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Update max_capacity when an entire cluster is hotpluggedSyed Rameez Mustafa
When an entire cluster is hotplugged, the scheduler's notion of max_capacity can get outdated. This introduces the following inefficiencies in behavior: * task_will_fit() does not return true on all tasks. Consequently all big tasks go through fallback CPU selection logic skipping C-state and power checks in select_best_cpu(). * During boost, migration_needed() return true unnecessarily causing an avoidable rerun of select_best_cpu(). * An unnecessary kick is sent to all little CPUs when boost is set. * An opportunity for early bailout from nohz_kick_needed() is lost. Start handling CPUFREQ_REMOVE_POLICY in the policy notifier callback which indicates the last CPU in a cluster being hotplugged out. Also modify update_min_max_capacity() to only iterate through online CPUs instead of possible CPUs. While we can't guarantee the integrity of the cpu_online_mask in the notifier callback, the scheduler will fix up all state soon after any changes to the online mask. The change does have one side effect; early termination from the notifier callback when min_max_freq or max_possible_freq remain unchanged is no longer possible. This is because when the last CPU in a cluster is hot removed, only max_capacity is updated without affecting min_max_freq or max_possible_freq. Therefore, when the first CPU in the same cluster gets hot added at a later point max_capacity must once again be recomputed despite there being no change in min_max_freq or max_possible_freq. Change-Id: I9a1256b5c2cd6fcddd85b069faf5e2ace177e122 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: Ensure attempting load balance when HMP active balance flags are setSyed Rameez Mustafa
find_busiest_group() can end up returning a NULL group due to load based checks even though there are tasks that can be migrated to higher capacity CPUs (LBF_BIG_TASK_ACTIVE_BALANCE) or EA core rotation is possible (LBF_EA_ACTIVE_BALANCE). To get best power and performance ensure that load balance does attempt to pull tasks when HMP_ACTIVE_BALANCE flag is set. Since sched boost also falls under the same category club it into the same generic condition. Change-Id: I3db7ec200d2a038917b1f2341602eb87b5aed289 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: add scheduling latency tracking procfs nodeJoonwoo Park
Add a new procfs node /proc/sys/kernel/sched_max_latency_us to track the worst scheduling latency. It provides easier way to identify maximum scheduling latency seen across the CPUs. Change-Id: I6e435bbf825c0a4dff2eded4a1256fb93f108d0e [joonwoop@codeaurora.org: fixed conflict in update_stats_wait_end().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: warn/panic upon excessive scheduling latencyJoonwoo Park
Add new tunables /proc/sys/kernel/sched_latency_warn_threshold_us and /proc/sys/kernel/sched_latency_panic_threshold_us to warn or panic for the cases that tasks are runnable but not scheduled more than configured time. This helps to find out unacceptably high scheduling latency more easily. Change-Id: If077aba6211062cf26ee289970c5abcd1c218c82 [joonwoop@codeaurora.org: fixed conflict in update_stats_wait_end().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched/core: Fix incorrect wait time and wait count statisticsJoonwoo Park
At present scheduler resets task's wait start timestamp when the task migrates to another rq. This misleads scheduler itself into reporting less wait time than actual by omitting time spent for waiting prior to migration and also more wait count than actual by counting migration as wait end event which can be seen by trace or /proc/<pid>/sched with CONFIG_SCHEDSTATS=y. Carry forward migrating task's wait time prior to migration and don't count migration as a wait end event to fix such statistics error. In order to determine whether task is migrating mark task->on_rq with TASK_ON_RQ_MIGRATING while dequeuing and enqueuing due to migration. Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: ohaugan@codeaurora.org Link: http://lkml.kernel.org/r/20151113033854.GA4247@codeaurora.org Signed-off-by: Ingo Molnar <mingo@kernel.org> [joonwoop@codeaurora.org: fixed minor conflict in detach_task().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> Change-Id: I2d7f7d9895815430ad61383e62d28d889cce66c3
2016-03-23sched: Update cur_freq in the cpufreq policy notifier callbackSyed Rameez Mustafa
At boot, the cpufreq framework sends transition notifiers before sending out the policy notifier. Since the scheduler relies on the policy notifier to build up the frequency domain masks, when the initial set of transition notifiers are sent, the scheduler has no frequency domains. As a result the scheduler fails to update the cur_freq information. Update cur_freq as part of the policy notifier so that the scheduler always has the current frequency information. Change-Id: I7bd2958dfeb064dd20b9ccebafd372436484e5d6 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: avoid CPUs with high irq activity for non-small tasksJoonwoo Park
The irq-aware scheduler is to achieve better performance by avoiding task placement to the CPUs which have high irq activity. However current scheduler places tasks to the CPUs which are loaded by irq activity preferably as opposed to what it is meant to be when the task is non-small. This is suboptimal for both power and performance. Fix task placement algorithm to avoid CPUs with significant irq activities. Change-Id: Ifa5a6ac186241bd58fa614e93e3d873a5f5ad4ca Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: actively migrate big tasks on power CPU to idle performance CPUJoonwoo Park
When performance CPU runs idle or newly idle load balancer to pull a task on power efficient CPU, the load balancer always fails and enters idle mode if the big task on the power efficient CPU is running. This is suboptimal when the running task on the power efficient CPU doesn't fit on the power efficient CPU as it's quite possible that the big task will sustain on the power efficient CPU until it's preempted while there is a performance CPU sitting idle. Revise load balancer algorithm to actively migrate big tasks on power efficient CPU to performance CPU when performance CPU runs idle or newly idle load balancer. Change-Id: Iaf05e0236955fdcc7ded0ff09af0880050a2be32 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org> [rameezmustafa@codeaurora.org: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in group_classify().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Add cgroup-based criteria for upmigrationSrivatsa Vaddagiri
It may be desirable to discourage upmigration of tasks belonging to some cgroups. Add a per-cgroup flag (upmigrate_discourage) that discourages upmigration of tasks of a cgroup. Tasks of the cgroup are allowed to upmigrate only under overcommitted scenario. Change-Id: I1780e420af1b6865c5332fb55ee1ee408b74d8ce Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org: Use new cgroup APIs] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: avoid running idle_balance() on behalf of wrong CPUJoonwoo Park
With EA (Energy Awareness), idle_balance() on a CPU runs on behalf of most power efficient idle CPU among the CPUs in its sched domain level under the condition that the substitute idle CPU should be limited to a CPU which has the same capacity with original idle CPU. It is found that at present idle_balance() spans all the CPUs in its sched domain and run idle balancer on behalf of any CPU within the domain which could be all the CPUs in the system which consequently makes idle balancer on a performance CPU always runs on behalf of a power efficient idle CPU. This would cause for idle performance CPUs to fail to pull tasks from power efficient CPUs always when there is only an online performance CPU. Limit search CPUs to cache sharing CPUs with original idle CPU to ensure to run idle balancre on behalf of more power efficient CPU but still has the same capacity with original CPU to fix such issue. Change-Id: I0575290c24f28db011d9353915186e64df7e57fe Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: Keep track of average nr_big_tasksSrivatsa Vaddagiri
Extend sched_get_nr_running_avg() API to return average nr_big_tasks, in addition to average nr_running and average nr_io_wait tasks. Also add a new trace point to record values returned by sched_get_nr_running_avg() API. Change-Id: Id3591e6d04da8db484b4d1cb9d95dba075f5ab9a Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org: Resolve trivial merge conflicts] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: Fix bug in average nr_running and nr_iowait calculationSrivatsa Vaddagiri
sched_get_nr_running_avg() returns average nr_running and nr_iowait task count since it was last invoked. Fix several bugs in their calculation. * sched_update_nr_prod() needs to consider that nr_running count can change by more than 1 when CFS_BANDWIDTH feature is used * sched_get_nr_running_avg() needs to sum up nr_iowait count across all cpus, rather than just one * sched_get_nr_running_avg() could race with sched_update_nr_prod(), as a result of which it could use curr_time which is behind a cpu's 'last_time' value. That would lead to erroneous calculation of average nr_running or nr_iowait. While at it, fix also a bug in BUG_ON() check in sched_update_nr_prod() function and remove unnecessary nr_running argument to sched_update_nr_prod() function. Change-Id: I46737614737292fae0d7204c4648fb9b862f65b2 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org: Port to msm-3.18] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: Avoid pulling all tasks from a CPU during load balanceSyed Rameez Mustafa
When running load balance, the destination CPU checks the number of running tasks on the busiest CPU without holding the busiest CPUs runqueue lock. This opens the load balancer to a race whereby a third CPU running load balance at the same time; having found the same busiest group and queue, may have already pulled one of the waiting tasks from the busiest CPU. Under scenarios where the source CPU is running the idle task and only a single task remains waiting on the busiest runqueue (nr_running = 1), the destination CPU will end up pulling the only enqueued task from that CPU, leaving the destination CPU with nothing left to run. Fix this race, by reconfirming nr_running for the busiest CPU, after its runqueue lock has been obtained. Change-Id: I42e132b15f96d9d5d7b32ef4de3fb92d2f837e63 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23sched: Avoid pulling big tasks to the little cluster during load balanceSyed Rameez Mustafa
When a lower capacity CPU attempts to pull work from a higher capacity CPU, during load balance, it does not distinguish between tasks that will fit or not fit on the destination CPU. This causes suboptimal load balancing decisions whereby big tasks end up on the lower capacity CPUs and little tasks remain on higher capacity CPUs. Avoid this behavior, by first restricting search to only include tasks that fit on the destination CPU. If such a task cannot be found, remove this restriction so that any task can be pulled over to the destination CPU. This behavior is not applicable during sched_boost, however, as none of the tasks will fit on a lower capacity CPU. Change-Id: I1093420a629a0886fc3375849372ab7cf42e928e Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> [joonwoop@codeaurora.org: fixed minor conflict in can_migrate_task().] Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched: fix rounding error on scaled execution time calculationJoonwoo Park
It's found that the scaled execution time can be less than its actual time due to rounding errors. The HMP scheduler accumulates scaled execution time of tasks to determine if tasks are in need of up-migration. But the rounding error prevents the HMP scheduler from accumulating 100% load which prevents us from ever reaching an up-migrate of 100%. Fix rounding error by rounding quotient up. CRs-fixed: 759041 Change-Id: Ie4d9693593cc3053a292a29078aa56e6de8a2d52 Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23sched/fair: Respect wake to idle over sync wakeupOlav Haugan
Sync wakeup currently takes precedence over wake to idle flag. A sync wakeup causes a task to be placed on a non-idle CPU because we expect this CPU to become idle very shortly. However, even though the sync flag is set there is no guarantee that the task will go to sleep right away As a consequence performance suffers. Fix this by preferring an idle CPU over a potential busy cpu when both wake to idle and sync wakeup are set. Change-Id: I6b40a44e2b4d5b5fa6088e4f16428f9867bd928d CRs-fixed: 794424 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>