sched: Enhance the scheduler migration load fixup feature

In the current frequency guidance implementation the scheduler migrates task load from the source CPU to the destination CPU when a task migrates. The underlying assumption is that a task will stay on the destination CPU following the migration. Hence a CPU's load should reflect the sum of all tasks that last ran on that CPU prior to window expiration even if these tasks executed on some other CPU in that window prior to being migrated. However, given the ubiquitous nature of migrations the above assumption is flawed causing the scheduler to often add up load on a single CPU that in reality ran concurrently on multiple CPUs and will continue to run concurrently in subsequent windows. This leads to load over reporting on a single CPU which in turn causes CPU frequency to be higher than necessary. This is the first patch in a series of patches that attempts to change how load fixups are done upon migration to prevent load over reporting. In this patch, we stop doing migration fixups for intra-cluster migrations. Inter-cluster migration fixups are still retained. In order to achieve the above, we make use the per CPU footprint of each task introduced in the previous patch. Upon inter cluster migration, we go through every CPU in the source cluster to subtract the migrating task's contribution to the busy time on each one of those CPUs. The sum of the contributions is then added to the destination CPU allowing it to ramp up to the appropriate frequency for that task. Subtracting load from each of the source CPUs is not trivial, however, as it would require all runqueue locks to held. To get around this we introduce a deferred load subtraction mechanism whereby subtracting load from each of the source CPUs in deferred until an opportune moment. This opportune moment is when the governor comes asking the scheduler for load. At that time, all necessary runqueue locks are already held. There are a few cases to consider when doing deferred subtraction. Since we are not holding all runqueue locks other CPUs in the source cluster can be in a different window than the source CPU where the task is migrating from. Case 1: Other CPU in the source cluster is in the same window No special consideration Case 2: Other CPU in the source cluster is ahead by 1 window In this case, we will be doing redundant updates to subtraction load for the prev window. There is no way to avoid this redundant update though, without holding the rq lock. Case 3: Other CPU in the source cluster is trailing by 1 window In this case, we might end up overwriting old data for that CPU. But this is not a problem as when the other CPU calls update_task_ravg() it will move to the same window. This relies on maintaining synchronized windows between CPUs, which is true today. Finally, we must deal with frequency aggregation. When frequency aggregation is in effect, there is little point in dealing with per CPU footprint since the load of all related tasks have to be reported on a single CPU. Therefore when a task enters a related group we clear out all per CPU contributions and add it to the task CPU's cpu_time struct. From that point onwards we stop managing per CPU contributions upon inter cluster migrations since that work is redundant. Finally when a task exits a related group we must walk every CPU in reset all CPU contributions. We then set the task CPU contribution to the respective curr/prev sum values and add that sum to the task CPU rq runnable sum. Change-Id: I1f8d596e6c930f3f6f00e24109ddbe8b121f8d6b Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
author: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> 2016-05-19 17:06:47 -0700
committer: Syed Rameez Mustafa <rameezmustafa@codeaurora.org> 2016-10-17 12:43:54 -0700
commit: 7e1a4f15b2c38ea0d0207a6fc95b721c09d6f994 (patch)
tree: b1cf4fc3beaaf60acd4dbe14af477b6b75475771 /kernel/sched/sched.h
parent: eb7300e9a89edf0692fa53dbb6cb4214f9130927 (diff)
1 files changed, 10 insertions, 2 deletions
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f786767aa353..c107712643dc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -351,13 +351,22 @@ struct cfs_bandwidth { };
 
 #ifdef CONFIG_SCHED_HMP
 
+#define NUM_SUBTRACTION_WINDOWS 2
+
 struct hmp_sched_stats {
 	int nr_big_tasks;
 	u64 cumulative_runnable_avg;
 	u64 pred_demands_sum;
 };
 
+struct load_subtractions {
+	u64 window_start;
+	u64 subs;
+	u64 new_subs;
+};
+
 struct sched_cluster {
+	raw_spinlock_t load_lock;
 	struct list_head list;
 	struct cpumask cpus;
 	int id;
@@ -742,6 +751,7 @@ struct rq {
 	u64 prev_runnable_sum;
 	u64 nt_curr_runnable_sum;
 	u64 nt_prev_runnable_sum;
+	struct load_subtractions load_subs[NUM_SUBTRACTION_WINDOWS];
 #endif
 
 #ifdef CONFIG_IRQ_TIME_ACCOUNTING
@@ -1572,8 +1582,6 @@ static inline int update_preferred_cluster(struct related_thread_group *grp,
 static inline void add_new_task_to_grp(struct task_struct *new) {}
 
 #define sched_enable_hmp 0
-#define sched_freq_legacy_mode 1
-#define sched_migration_fixup	0
 #define PRED_DEMAND_DELTA (0)
 
 static inline void
author	Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2016-05-19 17:06:47 -0700
committer	Syed Rameez Mustafa <rameezmustafa@codeaurora.org>	2016-10-17 12:43:54 -0700
commit	7e1a4f15b2c38ea0d0207a6fc95b721c09d6f994 (patch)
tree	b1cf4fc3beaaf60acd4dbe14af477b6b75475771 /kernel/sched/sched.h
parent	eb7300e9a89edf0692fa53dbb6cb4214f9130927 (diff)