Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
sched_group_upmigrate tunable can accept values greater than
100%. Don't limit it to 100% while doing the auto adjustment.
Change-Id: I3d1c1e84f2f4dec688235feb1536b9261a3e808b
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
The sched_new_task_windows tunable is set to 5 in the scheduler
and it is not changed from user space. Remove this unused tunable.
Change-Id: I771e12b44876efe75ce87a90e4e9d69c22168b64
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
|
|
|
|
|
|
Setting sched_freq_reporting_policy tunable to an unsupported
values results in a warning from the scheduler. The previous
policy setting is also lost.
As sched_freq_reporting_policy can not be set to an incorrect
value now, remove the WARN_ON_ONCE from the scheduler.
Change-Id: I58d7e5dfefb7d11d2309bc05a1dd66acdc11b766
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
Clean up the code and make it more maintainable by removing dependency
on the sched_enable_hmp flag. We do not support HMP scheduler without
recompiling. Enabling the HMP scheduler is done through enabling the
CONFIG_SCHED_HMP config.
Change-Id: I246c1b1889f8dcbc8f0a0805077c0ce5d4f083b0
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
|
|
The sched average period can not be 0, disallow setting
it to 0. Otherwise CPUs stuck in sched_avg_update().
Change-Id: Ib9fcc5b35dface09d848ba7a737dc4ac0f05d8ee
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
update_task_burst() function's runtime argument type should
be u64 not int. Fix this to avoid potential overflow.
Change-Id: I33757b7b42f142138c1a099bb8be18c2a3bed331
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
There is no advantage of tracking busy time counters per related
thread group. We need busy time across all groups for either a CPU
or a frequency domain. Hence maintain group busy time counters in
the runqueue itself. When CPU window is rolled over, the group busy
counters are also rolled over. This eliminates the overhead of
individual group's window_start maintenance.
As we are preallocating related thread group now, this patch saves
40 * nr_cpu_ids * (nr_grp - 1) bytes memory.
Change-Id: Ieaaccea483b377f54ea1761e6939ee23a78a5e9c
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
|
|
|
|
The LBF_IGNORE_PREFERRED_CLUSTER_TASKS flag needs to be set
for all types of inter-cluster load balance. Currently this
is set only when higher capacity CPU is pulling the tasks from
a lower capacity CPU. This can result in the migration of grouped
tasks from higher capacity cluster to lower capacity cluster.
Change-Id: Ib0476c5c85781804798ef49268e1b193859ff5ef
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
There is use after free reported due to group
leader task is already freed but other tasks are
still holding the group leader task address in
task->group_leader pointer.
pid_nr_ns+0x10/0x38
cgroup_pidlist_start+0x144/0x400
cgroup_seqfile_start+0x1c/0x24
kernfs_seq_start+0x54/0x90
seq_read+0x15c/0x3a8
kernfs_fop_read+0x38/0x160
__vfs_read+0x28/0xc8
vfs_read+0x84/0xfc
Change-Id: Ib6b3fc75bf0d24a04455bf81d54900c21c434958
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
|
|
Add IRQ_AFFINITY_MANAGED flag and related kernel APIs so that
kernel driver can modify an irq's status in such a way that
user space affinity change will be ignored. Kernel space's
affinity setting will not be changed.
Change-Id: Ib2d5ea651263bff4317562af69079ad950c9e71e
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
|
|
Interupts marked with this flag are excluded from user space interrupt
affinity changes. Contrary to the IRQ_NO_BALANCING flag, the kernel internal
affinity mechanism is not blocked.
This flag will be used for multi-queue device interrupts.
Change-Id: I204c49bb1c8ce87fbcd163119093163b120bfe83
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-block@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Cc: axboe@fb.com
Cc: agordeev@redhat.com
Link: http://lkml.kernel.org/r/1467621574-8277-3-git-send-email-hch@lst.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Git-commit: 9c2555835bb3d34dfac52a0be943dcc4bedd650f
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[runminw@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
|
|
Cluster capacities should reflect differences in efficiency of
different clusters even in the absence of cpufreq. Currently
capacity is updated only when cpufreq policy notifier is received.
Therefore placement is suboptimal when cpufreq is turned off. Fix
this by updating capacities and load scaling factors during cluster
detection.
Change-Id: I47f63c1e374bbfd247a4302525afb37d55334bad
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
|
|
|
|
|
results"
|
|
We assume boot CPU as a sync CPU and initialize it's window_start
to sched_ktime_clock(). As windows are synchronized across all
CPUs, the secondary CPUs' window_start are initialized from the
sync_cpu's window_start. A CPU's window_start is never reset, so
this synchronization happens only once for a given CPU. Given this
fact, there is no need to reassigning the sync_cpu role to another
CPU when the boot CPU is going offline. Remove this unnecessary
maintenance of sync_cpu and use any online CPU's window_start as
reference.
Change-Id: I169a8e80573c6dbcb1edeab0659c07c17102f4c9
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
Colocation in HMP includes a tunable that turns on or off the feature
globally across all colocation groups. Supporting this tunable correctly
would result in complexity that would outweigh any foreseeable benefits.
For example, disabling the feature globally would involve deleting all
colocation groups one by one while ensuring no placement decisions are
made during the process.
Remove the tunable. Adding or removing a task from a colocation group is
still possible and so we're not losing functionality.
Change-Id: I4cb8bcdbee98d3bdd168baacbac345eca9ea8879
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
|
|
There are certain conditions under which group_will_fit() may return 0 for
all clusters in the system, especially under changing thermal conditions.
This may result in crashes such as this one:
CPU 0 | CPU 1
====================================================================
select_best_cpu() |
-> env.rtg = rtgA |
rtgA.pref_cluster=C_big |
| set_pref_cluster() for rtgA
| -> best_cluster()
| C_little doesn't fit
|
| IRQ: thermal mitigation
| C_big capacity now less
| than C_little capacity
|
| -> best_cluster() continues
| C_big doesn't fit
| set_pref_cluster() sets
| rtgA.pref_cluster = NULL
|
select_least_power_cluster() |
-> cluster_first_cpu() |
-> BUG() |
To add lock protection around accesses to the group's preferred cluster
would be expensive and defeat the point of the usage of RCU to protect
access to the related_thread_group structure. Therefore, ensure that
best_cluster() can never return NULL. In the worst case, we'll select the
wrong cluster for a related_thread_group's demand, but this should be
fixed in the next tick or wakeup etc. Locking would have still led to the
momentary wrong decision with the additional expense!
Also, don't set preferred cluster to NULL when colocation is disabled.
Change-Id: Id3f514b149add9b3ed33d104fa6a9bd57bec27e2
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
|
|
The 's' flag is supposed to indicate that a softirq is running. This
can be detected by testing the preempt_count with SOFTIRQ_OFFSET.
The current code tests the preempt_count with SOFTIRQ_MASK, which
would be true even when softirqs are disabled but not serving a
softirq.
Link: http://lkml.kernel.org/r/1481300417-3564-1-git-send-email-pkondeti@codeaurora.org
Change-Id: I084531ce806e0f7d42a38be0a7ad45977c43d158
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Git-commit: c59f29cb144a6a0dfac16ede9dc8eafc02dc56ca
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
|
|
|
|
|
|
|
|
|
|
|
|
Initialize variable at definition to avoid compiler warning when
compiling with CONFIG_OPTIMIZE_FOR_SIZE=n.
Change-Id: Ibd201877b2274c70ced9d7240d0e527bc77402f3
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
|
|
|
|
SYSCTL_WRITES_WARN was added in commit f4aacea2f5d1 ("sysctl: allow for
strict write position handling"), and released in v3.16 in August of
2014. Since then I can find only 1 instance of non-zero offset
writing[1], and it was fixed immediately in CRIU[2]. As such, it
appears safe to flip this to the strict state now.
[1] https://www.google.com/search?q="when%20file%20position%20was%20not%200"
[2] http://lists.openvz.org/pipermail/criu/2015-April/019819.html
Change-Id: Ibf8d46fa34fa9fd4df3527dc4dfc3e3d31b2f7e0
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 41662f5cc55335807d39404371cfcbb1909304c4
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
|
|
There are few compiler errors and warnings when CFS_BANDWIDTH
config is enabled but not SCHED_HMP.
Change-Id: Idaf4a7364564b6faf56df2eb3a1a74eeb242d57e
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
HMP scheduler boost feature related functions are referred in SMP
load balancer. Add the nop functions for the same to fix the
compiler errors with !SCHED_HMP.
Change-Id: I1cbcf67f728c2cbc7c0f47e8eaf1f4165649dce8
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
When mod_delayed_work() is concurrently executed, there a potential
live lock scenario due to pool->lock contention.
Lets say both CPU#0 and CPU#4 calls mod_delayed_work() on the same
work item with 0 delay on a bounded workqueue. This workitem has
run on CPU#4 previously. CPU#0 wins the work item PENDING bit race
and proceeds to queueing. As this work has previously run on CPU#4,
it tries to acquire the corresponding pool->lock to check if it is
still running there. In the meantime, CPU#4 loops in
try_to_grab_pending() for the workitem to be linked with a pwq so
that it can steal it from pwq->pool->worklist. The CPU#4 essentially
acquires and releases the pool->lock in a busy loop and CPU#0 may
never gets this lock.
---------------- --------------------
CPU#0 CPU#4
--------------- --------------------
blk_run_queue_async()
mod_delayed_work_on() queue_unplugged()
--> try_to_grab_pending() returns blk_run_queue_async()
0 indicating PENDING bit is set
now.
__queue_delayed_work() mod_delayed_work_on()
__queue_work() try_to_grab_pending()
{
--> waiting for the CPU#4's acquire pool->lock()
pool->lock release pool->lock()
}
Change-Id: I9aeab111f55a19478a9d045c8e3576bce3b7a7c5
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
When perf_group_detach is called on a group leader,
it should empty its sibling list. Otherwise, when
a sibling is later deallocated, list_del_event()
removes the sibling's group_entry from its current
list, which can be the now-deallocated group leader's
sibling list (use-after-free bug).
Bug: 32402548
Change-Id: I99f6bc97c8518df1cb0035814368012ba72ab1f1
Signed-off-by: John Dias <joaodias@google.com>
Git-repo: https://android.googlesource.com/kernel/msm
Git-commit: 6b6cfb2362f09553b46b3b7e5684b16b6e53e373
Signed-off-by: Dennis Cagle <d-cagle@codeaurora.org>
|
|
Since clusters can vary significantly in the power and performance
characteristics, there may be a need to have different CPU selection
policies based on which cluster a task is being placed on. For example
the placement policy can be more aggressive in using idle CPUs on
cluster that are power efficient and less aggressive on clusters
that are geared towards performance. Add support for per cluster
wake_up_idle flag to allow greater flexibility in placement policies.
Change-Id: I18cd3d907cd965db03a13f4655870dc10c07acfe
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
|
|
|
When frequency aggregation is enabled, there is a possibility of
rolling over the top task table multiple times in a single
window.
For example
- utra() is called with PUT_PREV_TASK for task 'A' which does not
belong to any related thread grp. Lets say window rollover happens.
rq counters and top task table rollover is done.
- utra() is called with PICK_NEXT_TASK/TASK_WAKE for task 'B' which
belongs to a related thread grp. Lets say this happens before
the grp's cpu_time->window_start is in sync with rq->window_start.
In this case, grp's cpu_time counters are rolled over and the
top task table is also rolled over again.
Roll over the top task table in the context of current running task
to fix this.
Change-Id: Iea3075e0ea460a9279a01ba42725890c46edd713
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
When early detection notification is pending, we skip calculating
predicted load. Initialize it to 0 so that stale value does not
get printed in trace_sched_get_busy().
Change-Id: I36287c0081f6c12191235104666172b7cae2a583
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
|
|
The documentation for schedule_timeout(), schedule_hrtimeout(), and
schedule_hrtimeout_range() all claim that the routines couldn't possibly
return early if the task state was TASK_UNINTERRUPTIBLE. This is simply
not true since wake_up_process() will cause those routines to exit early.
We cannot make schedule_[hr]timeout() loop until the timeout expires if the
task state is uninterruptible because we have users which rely on the
existing and designed behaviour.
Make the documentation match the (correct) implementation.
schedule_hrtimeout() returns -EINTR even when a uninterruptible task was
woken up. This might look strange, but making the return code depend on the
state is too much of an effort as it would affect all the call sites. There
is no value in doing so, but we spell it out clearly in the documentation.
Change-Id: I3e9bf91e7d285abcac134e32b02131b999d79f40
Suggested-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: huangtao@rock-chips.com
Cc: heiko@sntech.de
Cc: broonie@kernel.org
Cc: briannorris@chromium.org
Cc: Andreas Mohr <andi@lisas.de>
Cc: linux-rockchip@lists.infradead.org
Cc: tony.xie@rock-chips.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: linux@roeck-us.net
Cc: tskd08@gmail.com
Link: http://lkml.kernel.org/r/1477065531-30342-2-git-send-email-dianders@chromium.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Git-commit: 4b7e9cf9c84b09adc428e0433cd376b91f9c52a7
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
|
|
Users of usleep_range() expect that it will _never_ return in less time
than the minimum passed parameter. However, nothing in the code ensures
this, when the sleeping task is woken by wake_up_process() or any other
mechanism which can wake a task from uninterruptible state.
Neither usleep_range() nor schedule_hrtimeout_range*() have any protection
against wakeups. schedule_hrtimeout_range*() is designed this way despite
the fact that the API documentation does not mention it.
msleep() already has code to handle this case since it will loop as long
as there was still time left. usleep_range() has no such loop, add it.
Presumably this problem was not detected before because usleep_range() is
only used in a few places and the function is mostly used in contexts which
are not exposed to wakeups of any form.
An effort was made to look for users relying on the old behavior by
looking for usleep_range() in the same file as wake_up_process().
No problems were found by this search, though it is conceivable that
someone could have put the sleep and wakeup in two different files.
An effort was made to ask several upstream maintainers if they were aware
of people relying on wake_up_process() to wake up usleep_range(). No
maintainers were aware of that but they were aware of many people relying
on usleep_range() never returning before the minimum.
Change-Id: Ia403f0dc9cac711c8a4b6fcc4cf0094ad1358ed7
Reported-by: Tao Huang <huangtao@rock-chips.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: heiko@sntech.de
Cc: broonie@kernel.org
Cc: briannorris@chromium.org
Cc: Andreas Mohr <andi@lisas.de>
Cc: linux-rockchip@lists.infradead.org
Cc: tony.xie@rock-chips.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: djkurtz@chromium.org
Cc: linux@roeck-us.net
Cc: tskd08@gmail.com
Link: http://lkml.kernel.org/r/1477065531-30342-1-git-send-email-dianders@chromium.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Git-commit: 6c5e9059692567740a4ee51530dffe51a4b9584d
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
|
|
Linus noticed that lock_timer_base() lacks a READ_ONCE() for accessing the
timer flags. As a consequence the compiler is allowed to reload the flags
between the initial check for TIMER_MIGRATION and the following timer base
computation and the spin lock of the base.
While this has not been observed (yet), we need to make sure that it never
happens.
Change-Id: I577327e02ab77b6de951ac2aa936cb5d5a4f477a
Fixes: 0eeda71bc30d ("timer: Replace timer base by a cpu index")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610241711220.4983@nanos
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Git-commit: b831275a3553c32091222ac619cfddd73a5553fb
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[runminw@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
|
|
|
|
|
|
Heavy task prediction code needs further tuning to avoid any
negative power impact. Delete the code for now instead of adding
tunables to avoid inefficiencies in the scheduler path.
Change-Id: I71e3b37a5c99e24bc5be93cc825d7e171e8ff7ce
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
|
|
In transfer_busy_time(), the new_task flag is set based on the active
window count prior to the call to update_task_ravg(). update_task_ravg()
however, can then increment the active window count and consequently
the new_task flag above becomes stale. This is turn leads to inaccurate
accounting whereby update_task_ravg() does accounting based on the fact
that the task is not new whereas transfer_busy_time() then continues to
do further accounting assuming that the task is new. The accounting
discrepancies are sometimes caught by some of the scheduler BUGs.
Fix the described problem by moving the check is_new_task() after the
call to update_task_ravg(). Also add two missing BUGs that would catch
the problem sooner rather than later.
Change-Id: I8dc4822e97cc03ebf2ca1ee2de95eb4e5851f459
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|