summaryrefslogtreecommitdiff
path: root/kernel/locking
AgeCommit message (Collapse)Author
2015-04-13Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: "Main changes: - jump label asm preparatory work for PowerPC (Anton Blanchard) - rwsem optimizations and cleanups (Davidlohr Bueso) - mutex optimizations and cleanups (Jason Low) - futex fix (Oleg Nesterov) - remove broken atomicity checks from {READ,WRITE}_ONCE() (Peter Zijlstra)" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: powerpc, jump_label: Include linux/jump_label.h to get HAVE_JUMP_LABEL define jump_label: Allow jump labels to be used in assembly jump_label: Allow asm/jump_label.h to be included in assembly locking/mutex: Further simplify mutex_spin_on_owner() locking: Remove atomicy checks from {READ,WRITE}_ONCE locking/rtmutex: Rename argument in the rt_mutex_adjust_prio_chain() documentation as well locking/rwsem: Fix lock optimistic spinning when owner is not running locking: Remove ACCESS_ONCE() usage locking/rwsem: Check for active lock before bailing on spinning locking/rwsem: Avoid deceiving lock spinners locking/rwsem: Set lock ownership ASAP locking/rwsem: Document barrier need when waking tasks locking/futex: Check PF_KTHREAD rather than !p->mm to filter out kthreads locking/mutex: Refactor mutex_spin_on_owner() locking/mutex: In mutex_spin_on_owner(), return true when owner changes
2015-04-09locking/mutex: Further simplify mutex_spin_on_owner()Jason Low
Similar to what Linus suggested for rwsem_spin_on_owner(), in mutex_spin_on_owner() instead of having while (true) and breaking out of the spin loop on lock->owner != owner, we can have the loop directly check for while (lock->owner == owner) to improve the readability of the code. It also shrinks the code a bit: text data bss dec hex filename 3721 0 0 3721 e89 mutex.o.before 3705 0 0 3705 e79 mutex.o.after Signed-off-by: Jason Low <jason.low2@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1428521960-5268-2-git-send-email-jason.low2@hp.com [ Added code generation info. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-25locking/rtmutex: Rename argument in the rt_mutex_adjust_prio_chain() ↵Tom(JeHyeon) Yeon
documentation as well The following commit changed "deadlock_detect" to "chwalk": 8930ed80f970 ("rtmutex: Cleanup deadlock detector debug logic") do that rename in the function's documentation as well. Signed-off-by: Tom(JeHyeon) Yeon <tom.yeon@windriver.com> Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1426655010-31651-1-git-send-email-tom.yeon@windriver.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23lockdep: Fix the module unload key range freeing logicPeter Zijlstra
Module unload calls lockdep_free_key_range(), which removes entries from the data structures. Most of the lockdep code OTOH assumes the data structures are append only; in specific see the comments in add_lock_to_list() and look_up_lock_class(). Clearly this has only worked by accident; make it work proper. The actual scenario to make it go boom would involve the memory freed by the module unlock being re-allocated and re-used for a lock inside of a rcu-sched grace period. This is a very unlikely scenario, still better plug the hole. Use RCU list iteration in all places and ammend the comments. Change lockdep_free_key_range() to issue a sync_sched() between removal from the lists and returning -- which results in the memory being freed. Further ensure the callers are placed correctly and comment the requirements. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrey Tsyvarev <tsyvarev@ispras.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-07locking/rwsem: Fix lock optimistic spinning when owner is not runningJason Low
Ming reported soft lockups occurring when running xfstest due to the following tip:locking/core commit: b3fd4f03ca0b ("locking/rwsem: Avoid deceiving lock spinners") When doing optimistic spinning in rwsem, threads should stop spinning when the lock owner is not running. While a thread is spinning on owner, if the owner reschedules, owner->on_cpu returns false and we stop spinning. However, this commit essentially caused the check to get ignored because when we break out of the spin loop due to !on_cpu, we continue spinning if sem->owner != NULL. This patch fixes this by making sure we stop spinning if the owner is not running. Furthermore, just like with mutexes, refactor the code such that we don't have separate checks for owner_running(). This makes it more straightforward in terms of why we exit the spin on owner loop and we would also avoid needing to "guess" why we broke out of the loop to make this more readable. Reported-and-tested-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jason Low <jason.low2@hp.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1425714331.2475.388.camel@j-VirtualBox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-01locking/rtmutex: Set state back to running on errorSebastian Andrzej Siewior
The "usual" path is: - rt_mutex_slowlock() - set_current_state() - task_blocks_on_rt_mutex() (ret 0) - __rt_mutex_slowlock() - sleep or not but do return with __set_current_state(TASK_RUNNING) - back to caller. In the early error case where task_blocks_on_rt_mutex() return -EDEADLK we never change the task's state back to RUNNING. I assume this is intended. Without this change after ww_mutex using rt_mutex the selftest passes but later I get plenty of: | bad: scheduling from the idle thread! backtraces. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: afffc6c1805d ("locking/rtmutex: Optimize setting task running after being blocked") Link: http://lkml.kernel.org/r/1425056229-22326-4-git-send-email-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-24locking: Remove ACCESS_ONCE() usageDavidlohr Bueso
With the new standardized functions, we can replace all ACCESS_ONCE() calls across relevant locking - this includes lockref and seqlock while at it. ACCESS_ONCE() does not work reliably on non-scalar types. For example gcc 4.6 and 4.7 might remove the volatile tag for such accesses during the SRA (scalar replacement of aggregates) step: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145 Update the new calls regardless of if it is a scalar type, this is cleaner than having three alternatives. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1424662301.6539.18.camel@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-24Merge tag 'v4.0-rc1' into locking/core, to refresh the tree before merging ↵Ingo Molnar
new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-21Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Two fixes: the paravirt spin_unlock() corruption/crash fix, and an rtmutex NULL dereference crash fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/spinlocks/paravirt: Fix memory corruption on unlock locking/rtmutex: Avoid a NULL pointer dereference on deadlock
2015-02-18locking/rwsem: Check for active lock before bailing on spinningDavidlohr Bueso
37e9562453b ("locking/rwsem: Allow conservative optimistic spinning when readers have lock") forced the default for optimistic spinning to be disabled if the lock owner was nil, which makes much sense for readers. However, while it is not our priority, we can make some optimizations for write-mostly workloads. We can bail the spinning step and still be conservative if there are any active tasks, otherwise there's really no reason not to spin, as the semaphore is most likely unlocked. This patch recovers most of a Unixbench 'execl' benchmark throughput by sleeping less and making better average system usage: before: CPU %user %nice %system %iowait %steal %idle all 0.60 0.00 8.02 0.00 0.00 91.38 after: CPU %user %nice %system %iowait %steal %idle all 1.22 0.00 70.18 0.00 0.00 28.60 Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1422609267-15102-6-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18locking/rwsem: Avoid deceiving lock spinnersDavidlohr Bueso
When readers hold the semaphore, the ->owner is nil. As such, and unlike mutexes, '!owner' does not necessarily imply that the lock is free. This will cause writers to potentially spin excessively as they've been mislead to thinking they have a chance of acquiring the lock, instead of blocking. This patch therefore enhances the counter check when the owner is not set by the time we've broken out of the loop. Otherwise we can return true as a new owner has the lock and thus we want to continue spinning. While at it, we can make rwsem_spin_on_owner() less ambiguos and return right away under need_resched conditions. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1422609267-15102-5-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18locking/rwsem: Set lock ownership ASAPDavidlohr Bueso
In order to optimize the spinning step, we need to set the lock owner as soon as the lock is acquired; after a successful counter cmpxchg operation, that is. This is particularly useful as rwsems need to set the owner to nil for readers, so there is a greater chance of falling out of the spinning. Currently we only set the owner much later in the game, in the more generic level -- latency can be specially bad when waiting for a node->next pointer when releasing the osq in up_write calls. As such, update the owner inside rwsem_try_write_lock (when the lock is obtained after blocking) and rwsem_try_write_lock_unqueued (when the lock is obtained while spinning). This requires creating a new internal rwsem.h header to share the owner related calls. Also cleanup some headers for mutex and rwsem. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1422609267-15102-4-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18locking/rwsem: Document barrier need when waking tasksDavidlohr Bueso
The need for the smp_mb() in __rwsem_do_wake() should be properly documented. Applies to both xadd and spinlock variants. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1422609267-15102-3-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18locking/mutex: Refactor mutex_spin_on_owner()Jason Low
As suggested by Davidlohr, we could refactor mutex_spin_on_owner(). Currently, we split up owner_running() with mutex_spin_on_owner(). When the owner changes, we make duplicate owner checks which are not necessary. It also makes the code a bit obscure as we are using a second check to figure out why we broke out of the loop. This patch modifies it such that we remove the owner_running() function and the mutex_spin_on_owner() loop directly checks for if the owner changes, if the owner is not running, or if we need to reschedule. If the owner changes, we break out of the loop and return true. If the owner is not running or if we need to reschedule, then break out of the loop and return false. Suggested-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: chegu_vinod@hp.com Cc: tglx@linutronix.de Link: http://lkml.kernel.org/r/1422914367-5574-3-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18locking/mutex: In mutex_spin_on_owner(), return true when owner changesJason Low
In the mutex_spin_on_owner(), we return true only if lock->owner == NULL. This was beneficial in situations where there were multiple threads simultaneously spinning for the mutex. If another thread got the lock while other spinner(s) were also doing mutex_spin_on_owner(), then the other spinners would stop spinning. This workaround helped reduce the chance that many spinners were simultaneously spinning for the mutex which can help reduce contention in highly contended cases. However, recent changes were made to the optimistic spinning code such that instead of having all spinners simultaneously spin for the mutex, we queue the spinners with an MCS lock such that only one thread spins for the mutex at a time. Furthermore, the OSQ optimizations ensure that spinners in the queue will stop waiting if it needs to reschedule. Now, we don't have to worry about multiple threads spinning on owner at the same time, and if lock->owner is not NULL at this point, it likely means another thread happens to obtain the lock in the fastpath. In this case, it would make sense for the spinner to continue spinning as long as the spinner doesn't need to schedule and the mutex owner is running. This patch changes this so that mutex_spin_on_owner() returns true when the lock owner changes, which means a thread will only stop spinning if it either needs to reschedule or if the lock owner is not running. We saw up to a 5% performance improvement in the fserver workload with this patch. Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: chegu_vinod@hp.com Cc: tglx@linutronix.de Link: http://lkml.kernel.org/r/1422914367-5574-2-git-send-email-jason.low2@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18locking/rtmutex: Avoid a NULL pointer dereference on deadlockSebastian Andrzej Siewior
With task_blocks_on_rt_mutex() returning early -EDEADLK we never add the waiter to the waitqueue. Later, we try to remove it via remove_waiter() and go boom in rt_mutex_top_waiter() because rb_entry() gives a NULL pointer. ( Tested on v3.18-RT where rtmutex is used for regular mutex and I tried to get one twice in a row. ) Not sure when this started but I guess 397335f004f4 ("rtmutex: Fix deadlock detector for real") or commit 3d5c9340d194 ("rtmutex: Handle deadlock detection smarter"). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> # for v3.16 and later kernels Link: http://lkml.kernel.org/r/1424187823-19600-1-git-send-email-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: - The remaining patches for the z13 machine support: kernel build option for z13, the cache synonym avoidance, SMT support, compare-and-delay for spinloops and the CES5S crypto adapater. - The ftrace support for function tracing with the gcc hotpatch option. This touches common code Makefiles, Steven is ok with the changes. - The hypfs file system gets an extension to access diagnose 0x0c data in user space for performance analysis for Linux running under z/VM. - The iucv hvc console gets wildcard spport for the user id filtering. - The cacheinfo code is converted to use the generic infrastructure. - Cleanup and bug fixes. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (42 commits) s390/process: free vx save area when releasing tasks s390/hypfs: Eliminate hypfs interval s390/hypfs: Add diagnose 0c support s390/cacheinfo: don't use smp_processor_id() in preemptible context s390/zcrypt: fixed domain scanning problem (again) s390/smp: increase maximum value of NR_CPUS to 512 s390/jump label: use different nop instruction s390/jump label: add sanity checks s390/mm: correct missing space when reporting user process faults s390/dasd: cleanup profiling s390/dasd: add locking for global_profile access s390/ftrace: hotpatch support for function tracing ftrace: let notrace function attribute disable hotpatching if necessary ftrace: allow architectures to specify ftrace compile options s390: reintroduce diag 44 calls for cpu_relax() s390/zcrypt: Add support for new crypto express (CEX5S) adapter. s390/zcrypt: Number of supported ap domains is not retrievable. s390/spinlock: add compare-and-delay to lock wait loops s390/tape: remove redundant if statement s390/hvc_iucv: add simple wildcard matches to the iucv allow filter ...
2015-02-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) More iov_iter conversion work from Al Viro. [ The "crypto: switch af_alg_make_sg() to iov_iter" commit was wrong, and this pull actually adds an extra commit on top of the branch I'm pulling to fix that up, so that the pre-merge state is ok. - Linus ] 2) Various optimizations to the ipv4 forwarding information base trie lookup implementation. From Alexander Duyck. 3) Remove sock_iocb altogether, from CHristoph Hellwig. 4) Allow congestion control algorithm selection via routing metrics. From Daniel Borkmann. 5) Make ipv4 uncached route list per-cpu, from Eric Dumazet. 6) Handle rfs hash collisions more gracefully, also from Eric Dumazet. 7) Add xmit_more support to r8169, e1000, and e1000e drivers. From Florian Westphal. 8) Transparent Ethernet Bridging support for GRO, from Jesse Gross. 9) Add BPF packet actions to packet scheduler, from Jiri Pirko. 10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer. 11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman Kwok. 12) More sanely handle out-of-window dupacks, which can result in serious ACK storms. From Neal Cardwell. 13) Various rhashtable bug fixes and enhancements, from Herbert Xu, Patrick McHardy, and Thomas Graf. 14) Support xmit_more in be2net, from Sathya Perla. 15) Group Policy extensions for vxlan, from Thomas Graf. 16) Remove Checksum Offload support for vxlan, from Tom Herbert. 17) Like ipv4, support lockless transmit over ipv6 UDP sockets. From Vlad Yasevich. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits) crypto: fix af_alg_make_sg() conversion to iov_iter ipv4: Namespecify TCP PMTU mechanism i40e: Fix for stats init function call in Rx setup tcp: don't include Fast Open option in SYN-ACK on pure SYN-data openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set ipv6: Make __ipv6_select_ident static ipv6: Fix fragment id assignment on LE arches. bridge: Fix inability to add non-vlan fdb entry net: Mellanox: Delete unnecessary checks before the function call "vunmap" cxgb4: Add support in cxgb4 to get expansion rom version via ethtool ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version net: dsa: Remove redundant phy_attach() IB/mlx4: Reset flow support for IB kernel ULPs IB/mlx4: Always use the correct port for mirrored multicast attachments net/bonding: Fix potential bad memory access during bonding events tipc: remove tipc_snprintf tipc: nl compat add noop and remove legacy nl framework tipc: convert legacy nl stats show to nl compat tipc: convert legacy nl net id get to nl compat tipc: convert legacy nl net id set to nl compat ...
2015-02-09Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main scheduler changes in this cycle were: - various sched/deadline fixes and enhancements - rescheduling latency fixes/cleanups - rework the rq->clock code to be more consistent and more robust. - minor micro-optimizations - ->avg.decay_count fixes - add a stack overflow check to might_sleep() - idle-poll handler fix, possibly resulting in power savings - misc smaller updates and fixes" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/Documentation: Remove unneeded word sched/wait: Introduce wait_on_bit_timeout() sched: Pull resched loop to __schedule() callers sched/deadline: Remove cpu_active_mask from cpudl_find() sched: Fix hrtick_start() on UP sched/deadline: Avoid pointless __setscheduler() sched/deadline: Fix stale yield state sched/deadline: Fix hrtick for a non-leftmost task sched/deadline: Modify cpudl::free_cpus to reflect rd->online sched/idle: Add missing checks to the exit condition of cpu_idle_poll() sched: Fix missing preemption opportunity sched/rt: Reduce rq lock contention by eliminating locking of non-feasible target sched/debug: Print rq->clock_task sched/core: Rework rq->clock update skips sched/core: Validate rq_clock*() serialization sched/core: Remove check of p->sched_class sched/fair: Fix sched_entity::avg::decay_count initialization sched/debug: Fix potential call to __ffs(0) in sched_show_task() sched/debug: Check for stack overflow in ___might_sleep() sched/fair: Fix the dealing with decay_count in __synchronize_entity_decay()
2015-02-04locking/rtmutex: Optimize setting task running after being blockedDavidlohr Bueso
We explicitly mark the task running after returning from a __rt_mutex_slowlock() call, which does the actual sleeping via wait-wake-trylocking. As such, this patch does two things: (1) refactors the code so that setting current to TASK_RUNNING is done by __rt_mutex_slowlock(), and not by the callers. The downside to this is that it becomes a bit unclear when at what point we block. As such I've added a comment that the task blocks when calling __rt_mutex_slowlock() so readers can figure out when it is running again. (2) relaxes setting current's state through __set_current_state(), instead of it's more expensive barrier alternative. There was no need for the implied barrier as we're obviously not planning on blocking. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422857784.18096.1.camel@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04locking/rwsem: Use task->state helpersDavidlohr Bueso
Call __set_task_state() instead of assigning the new state directly. These interfaces also aid CONFIG_DEBUG_ATOMIC_SLEEP environments, keeping track of who last changed the state. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Jason Low <jason.low2@hp.com> Cc: Michel Lespinasse <walken@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422257769-14083-2-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04locking/mutex: Explicitly mark task as running after wakeupDavidlohr Bueso
By the time we wake up and get the lock after being asleep in the slowpath, we better be running. As good practice, be explicit about this and avoid any mischief. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1421717961.4903.11.camel@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04sched/Documentation: Remove unneeded wordSharon Dvir
The second 'mutex' shouldn't be there, it can't be about the mutex, as the mutex can't be freed, but unlocked, the memory where the mutex resides however, can be freed. Signed-off-by: Sharon Dvir <sharon.dvir1@mail.huji.ac.il> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1422827252-31363-1-git-send-email-sharon.dvir1@mail.huji.ac.il Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-29ftrace: allow architectures to specify ftrace compile optionsHeiko Carstens
If the kernel is compiled with function tracer support the -pg compile option is passed to gcc to generate extra code into the prologue of each function. This patch replaces the "open-coded" -pg compile flag with a CC_FLAGS_FTRACE makefile variable which architectures can override if a different option should be used for code generation. Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-01-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/xen-netfront.c Minor overlapping changes in xen-netfront.c, mostly to do with some buffer management changes alongside the split of stats into TX and RX. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14locking/osq: No need for load/acquire when acquire-pollingDavidlohr Bueso
Both mutexes and rwsems took a performance hit when we switched over from the original mcs code to the cancelable variant (osq). The reason being the use of smp_load_acquire() when polling for node->locked. This is not needed as reordering is not an issue, as such, relax the barrier semantics. Paul describes the scenario nicely: https://lkml.org/lkml/2013/11/19/405 - If we start polling before the insertion is complete, all that happens is that the first few polls have no chance of seeing a lock grant. - Ordering the polling against the initialization -- the above xchg() is already doing that for us. The smp_load_acquire() when unqueuing make sense. In addition, we don't need to worry about leaking the critical region as osq is only used internally. This impacts both regular and large levels of concurrency, ie on a 40 core system with a disk intensive workload: disk-1 804.83 ( 0.00%) 828.16 ( 2.90%) disk-61 8063.45 ( 0.00%) 18181.82 (125.48%) disk-121 7187.41 ( 0.00%) 20119.17 (179.92%) disk-181 6933.32 ( 0.00%) 20509.91 (195.82%) disk-241 6850.81 ( 0.00%) 20397.80 (197.74%) disk-301 6815.22 ( 0.00%) 20287.58 (197.68%) disk-361 7080.40 ( 0.00%) 20205.22 (185.37%) disk-421 7076.13 ( 0.00%) 19957.33 (182.04%) disk-481 7083.25 ( 0.00%) 19784.06 (179.31%) disk-541 7038.39 ( 0.00%) 19610.92 (178.63%) disk-601 7072.04 ( 0.00%) 19464.53 (175.23%) disk-661 7010.97 ( 0.00%) 19348.23 (175.97%) disk-721 7069.44 ( 0.00%) 19255.33 (172.37%) disk-781 7007.58 ( 0.00%) 19103.14 (172.61%) disk-841 6981.18 ( 0.00%) 18964.22 (171.65%) disk-901 6968.47 ( 0.00%) 18826.72 (170.17%) disk-961 6964.61 ( 0.00%) 18708.02 (168.62%) Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1420573509-24774-7-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-14locking/mcs: Better differentiate between MCS variantsDavidlohr Bueso
We have two flavors of the MCS spinlock: standard and cancelable (OSQ). While each one is independent of the other, we currently mix and match them. This patch: - Moves the OSQ code out of mcs_spinlock.h (which only deals with the traditional version) into include/linux/osq_lock.h. No unnecessary code is added to the more global header file, anything locks that make use of OSQ must include it anyway. - Renames mcs_spinlock.c to osq_lock.c. This file only contains osq code. - Introduces a CONFIG_LOCK_SPIN_ON_OWNER in order to only build osq_lock if there is support for it. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/1420573509-24774-5-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-14locking/mutex: Introduce ww_mutex_set_context_slowpath()Davidlohr Bueso
... which is equivalent to the fastpath counter part. This mainly allows getting some WW specific code out of generic mutex paths. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1420573509-24774-4-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-14locking/mutex: Move MCS related comments to proper locationDavidlohr Bueso
It serves much better if the comments are right before the osq_lock() call. Also delete a useless comment. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1420573509-24774-3-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-14locking/mutex: Checking the stamp is WW onlyDavidlohr Bueso
Mark it so by renaming __mutex_lock_check_stamp(). Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1420573509-24774-2-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09mutex: Always clear owner field upon mutex_unlock()Chris Wilson
Currently if DEBUG_MUTEXES is enabled, the mutex->owner field is only cleared iff debug_locks is active. This exposes a race to other users of the field where the mutex->owner may be still set to a stale value, potentially upsetting mutex_spin_on_owner() among others. References: https://bugs.freedesktop.org/show_bug.cgi?id=87955 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1420540175-30204-1-git-send-email-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-03spinlock: Add spin_lock_bh_nested()Thomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28locking/mutex: Don't assume TASK_RUNNINGPeter Zijlstra
We're going to make might_sleep() test for TASK_RUNNING, because blocking without TASK_RUNNING will destroy the task state by setting it to TASK_RUNNING. There are a few occasions where its 'valid' to call blocking primitives (and mutex_lock in particular) and not have TASK_RUNNING, typically such cases are right before we set TASK_RUNNING anyhow. Robustify the code by not assuming this; this has the beneficial side effect of allowing optional code emission for fixing the above might_sleep() false positives. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: tglx@linutronix.de Cc: ilya.dryomov@inktank.com Cc: umgwanakikbuti@gmail.com Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140924082241.988560063@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-13Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Ingo Molnar: "The main updates in this cycle were: - mutex MCS refactoring finishing touches: improve comments, refactor and clean up code, reduce debug data structure footprint, etc. - qrwlock finishing touches: remove old code, self-test updates. - small rwsem optimization - various smaller fixes/cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Revert qrwlock recusive stuff locking/rwsem: Avoid double checking before try acquiring write lock locking/rwsem: Move EXPORT_SYMBOL() lines to follow function definition locking/rwlock, x86: Delete unused asm/rwlock.h and rwlock.S locking/rwlock, x86: Clean up asm/spinlock*.h to remove old rwlock code locking/semaphore: Resolve some shadow warnings locking/selftest: Support queued rwlock locking/lockdep: Restrict the use of recursive read_lock() with qrwlock locking/spinlocks: Always evaluate the second argument of spin_lock_nested() locking/Documentation: Update locking/mutex-design.txt disadvantages locking/Documentation: Move locking related docs into Documentation/locking/ locking/mutexes: Use MUTEX_SPIN_ON_OWNER when appropriate locking/mutexes: Refactor optimistic spinning code locking/mcs: Remove obsolete comment locking/mutexes: Document quick lock release when unlocking locking/mutexes: Standardize arguments in lock/unlock slowpaths locking: Remove deprecated smp_mb__() barriers
2014-10-03locking/lockdep: Revert qrwlock recusive stuffPeter Zijlstra
Commit f0bab73cb539 ("locking/lockdep: Restrict the use of recursive read_lock() with qrwlock") changed lockdep to try and conform to the qrwlock semantics which differ from the traditional rwlock semantics. In particular qrwlock is fair outside of interrupt context, but in interrupt context readers will ignore all fairness. The problem modeling this is that read and write side have different lock state (interrupts) semantics but we only have a single representation of these. Therefore lockdep will get confused, thinking the lock can cause interrupt lock inversions. So revert it for now; the old rwlock semantics were already imperfectly modeled and the qrwlock extra won't fit either. If we want to properly fix this, I think we need to resurrect the work by Gautham did a few years ago that split the read and write state of locks: http://lwn.net/Articles/332801/ FWIW the locking selftest that would've failed (and was reported by Borislav earlier) is something like: RL(X1); /* IRQ-ON */ LOCK(A); UNLOCK(A); RU(X1); IRQ_ENTER(); RL(X1); /* IN-IRQ */ RU(X1); IRQ_EXIT(); At which point it would report that because A is an IRQ-unsafe lock we can suffer the following inversion: CPU0 CPU1 lock(A) lock(X1) lock(A) <IRQ> lock(X1) And this is 'wrong' because X1 can recurse (assuming the above lock are in fact read-lock) but lockdep doesn't know about this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Waiman Long <Waiman.Long@hp.com> Cc: ego@linux.vnet.ibm.com Cc: bp@alien8.de Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20140930132600.GA7444@worktop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-10-03locking/rwsem: Avoid double checking before try acquiring write lockJason Low
Commit 9b0fc9c09f1b ("rwsem: skip initial trylock in rwsem_down_write_failed") checks for if there are known active lockers in order to avoid write trylocking using expensive cmpxchg() when it likely wouldn't get the lock. However, a subsequent patch was added such that we directly check for sem->count == RWSEM_WAITING_BIAS right before trying that cmpxchg(). Thus, commit 9b0fc9c09f1b now just adds overhead. This patch modifies it so that we only do a check for if count == RWSEM_WAITING_BIAS. Also, add a comment on why we do an "extra check" of count before the cmpxchg(). Signed-off-by: Jason Low <jason.low2@hp.com> Acked-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Chegu Vinod <chegu_vinod@hp.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1410913017.2447.22.camel@j-VirtualBox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-30locktorture: Cleanup header usageDavidlohr Bueso
Remove some unnecessary ones and explicitly include rwsem.h Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-30locktorture: Cannot hold read and write lockDavidlohr Bueso
... trigger an error if so. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-30locktorture: Fix __acquire annotation for spinlock irqDavidlohr Bueso
Its quite easy to get mixed up with the names -- 'torture_spinlock_irq' is not actually a valid spinlock name. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-30locktorture: Support rwlocksDavidlohr Bueso
Add a "rw_lock" torture test to stress kernel rwlocks and their irq variant. Reader critical regions are 5x longer than writers. As such a similar ratio of lock acquisitions is seen in the statistics. In the case of massive contention, both hold the lock for 1/10 of a second. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-16locktorture: Introduce torture contextDavidlohr Bueso
The amount of global variables is getting pretty ugly. Group variables related to the execution (ie: not parameters) in a new context structure. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-16locktorture: Support rwsemsDavidlohr Bueso
We can easily do so with our new reader lock support. Just an arbitrary design default: readers have higher (5x) critical region latencies than writers: 50 ms and 10 ms, respectively. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-16locktorture: Add infrastructure for torturing read locksDavidlohr Bueso
Most of it is based on what we already have for writers. This allows readers to be very independent (and thus configurable), enabling future module parameters to control things such as rw distribution. Furthermore, readers have their own delaying function, allowing us to test different rw critical region latencies, and stress locking internals. Similarly, statistics, for now will only serve for the number of lock acquisitions -- as opposed to writers, readers have no failure detection. In addition, introduce a new nreaders_stress module parameter. The default number of readers will be the same number of writers threads. Writer threads are interleaved with readers. Documentation is updated, respectively. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-16torture: Address race in module cleanupDavidlohr Bueso
When performing module cleanups by calling torture_cleanup() the 'torture_type' string in nullified However, callers are not necessarily done, and might still need to reference the variable. This impacts both rcutorture and locktorture, causing printing things like: [ 94.226618] (null)-torture: Stopping lock_torture_writer task [ 94.226624] (null)-torture: Stopping lock_torture_stats task Thus delay this operation until the very end of the cleanup process. The consequence (which shouldn't matter for this kid of program) is, of course, that we delay the window between rmmod and modprobing, for instance in module_torture_begin(). Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-16locktorture: Make statistics genericDavidlohr Bueso
The statistics structure can serve well for both reader and writer locks, thus simply rename some fields that mention 'write' and leave the declaration of lwsa. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-16locktorture: Teach about lock debuggingDavidlohr Bueso
Regular locks are very different than locks with debugging. For instance for mutexes, debugging forces to only take the slowpaths. As such, the locktorture module should take this into account when printing related information -- specifically when printing user passed parameters, it seems the right place for such info. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-16locktorture: Support mutexesDavidlohr Bueso
Add a "mutex_lock" torture test. The main difference with the already existing spinlock tests is that the latency of the critical region is much larger. We randomly delay for (arbitrarily) either 500 ms or, otherwise, 25 ms. While this can considerably reduce the amount of writes compared to non blocking locks, if run long enough it can have the same torturous effect. Furthermore it is more representative of mutex hold times and can stress better things like thrashing. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-16locktorture: Rename locktorture_runnable parameterDavidlohr Bueso
... to just 'torture_runnable'. It follows other variable naming and is shorter. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-09-16locking/rwsem: Move EXPORT_SYMBOL() lines to follow function definitionDavidlohr Bueso
rw-semaphore is the only type of lock doing this ugliness of exporting at the end of the file. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: dave@stgolabs.net Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1410500066-5909-1-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-04locking/semaphore: Resolve some shadow warningsMark Rustad
Resolve some shadow warnings resulting from using the name jiffies, which is a well-known global. This is not a problem of course, but it could be a trap for someone copying and pasting code, and it just makes W=2 a little cleaner. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1409739444-13635-1-git-send-email-jeffrey.t.kirsher@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>