summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2014-08-09Merge tag 'trace-fixes-3.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull trace file read iterator fixes from Steven Rostedt: "This contains a fix for two long standing bugs. Both of which are rarely ever hit, and requires the user to do something that users rarely do. It took a few special test cases to even trigger this bug, and one of them was just one test in the process of finishing up as another one started. Both bugs have to do with the ring buffer iterator rb_iter_peek(), but one is more indirect than the other. The fist bug fix is simply an increase in the safety net loop counter. The counter makes sure that the rb_iter_peek() only iterates the number of times we expect it can, and no more. Well, there was one way it could iterate one more than we expected, and that caused the ring buffer to shutdown with a nasty warning. The fix was simply to up that counter by one. The other bug has to be with rb_iter_reset() (called by rb_iter_peek()). This happens when a user reads both the trace_pipe and trace files. The trace_pipe is a consuming read and does not use the ring buffer iterator, but the trace file is not a consuming read and does use the ring buffer iterator. When the trace file is being read, if it detects that a consuming read occurred, it resets the iterator and starts over. But the reset code that does this (rb_iter_reset()), checks if the reader_page is linked to the ring buffer or not, and will look into the ring buffer itself if it is not. This is wrong, as it should always try to read the reader page first. Not to mention, the code that looked into the ring buffer did it wrong, and used the header_page "read" offset to start reading on that page. That offset is bogus for pages in the writable ring buffer, and was corrupting the iterator, and it would start returning bogus events" * tag 'trace-fixes-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Always reset iterator to reader page ring-buffer: Up rb_iter_peek() loop count to 3
2014-08-06ring-buffer: Always reset iterator to reader pageSteven Rostedt (Red Hat)
When performing a consuming read, the ring buffer swaps out a page from the ring buffer with a empty page and this page that was swapped out becomes the new reader page. The reader page is owned by the reader and since it was swapped out of the ring buffer, writers do not have access to it (there's an exception to that rule, but it's out of scope for this commit). When reading the "trace" file, it is a non consuming read, which means that the data in the ring buffer will not be modified. When the trace file is opened, a ring buffer iterator is allocated and writes to the ring buffer are disabled, such that the iterator will not have issues iterating over the data. Although the ring buffer disabled writes, it does not disable other reads, or even consuming reads. If a consuming read happens, then the iterator is reset and starts reading from the beginning again. My tests would sometimes trigger this bug on my i386 box: WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa() Modules linked in: CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006 00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0 ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M Call Trace: [<c18796b3>] dump_stack+0x4b/0x75 [<c103a0e3>] warn_slowpath_common+0x7e/0x95 [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa [<c103a185>] warn_slowpath_fmt+0x33/0x35 [<c10bd85a>] __trace_find_cmdline+0x66/0xaa^M [<c10bed04>] trace_find_cmdline+0x40/0x64 [<c10c3c16>] trace_print_context+0x27/0xec [<c10c4360>] ? trace_seq_printf+0x37/0x5b [<c10c0b15>] print_trace_line+0x319/0x39b [<c10ba3fb>] ? ring_buffer_read+0x47/0x50 [<c10c13b1>] s_show+0x192/0x1ab [<c10bfd9a>] ? s_next+0x5a/0x7c [<c112e76e>] seq_read+0x267/0x34c [<c1115a25>] vfs_read+0x8c/0xef [<c112e507>] ? seq_lseek+0x154/0x154 [<c1115ba2>] SyS_read+0x54/0x7f [<c188488e>] syscall_call+0x7/0xb ---[ end trace 3f507febd6b4cc83 ]--- >>>> ##### CPU 1 buffer started #### Which was the __trace_find_cmdline() function complaining about the pid in the event record being negative. After adding more test cases, this would trigger more often. Strangely enough, it would never trigger on a single test, but instead would trigger only when running all the tests. I believe that was the case because it required one of the tests to be shutting down via delayed instances while a new test started up. After spending several days debugging this, I found that it was caused by the iterator becoming corrupted. Debugging further, I found out why the iterator became corrupted. It happened with the rb_iter_reset(). As consuming reads may not read the full reader page, and only part of it, there's a "read" field to know where the last read took place. The iterator, must also start at the read position. In the rb_iter_reset() code, if the reader page was disconnected from the ring buffer, the iterator would start at the head page within the ring buffer (where writes still happen). But the mistake there was that it still used the "read" field to start the iterator on the head page, where it should always start at zero because readers never read from within the ring buffer where writes occur. I originally wrote a patch to have it set the iter->head to 0 instead of iter->head_page->read, but then I questioned why it wasn't always setting the iter to point to the reader page, as the reader page is still valid. The list_empty(reader_page->list) just means that it was successful in swapping out. But the reader_page may still have data. There was a bug report a long time ago that was not reproducible that had something about trace_pipe (consuming read) not matching trace (iterator read). This may explain why that happened. Anyway, the correct answer to this bug is to always use the reader page an not reset the iterator to inside the writable ring buffer. Cc: stable@vger.kernel.org # 2.6.28+ Fixes: d769041f8653 "ring_buffer: implement new locking" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-08-06ring-buffer: Up rb_iter_peek() loop count to 3Steven Rostedt (Red Hat)
After writting a test to try to trigger the bug that caused the ring buffer iterator to become corrupted, I hit another bug: WARNING: CPU: 1 PID: 5281 at kernel/trace/ring_buffer.c:3766 rb_iter_peek+0x113/0x238() Modules linked in: ipt_MASQUERADE sunrpc [...] CPU: 1 PID: 5281 Comm: grep Tainted: G W 3.16.0-rc3-test+ #143 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007 0000000000000000 ffffffff81809a80 ffffffff81503fb0 0000000000000000 ffffffff81040ca1 ffff8800796d6010 ffffffff810c138d ffff8800796d6010 ffff880077438c80 ffff8800796d6010 ffff88007abbe600 0000000000000003 Call Trace: [<ffffffff81503fb0>] ? dump_stack+0x4a/0x75 [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97 [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238 [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238 [<ffffffff810c14df>] ? ring_buffer_iter_peek+0x2d/0x5c [<ffffffff810c6f73>] ? tracing_iter_reset+0x6e/0x96 [<ffffffff810c74a3>] ? s_start+0xd7/0x17b [<ffffffff8112b13e>] ? kmem_cache_alloc_trace+0xda/0xea [<ffffffff8114cf94>] ? seq_read+0x148/0x361 [<ffffffff81132d98>] ? vfs_read+0x93/0xf1 [<ffffffff81132f1b>] ? SyS_read+0x60/0x8e [<ffffffff8150bf9f>] ? tracesys+0xdd/0xe2 Debugging this bug, which triggers when the rb_iter_peek() loops too many times (more than 2 times), I discovered there's a case that can cause that function to legitimately loop 3 times! rb_iter_peek() is different than rb_buffer_peek() as the rb_buffer_peek() only deals with the reader page (it's for consuming reads). The rb_iter_peek() is for traversing the buffer without consuming it, and as such, it can loop for one more reason. That is, if we hit the end of the reader page or any page, it will go to the next page and try again. That is, we have this: 1. iter->head > iter->head_page->page->commit (rb_inc_iter() which moves the iter to the next page) try again 2. event = rb_iter_head_event() event->type_len == RINGBUF_TYPE_TIME_EXTEND rb_advance_iter() try again 3. read the event. But we never get to 3, because the count is greater than 2 and we cause the WARNING and return NULL. Up the counter to 3. Cc: stable@vger.kernel.org # 2.6.37+ Fixes: 69d1b839f7ee "ring-buffer: Bind time extend and data events together" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-08-05Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and time updates from Thomas Gleixner: "A rather large update of timers, timekeeping & co - Core timekeeping code is year-2038 safe now for 32bit machines. Now we just need to fix all in kernel users and the gazillion of user space interfaces which rely on timespec/timeval :) - Better cache layout for the timekeeping internal data structures. - Proper nanosecond based interfaces for in kernel users. - Tree wide cleanup of code which wants nanoseconds but does hoops and loops to convert back and forth from timespecs. Some of it definitely belongs into the ugly code museum. - Consolidation of the timekeeping interface zoo. - A fast NMI safe accessor to clock monotonic for tracing. This is a long standing request to support correlated user/kernel space traces. With proper NTP frequency correction it's also suitable for correlation of traces accross separate machines. - Checkpoint/restart support for timerfd. - A few NOHZ[_FULL] improvements in the [hr]timer code. - Code move from kernel to kernel/time of all time* related code. - New clocksource/event drivers from the ARM universe. I'm really impressed that despite an architected timer in the newer chips SoC manufacturers insist on inventing new and differently broken SoC specific timers. [ Ed. "Impressed"? I don't think that word means what you think it means ] - Another round of code move from arch to drivers. Looks like most of the legacy mess in ARM regarding timers is sorted out except for a few obnoxious strongholds. - The usual updates and fixlets all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) timekeeping: Fixup typo in update_vsyscall_old definition clocksource: document some basic timekeeping concepts timekeeping: Use cached ntp_tick_length when accumulating error timekeeping: Rework frequency adjustments to work better w/ nohz timekeeping: Minor fixup for timespec64->timespec assignment ftrace: Provide trace clocks monotonic timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC seqcount: Add raw_write_seqcount_latch() seqcount: Provide raw_read_seqcount() timekeeping: Use tk_read_base as argument for timekeeping_get_ns() timekeeping: Create struct tk_read_base and use it in struct timekeeper timekeeping: Restructure the timekeeper some more clocksource: Get rid of cycle_last clocksource: Move cycle_last validation to core code clocksource: Make delta calculation a function wireless: ath9k: Get rid of timespec conversions drm: vmwgfx: Use nsec based interfaces drm: i915: Use nsec based interfaces timekeeping: Provide ktime_get_raw() hangcheck-timer: Use ktime_get_ns() ...
2014-08-04Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Kernel side changes: - Consolidate the PMU interrupt-disabled code amongst architectures (Vince Weaver) - misc fixes Tooling changes (new features, user visible changes): - Add support for pagefault tracing in 'trace', please see multiple examples in the changeset messages (Stanislav Fomichev). - Add pagefault statistics in 'trace' (Stanislav Fomichev) - Add header for columns in 'top' and 'report' TUI browsers (Jiri Olsa) - Add pagefault statistics in 'trace' (Stanislav Fomichev) - Add IO mode into timechart command (Stanislav Fomichev) - Fallback to syscalls:* when raw_syscalls:* is not available in the perl and python perf scripts. (Daniel Bristot de Oliveira) - Add --repeat global option to 'perf bench' to be used in benchmarks such as the existing 'futex' one, that was modified to use it instead of a local option. (Davidlohr Bueso) - Fix fd -> pathname resolution in 'trace', be it using /proc or a vfs_getname probe point. (Arnaldo Carvalho de Melo) - Add suggestion of how to set perf_event_paranoid sysctl, to help non-root users trying tools like 'trace' to get a working environment. (Arnaldo Carvalho de Melo) - Updates from trace-cmd for traceevent plugin_kvm plus args cleanup (Steven Rostedt, Jan Kiszka) - Support S/390 in 'perf kvm stat' (Alexander Yarygin) Tooling infrastructure changes: - Allow reserving a row for header purposes in the hists browser (Arnaldo Carvalho de Melo) - Various fixes and prep work related to supporting Intel PT (Adrian Hunter) - Introduce multiple debug variables control (Jiri Olsa) - Add callchain and additional sample information for python scripts (Joseph Schuchart) - More prep work to support Intel PT: (Adrian Hunter) - Polishing 'script' BTS output - 'inject' can specify --kallsym - VDSO is per machine, not a global var - Expose data addr lookup functions previously private to 'script' - Large mmap fixes in events processing - Include standard stringify macros in power pc code (Sukadev Bhattiprolu) Tooling cleanups: - Convert open coded equivalents to asprintf() (Andy Shevchenko) - Remove needless reassignments in 'trace' (Arnaldo Carvalho de Melo) - Cache the is_exit syscall test in 'trace) (Arnaldo Carvalho de Melo) - No need to reimplement err() in 'perf bench sched-messaging', drop barf(). (Davidlohr Bueso). - Remove ev_name argument from perf_evsel__hists_browse, can be obtained from the other parameters. (Jiri Olsa) Tooling fixes: - Fix memory leak in the 'sched-messaging' perf bench test. (Davidlohr Bueso) - The -o and -n 'perf bench mem' options are mutually exclusive, emit error when both are specified. (Davidlohr Bueso) - Fix scrollbar refresh row index in the ui browser, problem exposed now that headers will be added and will be allowed to be switched on/off. (Jiri Olsa) - Handle the num array type in python properly (Sebastian Andrzej Siewior) - Fix wrong condition for allocation failure (Jiri Olsa) - Adjust callchain based on DWARF debug info on powerpc (Sukadev Bhattiprolu) - Fix a risk for doing free on uninitialized pointer in traceevent lib (Rickard Strandqvist) - Update attr test with PERF_FLAG_FD_CLOEXEC flag (Jiri Olsa) - Enable close-on-exec flag on perf file descriptor (Yann Droneaud) - Fix build on gcc 4.4.7 (Arnaldo Carvalho de Melo) - Event ordering fixes (Jiri Olsa)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (123 commits) Revert "perf tools: Fix jump label always changing during tracing" perf tools: Fix perf usage string leftover perf: Check permission only for parent tracepoint event perf record: Store PERF_RECORD_FINISHED_ROUND only for nonempty rounds perf record: Always force PERF_RECORD_FINISHED_ROUND event perf inject: Add --kallsyms parameter perf tools: Expose 'addr' functions so they can be reused perf session: Fix accounting of ordered samples queue perf powerpc: Include util/util.h and remove stringify macros perf tools: Fix build on gcc 4.4.7 perf tools: Add thread parameter to vdso__dso_findnew() perf tools: Add dso__type() perf tools: Separate the VDSO map name from the VDSO dso name perf tools: Add vdso__new() perf machine: Fix the lifetime of the VDSO temporary file perf tools: Group VDSO global variables into a structure perf session: Add ability to skip 4GiB or more perf session: Add ability to 'skip' a non-piped event stream perf tools: Pass machine to vdso__dso_findnew() perf tools: Add dso__data_size() ...
2014-08-04Merge tag 'trace-3.17-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing filter cleanups from Steven Rostedt: "Oleg Nesterov did several clean ups with the tracing filter code. As he found some small bugs that went into 3.16, and these changes were based on that, I had to apply his changes to a separate branch than my main development branch. This was based on work that was already pulled into 3.16, and is a separate pull request to keep from having local merges in my pull request" * tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Kill "filter_string" arg of replace_preds() tracing: Change apply_subsystem_event_filter() paths to check file->system == dir tracing: Kill ftrace_event_call->files tracing/uprobes: Kill the dead TRACE_EVENT_FL_USE_CALL_FILTER logic tracing: Kill call_filter_disable() tracing: Kill destroy_call_preds() tracing: Kill destroy_preds() and destroy_file_preds()
2014-08-04Merge tag 'trace-3.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This pull request has a lot of work done. The main thing is the changes to the ftrace function callback infrastructure. It's introducing a way to allow different functions to call directly different trampolines instead of all calling the same "mcount" one. The only user of this for now is the function graph tracer, which always had a different trampoline, but the function tracer trampoline was called and did basically nothing, and then the function graph tracer trampoline was called. The difference now, is that the function graph tracer trampoline can be called directly if a function is only being traced by the function graph trampoline. If function tracing is also happening on the same function, the old way is still done. The accounting for this takes up more memory when function graph tracing is activated, as it needs to keep track of which functions it uses. I have a new way that wont take as much memory, but it's not ready yet for this merge window, and will have to wait for the next one. Another big change was the removal of the ftrace_start/stop() calls that were used by the suspend/resume code that stopped function tracing when entering into suspend and resume paths. The stop of ftrace was done because there was some function that would crash the system if one called smp_processor_id()! The stop/start was a big hammer to solve the issue at the time, which was when ftrace was first introduced into Linux. Now ftrace has better infrastructure to debug such issues, and I found the problem function and labeled it with "notrace" and function tracing can now safely be activated all the way down into the guts of suspend and resume Other changes include clean ups of uprobe code, clean up of the trace_seq() code, and other various small fixes and clean ups to ftrace and tracing" * tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits) ftrace: Add warning if tramp hash does not match nr_trampolines ftrace: Fix trampoline hash update check on rec->flags ring-buffer: Use rb_page_size() instead of open coded head_page size ftrace: Rename ftrace_ops field from trampolines to nr_trampolines tracing: Convert local function_graph functions to static ftrace: Do not copy old hash when resetting tracing: let user specify tracing_thresh after selecting function_graph ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on() tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST s390/ftrace: remove check of obsolete variable function_trace_stop arm64, ftrace: Remove check of obsolete variable function_trace_stop Blackfin: ftrace: Remove check of obsolete variable function_trace_stop metag: ftrace: Remove check of obsolete variable function_trace_stop microblaze: ftrace: Remove check of obsolete variable function_trace_stop MIPS: ftrace: Remove check of obsolete variable function_trace_stop parisc: ftrace: Remove check of obsolete variable function_trace_stop sh: ftrace: Remove check of obsolete variable function_trace_stop sparc64,ftrace: Remove check of obsolete variable function_trace_stop tile: ftrace: Remove check of obsolete variable function_trace_stop ftrace: x86: Remove check of obsolete variable function_trace_stop ...
2014-07-28perf: Check permission only for parent tracepoint eventJiri Olsa
There's no need to check cloned event's permission once the parent was already checked. Also the code is checking 'current' process permissions, which is not owner process for cloned events, thus could end up with wrong permission check result. Reported-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Tested-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1405079782-8139-1-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-24ftrace: Add warning if tramp hash does not match nr_trampolinesSteven Rostedt (Red Hat)
After adding all the records to the tramp_hash, add a check that makes sure that the number of records added matches the number of records expected to match and do a WARN_ON and disable ftrace if they do not match. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-24ftrace: Fix trampoline hash update check on rec->flagsSteven Rostedt (Red Hat)
In the loop of ftrace_save_ops_tramp_hash(), it adds all the recs to the ops hash if the rec has only one callback attached and the ops is connected to the rec. It gives a nasty warning and shuts down ftrace if the rec doesn't have a trampoline set for it. But this can happen with the following scenario: # cd /sys/kernel/debug/tracing # echo schedule do_IRQ > set_ftrace_filter # mkdir instances/foo # echo schedule > instances/foo/set_ftrace_filter # echo function_graph > current_function # echo function > instances/foo/current_function # echo nop > instances/foo/current_function The above would then trigger the following warning and disable ftrace: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 3145 at kernel/trace/ftrace.c:2212 ftrace_run_update_code+0xe4/0x15b() Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ip [...] CPU: 1 PID: 3145 Comm: bash Not tainted 3.16.0-rc3-test+ #136 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007 0000000000000000 ffffffff81808a88 ffffffff81502130 0000000000000000 ffffffff81040ca1 ffff880077c08000 ffffffff810bd286 0000000000000001 ffffffff81a56830 ffff88007a041be0 ffff88007a872d60 00000000000001be Call Trace: [<ffffffff81502130>] ? dump_stack+0x4a/0x75 [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97 [<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b [<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b [<ffffffff810bda1a>] ? ftrace_shutdown+0x11c/0x16b [<ffffffff810bda87>] ? unregister_ftrace_function+0x1e/0x38 [<ffffffff810cc7e1>] ? function_trace_reset+0x1a/0x28 [<ffffffff810c924f>] ? tracing_set_tracer+0xc1/0x276 [<ffffffff810c9477>] ? tracing_set_trace_write+0x73/0x91 [<ffffffff81132383>] ? __sb_start_write+0x9a/0xcc [<ffffffff8120478f>] ? security_file_permission+0x1b/0x31 [<ffffffff81130e49>] ? vfs_write+0xac/0x11c [<ffffffff8113115d>] ? SyS_write+0x60/0x8e [<ffffffff81508112>] ? system_call_fastpath+0x16/0x1b ---[ end trace 938c4415cbc7dc96 ]--- ------------[ cut here ]------------ Link: http://lkml.kernel.org/r/20140723120805.GB21376@redhat.com Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-23ring-buffer: Use rb_page_size() instead of open coded head_page sizeSteven Rostedt (Red Hat)
There's a helper function to get a ring buffer page size (the number of bytes of data recorded on the page), called rb_page_size(). Use that instead of open coding it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-23ftrace: Provide trace clocks monotonicThomas Gleixner
Expose the new NMI safe accessor to clock monotonic to the tracer. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
2014-07-23ftrace: Rename ftrace_ops field from trampolines to nr_trampolinesSteven Rostedt (Red Hat)
Having two fields within the same struct that is off by one character can be confusing and error prone. Rename the counter "trampolines" to "nr_trampolines" to explicitly show it is a counter and not to be confused by the "trampoline" field. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-21tracing: Fix wraparound problems in "uptime" trace clockTony Luck
The "uptime" trace clock added in: commit 8aacf017b065a805d27467843490c976835eb4a5 tracing: Add "uptime" trace clock that uses jiffies has wraparound problems when the system has been up more than 1 hour 11 minutes and 34 seconds. It converts jiffies to nanoseconds using: (u64)jiffies_to_usecs(jiffy) * 1000ULL but since jiffies_to_usecs() only returns a 32-bit value, it truncates at 2^32 microseconds. An additional problem on 32-bit systems is that the argument is "unsigned long", so fixing the return value only helps until 2^32 jiffies (49.7 days on a HZ=1000 system). Avoid these problems by using jiffies_64 as our basis, and not converting to nanoseconds (we do convert to clock_t because user facing API must not be dependent on internal kernel HZ values). Link: http://lkml.kernel.org/p/99d63c5bfe9b320a3b428d773825a37095bf6a51.1405708254.git.tony.luck@intel.com Cc: stable@vger.kernel.org # 3.10+ Fixes: 8aacf017b065 "tracing: Add "uptime" trace clock that uses jiffies" Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18tracing: Convert local function_graph functions to staticSteven Rostedt (Red Hat)
Local functions should be static. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18ftrace: Do not copy old hash when resettingWang Nan
Do not waste time copying the old hash if the hash is going to be reset. Just allocate a new hash and free the old one, as that is the same result as copying te old one and then resetting it. Link: http://lkml.kernel.org/p/1405384820-48837-1-git-send-email-wangnan0@huawei.com Signed-off-by: Wang Nan <wangnan0@huawei.com> [ SDR: Removed unused ftrace_filter_reset() function ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18tracing: let user specify tracing_thresh after selecting function_graphStanislav Fomichev
Currently, tracing_thresh works only if we specify it before selecting function_graph tracer. If we do the opposite, tracing_thresh will change it's value, but it will not be applied. To fix it, we add update_thresh callback which is called whenever tracing_thresh is updated and for function_graph tracer we register handler which reinitializes tracer depending on tracing_thresh. Link: http://lkml.kernel.org/p/20140718111727.GA3206@stfomichev-desktop.yandex.net Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on()Corey Minyard
The code for resizing the trace ring buffers has to run the per-cpu resize on the CPU itself. The code was using preempt_off() and running the code for the current CPU directly, otherwise calling schedule_work_on(). At least on RT this could result in the following: |BUG: sleeping function called from invalid context at kernel/rtmutex.c:673 |in_atomic(): 1, irqs_disabled(): 0, pid: 607, name: bash |3 locks held by bash/607: |CPU: 0 PID: 607 Comm: bash Not tainted 3.12.15-rt25+ #124 |(rt_spin_lock+0x28/0x68) |(free_hot_cold_page+0x84/0x3b8) |(free_buffer_page+0x14/0x20) |(rb_update_pages+0x280/0x338) |(ring_buffer_resize+0x32c/0x3dc) |(free_snapshot+0x18/0x38) |(tracing_set_tracer+0x27c/0x2ac) probably via |cd /sys/kernel/debug/tracing/ |echo 1 > events/enable ; sleep 2 |echo 1024 > buffer_size_kb If we just always use schedule_work_on(), there's no need for the preempt_off(). So do that. Link: http://lkml.kernel.org/p/1405537633-31518-1-git-send-email-cminyard@mvista.com Reported-by: Stanislav Meduna <stano@meduna.org> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TESTSteven Rostedt (Red Hat)
All users of function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST have been removed. We can safely remove them from the kernel. Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18ftrace: Remove function_trace_stop check from list funcSteven Rostedt (Red Hat)
function_trace_stop is no longer used to stop function tracing. Remove the check from __ftrace_ops_list_func(). Also, call FTRACE_WARN_ON() instead of setting function_trace_stop if a ops has no func to call. Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18ftrace: Do no disable function tracing on enabling function tracingSteven Rostedt (Red Hat)
When function tracing is being updated function_trace_stop is set to keep from tracing the updates. This was fine when function tracing was done from stop machine. But it is no longer done that way and this can cause real tracing to be missed. Remove it. Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-18ftrace-graph: Remove usage of ftrace_stop() in ftrace_graph_stop()Steven Rostedt (Red Hat)
All archs now use ftrace_graph_is_dead() to stop function graph tracing. Remove the usage of ftrace_stop() as that is no longer needed. Cc: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-17ftrace-graph: Remove dependency of ftrace_stop() from ftrace_graph_stop()Steven Rostedt (Red Hat)
ftrace_stop() is going away as it disables parts of function tracing that affects users that should not be affected. But ftrace_graph_stop() is built on ftrace_stop(). Here's another example of killing all of function tracing because something went wrong with function graph tracing. Instead of disabling all users of function tracing on function graph error, disable only function graph tracing. A new function is created called ftrace_graph_is_dead(). This is called in strategic paths to prevent function graph from doing more harm and allowing at least a warning to be printed before the system crashes. NOTE: ftrace_stop() is still used until all the archs are converted over to use ftrace_graph_is_dead(). After that, ftrace_stop() will be removed. Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16tracing: Kill "filter_string" arg of replace_preds()Oleg Nesterov
Cosmetic, but replace_preds() doesn't need/use "char *filter_string". Remove it to microsimplify the code. Link: http://lkml.kernel.org/p/20140715184832.GA20519@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16tracing: Change apply_subsystem_event_filter() paths to check file->system ↵Oleg Nesterov
== dir filter_free_subsystem_preds(), filter_free_subsystem_filters() and replace_system_preds() can simply check file->system->subsystem and avoid strcmp(call->class->system). Better yet, we can pass "struct ftrace_subsystem_dir *dir" instead of event_subsystem and just check file->system == dir. Thanks to Namhyung Kim who pointed out that replace_system_preds() can be changed too. Link: http://lkml.kernel.org/p/20140715184829.GA20516@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16tracing/uprobes: Kill the dead TRACE_EVENT_FL_USE_CALL_FILTER logicOleg Nesterov
alloc_trace_uprobe() sets TRACE_EVENT_FL_USE_CALL_FILTER for unknown reason and this is simply wrong. Fortunately this has no effect because register_uprobe_event() clears call->flags after that. Kill both. This trace_uprobe was kzalloc'ed and we rely on this fact anyway. Link: http://lkml.kernel.org/p/20140715184824.GA20505@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16tracing: Kill call_filter_disable()Oleg Nesterov
It seems that the only purpose of call_filter_disable() is to make filter_disable() less clear and symmetrical, remove it. Link: http://lkml.kernel.org/p/20140715184821.GA20498@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16tracing: Kill destroy_call_preds()Oleg Nesterov
Remove destroy_call_preds(). Its only caller, __trace_remove_event_call(), can use free_event_filter() and nullify ->filter by hand. Perhaps we could keep this trivial helper although imo it is pointless, but then it should be static in trace_events.c. Link: http://lkml.kernel.org/p/20140715184816.GA20495@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16tracing: Kill destroy_preds() and destroy_file_preds()Oleg Nesterov
destroy_preds() makes no sense. The only caller, event_remove(), actually wants destroy_file_preds(). __trace_remove_event_call() does destroy_call_preds() which takes care of call->filter. And after the previous change we can simply remove destroy_preds() from event_remove(), we are going to call remove_event_from_tracers() which in turn calls remove_event_file_dir()->free_event_filter(). Link: http://lkml.kernel.org/p/20140715184813.GA20488@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-16ftrace: Allow archs to specify if they need a separate function graph trampolineSteven Rostedt (Red Hat)
Currently if an arch supports function graph tracing, the core code will just assign the function graph trampoline to the function graph addr that gets called. But as the old method for function graph tracing always calls the function trampoline first and that calls the function graph trampoline, some archs may have the function graph trampoline dependent on operations that were done in the function trampoline. This causes function graph tracer to break on those archs. Instead of having the default be to set the function graph ftrace_ops to the function graph trampoline, have it instead just set it to zero which will keep it from jumping to a trampoline that is not set up to be jumped directly too. Link: http://lkml.kernel.org/r/53BED155.9040607@nvidia.com Reported-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com> Tested-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15ring-buffer: Fix polling on trace_pipeMartin Lau
ring_buffer_poll_wait() should always put the poll_table to its wait_queue even there is immediate data available. Otherwise, the following epoll and read sequence will eventually hang forever: 1. Put some data to make the trace_pipe ring_buffer read ready first 2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee) 3. epoll_wait() 4. read(trace_pipe_fd) till EAGAIN 5. Add some more data to the trace_pipe ring_buffer 6. epoll_wait() -> this epoll_wait() will block forever ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2, ring_buffer_poll_wait() returns immediately without adding poll_table, which has poll_table->_qproc pointing to ep_poll_callback(), to its wait_queue. ~ During the epoll_wait() call in step 3 and step 6, ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue because the poll_table->_qproc is NULL and it is how epoll works. ~ When there is new data available in step 6, ring_buffer does not know it has to call ep_poll_callback() because it is not in its wait queue. Hence, block forever. Other poll implementation seems to call poll_wait() unconditionally as the very first thing to do. For example, tcp_poll() in tcp.c. Link: http://lkml.kernel.org/p/20140610060637.GA14045@devbig242.prn2.facebook.com Cc: stable@vger.kernel.org # 2.6.27 Fixes: 2a2cc8f7c4d0 "ftrace: allow the event pipe to be polled" Reviewed-by: Chris Mason <clm@fb.com> Signed-off-by: Martin Lau <kafai@fb.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputszhangwei(Jovi)
The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing, so add it, to be consistent with __trace_printk/__trace_bprintk. Those functions are all called by the same function: trace_printk(). Link: http://lkml.kernel.org/p/51E7A7D6.8090900@huawei.com Cc: stable@vger.kernel.org # 3.11+ Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15tracing: Fix graph tracer with stack tracer on other archsSteven Rostedt (Red Hat)
Running my ftrace tests on PowerPC, it failed the test that checks if function_graph tracer is affected by the stack tracer. It was. Looking into this, I found that the update_function_graph_func() must be called even if the trampoline function is not changed. This is because archs like PowerPC do not support ftrace_ops being passed by assembly and instead uses a helper function (what the trampoline function points to). Since this function is not changed even when multiple ftrace_ops are added to the code, the test that falls out before calling update_function_graph_func() will miss that the update must still be done. Call update_function_graph_function() for all calls to update_ftrace_function() Cc: stable@vger.kernel.org # 3.3+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-15tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputszhangwei(Jovi)
Currently trace option stacktrace is not applicable for trace_printk with constant string argument, the reason is in __trace_puts/__trace_bputs ftrace_trace_stack is missing. In contrast, when using trace_printk with non constant string argument(will call into __trace_printk/__trace_bprintk), then trace option stacktrace is workable, this inconstant result will confuses users a lot. Link: http://lkml.kernel.org/p/51E7A7C9.9040401@huawei.com Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-14tracing: instance_rmdir() leaks ftrace_event_file->filterOleg Nesterov
instance_rmdir() path destroys the event files but forgets to free file->filter. Change remove_event_file_dir() to free_event_filter(). Link: http://lkml.kernel.org/p/20140711190638.GA19517@redhat.com Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: "zhangwei(Jovi)" <jovi.zhangwei@huawei.com> Cc: stable@vger.kernel.org # 3.11+ Fixes: f6a84bdc75b5 "tracing: Introduce remove_event_file_dir()" Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-09Merge branch 'trace/ftrace/urgent' into trace/ftrace/coreSteven Rostedt (Red Hat)
Needed 099ed151675c "tracing: Remove ftrace_stop/start() from reading the trace file" for the removal of ftrace_start/stop().
2014-07-01tracing: Remove ftrace_stop/start() from reading the trace fileSteven Rostedt (Red Hat)
Disabling reading and writing to the trace file should not be able to disable all function tracing callbacks. There's other users today (like kprobes and perf). Reading a trace file should not stop those from happening. Cc: stable@vger.kernel.org # 3.0+ Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Add description of set_graph_notrace to tracing/READMENamhyung Kim
It was missing the description of set_graph_notrace file. Add it. Link: http://lkml.kernel.org/p/1402590233-22321-5-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Improve message of empty set_ftrace_notrace fileNamhyung Kim
When there's no entry in set_ftrace_notrace, it'll print nothing, but it's better to print something like below like set_graph_notrace does: #### no functions disabled #### Link: http://lkml.kernel.org/p/1402644246-4649-1-git-send-email-namhyung@kernel.org Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Improve message of empty set_graph_notrace fileNamhyung Kim
When there's no entry in set_graph_notrace, it'll print below message #### all functions enabled #### While this is technically correct, it's better to print like below: #### no functions disabled #### Link: http://lkml.kernel.org/p/1402590233-22321-3-git-send-email-namhyung@kernel.org Reported-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Add ftrace_graph_notrace boot parameterNamhyung Kim
The ftrace_graph_notrace option is for specifying notrace filter for function graph tracer at boot time. It can be altered after boot using set_graph_notrace file on the debugfs. Link: http://lkml.kernel.org/p/1402590233-22321-2-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Convert pr_warning() to pr_warn() in trace_events.cFabian Frederick
Convert pr_warning to standard pr_warn Define pr_fmt(fmt) fmt to avoid any future default fmt definition Link: http://lkml.kernel.org/p/1402141388-21144-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01ftrace: Do not copy hash if O_TRUNC is setNamhyung Kim
When a filter file is open for writing and O_TRUNC is set, there's no need to copy and free the filter entries. Link: http://lkml.kernel.org/p/1402474014-28655-2-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01ftrace: Fix memory leak on failure path in ftrace_allocate_pages()Namhyung Kim
As struct ftrace_page is managed in a single linked list, it should free from the start page. Link: http://lkml.kernel.org/p/1402474014-28655-1-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01ftrace: Get rid of obsolete global_start_up variableNamhyung Kim
It seems like it's a leftover from commit 4104d326b670 ("ftrace: Remove global function list and call function directly"). As it isn't updated at all, checking its value is meaningless. Let's get rid of it. Link: http://lkml.kernel.org/p/1402584972-17824-1-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Add trace_seq_buffer_ptr() helper functionSteven Rostedt (Red Hat)
There's several locations in the kernel that open code the calculation of the next location in the trace_seq buffer. This is usually done with p->buffer + p->len Instead of having this open coded, supply a helper function in the header to do it for them. This function is called trace_seq_buffer_ptr(). Link: http://lkml.kernel.org/p/20140626220129.452783019@goodmis.org Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Remove unnecessary null test before debugfs_remove()Fabian Frederick
This fixes checkpatch warning: "WARNING: debugfs_remove(NULL) is safe this check is probably not required" Link: http://lkml.kernel.org/p/1403802871-8599-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Remove trace_seq_reserve()Steven Rostedt (Red Hat)
trace_seq_reserve() has no users in the kernel, it just wastes space. Remove it. Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Make trace_seq_putmem_hex() more robustSteven Rostedt (Red Hat)
Currently trace_seq_putmem_hex() can only take as a parameter a pointer to something that is 8 bytes or less, otherwise it will overflow the buffer. This is protected by a macro that encompasses the call to trace_seq_putmem_hex() that has a BUILD_BUG_ON() for the variable before it is passed in. This is not very robust and if trace_seq_putmem_hex() ever gets used outside that macro it will cause issues. Instead of only being able to produce a hex output of memory that is for a single word, change it to be more robust and allow any size input. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-01tracing: Clean up trace_seq.cSteven Rostedt (Red Hat)
For using trace_seq_*() functions in NMI context, I posted a patch to move it to the lib/ directory. This caused Andrew Morton to take a look at the code. He went through and gave a lot of comments about missing kernel doc, inconsistent types for the save variable, mix match of EXPORT_SYMBOL_GPL() and EXPORT_SYMBOL() as well as missing EXPORT_SYMBOL*()s. There were a few comments about the way variables were being compared (int vs uint). All these were good review comments and should be implemented regardless of if trace_seq.c should be moved to lib/ or not. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>