summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-07-20perf record: Apply filter to all events in a glob matchingWang Nan
There is an old problem in perf's filter applying which first posted at Sep. 2014 at https://lkml.org/lkml/2014/9/9/944 that, if passing multiple events in a glob matching expression in cmdline then add '--filter' after them, the filter will be applied on only the last one. For example: # dd if=/dev/zero of=/dev/null & [1] 464 # perf record -a -e 'syscalls:sys_*_read' --filter 'common_pid != 464' sleep 0.1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.239 MB perf.data (2094 samples) ] # perf report --stdio | tee ... # Samples: 2K of event 'syscalls:sys_enter_read' # Event count (approx.): 2092 ... # Samples: 2 of event 'syscalls:sys_exit_read' # Event count (approx.): 2 ... In this example, filter only applied on 'syscalls:sys_exit_read', and there's no way to set filter for ''syscalls:sys_enter_read'. This patch adds a 'cmdline_group_boundary' for 'struct evsel', and apply filter on all events between two boundary marks. After applying this patch: # perf record -a -e 'syscalls:sys_*_read' --filter 'common_pid != 464' sleep 0.1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.031 MB perf.data (3 samples) ] # perf report --stdio | tee ... # Samples: 1 of event 'syscalls:sys_enter_read' # Event count (approx.): 1 ... # Samples: 2 of event 'syscalls:sys_exit_read' # Event count (approx.): 2 ... Signed-off-by: Wang Nan <wangnan0@huawei.com> Reported-by: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1436513770-8896-1-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-20perf trace: Support 'strace' syscall event groupsArnaldo Carvalho de Melo
I.e.: $ cat ~/share/perf-core/strace/groups/file access chmod creat execve faccessat getcwd lstat mkdir open openat quotactl readlink rename rmdir stat statfs symlink unlink $ Then, on a quiet desktop, try running this and then moving your mouse to see the deluge of mouse related activity: # perf probe 'vfs_getname=getname_flags:72 pathname=filename:string' Added new event: probe:vfs_getname (on getname_flags:72 with pathname=filename:string) You can now use it in all perf tools, such as: perf record -e probe:vfs_getname -aR sleep 1 # # trace --ev probe:vfs_getname --filter-pids 2232 -e file 0.042 (0.042 ms): mousetweaks/2235 open(filename: 0x14e3910, mode: 438 ) ... 0.042 ( ): probe:vfs_getname:(ffffffff812230bc) pathname="/home/acme/.icons/Adwaita/cursors/xterm") 0.100 (0.100 ms): mousetweaks/2235 ... [continued]: open()) = -1 ENOENT No such file or directory 0.142 (0.018 ms): mousetweaks/2235 open(filename: 0x14c3c10, mode: 438 ) ... 0.142 ( ): probe:vfs_getname:(ffffffff812230bc) pathname="/home/acme/.icons/Adwaita/index.theme") 0.192 (0.069 ms): mousetweaks/2235 ... [continued]: open()) = -1 ENOENT No such file or directory 0.230 (0.017 ms): mousetweaks/2235 open(filename: 0x14c3c10, mode: 438 ) ... 0.230 ( ): probe:vfs_getname:(ffffffff812230bc) pathname="/usr/share/icons/Adwaita/cursors/xterm") 0.253 (0.041 ms): mousetweaks/2235 ... [continued]: open()) = 14 0.459 (0.008 ms): mousetweaks/2235 open(filename: 0x14e3910, mode: 438 ) ... 0.459 ( ): probe:vfs_getname:(ffffffff812230bc) pathname="/home/acme/.icons/Adwaita/cursors/left_side") 0.468 (0.017 ms): mousetweaks/2235 ... [continued]: open()) = -1 ENOENT No such file or directory Need to combine that raw_syscalls:sys_enter(open) + probe:vfs_getname + raw_syscalls:sys_exit(open) sequence... Now, if you're bored, please write some more syscall groups, like the ones in 'strace' and send it our way :-) Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Milian Wolff <mail@milianw.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-a42xklu59lcbxp7bbnic74a8@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-20perf strlist: Make parse_list() privateArnaldo Carvalho de Melo
It is not used anywhere, expose it when/if needed. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-f6in51stj17avhk4rv11gjgg@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-20perf strlist: Allow substitutions from file contents in a given directoryArnaldo Carvalho de Melo
So, if we have an strlist equal to: "file,close" And we call it as: struct strlist_config *config = { .dirname = "~/strace/groups", }; struct strlist *slist = strlist__new("file, close", &config); And we have: $ cat ~/strace/groups/file access open openat statfs Then the resulting strlist will have these contents: [ "access", "open", "openat", "statfs", "close" ] This will be used to implement strace syscall groups in 'perf trace', but can be used in some other tool, thus being implemented in 'strlist'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-wi6l6qtomqlywwr6005jvs05@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-20perf strlist: Make dupstr be the default and part of an extensible config parmArnaldo Carvalho de Melo
So that we can pass more info to strlist__new() without having to change its function signature, just adding entries to the strlist_config struct with sensible defaults for when those fields are not specified. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-5uaaler4931i0s9sedxjquhq@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-16perf strlist: load() should return a negative errnoArnaldo Carvalho de Melo
To match what its users return. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-jntpe2lwg1fxn1bku7uccan0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-14perf record: Document setting '-e pmu/period=N/' in man pageKan Liang
The 'period' param is not defined in /sys/bus/event_sources/devices/<pmu>/format/*, but can be used, document it. Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1436345097-11113-3-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-06Merge tag 'perf-core-for-mingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - Take tracefs into account when reporting errors about accessing tracepoint information in tools like 'perf trace' (Arnaldo Carvalho de Melo) - Let user have timestamps with per-thread recording in 'perf record' (Adrian Hunter) Infrastructure changes: - Introduce series of functions to build event filters so that we can set them in just one ioctl call, useful to set up common_pid, raw_syscalls:sys_{enter,exit}'s "id" filters to use with 'perf trace' (Arnaldo Carvalho de Melo) - Delete an unnecessary check before calling strfilter__delete() (Markus Elfring) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06tools lib api debugfs: Check for tracefs when reporting errorsArnaldo Carvalho de Melo
Now that we have two mountpoints, one for debugfs and another, for tracefs, we end up needing to check permissions for both, so, on a system with default config we were always asking the user to check the permission of the debugfs mountpoint, even when it was already sufficient. Fix it. E.g.: $ trace -e nanosleep usleep 1 Error: No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit) Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug' $ sudo mount -o remount,mode=755 /sys/kernel/debug $ trace -e nanosleep usleep 1 Error: No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit) Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing' $ sudo mount -o remount,mode=755 /sys/kernel/debug/tracing $ trace -e nanosleep usleep 1 0.326 ( 0.061 ms): usleep/11961 nanosleep(rqtp: 0x7ffef1081c50) = 0 $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-0viljeuhc7q84ic8kobsna43@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-06perf/x86: Fix copy_from_user_nmi() return if range is not okYann Droneaud
Commit 0a196848ca36 ("perf: Fix arch_perf_out_copy_user default"), changes copy_from_user_nmi() to return the number of remaining bytes so that it behave like copy_from_user(). Unfortunately, when the range is outside of the process memory, the return value is still the number of byte copied, eg. 0, instead of the remaining bytes. As all users of copy_from_user_nmi() were modified as part of commit 0a196848ca36, the function should be fixed to return the total number of bytes if range is not correct. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435001923-30986-1-git-send-email-ydroneaud@opteya.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06perf: Fix AUX buffer refcountingPeter Zijlstra
Its currently possible to drop the last refcount to the aux buffer from NMI context, which results in the expected fireworks. The refcounting needs a bigger overhaul, but to cure the immediate problem, delay the freeing by using an irq_work. Reviewed-and-tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150618103249.GK19282@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-06perf record: Let user have timestamps with per-thread recordingAdrian Hunter
If the option -T is used with option --per-thread, then time is still not sampled. Fix that by using OPT_BOOLEAN_SET to distinguish when the user used the -T option as opposed to the default case when timestamps are enabled but only for per-cpu recording. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1436183461-1918-1-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-06perf probe: Delete an unnecessary check before the function call ↵Markus Elfring
"strfilter__delete" The strfilter__delete() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/5597751A.5000506@users.sourceforge.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-06perf trace: Use event filters for the event qualifier listArnaldo Carvalho de Melo
We use raw_syscalls:sys_{enter,exit} events to show the syscalls, but were using a rather lazy/inneficient way to implement our 'strace -e' equivalent: filter out after reading the events in the ring buffer. Deflect more work to the kernel by appending a filter expression for that, that, together with the pid list, that is always present, if only to filter the tracer itself, reduces pressure on the ring buffer and otherwise use infrastructure already in place in the kernel to do early filtering. If we use it with -v we can see the filter passed to the kernel, for instance, for this contrieved case: # trace -v -e \!open,close,write,poll,recvfrom,select,recvmsg,writev,sendmsg,read,futex,epoll_wait,ioctl,eventfd --filter-pids 2189,2566,1398,2692,4475,4532 <SNIP> (common_pid != 2514 && common_pid != 1398 && common_pid != 2189 && common_pid != 2566 && common_pid != 2692 && common_pid != 4475 && common_pid != 4532) && (id != 3 && id != 232 && id != 284 && id != 202 && id != 16 && id != 2 && id != 7 && id != 0 && id != 45 && id != 47 && id != 23 && id != 46 && id != 1 && id != 20) 0.011 (0.011 ms): caribou/2295 eventfd2(flags: CLOEXEC|NONBLOCK) = 18 16.946 (0.019 ms): caribou/2295 eventfd2(flags: CLOEXEC|NONBLOCK) = 18 38.598 (0.167 ms): chronyd/794 socket(family: INET, type: DGRAM ) = 4 38.603 (0.002 ms): chronyd/794 fcntl(fd: 4<socket:[239307]>, cmd: GETFD) = 0 38.605 (0.001 ms): chronyd/794 fcntl(fd: 4<socket:[239307]>, cmd: SETFD, arg: 1) = 0 ^C # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-ti2tg18atproqpguc2moinp6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-06perf evsel: Introduce append_filter() methodArnaldo Carvalho de Melo
To allow building filters in evsel->filter, that will eventually be applied via perf_evsel__apply_filter(). Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-sjfoes3pycx7nlpmgedca13v@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-06perf evlist: Make perf_evlist__set_filter use perf_evsel__set_filterArnaldo Carvalho de Melo
Instead of calling perf_evsel__apply_filter straight away, so that we can, in the next patches, expand the filter with more conditions before actually calling the ioctl to pass the end result filter to the kernel. Now we need to call perf_evlist__apply_filters() after the filter is completely setup, i.e. do the ioctl calls. The perf_evlist__apply_filters() method was already in place, because that is the model for the other tools that receives filters in the command line: go on setting then in the evsel->filter and only at the end, after parsing the whole command line, apply them. We get, as a bonus, a more expressive message that states which event, if any, failed to have the filter applied to, with an error message stating what happened. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-f429pgz75ryz7tpe6v74etre@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-06perf evsel: Introduce set_filter methodArnaldo Carvalho de Melo
Replaces existing filter string with the one provided. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-jst49z83li0yx3g18o54u51a@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-06perf evsel: Rename set_filter to apply_filterArnaldo Carvalho de Melo
We need to be able to go on constructing a complex filter in multiple stages, since we can only set one filter per event. For instance, we need to be able, in 'perf trace' to filter by the 'common_pid' field all the time, if only for the tracer itself, to avoid a feedback loop, and, in addition, we may want to filter the raw_syscalls:sys_{enter,exit} events by its 'id' filter, when using 'perf trace -e open,close' or 'perf trace -e !open,close', i.e. when we are interested in just a subset of syscalls or when we are not interested in it. So we will have: perf_evsel__set_filter(evsel, char *filter) Replaces whatever is in evsel->filter. perf_evsel__append_filter(evsel, const char *op, char *filter) Appends, using op ("&&" or "||") with what is in evsel->filter. perf_evsel__apply_filter(evsel, filter): That actually applies a filter, be it the one being constructed in evsel->filter, or any other, for tools with more specific ways to build the filter, issuing the appropriate ioctl for all the evsel fds. The same changes will be made to the evlist__{set,apply} variants to keep everything consistent. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-2s5z9xtpnc2lwio3cv5x0jek@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-06perf trace: Store the syscall ids for the event qualifiers in a tableArnaldo Carvalho de Melo
That we will use to set a filter on raw_syscalls:sys_{enter,exit} events. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-2acxrcxyu7tlolrfilpty38y@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-06perf trace: Remember what are the syscalls tracepoint evselsArnaldo Carvalho de Melo
We will need to set filters on then. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-u8hpgjpf3w8o1prnnjnwegwf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-06perf tools: Asprintf like functions to format integer filter expressionArnaldo Carvalho de Melo
char *asprintf_expr_in_ints(const char *var, size_t nints, int *ints); char *asprintf_expr_not_in_ints(const char *var, size_t nints, int *ints); Example of output formatted with those functions: # ./tp_filter 6 12 2015 asprintf_expr_in_ints: id == 6 || id == 12 || id == 2015 asprintf_expr_not_in_ints: id != 6 && id != 12 && id != 2015 # It'll be used with, for instance, perf_evsel__set_filter_in_ints(), that will be used in turn to ask the kernel to filter out all raw_syscalls:* except for the ones specified by the user via: $ perf trace -e some,list,of,syscalls Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-jt07vfp6bd8y50c05j1t7hrn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-06Merge branch 'perf/rbtree_copy' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull rbtree build fix from Arnaldo Carvalho de Melo. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-05tools: Copy rbtree_augmented.h from the kernelArnaldo Carvalho de Melo
To complete the transitioning to not to share the same files with the kernel, also moving it from tools/perf/include/linux/ to tools/include/linux to make the whoke rbtree kit to other tools/ living codebases. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-5bxyehixafckqm6ez25alnfo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-05tools: Move rbtree.h from tools/perf/Arnaldo Carvalho de Melo
The previous step, copying the contents minus the rcupdate.h parts, was done as a minimal fix, now do the move from tools/perf/. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-52fllxtsgmtke66pmv98mcma@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-05tools: Copy lib/rbtree.c to tools/lib/Arnaldo Carvalho de Melo
So that we can remove kernel specific stuff we've been stubbing out via a tools/include/linux/export.h that gets removed in this patch and to avoid breakages in the future like the one fixed recently where rcupdate.h started being used in rbtree.h. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-rxuzfsozpb8hv1emwpx06rm6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-05perf tools: Copy rbtree.h from the kernelArnaldo Carvalho de Melo
We were using the include/linux/rbtree.h directly from the kernel, which broke the build as soon as it started using rcupdate.h, to avoid dragging the rcu header files into tools/, for which there is no use so far, grab a copy of rbtree.h. This is the minimal fix, later patches will copy as well lib/rbtree.c and move rbtree.h into tools/include/, etc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-dfmuj0j63w4by7vhlh4hhn74@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-05tools: Adopt {READ,WRITE_ONCE} from the kernelArnaldo Carvalho de Melo
We need it to build rbtree.c after this cset: commit d72da4a4d973 Author: Peter Zijlstra <peterz@infradead.org> Date: Wed May 27 11:09:36 2015 +0930 rbtree: Make lockless searches non-fatal Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-qlnzhezv5ddwst0w9fydju0y@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-05Linux 4.2-rc1Linus Torvalds
2015-07-05Merge tag 'platform-drivers-x86-v4.2-2' of ↵Linus Torvalds
git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull late x86 platform driver updates from Darren Hart: "The following came in a bit later and I wanted them to bake in next a few more days before submitting, thus the second pull. A new intel_pmc_ipc driver, a symmetrical allocation and free fix in dell-laptop, a couple minor fixes, and some updated documentation in the dell-laptop comments. intel_pmc_ipc: - Add Intel Apollo Lake PMC IPC driver tc1100-wmi: - Delete an unnecessary check before the function call "kfree" dell-laptop: - Fix allocating & freeing SMI buffer page - Show info about WiGig and UWB in debugfs - Update information about wireless control" * tag 'platform-drivers-x86-v4.2-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: intel_pmc_ipc: Add Intel Apollo Lake PMC IPC driver tc1100-wmi: Delete an unnecessary check before the function call "kfree" dell-laptop: Fix allocating & freeing SMI buffer page dell-laptop: Show info about WiGig and UWB in debugfs dell-laptop: Update information about wireless control
2015-07-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...
2015-07-04bluetooth: fix list handlingLinus Torvalds
Commit 835a6a2f8603 ("Bluetooth: Stop sabotaging list poisoning") thought that the code was sabotaging the list poisoning when NULL'ing out the list pointers and removed it. But what was going on was that the bluetooth code was using NULL pointers for the list as a way to mark it empty, and that commit just broke it (and replaced the test with NULL with a "list_empty()" test on a uninitialized list instead, breaking things even further). So fix it all up to use the regular and real list_empty() handling (which does not use NULL, but a pointer to itself), also making sure to initialize the list properly (the previous NULL case was initialized implicitly by the session being allocated with kzalloc()) This is a combination of patches by Marcel Holtmann and Tedd Ho-Jeong An. [ I would normally expect to get this through the bt tree, but I'm going to release -rc1, so I'm just committing this directly - Linus ] Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Original-by: Tedd Ho-Jeong An <tedd.an@intel.com> Original-by: Marcel Holtmann <marcel@holtmann.org>: Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-07-04Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "It's been a busy development cycle for target-core in a number of different areas. The fabric API usage for se_node_acl allocation is now within target-core code, dropping the external API callers for all fabric drivers tree-wide. There is a new conversion to RCU hlists for se_node_acl and se_portal_group LUN mappings, that turns fast-past LUN lookup into a completely lockless code-path. It also removes the original hard-coded limitation of 256 LUNs per fabric endpoint. The configfs attributes for backends can now be shared between core and driver code, allowing existing drivers to use common code while still allowing flexibility for new backend provided attributes. The highlights include: - Merge sbc_verify_dif_* into common code (sagi) - Remove iscsi-target support for obsolete IFMarker/OFMarker (Christophe Vu-Brugier) - Add bidi support in target/user backend (ilias + vangelis + agover) - Move se_node_acl allocation into target-core code (hch) - Add crc_t10dif_update common helper (akinobu + mkp) - Handle target-core odd SGL mapping for data transfer memory (akinobu) - Move transport ID handling into target-core (hch) - Move task tag into struct se_cmd + support 64-bit tags (bart) - Convert se_node_acl->device_list[] to RCU hlist (nab + hch + paulmck) - Convert se_portal_group->tpg_lun_list[] to RCU hlist (nab + hch + paulmck) - Simplify target backend driver registration (hch) - Consolidate + simplify target backend attribute implementations (hch + nab) - Subsume se_port + t10_alua_tg_pt_gp_member into se_lun (hch) - Drop lun_sep_lock for se_lun->lun_se_dev RCU usage (hch + nab) - Drop unnecessary core_tpg_register TFO parameter (nab) - Use 64-bit LUNs tree-wide (hannes) - Drop left-over TARGET_MAX_LUNS_PER_TRANSPORT limit (hannes)" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (76 commits) target: Bump core version to v5.0 target: remove target_core_configfs.h target: remove unused TARGET_CORE_CONFIG_ROOT define target: consolidate version defines target: implement WRITE_SAME with UNMAP bit using ->execute_unmap target: simplify UNMAP handling target: replace se_cmd->execute_rw with a protocol_data field target/user: Fix inconsistent kmap_atomic/kunmap_atomic target: Send UA when changing LUN inventory target: Send UA upon LUN RESET tmr completion target: Send UA on ALUA target port group change target: Convert se_lun->lun_deve_lock to normal spinlock target: use 'se_dev_entry' when allocating UAs target: Remove 'ua_nacl' pointer from se_ua structure target_core_alua: Correct UA handling when switching states xen-scsiback: Fix compile warning for 64-bit LUN target: Remove TARGET_MAX_LUNS_PER_TRANSPORT target: use 64-bit LUNs target: Drop duplicate + unused se_dev_check_wce target: Drop unnecessary core_tpg_register TFO parameter ...
2015-07-04Merge tag 'ntb-4.2' of git://github.com/jonmason/ntbLinus Torvalds
Pull NTB updates from Jon Mason: "This includes a pretty significant reworking of the NTB core code, but has already produced some significant performance improvements. An abstraction layer was added to allow the hardware and clients to be easily added. This required rewriting the NTB transport layer for this abstraction layer. This modification will allow future "high performance" NTB clients. In addition to this change, a number of performance modifications were added. These changes include NUMA enablement, using CPU memcpy instead of asyncdma, and modification of NTB layer MTU size" * tag 'ntb-4.2' of git://github.com/jonmason/ntb: (22 commits) NTB: Add split BAR output for debugfs stats NTB: Change WARN_ON_ONCE to pr_warn_once on unsafe NTB: Print driver name and version in module init NTB: Increase transport MTU to 64k from 16k NTB: Rename Intel code names to platform names NTB: Default to CPU memcpy for performance NTB: Improve performance with write combining NTB: Use NUMA memory in Intel driver NTB: Use NUMA memory and DMA chan in transport NTB: Rate limit ntb_qp_link_work NTB: Add tool test client NTB: Add ping pong test client NTB: Add parameters for Intel SNB B2B addresses NTB: Reset transport QP link stats on down NTB: Do not advance transport RX on link down NTB: Differentiate transport link down messages NTB: Check the device ID to set errata flags NTB: Enable link for Intel root port mode in probe NTB: Read peer info from local SPAD in transport NTB: Split ntb_hw_intel and ntb_transport drivers ...
2015-07-049p: cope with bogus responses from server in p9_client_{read,write}Al Viro
if server claims to have written/read more than we'd told it to, warn and cap the claimed byte count to avoid advancing more than we are ready to.
2015-07-04p9_client_write(): avoid double p9_free_req()Al Viro
Braino in "9p: switch p9_client_write() to passing it struct iov_iter *"; if response is impossible to parse and we discard the request, get the out of the loop right there. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-049p: forgetting to cancel request on interrupted zero-copy RPCAl Viro
If we'd already sent a request and decide to abort it, we *must* issue TFLUSH properly and not just blindly reuse the tag, or we'll get seriously screwed when response eventually arrives and we confuse it for response to later request that had reused the same tag. Cc: stable@vger.kernel.org # v3.2 and later Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04dax: bdev_direct_access() may sleepMatthew Wilcox
The brd driver is the only in-tree driver that may sleep currently. After some discussion on linux-fsdevel, we decided that any driver may choose to sleep in its ->direct_access method. To ensure that all callers of bdev_direct_access() are prepared for this, add a call to might_sleep(). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04block: Add support for DAX reads/writes to block devicesMatthew Wilcox
If a block device supports the ->direct_access methods, bypass the normal DIO path and use DAX to go straight to memcpy() instead of allocating a DIO and a BIO. Includes support for the DIO_SKIP_DIO_COUNT flag in DAX, as is done in do_blockdev_direct_IO(). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04dax: Use copy_from_iter_nocacheMatthew Wilcox
When userspace does a write, there's no need for the written data to pollute the CPU cache. This matches the original XIP code. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04dax: Add block size note to documentationMatthew Wilcox
For block devices which are small enough, mkfs will default to creating a filesystem with block sizes smaller than page size. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Except for the preempt notifiers fix, these are all small bugfixes that could have been waited for -rc2. Sending them now since I was taking care of Peter's patch anyway" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: add hyper-v crash msrs values KVM: x86: remove data variable from kvm_get_msr_common KVM: s390: virtio-ccw: don't overwrite config space values KVM: x86: keep track of LVT0 changes under APICv KVM: x86: properly restore LVT0 KVM: x86: make vapics_in_nmi_mode atomic sched, preempt_notifier: separate notifier registration from static_key inc/dec
2015-07-04NTB: Add split BAR output for debugfs statsDave Jiang
When split BAR is enabled, the driver needs to dump out the split BAR registers rather than the original 64bit BAR registers. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Change WARN_ON_ONCE to pr_warn_once on unsafeDave Jiang
The unsafe doorbell and scratchpad access should display reason when WARN is called. Otherwise we get a stack dump without any explanation. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Print driver name and version in module initDave Jiang
Printouts driver name and version to indicate what is being loaded. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Increase transport MTU to 64k from 16kDave Jiang
Benchmarking showed a significant performance increase with the MTU size to 64k instead of 16k. Change the driver default to 64k. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Rename Intel code names to platform namesDave Jiang
Instead of using the platform code names, use the correct platform names to identify the respective Intel NTB hardware. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Default to CPU memcpy for performanceDave Jiang
Disable DMA usage by default, since the CPU provides much better performance with write combining. Provide a module parameter to enable DMA usage when offloading the memcpy is preferred. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Improve performance with write combiningDave Jiang
Changing the memory window BAR mappings to write combining significantly boosts the performance. We will also use memcpy that uses non-temporal store, which showed performance improvement when doing non-cached memcpys. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Use NUMA memory in Intel driverAllen Hubbe
Allocate memory for the NUMA node of the NTB device. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>
2015-07-04NTB: Use NUMA memory and DMA chan in transportAllen Hubbe
Allocate memory and request the DMA channel for the same NUMA node as the NTB device. Signed-off-by: Allen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>