summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2011-06-14tracing: Add a proc file to stop tracing and free bufferVaibhav Nagarnaik
The proc file entry buffer_size_kb is used to set the size of tracing buffer. The memory to expand the buffer size is kernel memory. Consider a use case where tracing is handled by a user space utility, which acts as a gate keeper for tracing requests. In an OOM condition, tracing is considered a low priority task and if the utility gets killed the ring buffer memory cannot be released back to the kernel. This patch adds a proc file called "free_buffer" whose purpose is to stop tracing and free up the ring buffer when it is closed. The user space process can then set the desired size in buffer_size_kb file and open the fd to the "free_buffer" file. Under OOM condition, if the process gets killed, the kernel closes the file descriptor. The release handler stops the tracing and releases the kernel memory automatically. Cc: Ingo Molnar <mingo@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Michael Rubin <mrubin@google.com> Cc: David Sharp <dhsharp@google.com> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Link: http://lkml.kernel.org/r/1308012717-11148-1-git-send-email-vnagarnaik@google.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-06-14tracing: Use NUMA allocation for per-cpu ring buffer pagesVaibhav Nagarnaik
The tracing ring buffer is a group of per-cpu ring buffers where allocation and logging is done on a per-cpu basis. The events that are generated on a particular CPU are logged in the corresponding buffer. This is to provide wait-free writes between CPUs and good NUMA node locality while accessing the ring buffer. However, the allocation routines consider NUMA locality only for buffer page metadata and not for the actual buffer page. This causes the pages to be allocated on the NUMA node local to the CPU where the allocation routine is running at the time. This patch fixes the problem by using a NUMA node specific allocation routine so that the pages are allocated from a NUMA node local to the logging CPU. I tested with the getuid_microbench from autotest. It is a simple binary that calls getuid() in a loop and measures the average time for the syscall to complete. The following command was used to test: $ getuid_microbench 1000000 Compared the numbers found on kernel with and without this patch and found that logging latency decreases by 30-50 ns/call. tracing with non-NUMA allocation - 569 ns/call tracing with NUMA allocation - 512 ns/call Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Rubin <mrubin@google.com> Cc: David Sharp <dhsharp@google.com> Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-06-14tracing: Schedule a delayed work to call wakeup()Vaibhav Nagarnaik
In using syscall tracing by concurrent processes, the wakeup() that is called in the event commit function causes contention on the spin lock of the waitqueue. I enabled sys_enter_getuid and sys_exit_getuid tracepoints, and by running getuid_microbench from autotest in parallel I found that the contention causes exponential latency increase in the tracing path. The autotest binary getuid_microbench calls getuid() in a tight loop for the given number of iterations and measures the average time required to complete a single invocation of syscall. The patch schedules a delayed work after 2 ms once an event commit calls to wake up the trace wait_queue. This removes the delay caused by contention on spin lock in wakeup() and amortizes the wakeup() calls scheduled over the 2 ms period. In the following example, the script enables the sys_enter_getuid and sys_exit_getuid tracepoints and runs the getuid_microbench in parallel with the given number of processes. The output clearly shows the latency increase caused by contentions. $ ~/getuid.sh 1 1000000 calls in 0.720974253 s (720.974253 ns/call) $ ~/getuid.sh 2 1000000 calls in 1.166457554 s (1166.457554 ns/call) 1000000 calls in 1.168933765 s (1168.933765 ns/call) $ ~/getuid.sh 3 1000000 calls in 1.783827516 s (1783.827516 ns/call) 1000000 calls in 1.795553270 s (1795.553270 ns/call) 1000000 calls in 1.796493376 s (1796.493376 ns/call) $ ~/getuid.sh 4 1000000 calls in 4.483041796 s (4483.041796 ns/call) 1000000 calls in 4.484165388 s (4484.165388 ns/call) 1000000 calls in 4.484850762 s (4484.850762 ns/call) 1000000 calls in 4.485643576 s (4485.643576 ns/call) $ ~/getuid.sh 5 1000000 calls in 6.497521653 s (6497.521653 ns/call) 1000000 calls in 6.502000236 s (6502.000236 ns/call) 1000000 calls in 6.501709115 s (6501.709115 ns/call) 1000000 calls in 6.502124100 s (6502.124100 ns/call) 1000000 calls in 6.502936358 s (6502.936358 ns/call) After the patch, the latencies scale better. 1000000 calls in 0.728720455 s (728.720455 ns/call) 1000000 calls in 0.842782857 s (842.782857 ns/call) 1000000 calls in 0.883803135 s (883.803135 ns/call) 1000000 calls in 0.902077764 s (902.077764 ns/call) 1000000 calls in 0.902838202 s (902.838202 ns/call) 1000000 calls in 0.908896885 s (908.896885 ns/call) 1000000 calls in 0.932523515 s (932.523515 ns/call) 1000000 calls in 0.958009672 s (958.009672 ns/call) 1000000 calls in 0.986188020 s (986.188020 ns/call) 1000000 calls in 0.989771102 s (989.771102 ns/call) 1000000 calls in 0.933518391 s (933.518391 ns/call) 1000000 calls in 0.958897947 s (958.897947 ns/call) 1000000 calls in 1.031038897 s (1031.038897 ns/call) 1000000 calls in 1.089516025 s (1089.516025 ns/call) 1000000 calls in 1.141998347 s (1141.998347 ns/call) Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Rubin <mrubin@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1305059241-7629-1-git-send-email-vnagarnaik@google.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-06-07perf, core: Fix initial task_ctx/event installationPeter Zijlstra
A lost Quilt refresh of 2c29ef0fef8 (perf: Simplify and fix __perf_install_in_context()) is causing grief and lockups, reported by Jiri Olsa. When installing an event in a task context, there's a number of issues: - there might not be an existing task context, in which case we should install the now current context; - there might already be a context, not the current one, in which case we should de-schedule the old and install the new; these cases were dealt with in the lost refresh, however there is one further case that was found in testing: - there might already be a context, the current one, in which case we should still de-schedule, and should take care to re-install it (note that task_ctx_sched_out() clears cpuctx->task_ctx). Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1307399008.2497.971.camel@laptop Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-06-04Merge branch 'perf/urgent' into perf/coreIngo Molnar
Conflicts: tools/perf/util/python.c Merge reason: resolve the conflict with perf/urgent. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-06-04perf: Comment /proc/sys/kernel/perf_event_paranoid to be part of user ABIVince Weaver
Turns out that distro packages use this file as an indicator of the perf event subsystem - this is easier to check for from scripts than the existence of the system call. This is easy enough to keep around for the kernel, so add a comment to make sure it stays so. Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus@samba.org Cc: acme@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106031751170.29381@cl320.eecs.utk.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-06-04Merge branch 'perf/urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
2011-06-03perf python: Fix argument name list of read_on_cpu()Frederic Weisbecker
Mandatory arguments need to be present in the argument name list, as well as optional arguments, otherwise python barfs: # ./python/twatch.py Traceback (most recent call last): File "./python/twatch.py", line 41, in <module> main() File "./python/twatch.py", line 32, in main event = evlist.read_on_cpu(cpu) RuntimeError: more argument specifiers than keyword list entries Hence, add cpu to the name list. Cc: David Ahern <daahern@cisco.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> Link: http://lkml.kernel.org/r/1301588863-20210-1-git-send-email-fweisbec@gmail.com Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-06-03perf evlist: Don't die if sample_{id_all|type} is invalidArnaldo Carvalho de Melo
Fixes two more cases where the python binding would not load: . Not finding die(), which it shouldn't anyway, not good to just stop the world because some particular perf.data file is invalid, just propagate the error to the caller. . Not finding perf_sample_size: fix it by moving it from event.c to evsel, where it belongs, as most cases are moving to operate on an evsel object.o One of the fixed problems: [root@emilia ~]# python >>> import perf Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: /home/acme/git/build/perf/python/perf.so: undefined symbol: perf_sample_size >>> [root@emilia ~]# Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-1hkj7b2cvgbfnoizsekjb6c9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-06-03perf python: Use exception to propagate errorsArnaldo Carvalho de Melo
We were using pr_debug to tell the user about not being able to parse a sample where we should really use the python way of reporting errors: exceptions. Fixes this problem: [root@emilia ~]# python >>> import perf Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: /home/acme/git/build/perf/python/perf.so: undefined symbol: eprintf >>> [root@emilia ~] As we want to keep the objects linked in the python binding (and in the future in a shared library) minimal. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-m9dba9kaluas0kq8r58z191c@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-06-03perf evlist: Remove dependency on debug routinesArnaldo Carvalho de Melo
So far we avoided having to link debug.o in the python binding, keep it that way by not using ui__warning() in evlist.c. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-4wtew8hd3g7ejnlehtspys2t@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-06-03Merge branch 'perf/core' of ssh://k/pub/scm/linux/kernel/git/acme/linux into ↵Ingo Molnar
perf/core
2011-06-03Merge commit 'v3.0-rc1' into perf/coreIngo Molnar
Merge reason: merge in the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-06-02perf script: Add printing of sample addressDavid Ahern
Resolve to a function or variable if possible and if the sym option is enabled. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1306782503-22002-1-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-06-02perf script: Make printing of dso a separate field optionDavid Ahern
The 'sym' option displays both the function name and the DSO it comes from. Split the display of the dso into a separate option. This allows display of the ip address and symbol without the dso, thus shortening line lengths - and decluttering the output a bit. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1306528124-25861-3-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-06-02perf script: "sym" field really means show IP dataDavid Ahern
Currently the "sym" output field is used to dump instruction pointers and callchain stack. Sample addresses can also be converted to symbols, so the meaning of "sym" needs to be fixed. This patch adds an "ip" option and if it is selected the user can also opt to dump symbols for them. If the user opts to dump IP without syms only the address is shown. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1306528124-25861-2-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-06-02perf stat: clarify unsupported events from uncounted eventsDavid Ahern
perf stat continues running even if the event list contains counters that are not supported. The resulting output then contains <not counted> for those events which gets confusing as to which events are supported, but not counted and which are not supported. Before: perf stat -ddd -- sleep 1 Performance counter stats for 'sleep 1': 0.571283 task-clock # 0.001 CPUs utilized 1 context-switches # 0.002 M/sec 0 CPU-migrations # 0.000 M/sec 157 page-faults # 0.275 M/sec 1,037,707 cycles # 1.816 GHz <not counted> stalled-cycles-frontend <not counted> stalled-cycles-backend 654,499 instructions # 0.63 insns per cycle 136,129 branches # 238.286 M/sec <not counted> branch-misses <not counted> L1-dcache-loads <not counted> L1-dcache-load-misses <not counted> LLC-loads <not counted> LLC-load-misses <not counted> L1-icache-loads <not counted> L1-icache-load-misses <not counted> dTLB-loads <not counted> dTLB-load-misses <not counted> iTLB-loads <not counted> iTLB-load-misses <not counted> L1-dcache-prefetches <not counted> L1-dcache-prefetch-misses 1.001004836 seconds time elapsed After: perf stat -ddd -- sleep 1 Performance counter stats for 'sleep 1': 1.350326 task-clock # 0.001 CPUs utilized 2 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 M/sec 157 page-faults # 0.116 M/sec 11,986 cycles # 0.009 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 496,986 instructions # 41.46 insns per cycle 138,065 branches # 102.246 M/sec 7,245 branch-misses # 5.25% of all branches <not counted> L1-dcache-loads <not counted> L1-dcache-load-misses <not counted> LLC-loads <not counted> LLC-load-misses <not counted> L1-icache-loads <not counted> L1-icache-load-misses <not counted> dTLB-loads <not counted> dTLB-load-misses <not counted> iTLB-loads <not counted> iTLB-load-misses <not counted> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 1.002397333 seconds time elapsed v1->v2: changed supported type from int to bool v2->v3 fixed vertical alignment of new struct element Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1306767359-13221-1-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-06-02perf python: Cleanup useless double NULL termination in method arg namesFrederic Weisbecker
The list of methods argument names only needs to be NULL terminated once. Remove the second ones. Cc: David Ahern <daahern@cisco.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> Link: http://lkml.kernel.org/r/1301588863-20210-2-git-send-email-fweisbec@gmail.com Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-06-02perf python: Fix argument name list of read_on_cpu()Frederic Weisbecker
Mandatory arguments need to be present in the argument name list, as well as optional arguments, otherwise python barfs: # ./python/twatch.py Traceback (most recent call last): File "./python/twatch.py", line 41, in <module> main() File "./python/twatch.py", line 32, in main event = evlist.read_on_cpu(cpu) RuntimeError: more argument specifiers than keyword list entries Hence, add cpu to the name list. Cc: David Ahern <daahern@cisco.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> Link: http://lkml.kernel.org/r/1301588863-20210-1-git-send-email-fweisbec@gmail.com Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-06-02perf evlist: Don't die if sample_{id_all|type} is invalidArnaldo Carvalho de Melo
Fixes two more cases where the python binding would not load: . Not finding die(), which it shouldn't anyway, not good to just stop the world because some particular perf.data file is invalid, just propagate the error to the caller. . Not finding perf_sample_size: fix it by moving it from event.c to evsel, where it belongs, as most cases are moving to operate on an evsel object.o One of the fixed problems: [root@emilia ~]# python >>> import perf Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: /home/acme/git/build/perf/python/perf.so: undefined symbol: perf_sample_size >>> [root@emilia ~]# Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-1hkj7b2cvgbfnoizsekjb6c9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-06-02perf python: Use exception to propagate errorsArnaldo Carvalho de Melo
We were using pr_debug to tell the user about not being able to parse a sample where we should really use the python way of reporting errors: exceptions. Fixes this problem: [root@emilia ~]# python >>> import perf Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: /home/acme/git/build/perf/python/perf.so: undefined symbol: eprintf >>> [root@emilia ~] As we want to keep the objects linked in the python binding (and in the future in a shared library) minimal. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-m9dba9kaluas0kq8r58z191c@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-06-02perf evlist: Remove dependency on debug routinesArnaldo Carvalho de Melo
So far we avoided having to link debug.o in the python binding, keep it that way by not using ui__warning() in evlist.c. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-4wtew8hd3g7ejnlehtspys2t@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-06-02Revert "mm: fail GFP_DMA allocations when ZONE_DMA is not configured"Linus Torvalds
This reverts commit a197b59ae6e8bee56fcef37ea2482dc08414e2ac. As rmk says: "Commit a197b59ae6e8 (mm: fail GFP_DMA allocations when ZONE_DMA is not configured) is causing regressions on ARM with various drivers which use GFP_DMA. The behaviour up until now has been to silently ignore that flag when CONFIG_ZONE_DMA is not enabled, and to allocate from the normal zone. However, as a result of the above commit, such allocations now fail which causes drivers to fail. These are regressions compared to the previous kernel version." so just revert it. Requested-by: Russell King <linux@arm.linux.org.uk> Acked-by: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-02Merge git://git.infradead.org/iommu-2.6Linus Torvalds
* git://git.infradead.org/iommu-2.6: intel-iommu: Fix off-by-one in RMRR setup intel-iommu: Add domain check in domain_remove_one_dev_info intel-iommu: Remove Host Bridge devices from identity mapping intel-iommu: Use coherent DMA mask when requested intel-iommu: Dont cache iova above 32bit intel-iommu: Speed up processing of the identity_mapping function intel-iommu: Check for identity mapping candidate using system dma mask intel-iommu: Only unlink device domains from iommu intel-iommu: Enable super page (2MiB, 1GiB, etc.) support intel-iommu: Flush unmaps at domain_exit intel-iommu: Remove obsolete comment from detect_intel_iommu intel-iommu: fix VT-d PMR disable for TXT on S3 resume
2011-06-02block: fix mismerge of the DISK_EVENT_MEDIA_CHANGE removalLinus Torvalds
Jens' back-merge commit 698567f3fa79 ("Merge commit 'v2.6.39' into for-2.6.40/core") was incorrectly done, and re-introduced the DISK_EVENT_MEDIA_CHANGE lines that had been removed earlier in commits - 9fd097b14918 ("block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers") - 7eec77a1816a ("ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd") because of conflicts with the "g->flags" updates near-by by commit d4dc210f69bc ("block: don't block events on excl write for non-optical devices") As a result, we re-introduced the hanging behavior due to infinite disk media change reports. Tssk, tssk, people! Don't do back-merges at all, and *definitely* don't do them to hide merge conflicts from me - especially as I'm likely better at merging them than you are, since I do so many merges. Reported-by: Steven Rostedt <rostedt@goodmis.org> Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-01Merge git://git.infradead.org/mtd-2.6Linus Torvalds
* git://git.infradead.org/mtd-2.6: mtd: fix physmap.h warnings
2011-06-01intel-iommu: Fix off-by-one in RMRR setupDavid Woodhouse
We were mapping an extra byte (and hence usually an extra page): iommu_prepare_identity_map() expects to be given an 'end' argument which is the last byte to be mapped; not the first byte *not* to be mapped. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01intel-iommu: Add domain check in domain_remove_one_dev_infoMike Habeck
The comment in domain_remove_one_dev_info() states "No need to compare PCI domain; it has to be the same". But for the si_domain that isn't going to be true, as it consists of all the PCI devices that are identity mapped thus multiple PCI domains can be in si_domain. The code needs to validate the PCI domain too. Signed-off-by: Mike Habeck <habeck@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Cc: stable@kernel.org Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01intel-iommu: Remove Host Bridge devices from identity mappingMike Travis
When using the 1:1 (identity) PCI DMA remapping, PCI Host Bridge devices that do not use the IOMMU causes a kernel panic. Fix that by not inserting those devices into the si_domain. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Mike Habeck <habeck@sgi.com> Cc: stable@kernel.org Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01intel-iommu: Use coherent DMA mask when requestedMike Travis
The __intel_map_single function is not honoring the passed in DMA mask. This results in not using the coherent DMA mask when called from intel_alloc_coherent(). Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Reviewed-by: Mike Habeck <habeck@sgi.com> Cc: stable@kernel.org Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01intel-iommu: Dont cache iova above 32bitChris Wright
Mike Travis and Mike Habeck reported an issue where iova allocation would return a range that was larger than a device's dma mask. https://lkml.org/lkml/2011/3/29/423 The dmar initialization code will reserve all PCI MMIO regions and copy those reservations into a domain specific iova tree. It is possible for one of those regions to be above the dma mask of a device. It is typical to allocate iovas with a 32bit mask (despite device's dma mask possibly being larger) and cache the result until it exhausts the lower 32bit address space. Freeing the iova range that is >= the last iova in the lower 32bit range when there is still an iova above the 32bit range will corrupt the cached iova by pointing it to a region that is above 32bit. If that region is also larger than the device's dma mask, a subsequent allocation will return an unusable iova and cause dma failure. Simply don't cache an iova that is above the 32bit caching boundary. Reported-by: Mike Travis <travis@sgi.com> Reported-by: Mike Habeck <habeck@sgi.com> Cc: stable@kernel.org Acked-by: Mike Travis <travis@sgi.com> Tested-by: Mike Habeck <habeck@sgi.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01intel-iommu: Speed up processing of the identity_mapping functionMike Travis
When there are a large count of PCI devices, and the pass through option for iommu is set, much time is spent in the identity_mapping function hunting though the iommu domains to check if a specific device is "identity mapped". Speed up the function by checking the cached info to see if it's mapped to the static identity domain. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Mike Habeck <habeck@sgi.com> Cc: stable@kernel.org Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01intel-iommu: Check for identity mapping candidate using system dma maskChris Wright
The identity mapping code appears to make the assumption that if the devices dma_mask is greater than 32bits the device can use identity mapping. But that is not true: take the case where we have a 40bit device in a 44bit architecture. The device can potentially receive a physical address that it will truncate and cause incorrect addresses to be used. Instead check to see if the device's dma_mask is large enough to address the system's dma_mask. Signed-off-by: Mike Travis <travis@sgi.com> Reviewed-by: Mike Habeck <habeck@sgi.com> Cc: stable@kernel.org Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01intel-iommu: Only unlink device domains from iommuAlex Williamson
Commit a97590e5 added unlinking domains from iommus to reciprocate the iommu from domains unlinking that was already done. We actually want to only do this for device domains and never for the static identity map domain or VM domains. The SI domain is special and never freed, while VM domain->id lives in their own special address space, separate from iommu->domain_ids. In the current code, a VM can get domain->id zero, then mark that domain unused when unbound from pci-stub. This leads to DMAR write faults when the device is re-bound to the host driver. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: stable@kernel.org Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01intel-iommu: Enable super page (2MiB, 1GiB, etc.) supportYouquan Song
There are no externally-visible changes with this. In the loop in the internal __domain_mapping() function, we simply detect if we are mapping: - size >= 2MiB, and - virtual address aligned to 2MiB, and - physical address aligned to 2MiB, and - on hardware that supports superpages. (and likewise for larger superpages). We automatically use a superpage for such mappings. We never have to worry about *breaking* superpages, since we trust that we will always *unmap* the same range that was mapped. So all we need to do is ensure that dma_pte_clear_range() will also cope with superpages. Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so it can return a PTE at the appropriate level rather than always extending the page tables all the way down to level 1. Again, this is simplified by the fact that we should never encounter existing small pages when we're creating a mapping; any old mapping that used the same virtual range will have been entirely removed and its obsolete page tables freed. Provide an 'intel_iommu=sp_off' argument on the command line as a chicken bit. Not that it should ever be required. == The original commit seen in the iommu-2.6.git was Youquan's implementation (and completion) of my own half-baked code which I'd typed into an email. Followed by half a dozen subsequent 'fixes'. I've taken the unusual step of rewriting history and collapsing the original commits in order to keep the main history simpler, and make life easier for the people who are going to have to backport this to older kernels. And also so I can give it a more coherent commit comment which (hopefully) gives a better explanation of what's going on. The original sequence of commits leading to identical code was: Youquan Song (3): intel-iommu: super page support intel-iommu: Fix superpage alignment calculation error intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte() David Woodhouse (4): intel-iommu: Precalculate superpage support for dmar_domain intel-iommu: Fix hardware_largepage_caps() intel-iommu: Fix inappropriate use of superpages in __domain_mapping() intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01mtd: fix physmap.h warningsRandy Dunlap
Fix build warnings in physmap.h: include/linux/mtd/physmap.h:25: warning: 'struct platform_device' declared inside parameter list include/linux/mtd/physmap.h:25: warning: its scope is only this definition or declaration, which is probably not what you want include/linux/mtd/physmap.h:26: warning: 'struct platform_device' declared inside parameter list include/linux/mtd/physmap.h:27: warning: 'struct platform_device' declared inside parameter list Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2011-06-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: AppArmor: fix oops in apparmor_setprocattr
2011-06-01kgdbts: only use new asm-generic/ptrace.h api when neededMike Frysinger
The new instruction_pointer_set helper is defined for people who have converted to asm-generic/ptrace.h, so don't use it generally unless the arch needs it (in which case it has been converted). This should fix building of kgdb tests for arches not yet converted. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-01AppArmor: fix oops in apparmor_setprocattrKees Cook
When invalid parameters are passed to apparmor_setprocattr a NULL deref oops occurs when it tries to record an audit message. This is because it is passing NULL for the profile parameter for aa_audit. But aa_audit now requires that the profile passed is not NULL. Fix this by passing the current profile on the task that is trying to setprocattr. Signed-off-by: Kees Cook <kees@ubuntu.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Cc: stable@kernel.org Signed-off-by: James Morris <jmorris@namei.org>
2011-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linusLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: virtio_net: delay TX callbacks virtio: add api for delayed callbacks virtio_test: support event index vhost: support event index virtio_ring: support event idx feature virtio ring: inline function to check for events virtio: event index interface virtio: add full three-clause BSD text to headers. virtio balloon: kill tell-host-first logic virtio console: don't manually set or finalize VIRTIO_CONSOLE_F_MULTIPORT. drivers, block: virtio_blk: Replace cryptic number with the macro virtio_blk: allow re-reading config space at runtime lguest: remove support for VIRTIO_F_NOTIFY_ON_EMPTY. lguest: fix up compilation after move lguest: fix timer interrupt setup
2011-06-01Merge branch 'release' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] wire up sendmmsg() syscall for Itanium
2011-05-31[IA64] wire up sendmmsg() syscall for ItaniumTony Luck
Add entries in unistd.h and entry.S to make this new syscall visible. Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-06-01Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix mwait_play_dead() faulting on mwait-incapable cpus x86 idle: Fix mwait deprecation warning message Evil merge to remove extra quote noticed by Joe Perches
2011-06-01Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: Cure load woes
2011-05-31perf, cgroups: Fix up for new APIPeter Zijlstra
Ben changed the cgroup API in commit f780bdb7c1c (cgroups: add per-thread subsystem callbacks) in an incompatible way, but forgot to convert the perf cgroup bits. Avoid compile warnings and runtime splats and convert perf too ;-) Acked-by: Ben Blum <bblum@andrew.cmu.edu> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1306767651.1200.2990.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-31Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Put back -pg to tsc.o and add no GCOV to vread_tsc_64.o
2011-05-31Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: autofs4: bogus dentry_unhash() added in ->unlink() vfs: shrink_dcache_parent before rmdir, dir rename
2011-05-31powerpc/pmac: Don't register pmac PIC syscore ops when HW not presentBenjamin Herrenschmidt
The Apple custom PIC only exist in some earlier machine models, anything with an MPIC will crash on suspend if we register those syscore ops unconditionally. This is a regression caused by commit f5a592f7d74e ("PM / PowerPC: Use struct syscore_ops instead of sysdevs for PM") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-31rcu: Cure load woesPeter Zijlstra
Commit cc3ce5176d83 (rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state) fudges a sleeping task' state, resulting in the scheduler seeing a TASK_UNINTERRUPTIBLE task going to sleep, but a TASK_INTERRUPTIBLE task waking up. The result is unbalanced load calculation. The problem that patch tried to address is that the RCU threads could stay in UNINTERRUPTIBLE state for quite a while and triggering the hung task detector due to on-demand wake-ups. Cure the problem differently by always giving the tasks at least one wake-up once the CPU is fully up and running, this will kick them out of the initial UNINTERRUPTIBLE state and into the regular INTERRUPTIBLE wait state. [ The alternative would be teaching kthread_create() to start threads as INTERRUPTIBLE but that needs a tad more thought. ] Reported-by: Damien Wyart <damien.wyart@free.fr> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul E. McKenney <paul.mckenney@linaro.org> Link: http://lkml.kernel.org/r/1306755291.1200.2872.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-30x86: Fix mwait_play_dead() faulting on mwait-incapable cpusAvi Kivity
A logic error in mwait_play_dead() causes the kernel to use mwait even on cpus which don't support it, such as KVM virtual cpus. Introduced by: 349c004e3d31: x86: A fast way to check capabilities of the current cpu Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=36222 Reported-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1306758237-9327-1-git-send-email-avi@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>