summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2008-07-25kprobes: remove redundant config checkAbhishek Sagar
I noticed that there's a CONFIG_KPROBES check inside kernel/kprobes.c, which is redundant. Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25kprobes: improve kretprobe scalability with hashed lockingSrinivasa D S
Currently list of kretprobe instances are stored in kretprobe object (as used_instances,free_instances) and in kretprobe hash table. We have one global kretprobe lock to serialise the access to these lists. This causes only one kretprobe handler to execute at a time. Hence affects system performance, particularly on SMP systems and when return probe is set on lot of functions (like on all systemcalls). Solution proposed here gives fine-grain locks that performs better on SMP system compared to present kretprobe implementation. Solution: 1) Instead of having one global lock to protect kretprobe instances present in kretprobe object and kretprobe hash table. We will have two locks, one lock for protecting kretprobe hash table and another lock for kretporbe object. 2) We hold lock present in kretprobe object while we modify kretprobe instance in kretprobe object and we hold per-hash-list lock while modifying kretprobe instances present in that hash list. To prevent deadlock, we never grab a per-hash-list lock while holding a kretprobe lock. 3) We can remove used_instances from struct kretprobe, as we can track used instances of kretprobe instances using kretprobe hash table. Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system with return probes set on all systemcalls looks like this. cacheline non-cacheline Un-patched kernel aligned patch aligned patch =============================================================================== real 9m46.784s 9m54.412s 10m2.450s user 40m5.715s 40m7.142s 40m4.273s sys 2m57.754s 2m58.583s 3m17.430s =========================================================== Time duration for kernel compilation ("make -j 8) on the same system, when kernel is not probed. ========================= real 9m26.389s user 40m8.775s sys 2m7.283s ========================= Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com> Signed-off-by: Jim Keniston <jkenisto@us.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25printk ratelimiting rewriteDave Young
All ratelimit user use same jiffies and burst params, so some messages (callbacks) will be lost. For example: a call printk_ratelimit(5 * HZ, 1) b call printk_ratelimit(5 * HZ, 1) before the 5*HZ timeout of a, then b will will be supressed. - rewrite __ratelimit, and use a ratelimit_state as parameter. Thanks for hints from andrew. - Add WARN_ON_RATELIMIT, update rcupreempt.h - remove __printk_ratelimit - use __ratelimit in net_ratelimit Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25kernel/irq/manage.c: replace a printk + WARN_ON() to a WARN()Arjan van de Ven
Replace a printk+WARN_ON() by a WARN(); this increases the chance of the string making it into the bugreport (ie: it goes inside the ---[ cut here ]--- section) Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25Add a WARN() macro; this is WARN_ON() + printk argumentsArjan van de Ven
Add a WARN() macro that acts like WARN_ON(), with the added feature that it takes a printk like argument that is printed as part of the warning message. [akpm@linux-foundation.org: fix printk arguments] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Greg KH <greg@kroah.com> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25clean up duplicated alloc/free_thread_infoFUJITA Tomonori
We duplicate alloc/free_thread_info defines on many platforms (the majority uses __get_free_pages/free_pages). This patch defines common defines and removes these duplicated defines. __HAVE_ARCH_THREAD_INFO_ALLOCATOR is introduced for platforms that do something different. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25call_usermodehelper(): increase reliabilityKOSAKI Motohiro
Presently call_usermodehelper_setup() uses GFP_ATOMIC. but it can return NULL _very_ easily. GFP_ATOMIC is needed only when we can't sleep. and, GFP_KERNEL is robust and better. thus, I add gfp_mask argument to call_usermodehelper_setup(). So, its callers pass the gfp_t as below: call_usermodehelper() and call_usermodehelper_keys(): depend on 'wait' argument. call_usermodehelper_pipe(): always GFP_KERNEL because always run under process context. orderly_poweroff(): pass to GFP_ATOMIC because may run under interrupt context. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: "Paul Menage" <menage@google.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25build kernel/profile.o only when requestedAdrian Bunk
Build kernel/profile.o only if CONFIG_PROFILING is enabled. This makes CONFIG_PROFILING=n kernels smaller. As a bonus, some profile_tick() calls and one branch from schedule() are now eliminated with CONFIG_PROFILING=n (but I doubt these are measurable effects). This patch changes the effects of CONFIG_PROFILING=n, but I don't think having more than two choices would be the better choice. This patch also adds the name of the first parameter to the prototypes of profile_{hits,tick}() since I anyway had to add them for the dummy functions. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25kallsyms: fix potential overflow in binary searchVegard Nossum
This will probably never trigger... but it won't hurt to be careful. http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Cc: Joshua Bloch <jjb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25flag parameters: fix compile error of sys_epoll_create1Wang Chen
GEN .version CHK include/linux/compile.h UPD include/linux/compile.h CC init/version.o LD init/built-in.o LD vmlinux arch/x86/kernel/built-in.o: In function `sys_call_table': (.rodata+0x8a4): undefined reference to `sys_epoll_create1' make: *** [vmlinux] Error 1 Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well nohz: prevent tick stop outside of the idle loop
2008-07-24Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: arch/mips/kernel/stacktrace.c: Heiko can't type kthread: reduce stack pressure in create_kthread and kthreadd fix core/stacktrace changes on avr32, mips, sh
2008-07-24Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: clean up compiler warning sched: fix hrtick & generic-ipi dependency
2008-07-24flag parameters: NONBLOCK in pipeUlrich Drepper
This patch adds O_NONBLOCK support to pipe2. It is minimally more involved than the patches for eventfd et.al but still trivial. The interfaces of the create_write_pipe and create_read_pipe helper functions were changed and the one other caller as well. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_pipe2 # ifdef __x86_64__ # define __NR_pipe2 293 # elif defined __i386__ # define __NR_pipe2 331 # else # error "need __NR_pipe2" # endif #endif int main (void) { int fds[2]; if (syscall (__NR_pipe2, fds, 0) == -1) { puts ("pipe2(0) failed"); return 1; } for (int i = 0; i < 2; ++i) { int fl = fcntl (fds[i], F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if (fl & O_NONBLOCK) { printf ("pipe2(0) set non-blocking mode for fds[%d]\n", i); return 1; } close (fds[i]); } if (syscall (__NR_pipe2, fds, O_NONBLOCK) == -1) { puts ("pipe2(O_NONBLOCK) failed"); return 1; } for (int i = 0; i < 2; ++i) { int fl = fcntl (fds[i], F_GETFL); if (fl == -1) { puts ("fcntl failed"); return 1; } if ((fl & O_NONBLOCK) == 0) { printf ("pipe2(O_NONBLOCK) does not set non-blocking mode for fds[%d]\n", i); return 1; } close (fds[i]); } puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24flag parameters: inotify_initUlrich Drepper
This patch introduces the new syscall inotify_init1 (note: the 1 stands for the one parameter the syscall takes, as opposed to no parameter before). The values accepted for this parameter are function-specific and defined in the inotify.h header. Here the values must match the O_* flags, though. In this patch CLOEXEC support is introduced. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_inotify_init1 # ifdef __x86_64__ # define __NR_inotify_init1 294 # elif defined __i386__ # define __NR_inotify_init1 332 # else # error "need __NR_inotify_init1" # endif #endif #define IN_CLOEXEC O_CLOEXEC int main (void) { int fd; fd = syscall (__NR_inotify_init1, 0); if (fd == -1) { puts ("inotify_init1(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("inotify_init1(0) set close-on-exit"); return 1; } close (fd); fd = syscall (__NR_inotify_init1, IN_CLOEXEC); if (fd == -1) { puts ("inotify_init1(IN_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("inotify_init1(O_CLOEXEC) does not set close-on-exit"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: add sys_ni stub] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24flag parameters: eventfdUlrich Drepper
This patch adds the new eventfd2 syscall. It extends the old eventfd syscall by one parameter which is meant to hold a flag value. In this patch the only flag support is EFD_CLOEXEC which causes the close-on-exec flag for the returned file descriptor to be set. A new name EFD_CLOEXEC is introduced which in this implementation must have the same value as O_CLOEXEC. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_eventfd2 # ifdef __x86_64__ # define __NR_eventfd2 290 # elif defined __i386__ # define __NR_eventfd2 328 # else # error "need __NR_eventfd2" # endif #endif #define EFD_CLOEXEC O_CLOEXEC int main (void) { int fd = syscall (__NR_eventfd2, 1, 0); if (fd == -1) { puts ("eventfd2(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("eventfd2(0) sets close-on-exec flag"); return 1; } close (fd); fd = syscall (__NR_eventfd2, 1, EFD_CLOEXEC); if (fd == -1) { puts ("eventfd2(EFD_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("eventfd2(EFD_CLOEXEC) does not set close-on-exec flag"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: add sys_ni stub] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24flag parameters: signalfdUlrich Drepper
This patch adds the new signalfd4 syscall. It extends the old signalfd syscall by one parameter which is meant to hold a flag value. In this patch the only flag support is SFD_CLOEXEC which causes the close-on-exec flag for the returned file descriptor to be set. A new name SFD_CLOEXEC is introduced which in this implementation must have the same value as O_CLOEXEC. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <fcntl.h> #include <signal.h> #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #ifndef __NR_signalfd4 # ifdef __x86_64__ # define __NR_signalfd4 289 # elif defined __i386__ # define __NR_signalfd4 327 # else # error "need __NR_signalfd4" # endif #endif #define SFD_CLOEXEC O_CLOEXEC int main (void) { sigset_t ss; sigemptyset (&ss); sigaddset (&ss, SIGUSR1); int fd = syscall (__NR_signalfd4, -1, &ss, 8, 0); if (fd == -1) { puts ("signalfd4(0) failed"); return 1; } int coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if (coe & FD_CLOEXEC) { puts ("signalfd4(0) set close-on-exec flag"); return 1; } close (fd); fd = syscall (__NR_signalfd4, -1, &ss, 8, SFD_CLOEXEC); if (fd == -1) { puts ("signalfd4(SFD_CLOEXEC) failed"); return 1; } coe = fcntl (fd, F_GETFD); if (coe == -1) { puts ("fcntl failed"); return 1; } if ((coe & FD_CLOEXEC) == 0) { puts ("signalfd4(SFD_CLOEXEC) does not set close-on-exec flag"); return 1; } close (fd); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: add sys_ni stub] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24flag parameters: pacceptUlrich Drepper
This patch is by far the most complex in the series. It adds a new syscall paccept. This syscall differs from accept in that it adds (at the userlevel) two additional parameters: - a signal mask - a flags value The flags parameter can be used to set flag like SOCK_CLOEXEC. This is imlpemented here as well. Some people argued that this is a property which should be inherited from the file desriptor for the server but this is against POSIX. Additionally, we really want the signal mask parameter as well (similar to pselect, ppoll, etc). So an interface change in inevitable. The flag value is the same as for socket and socketpair. I think diverging here will only create confusion. Similar to the filesystem interfaces where the use of the O_* constants differs, it is acceptable here. The signal mask is handled as for pselect etc. The mask is temporarily installed for the thread and removed before the call returns. I modeled the code after pselect. If there is a problem it's likely also in pselect. For architectures which use socketcall I maintained this interface instead of adding a system call. The symmetry shouldn't be broken. The following test must be adjusted for architectures other than x86 and x86-64 and in case the syscall numbers changed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <errno.h> #include <fcntl.h> #include <pthread.h> #include <signal.h> #include <stdio.h> #include <unistd.h> #include <netinet/in.h> #include <sys/socket.h> #include <sys/syscall.h> #ifndef __NR_paccept # ifdef __x86_64__ # define __NR_paccept 288 # elif defined __i386__ # define SYS_PACCEPT 18 # define USE_SOCKETCALL 1 # else # error "need __NR_paccept" # endif #endif #ifdef USE_SOCKETCALL # define paccept(fd, addr, addrlen, mask, flags) \ ({ long args[6] = { \ (long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \ syscall (__NR_socketcall, SYS_PACCEPT, args); }) #else # define paccept(fd, addr, addrlen, mask, flags) \ syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags) #endif #define PORT 57392 #define SOCK_CLOEXEC O_CLOEXEC static pthread_barrier_t b; static void * tf (void *arg) { pthread_barrier_wait (&b); int s = socket (AF_INET, SOCK_STREAM, 0); struct sockaddr_in sin; sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr *) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); s = socket (AF_INET, SOCK_STREAM, 0); sin.sin_port = htons (PORT); connect (s, (const struct sockaddr *) &sin, sizeof (sin)); close (s); pthread_barrier_wait (&b); pthread_barrier_wait (&b); sleep (2); pthread_kill ((pthread_t) arg, SIGUSR1); return NULL; } static void handler (int s) { } int main (void) { pthread_barrier_init (&b, NULL, 2); struct sockaddr_in sin; pthread_t th; if (pthread_create (&th, NULL, tf, (void *) pthread_self ()) != 0) { puts ("pthread_create failed"); return 1; } int s = socket (AF_INET, SOCK_STREAM, 0); int reuse = 1; setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse)); sin.sin_family = AF_INET; sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK); sin.sin_port = htons (PORT); bind (s, (struct sockaddr *) &sin, sizeof (sin)); listen (s, SOMAXCONN); pthread_barrier_wait (&b); int s2 = paccept (s, NULL, 0, NULL, 0); if (s2 < 0) { puts ("paccept(0) failed"); return 1; } int coe = fcntl (s2, F_GETFD); if (coe & FD_CLOEXEC) { puts ("paccept(0) set close-on-exec-flag"); return 1; } close (s2); pthread_barrier_wait (&b); s2 = paccept (s, NULL, 0, NULL, SOCK_CLOEXEC); if (s2 < 0) { puts ("paccept(SOCK_CLOEXEC) failed"); return 1; } coe = fcntl (s2, F_GETFD); if ((coe & FD_CLOEXEC) == 0) { puts ("paccept(SOCK_CLOEXEC) does not set close-on-exec flag"); return 1; } close (s2); pthread_barrier_wait (&b); struct sigaction sa; sa.sa_handler = handler; sa.sa_flags = 0; sigemptyset (&sa.sa_mask); sigaction (SIGUSR1, &sa, NULL); sigset_t ss; pthread_sigmask (SIG_SETMASK, NULL, &ss); sigaddset (&ss, SIGUSR1); pthread_sigmask (SIG_SETMASK, &ss, NULL); sigdelset (&ss, SIGUSR1); alarm (4); pthread_barrier_wait (&b); errno = 0 ; s2 = paccept (s, NULL, 0, &ss, 0); if (s2 != -1 || errno != EINTR) { puts ("paccept did not fail with EINTR"); return 1; } close (s); puts ("OK"); return 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [akpm@linux-foundation.org: make it compile] [akpm@linux-foundation.org: add sys_ni stub] Signed-off-by: Ulrich Drepper <drepper@redhat.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <linux-arch@vger.kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Roland McGrath <roland@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24generic irqs: handle failure of irqchip->set_type in setup_irqUwe Kleine-König
set_type returns an int indicating success or failure, but up to now setup_irq ignores that. In my case this resulted in a machine hang: gpio-keys requested IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING, but arm/ns9xxx can only trigger on one direction so set_type didn't touch the configuration which happens do default to a level sensitiveness and returned -EINVAL. setup_irq ignored that and unmasked the irq. This resulted in an endless triggering of the gpio-key interrupt service routine which effectively killed the machine. With this patch applied setup_irq propagates the error to the caller. Note that before in the case chip && !chip->set_type && !chip->name a NULL pointer was feed to printk. This is fixed, too. Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24pm: fix try_to_freeze_tasks()'s use of do_div()David Howells
Fix try_to_freeze_tasks()'s use of do_div() on an s64 by making elapsed_csecs64 a u64 instead and dividing that. Possibly this should be guarded lest the interval calculation turn up negative, but the possible negativity of the result of the division is cast away anyway. This was introduced by patch 438e2ce68dfd4af4cfcec2f873564fb921db4bb5. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24pm: schedule sysrq poweroff on boot cpuZhang Rui
schedule sysrq poweroff on boot cpu. sysrq poweroff needs to disable nonboot cpus, and we need to run this on boot cpu to avoid any recursion. http://bugzilla.kernel.org/show_bug.cgi?id=10897 [kosaki.motohiro@jp.fujitsu.com: build fix] Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Rus <harbour@sfinx.od.ua> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24pm: introduce new interfaces schedule_work_on() and queue_work_on()Zhang Rui
This interface allows adding a job on a specific cpu. Although a work struct on a cpu will be scheduled to other cpu if the cpu dies, there is a recursion if a work task tries to offline the cpu it's running on. we need to schedule the task to a specific cpu in this case. http://bugzilla.kernel.org/show_bug.cgi?id=10897 [oleg@tv-sign.ru: cleanups] Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Rus <harbour@sfinx.od.ua> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24pm: hibernation: simplify memory bitmapAkinobu Mita
This patch simplifies the memory bitmap manipulations. - remove the member size in struct bm_block It is not necessary for struct bm_block to have the number of bit chunks that can be calculated by using end_pfn and start_pfn. - use find_next_bit() for memory_bm_next_pfn No need to invent the bitmap library only for the memory bitmap. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24pm: boot time suspend selftestDavid Brownell
Boot-time test for system suspend states (STR or standby). The generic RTC framework triggers wakeup alarms, which are used to exit those states. - Measures some aspects of suspend time ... this uses "jiffies" until someone converts it to use a timebase that works properly even while timer IRQs are disabled. - Triggered by a command line parameter. By default nothing even vaguely troublesome will happen, but "test_suspend=mem" will give you a brief STR test during system boot. (Or you may need to use "test_suspend=standby" instead, if your hardware needs that.) This isn't without problems. It fires early enough during boot that for example both PCMCIA and MMC stacks have misbehaved. The workaround in those cases was to boot without such media cards inserted. [matthltc@us.ibm.com: fix compile failure in boot time suspend selftest] Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Machek <pavel@suse.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24swsusp: provide users with a hint about the no_console_suspend optionPavel Machek
Tell the user about the no_console_suspend option, so that we don't have to tell each bug reporter personally. [akpm@linux-foundation.org: clarify the text a little] Signed-off-by: Pavel Machek <pavel@suse.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24security: filesystem capabilities refactor kernel codeAndrew G. Morgan
To date, we've tried hard to confine filesystem support for capabilities to the security modules. This has left a lot of the code in kernel/capability.c in a state where it looks like it supports something that filesystem support for capabilities actually suppresses when the LSM security/commmoncap.c code runs. What is left is a lot of code that uses sub-optimal locking in the main kernel With this change we refactor the main kernel code and make it explicit which locks are needed and that the only remaining kernel races in this area are associated with non-filesystem capability code. Signed-off-by: Andrew G. Morgan <morgan@kernel.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24hugetlb: multiple hstates for multiple page sizesAndi Kleen
Add basic support for more than one hstate in hugetlbfs. This is the key to supporting multiple hugetlbfs page sizes at once. - Rather than a single hstate, we now have an array, with an iterator - default_hstate continues to be the struct hstate which we use by default - Add functions for architectures to register new hstates [akpm@linux-foundation.org: coding-style fixes] Acked-by: Adam Litke <agl@us.ibm.com> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24hugetlb: reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings ↵Mel Gorman
until fork() This patch reserves huge pages at mmap() time for MAP_PRIVATE mappings in a similar manner to the reservations taken for MAP_SHARED mappings. The reserve count is accounted both globally and on a per-VMA basis for private mappings. This guarantees that a process that successfully calls mmap() will successfully fault all pages in the future unless fork() is called. The characteristics of private mappings of hugetlbfs files behaviour after this patch are; 1. The process calling mmap() is guaranteed to succeed all future faults until it forks(). 2. On fork(), the parent may die due to SIGKILL on writes to the private mapping if enough pages are not available for the COW. For reasonably reliable behaviour in the face of a small huge page pool, children of hugepage-aware processes should not reference the mappings; such as might occur when fork()ing to exec(). 3. On fork(), the child VMAs inherit no reserves. Reads on pages already faulted by the parent will succeed. Successful writes will depend on enough huge pages being free in the pool. 4. Quotas of the hugetlbfs mount are checked at reserve time for the mapper and at fault time otherwise. Before this patch, all reads or writes in the child potentially needs page allocations that can later lead to the death of the parent. This applies to reads and writes of uninstantiated pages as well as COW. After the patch it is only a write to an instantiated page that causes problems. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24mm/vmstat.c: proper externsAdrian Bunk
This patch adds proper extern declarations for five variables in include/linux/vmstat.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24sched: clean up compiler warningPeter Zijlstra
Reported-by: Daniel Walker <dwalker@mvista.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-23Merge branch 'x86/auditsc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland * 'x86/auditsc' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland: i386 syscall audit fast-path x86_64 ia32 syscall audit fast-path x86_64 syscall audit fast-path x86_64: remove bogus optimization in sysret_signal
2008-07-23Merge branch 'sched/for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: hrtick_enabled() should use cpu_active() sched, x86: clean up hrtick implementation sched: fix build error, provide partition_sched_domains() unconditionally sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed cpu hotplug: Make cpu_active_map synchronization dependency clear cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2) sched: rework of "prioritize non-migratable tasks over migratable ones" sched: reduce stack size in isolated_cpu_setup() Revert parts of "ftrace: do not trace scheduler functions" Fixed up conflicts in include/asm-x86/thread_info.h (due to the TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr() introduction).
2008-07-23Merge branch 'cpus4096-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits) NR_CPUS: Replace NR_CPUS in speedstep-centrino.c cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP NR_CPUS: Replace NR_CPUS in cpufreq userspace routines NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target cpumask: Provide a generic set of CPUMASK_ALLOC macros cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr Revert "cpumask: introduce new APIs" cpumask: make for_each_cpu_mask a bit smaller net: Pass reference to cpumask variable in net/sunrpc/svc.c ... Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
2008-07-23Merge branch 'core/softlockup-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core/softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: softlockup: fix invalid proc_handler for softlockup_panic softlockup: fix watchdog task wakeup frequency softlockup: fix watchdog task wakeup frequency softlockup: show irqtrace softlockup: print a module list on being stuck softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds softlockup: fix softlockup_thresh fix softlockup: fix softlockup_thresh unaligned access and disable detection at runtime softlockup: allow panic on lockup
2008-07-23x86_64 syscall audit fast-pathRoland McGrath
This adds a fast path for 64-bit syscall entry and exit when TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing. This path does not need to save and restore all registers as the general case of tracing does. Avoiding the iret return path when syscall audit is enabled helps performance a lot. Signed-off-by: Roland McGrath <roland@redhat.com>
2008-07-23set_irq_wake: fix return code and wake status trackingUwe Kleine-König
Since 15a647eba94c3da27ccc666bea72e7cca06b2d19 set_irq_wake returned -ENXIO if another device had it already enabled. Zero is the right value to return in this case. Moreover the change to desc->status was not reverted if desc->chip->set_wake returned an error. Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com> Acked-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-23sched: fix hrtick & generic-ipi dependencyIngo Molnar
Andrew Morton reported this s390 allmodconfig build failure: kernel/built-in.o: In function `hrtick_start_fair': sched.c:(.text+0x69c6): undefined reference to `__smp_call_function_single' the reason is that s390 is not a generic-ipi SMP platform yet, while the hrtick code relies on it. Fix the dependency. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linusLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: remove CONFIG_KMOD from core kernel code remove CONFIG_KMOD from lib remove CONFIG_KMOD from sparc64 rework try_then_request_module to do less in non-modular kernels remove mention of CONFIG_KMOD from documentation make CONFIG_KMOD invisible modules: Take a shortcut for checking if an address is in a module module: turn longs into ints for module sizes Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs module: reorder struct module to save space on 64 bit builds module: generic each_symbol iterator function module: don't use stop_machine for waiting rmmod
2008-07-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (79 commits) arm: bus_id -> dev_name() and dev_set_name() conversions sparc64: fix up bus_id changes in sparc core code 3c59x: handle pci_name() being const MTD: handle pci_name() being const HP iLO driver sysdev: Convert the x86 mce tolerant sysdev attribute to generic attribute sysdev: Add utility functions for simple int/ulong variable sysdev attributes sysdev: Pass the attribute to the low level sysdev show/store function driver core: Suppress sysfs warnings for device_rename(). kobject: Transmit return value of call_usermodehelper() to caller sysfs-rules.txt: reword API stability statement debugfs: Implement debugfs_remove_recursive() HOWTO: change email addresses of James in HOWTO always enable FW_LOADER unless EMBEDDED=y uio-howto.tmpl: use unique output names uio-howto.tmpl: use standard copyright/legal markings sysfs: don't call notify_change sysdev: fix debugging statements in registration code. kobject: should use kobject_put() in kset-example kobject: reorder kobject to save space on 64 bit builds ...
2008-07-22Fix build on COMPAT platforms when CONFIG_EPOLL is disabledAtsushi Nemoto
Add missing cond_syscall() entry for compat_sys_epoll_pwait. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-22cpusets: fix wrong domain attr updatesMiao Xie
Fix wrong domain attr updates, or we will always update the first sched domain attr. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Paul Jackson <pj@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> [2.6.26.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-22remove CONFIG_KMOD from core kernel codeJohannes Berg
Always compile request_module when the kernel allows modules. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22modules: Take a shortcut for checking if an address is in a moduleRusty Russell
This patch keeps track of the boundaries of module allocation, in order to speed up module_text_address(). Inspired by Arjan's version, which required arch-specific defines: Various pieces of the kernel (lockdep, latencytop, etc) tend to store backtraces, sometimes at a relatively high frequency. In itself this isn't a big performance deal (after all you're using diagnostics features), but there have been some complaints from people who have over 100 modules loaded that this is a tad too slow. This is due to the new backtracer code which looks at every slot on the stack to see if it's a kernel/module text address, so that's 1024 slots. 1024 times 100 modules... that's a lot of list walking. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22module: turn longs into ints for module sizesDenys Vlasenko
This shrinks module.o and each *.ko file. And finally, structure members which hold length of module code (four such members there) and count of symbols are converted from longs to ints. We cannot possibly have a module where 32 bits won't be enough to hold such counts. For one, module loading checks module size for sanity before loading, so such insanely big module will fail that test first. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefsDenys Vlasenko
module.c and module.h conatains code for finding exported symbols which are declared with EXPORT_UNUSED_SYMBOL, and this code is compiled in even if CONFIG_UNUSED_SYMBOLS is not set and thus there can be no EXPORT_UNUSED_SYMBOLs in modules anyway (because EXPORT_UNUSED_SYMBOL(x) are compiled out to nothing then). This patch adds required #ifdefs. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22module: generic each_symbol iterator functionRusty Russell
Introduce an each_symbol() iterator to avoid duplicating the knowledge about the 5 different sections containing symbols. Currently only used by find_symbol(), but will be used by symbol_put_addr() too. (Includes NULL ptr deref fix by Jiri Kosina <jkosina@suse.cz>) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jiri Kosina <jkosina@suse.cz>
2008-07-22module: don't use stop_machine for waiting rmmodRusty Russell
rmmod has a little-used "-w" option, meaning that instead of failing if the module is in use, it should block until the module becomes unused. In this case, we don't need to use stop_machine: Max Krasnyansky indicated that would be useful for SystemTap which loads/unloads new modules frequently. Cc: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-21sysdev: Pass the attribute to the low level sysdev show/store functionAndi Kleen
This allow to dynamically generate attributes and share show/store functions between attributes. Right now most attributes are generated by special macros and lots of duplicated code. With the attribute passed it's instead possible to attach some data to the attribute and then use that in shared low level functions to do different things. I need this for the dynamically generated bank attributes in the x86 machine check code, but it'll allow some further cleanups. I converted all users in tree to the new show/store prototype. It's a single huge patch to avoid unbisectable sections. Runtime tested: x86-32, x86-64 Compiled only: ia64, powerpc Not compile tested/only grep converted: sh, arm, avr32 Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-07-20sched: hrtick_enabled() should use cpu_active()Ingo Molnar
Peter pointed out that hrtick_enabled() should use cpu_active(). Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-20Merge branch 'sched/urgent' into sched/develIngo Molnar