summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2010-08-12mmc_block: add discard supportAdrian Hunter
Enable MMC to service discard requests. In the case of SD and MMC cards that do not support trim, discards become erases. In the case of cards (MMC) that only allow erases in multiples of erase group size, round to the nearest completely discarded erase group. Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com> Acked-by: Jens Axboe <axboe@kernel.dk> Cc: Kyungmin Park <kmpark@infradead.org> Cc: Madhusudhan Chikkature <madhu.cr@ti.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ben Gardiner <bengardiner@nanometrics.ca> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12mmc: add erase, secure erase, trim and secure trim operationsAdrian Hunter
SD/MMC cards tend to support an erase operation. In addition, eMMC v4.4 cards can support secure erase, trim and secure trim operations that are all variants of the basic erase command. SD/MMC device attributes "erase_size" and "preferred_erase_size" have been added. "erase_size" is the minimum size, in bytes, of an erase operation. For MMC, "erase_size" is the erase group size reported by the card. Note that "erase_size" does not apply to trim or secure trim operations where the minimum size is always one 512 byte sector. For SD, "erase_size" is 512 if the card is block-addressed, 0 otherwise. SD/MMC cards can erase an arbitrarily large area up to and including the whole card. When erasing a large area it may be desirable to do it in smaller chunks for three reasons: 1. A single erase command will make all other I/O on the card wait. This is not a problem if the whole card is being erased, but erasing one partition will make I/O for another partition on the same card wait for the duration of the erase - which could be a several minutes. 2. To be able to inform the user of erase progress. 3. The erase timeout becomes too large to be very useful. Because the erase timeout contains a margin which is multiplied by the size of the erase area, the value can end up being several minutes for large areas. "erase_size" is not the most efficient unit to erase (especially for SD where it is just one sector), hence "preferred_erase_size" provides a good chunk size for erasing large areas. For MMC, "preferred_erase_size" is the high-capacity erase size if a card specifies one, otherwise it is based on the capacity of the card. For SD, "preferred_erase_size" is the allocation unit size specified by the card. "preferred_erase_size" is in bytes. Signed-off-by: Adrian Hunter <adrian.hunter@nokia.com> Acked-by: Jens Axboe <axboe@kernel.dk> Cc: Kyungmin Park <kmpark@infradead.org> Cc: Madhusudhan Chikkature <madhu.cr@ti.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ben Gardiner <bengardiner@nanometrics.ca> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12mm: fix writeback_in_progress()Jan Kara
Commit 83ba7b071f3 ("writeback: simplify the write back thread queue") broke writeback_in_progress() as in that commit we started to remove work items from the list at the moment we start working on them and not at the moment they are finished. Thus if the flusher thread was doing some work but there was no other work queued, writeback_in_progress() returned false. This could in particular cause unnecessary queueing of background writeback from balance_dirty_pages() or writeout work from writeback_sb_if_idle(). This patch fixes the problem by introducing a bit in the bdi state which indicates that the flusher thread is processing some work and uses this bit for writeback_in_progress() test. NOTE: Both callsites of writeback_in_progress() (namely, writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually need a different information than what writeback_in_progress() provides. They would need to know whether *the kind of writeback they are going to submit* is already queued. But this information isn't that simple to provide so let's fix writeback_in_progress() for the time being. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12writeback: merge for_kupdate and !for_kupdate casesWu Fengguang
Unify the logic for kupdate and non-kupdate cases. There won't be starvation because the inodes requeued into b_more_io will later be spliced _after_ the remaining inodes in b_io, hence won't stand in the way of other inodes in the next run. It avoids unnecessary redirty_tail() calls, hence the update of i_dirtied_when. The timestamp update is undesirable because it could later delay the inode's periodic writeback, or may exclude the inode from the data integrity sync operation (which checks timestamp to avoid extra work and livelock). === How the redirty_tail() comes about: It was a long story.. This redirty_tail() was introduced with wbc.more_io. The initial patch for more_io actually does not have the redirty_tail(), and when it's merged, several 100% iowait bug reports arised: reiserfs: http://lkml.org/lkml/2007/10/23/93 jfs: commit 29a424f28390752a4ca2349633aaacc6be494db5 JFS: clear PAGECACHE_TAG_DIRTY for no-write pages ext2: http://www.spinics.net/linux/lists/linux-ext4/msg04762.html They are all old bugs hidden in various filesystems that become "visible" with the more_io patch. At the time, the ext2 bug is thought to be "trivial", so not fixed. Instead the following updated more_io patch with redirty_tail() is merged: http://www.spinics.net/linux/lists/linux-ext4/msg04507.html This will in general prevent 100% on ext2 and possibly other unknown FS bugs. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Martin Bligh <mbligh@google.com> Cc: Michael Rubin <mrubin@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12writeback: fix queue_io() orderingWu Fengguang
This was not a bug, since b_io is empty for kupdate writeback. The next patch will do requeue_io() for non-kupdate writeback, so let's fix it. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Martin Bligh <mbligh@google.com> Cc: Michael Rubin <mrubin@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12writeback: don't redirty tail an inode with dirty pagesWu Fengguang
Avoid delaying writeback for an expire inode with lots of dirty pages, but no active dirtier at the moment. Previously we only do that for the kupdate case. Any filesystem that does delayed allocation or unwritten extent conversion after IO completion will cause this - for example, XFS. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12writeback: add comment to the dirty limit functionsWu Fengguang
Document global_dirty_limits() and bdi_dirty_limit(). Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12writeback: avoid unnecessary calculation of bdi dirty thresholdsWu Fengguang
Split get_dirty_limits() into global_dirty_limits()+bdi_dirty_limit(), so that the latter can be avoided when under global dirty background threshold (which is the normal state for most systems). Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12writeback: balance_dirty_pages(): reduce calls to global_page_stateWu Fengguang
Reducing the number of times balance_dirty_pages calls global_page_state reduces the cache references and so improves write performance on a variety of workloads. 'perf stats' of simple fio write tests shows the reduction in cache access. Where the test is fio 'write,mmap,600Mb,pre_read' on AMD AthlonX2 with 3Gb memory (dirty_threshold approx 600 Mb) running each test 10 times, dropping the fasted & slowest values then taking the average & standard deviation average (s.d.) in millions (10^6) 2.6.31-rc8 648.6 (14.6) +patch 620.1 (16.5) Achieving this reduction is by dropping clip_bdi_dirty_limit as it rereads the counters to apply the dirty_threshold and moving this check up into balance_dirty_pages where it has already read the counters. Also by rearrange the for loop to only contain one copy of the limit tests allows the pdflush test after the loop to use the local copies of the counters rather than rereading them. In the common case with no throttling it now calls global_page_state 5 fewer times and bdi_stat 2 fewer. Fengguang: This patch slightly changes behavior by replacing clip_bdi_dirty_limit() with the explicit check (nr_reclaimable + nr_writeback >= dirty_thresh) to avoid exceeding the dirty limit. Since the bdi dirty limit is mostly accurate we don't need to do routinely clip. A simple dirty limit check would be enough. The check is necessary because, in principle we should throttle everything calling balance_dirty_pages() when we're over the total limit, as said by Peter. We now set and clear dirty_exceeded not only based on bdi dirty limits, but also on the global dirty limit. The global limit check is added in place of clip_bdi_dirty_limit() for safety and not intended as a behavior change. The bdi limits should be tight enough to keep all dirty pages under the global limit at most time; occasional small exceeding should be OK though. The change makes the logic more obvious: the global limit is the ultimate goal and shall be always imposed. We may now start background writeback work based on outdated conditions. That's safe because the bdi flush thread will (and have to) double check the states. It reduces overall overheads because the test based on old states still have good chance to be right. [akpm@linux-foundation.org] fix uninitialized dirty_exceeded Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Jan Kara <jack@suse.cz> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12parisc: fix wrong page aligned size calculation in ioremapping codeFlorian Zumbiehl
parisc __ioremap(): fix off-by-one error in page alignment of allocation size for sizes where size%PAGE_SIZE==1. Signed-off-by: Florian Zumbiehl <florz@florz.de> Cc: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Helge Deller <deller@gmx.de> Tested-by: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12score: fix dereference of NULL pointer in local_flush_tlb_page()Roel Kluin
Don't dereference vma if it's NULL. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12pc8736x_gpio: depends on X86_32Randy Dunlap
Fix kconfig dependency warning for PC8736x_GPIO by restricting it to X86_32. warning: (SCx200_GPIO && SCx200 || PC8736x_GPIO && X86) selects NSC_GPIO which has unmet direct dependencies (X86_32) NSC_GPIO is X86_32 only. The other driver (SCx200_GPIO) that selects NSC_GPIO is X86_32 only (indirectly, since SCx200 depends on X86_32), so limit this driver also. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Jordan Crouse <jordan.crouse@amd.com> Cc: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12mm: fix fatal kernel-doc errorRandy Dunlap
Fix a fatal kernel-doc error due to a #define coming between a function's kernel-doc notation and the function signature. (kernel-doc cannot handle this) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12acpi: fix bogus preemption logicThomas Gleixner
The ACPI_PREEMPTION_POINT() logic was introduced in commit 8bd108d (ACPICA: add preemption point after each opcode parse). The follow up commits abe1dfab6, 138d15692, c084ca70 tried to fix the preemption logic back and forth, but nobody noticed that the usage of in_atomic_preempt_off() in that context is wrong. The check which guards the call of cond_resched() is: if (!in_atomic_preempt_off() && !irqs_disabled()) in_atomic_preempt_off() is not intended for general use as the comment above the macro definition clearly says: * Check whether we were atomic before we did preempt_disable(): * (used by the scheduler, *after* releasing the kernel lock) On a CONFIG_PREEMPT=n kernel the usage of in_atomic_preempt_off() works by accident, but with CONFIG_PREEMPT=y it's just broken. The whole purpose of the ACPI_PREEMPTION_POINT() is to reduce the latency on a CONFIG_PREEMPT=n kernel, so make ACPI_PREEMPTION_POINT() depend on CONFIG_PREEMPT=n and remove the in_atomic_preempt_off() check. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16210 [akpm@linux-foundation.org: fix build] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Len Brown <lenb@kernel.org> Cc: Francois Valenduc <francois.valenduc@tvcablenet.be> Cc: Lin Ming <ming.m.lin@intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12kernel/kfifo.c: add handling of chained scatterlistsStefani Seibold
The current kfifo scatterlist implementation will not work with chained scatterlists. It assumes that struct scatterlist arrays are allocated contiguously, which is not the case when chained scatterlists (struct sg_table) are in use. Signed-off-by: Stefani Seibold <stefani@seibold.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: isofs: Fix lseek() to position beyond 4 GB vfs: remove unused MNT_STRICTATIME vfs: show unreachable paths in getcwd and proc vfs: only add " (deleted)" where necessary vfs: add prepend_path() helper vfs: __d_path: dont prepend the name of the root dentry ia64: perfmon: add d_dname method vfs: add helpers to get root and pwd cachefiles: use path_get instead of lone dget fs/sysv/super.c: add support for non-PDP11 v7 filesystems V7: Adjust sanity checks for some volumes Add v7 alias v9fs: fixup for inode_setattr being removed Manual merge to take Al's version of the fs/sysv/super.c file: it merged cleanly, but Al had removed an unnecessary header include, so his side was better.
2010-08-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linusLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus: Squashfs: fix checkpatch.pl warnings Squashfs: fix filename typo Squashfs: update Kconfig and documentation for LZO Squashfs: fix block size use in LZO decompressor Squashfs: Add LZO compression support squashfs: fix filename in header comment Squashfs: Make XATTR config name consistent with other file systems squashfs: fix compiler inline warning
2010-08-11Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osdLinus Torvalds
* 'for-linus' of git://git.open-osd.org/linux-open-osd: exofs: Fix groups code when num_devices is not divisible by group_width exofs: Remove useless optimization exofs: exofs_file_fsync and exofs_file_flush correctness exofs: Remove superfluous dependency on buffer_head and writeback
2010-08-11Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (39 commits) ceph: generalize mon requests, add pool op support ceph: only queue async writeback on cap revocation if there is dirty data ceph: do not ignore osd_idle_ttl mount option ceph: constify dentry_operations ceph: whitespace cleanup ceph: add flock/fcntl lock support ceph: define on-wire types, constants for file locking support ceph: add CEPH_FEATURE_FLOCK to the supported feature bits ceph: support v2 reconnect encoding ceph: support v2 client_caps encoding ceph: move AES iv definition to shared header ceph: fix decoding of pool snap info ceph: make ->sync_fs not wait if wait==0 ceph: warn on missing snap realm ceph: print useful error message when crush rule not found ceph: use %pU to print uuid (fsid) ceph: sync header defs with server code ceph: clean up header guards ceph: strip misleading/obsolete version, feature info ceph: specify supported features in super.h ...
2010-08-11Merge branch 'msm-video' of git://codeaurora.org/quic/kernel/dwalker/linux-msmLinus Torvalds
* 'msm-video' of git://codeaurora.org/quic/kernel/dwalker/linux-msm: video: msm: Fix section mismatch in mddi.c. drivers: video: msm: drop some unused variables
2010-08-11Merge branch 'ixp4xx' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chris/linux-2.6 * 'ixp4xx' of git://git.kernel.org/pub/scm/linux/kernel/git/chris/linux-2.6: IXP4xx: Fix LL debugging on little-endian CPU. IXP4xx: Fix sparse warnings in I/O primitives. IXP4xx: Make mdio_bus struct static in the Ethernet driver. IXP4xx: Fix ixp4xx_crypto little-endian operation. IXP4xx: Prevent HSS transmitter lockup by disabling FRaMe signals. ixp4xx/vulcan: add PCI support ixp4xx: base support for Arcom Vulcan
2010-08-11Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds
* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (226 commits) ARM: 6323/1: cam60: don't use __init for cam60_spi_{flash_platform_data,partitions} ARM: 6324/1: cam60: move cam60_spi_devices to .init.data ARM: 6322/1: imx/pca100: Fix name of spi platform data ARM: 6321/1: fix syntax error in main Kconfig file ARM: 6297/1: move U300 timer to dynamic clock lookup ARM: 6296/1: clock U300 intcon and timer properly ARM: 6295/1: fix U300 apb_pclk split ARM: 6306/1: fix inverted MMC card detect in U300 ARM: 6299/1: errata: TLBIASIDIS and TLBIMVAIS operations can broadcast a faulty ASID ARM: 6294/1: etm: do a dummy read from OSSRR during initialization ARM: 6292/1: coresight: add ETM management registers ARM: 6288/1: ftrace: document mcount formats ARM: 6287/1: ftrace: clean up mcount assembly indentation ARM: 6286/1: fix Thumb-2 decompressor broken by "Auto calculate ZRELADDR" ARM: 6281/1: video/imxfb.c: allow usage without BACKLIGHT_CLASS_DEVICE ARM: 6280/1: imx: Fix build failure when including <mach/gpio.h> without <linux/spinlock.h> ARM: S5PV210: Fix on missing s3c-sdhci card detection method for hsmmc3 ARM: S5P: Fix on missing S5P_DEV_FIMC in plat-s5p/Kconfig ARM: S5PV210: Override FIMC driver name on Aquila board ARM: S5PC100: enable FIMC on SMDKC100 ... Fix up conflicts in arch/arm/mach-{s5pc100,s5pv210}/cpu.c due to different subsystem 'setname' calls, and trivial port types in include/linux/serial_core.h
2010-08-11lib/decompress_bunzip2.c: fix checkstack warningPrarit Bhargava
Fix checkstack error: lib/decompress_bunzip2.c: In function `get_next_block': lib/decompress_bunzip2.c:511: warning: the frame size of 1932 bytes is larger than 1024 bytes byteCount, symToByte, and mtfSymbol cannot be declared static or allocated dynamically so place them in the bunzip_data struct. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Phillip Lougher <phillip@lougher.demon.co.uk> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11kfifo: add example files to the kernel sample directoryStefani Seibold
Add four examples to the kernel sample directory. It shows how to handle: - a byte stream fifo - a integer type fifo - a dynamic record sized fifo - the fifo DMA functions [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Stefani Seibold <stefani@seibold.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11kfifo: replace the old non generic APIStefani Seibold
Simply replace the whole kfifo.c and kfifo.h files with the new generic version and fix the kerneldoc API template file. Signed-off-by: Stefani Seibold <stefani@seibold.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11kfifo: add the new generic kfifo APIStefani Seibold
Add the new version of the kfifo API files kfifo.c and kfifo.h. Signed-off-by: Stefani Seibold <stefani@seibold.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11kfifo: fix kfifo miss use of nozami.cStefani Seibold
There are different types of a fifo which can not handled in C without a lot of overhead. So i decided to write the API as a set of macros, which is the only way to do a kind of template meta programming without C++. This macros handles the different types of fifos in a transparent way. There are a lot of benefits: - Compile time handling of the different fifo types - Better performance (a save put or get of an integer does only generate 9 assembly instructions on a x86) - Type save - Cleaner interface, the additional kfifo_..._rec() functions are gone - Easier to use - Less error prone - Different types of fifos: it is now possible to define a int fifo or any other type. See below for an example. - Smaller footprint for none byte type fifos - No need of creating a second hidden variable, like in the old DEFINE_KFIFO The API was not changed. There are now real in place fifos where the data space is a part of the structure. The fifo needs now 20 byte plus the fifo space. Dynamic assigned or allocated create a little bit more code. Most of the macros code will be optimized away and simple generate a function call. Only the really small one generates inline code. Additionally you can now create fifos for any data type, not only the "unsigned char" byte streamed fifos. There is also a new kfifo_put and kfifo_get function, to handle a single element in a fifo. This macros generates inline code, which is lit bit larger but faster. I know that this kind of macros are very sophisticated and not easy to maintain. But i have all tested and it works as expected. I analyzed the output of the compiler and for the x86 the code is as good as hand written assembler code. For the byte stream fifo the generate code is exact the same as with the current kfifo implementation. For all other types of fifos the code is smaller before, because the interface is easier to use. The main goal was to provide an API which is very intuitive, save and easy to use. So linux will get now a powerful fifo API which provides all what a developer needs. This will save in the future a lot of kernel space, since there is no need to write an own implementation. Most of the device driver developers need a fifo, and also deep kernel development will gain benefit from this API. Here are the results of the text section usage: Example 1: kfifo_put/_get kfifo_in/out current kfifo dynamic allocated 0x000002a8 0x00000291 0x00000299 in place 0x00000291 0x0000026e 0x00000273 kfifo.c new old text section size 0x00000be5 0x000008b2 As you can see, kfifo_put/kfifo_get creates a little bit more code than kfifo_in/kfifo_out, but it is much faster (the code is inline). The code is complete hand crafted and optimized. The text section size is as small as possible. You get all the fifo handling in only 3 kb. This includes type safe fix size records, dynamic records and DMA handling. This should be the final version. All requested features are implemented. Note: Most features of this API doesn't have any users. All functions which are not used in the next 9 months will be removed. So, please adapt your drivers and other sources as soon as possible to the new API and post it. This are the features which are currently not used in the kernel: kfifo_to_user() kfifo_from_user() kfifo_dma_....() macros kfifo_esize() kfifo_recsize() kfifo_put() kfifo_get() The fixed size record elements, exclude "unsigned char" fifo's and the variable size records fifo's This patch: User of the kernel fifo should never bypass the API and directly access the fifo structure. Otherwise it will be very hard to maintain the API. Signed-off-by: Stefani Seibold <stefani@seibold.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11kfifo: kfifo_is_{full,empty} should return bools, not intsRobert P. J. Day
For consistency with other kfifo routines, return bool, not int. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Cc: Stefani Seibold <stefani@seibold.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11fs/sysv/super.c: add support for non-PDP11 v7 filesystemsLubomir Rintel
This adds byte order autodetection (of PDP-11 and LE filesystems). No attempt is made to detect big-endian filesystems -- were there any? Tested with PDP-11 v7 filesystems and PC-IX maintenance floppy. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11fs/sysv: v7: adjust sanity checks for some volumesLubomir Rintel
Newly mkfs-ed filesystems from Seventh Edition have last modification time set to zero, but are otherwise perfectly valid. Also, tighten up other sanity checks to filter out most filesystems with [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11fs/sysv: add v7 aliasLubomir Rintel
So that the module gets autoloaded when a v7 filesystem is mounted. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11kexec: return -EFAULT on copy_to_user() failuresDan Carpenter
copy_to/from_user() returns the number of bytes remaining to be copied. It never returns a negative value. The correct return code is -EFAULT and not -EIO. All the callers check for non-zero returns so that's Ok, but the return code is passed to the user so we should fix this. Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Simon Kagstrom <simon.kagstrom@netinsight.net> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11parport_serial: use the PCI IRQ if offeredFr?d?ric Bri?re
Commit 51dcdfe ("parport: Use the PCI IRQ if offered") added IRQ support for PCI parallel port devices handled by parport_pc, but turned it off for parport_serial, despite a printk() message to the contrary. Signed-off-by: Fr?d?ric Bri?re <fbriere@fbriere.net> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11lib/bug.c: add oops end marker to WARN implementationAnton Blanchard
We are missing the oops end marker for the exception based WARN implementation in lib/bug.c. This is useful for logfile analysis tools. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11lib/bug.c: make WARN implementation match the kernel/panic.c oneAnton Blanchard
There are a few issues with the exception based WARN implementation in lib/bug.c: - Inconsistent printk flags. The "cut here" line is printed at KERN_EMERG, so the console and all logged in users see the single line: ------------[ cut here ]------------ for each WARN. Fix this so we print everything at KERN_WARNING to match the kernel/panic.c version. - The lib/bug.c WARN would print "Badness at". Change it to match the kernel/panic.c version which prints "WARNING: at". - Print the list of modules, similar to kernel/panic.c of modules, similar to kernel/panic.c [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Anton Blanchard <anton@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11panic: keep blinking in spite of long spin timer modeTAMUKI Shoichi
To keep panic_timeout accuracy when running under a hypervisor, the current implementation only spins on long time (1 second) calls to mdelay. That brings a good effect, but the problem is the keyboard LEDs don't blink at all on that situation. This patch changes to call to panic_blink_enter() between every mdelay and keeps blinking in spite of long spin timer mode. The time to call to mdelay is now 100ms. Even this change will keep panic_timeout accuracy enough when running under a hypervisor. Signed-off-by: TAMUKI Shoichi <tamuki@linet.gr.jp> Cc: Ben Dooks <ben-linux@fluff.org> Cc: Russell King <linux@arm.linux.org.uk> Acked-by: Dmitry Torokhov <dtor@mail.ru> Cc: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11afs: destroy work queue on init failureDan Carpenter
We can clean up the work queue on this error path. This function is called from afs_init(). Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11dma-mapping: add DMA_xxBIT_MASK to feature-removal-schedule.txtFUJITA Tomonori
DMA_xxBIT_MASK macros were marked as deprecated in June 2009. One more year is long enough, I think. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11pci: add PCI DMA unamp state API to feature-removal-schedule.txtFUJITA Tomonori
It was replaced with the DMA unamp state API (which can be used for any bus). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11Documentation: DMA-API-HOWTO.txt: add multiple types of IOMMUs supportFUJITA Tomonori
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11dma-mapping: remove dma_is_consistent APIFUJITA Tomonori
Architectures implement dma_is_consistent() in different ways (some misinterpret the definition of API in DMA-API.txt). So it hasn't been so useful for drivers. We have only one user of the API in tree. Unlikely out-of-tree drivers use the API. Even if we fix dma_is_consistent() in some architectures, it doesn't look useful at all. It was invented long ago for some old systems that can't allocate coherent memory at all. It's better to export only APIs that are definitely necessary for drivers. Let's remove this API. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11scsi: 53c700: remove dma_is_consistent usageFUJITA Tomonori
This driver is the only user of dma_is_consistent(). We plan to remove this API. The driver uses the API in the following way: BUG_ON(!dma_is_consistent(hostdata->dev, pScript) && L1_CACHE_BYTES < dma_get_cache_alignment()); The above code tries to see if L1_CACHE_BYTES is greater than dma_get_cache_alignment() on sysmtes that can not allocate coherent memory (some old systems can't). James Bottomley exmplained that this is necesary because the driver packs the set of mailboxes into a single coherent area and separates the different usages by a L1 cache stride. So it's fatal if the dma He also pointed out that we can kill this checking because we don't hit this BUG_ON on all architectures that actually use the driver. (akpm: stolen from the scsi tree because dma-mapping-remove-dma_is_consistent-api.patch needs it) Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: James Bottomley <James.Bottomley@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11dma-mapping: parisc: set ARCH_DMA_MINALIGNFUJITA Tomonori
Architectures that handle DMA-non-coherent memory need to set ARCH_DMA_MINALIGN to make sure that kmalloc'ed buffer is DMA-safe: the buffer doesn't share a cache with the others. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Helge Deller <deller@gmx.de> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11dma-mapping: unify dma_get_cache_alignment implementationsFUJITA Tomonori
dma_get_cache_alignment returns the minimum DMA alignment. Architectures defines it as ARCH_DMA_MINALIGN (formally ARCH_KMALLOC_MINALIGN). So we can unify dma_get_cache_alignment implementations. Note that some architectures implement dma_get_cache_alignment wrongly. dma_get_cache_alignment() should return the minimum DMA alignment. So fully-coherent architectures should return 1. This patch also fixes this issue. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11dma-mapping: rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGNFUJITA Tomonori
Now each architecture has the own dma_get_cache_alignment implementation. dma_get_cache_alignment returns the minimum DMA alignment. Architectures define it as ARCH_KMALLOC_MINALIGN (it's used to make sure that malloc'ed buffer is DMA-safe; the buffer doesn't share a cache with the others). So we can unify dma_get_cache_alignment implementations. This patch: dma_get_cache_alignment() needs to know if an architecture defines ARCH_KMALLOC_MINALIGN or not (needs to know if architecture has DMA alignment restriction). However, slab.h define ARCH_KMALLOC_MINALIGN if architectures doesn't define it. Let's rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN. ARCH_KMALLOC_MINALIGN is used only in the internals of slab/slob/slub (except for crypto). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11edac: mpc85xx: add support for new MPCxxx/Pxxxx EDAC controllersAnton Vorontsov
Simply add proper IDs into the device table. Signed-off-by: Anton Vorontsov <avorontsov@mvista.com> Cc: Scott Wood <scottwood@freescale.com> Cc: Peter Tyser <ptyser@xes-inc.com> Cc: Dave Jiang <djiang@mvista.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11edac: i5400: improve handling of pci_enable_device() return valueKulikov Vasiliy
-EIO is not the only error code that pci_enable_device() may return, also the set of errors can be enhanced in future. We should compare return code with zero, not with concrete error value. Signed-off-by: Kulikov Vasiliy <segooon@gmail.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Jeff Roberson <jroberson@jroberson.net> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11edac: i5000: improve handling of pci_enable_device() return valueKulikov Vasiliy
-EIO is not the only error code that pci_enable_device() may return, also the set of errors can be enhanced in future. We should compare return code with zero, not with concrete error value. Signed-off-by: Kulikov Vasiliy <segooon@gmail.com> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Jeff Roberson <jroberson@jroberson.net> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11edac: add wissing pieces from MPC85xx -> FSL_SOC_BOOKEChristoph Egger
In 5753c082f66eca5be81f6bda85c1718c5eea6ada ("powerpc/85xx: Kconfig cleanup") menuconfig MPC85xx was replaced by FSL_SOC_BOOKE but some references insider the code were not adjusted accordingly. This patch adresses these missing pieces. Signed-off-by: Christoph Egger <siccegge@cs.fau.de> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Peter Tyser <ptyser@xes-inc.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Scott Wood <scottwood@freescale.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11pids: alloc_pidmap: remove the unnecessary boundary checksOleg Nesterov
alloc_pidmap() calculates max_scan so that if the initial offset != 0 we inspect the first map->page twice. This is correct, we want to find the unused bits < offset in this bitmap block. Add the comment. But it doesn't make any sense to stop the find_next_offset() loop when we are looking into this map->page for the second time. We have already already checked the bits >= offset during the first attempt, it is fine to do this again, no matter if we succeed this time or not. Remove this hard-to-understand code. It optimizes the very unlikely case when we are going to fail, but slows down the more likely case. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Salman Qazi <sqazi@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>