summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2011-03-23regulator: Add a subdriver for TI TPS6105x regulator portions v2Linus Walleij
This adds a subdriver for the regulator found inside the TPS61050 and TPS61052 chips. Cc: Samuel Ortiz <samuel.ortiz@intel.com> Cc: Ola Lilja <ola.o.lilja@stericsson.com> Cc: Jonas Aberg <jonas.aberg@stericsson.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Add a core driver for TI TPS61050/TPS61052 chips v2Linus Walleij
The TPS61050/TPS61052 are boost converters, LED drivers, LED flash drivers and a simple GPIO pin chips. Cc: Liam Girdwood <lrg@slimlogic.co.uk> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Jonas Aberg <jonas.aberg@stericsson.com> Cc: Ola Lilja <ola.o.lilja@stericsson.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23pci_ids: Add Intel Tunnel Creek LPC Bridge device ID.Denis Turischev
Signed-off-by: Denis Turischev <denis@compulab.co.il> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23regulator: MAX8997/8966 supportMyungJoo Ham
This patch supports PMIC/Regulator part of MAX8997/MAX8966 MFD. In this initial release, selecting voltages or current-limit and switching on/off the regulators are supported. Controlling voltages for DVS with GPIOs is not implemented fully and requires more considerations: it controls multiple bucks (selection of 1, 2, and 5) at the same time with SET1~3 gpios. Thus, when DVS-GPIO is activated, we lose the ability to control the voltage of a single buck regulator independently; i.e., contolling a buck affects other two bucks. Therefore, using the conventional regulator framework directly might be problematic. However, in this driver, we try to choose a setting without such side effect of affecting other regulators and then try to choose a setting with the minimum side effect (the sum of voltage changes in other regulators). On the other hand, controlling all the three bucks simultenously based on the voltage set table may help build cpufreq and similar system more robust; i.e., all the three voltages are consistent every time without glitches during transition. Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Add WM8994 bulk register write operationMark Brown
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Append additional read write on 88pm860xHaojian Zhuang
Append the additional read/write operation on 88pm860x for accessing test page in 88PM860x. Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Adopt mfd_data in 88pm860x regulatorHaojian Zhuang
Copy 88pm860x platform data into different mfd_data structure for regulator driver. So move the identification of device node from regulator driver to mfd driver. Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Adopt mfd_data in 88pm860x ledHaojian Zhuang
Copy 88pm860x platform data into different mfd_data structure for led driver. So move the identification of device node from led driver to mfd driver. Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Adopt mfd_data in 88pm860x backlightHaojian Zhuang
Copy 88pm860x platform data into different mfd_data structure for backlight driver. So move the identification of device node from backlight driver to mfd driver. Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Reentrance and revamp ab8500 gpadc fetching interfaceDaniel Willerud
This revamps the interface so that AB8500 GPADCs are fetched by name. Probed GPADCs are added to a list and this list is searched for a matching GPADC. This makes it possible to have multiple AB8500 GPADC instances instead of it being a singleton, and rids the need to keep a GPADC pointer around in the core AB8500 MFD struct. Currently the match is made to the device name which is by default numbered from the device instance such as "ab8500-gpadc.0" but by using the .init_name field of the device a more intiutive naming for the GPADC blocks can be achieved if desired. Signed-off-by: Daniel Willerud <daniel.willerud@stericsson.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Move ab8500 gpadc header to subdirDaniel Willerud
This moves the ab8500-gpadc.h header down into the ab8500/ subdir in include/linux/mfd and fixes some whitespace in the header in the process. Signed-off-by: Daniel Willerud <daniel.willerud@stericsson.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: MAX8997/8966 supportMyungJoo Ham
MAX8997/MAX8966 chip is a multi-function device with I2C bussses. The chip includes PMIC, RTC, Fuel Gauge, MUIC, Haptic, Flash control, and Battery (charging) control. This patch is an initial release of a MAX8997/8966 driver that supports to enable the chip with its primary I2C bus that connects every device mentioned above except for Fuel Gauge, which uses another I2C bus. The fuel gauge is not supported by this mfd driver and is supported by a seperated driver of MAX17042 Fuel Gauge (yes, the fuel gauge part is compatible with MAX17042). Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Reviewed-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Rename mfd_shared_cell_{en,dis}able to drop the "shared" partAndres Salomon
As requested by Samuel, there's not really any reason to have "shared" in the name. This also modifies the only user of the function, as well. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: ab8500 chip revision 3.0 supportMattias Wallin
This patch adds support for ab8500 chip revision cut 3.0. Also rephrased from Changes to Author in the header. Signed-off-by: Mattias Wallin <mattias.wallin@stericsson.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Add platform data to support multiple WM831x devices per boardMark Brown
If a system contains multiple WM831x devices we need to pass a device number through to the MFD so that we use unique device IDs when we instantiate child devices. In order to get support for this into 2.6.39 add some platform data to support the configuration, but no implementation as yet. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Add twl4030 madc driverKeerthy
Introducing a driver for MADC on TWL4030 powerIC. MADC stands for monitoring ADC. This driver monitors the real time conversion of analog signals like battery temperature, battery cuurent etc. Signed-off-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: add platform_device sharing support for mfdAndres Salomon
This adds functions to enable platform_device sharing for mfd clients. Each platform driver (mfd client) that wants to share an mfd_cell's platform_device uses the mfd_shared_platform_driver_{un,}register() functions instead of platform_driver_{un,}register(). Along with registering the platform driver, these also register a new platform device with the same characteristics as the original cell, but a different name. Given an mfd_cell with the name "foo", drivers that want to share access to its resources can call mfd_shared_platform_driver_register with platform drivers named (for example) "bar" and "baz". This will register two platform devices and drivers named "bar" and "baz" that share the same cell as the platform device "foo". The drivers can then call "foo" cell's enable hooks (or mfd_shared_cell_enable) to enable resources, and obtain platform resources as they normally would. This deals with platform handling only; mfd driver-specific details, hardware handling, refcounting, etc are all dealt with separately. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Add refcounting support to mfd_cellsAndres Salomon
This provides convenience functions for sharing of cells across multiple mfd clients. Mfd drivers can provide enable/disable hooks to actually tweak the hardware, and clients can call mfd_shared_cell_{en,dis}able without having to worry about whether or not another client happens to have enabled or disabled the cell/hardware. Note that this is purely optional; drivers can continue to use the mfd_cell's enable/disable hooks for their own purposes, if desired. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Remove driver_data field from mfd_cellAndres Salomon
All users of this have now been switched over to using mfd_data; it can go away now. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Rename platform_data field of mfd_cell to mfd_dataAndres Salomon
Rename the platform_data variable to imply a distinction between common platform_data driver usage (typically accessed via pdev->dev.platform_data) and the way MFD passes data down to clients (using a wrapper named mfd_get_data). All clients have already been changed to use the wrapper function, so this can be a quick single-commit change that only touches things in drivers/mfd. Signed-off-by: Andres Salomon <dilinger@queued.net> Acked-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Drop data_size from mfd_cell structAndres Salomon
Now that there are no more users of this, drop it. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: mfd_cell is now implicitly available to mc13xxx driversAndres Salomon
The cell's platform_data is now accessed with a helper function; change clients to use that, and remove the now-unused data_size. Note that mfd-core no longer makes a copy of platform_data, but the mc13xxx-core driver creates the pdata structures on the stack. In order to get around that, the various ARM mach types that set the pdata have been changed to hold the variable in static (global) memory. Also note that __initdata references in aforementioned pdata structs have been dropped. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: mfd_cell is now implicitly available to ab3550 driverAndres Salomon
No clients (in mainline kernel, I'm told that drivers exist in external trees that are planned for mainline inclusion) make use of this, nor do they make use of platform_data, so nothing really had to change here. The .data_size field is unused, so its usage gets removed. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd-core: Unconditionally add mfd_cell to every platform_deviceAndres Salomon
Previously, one would set the mfd_cell's platform_data/data_size to point to the current mfd_cell in order to pass that information along to drivers. This causes the current mfd_cell to always be available to drivers. It also adds a wrapper function for fetching the mfd cell from a platform device, similar to what originally existed for mfd devices. Drivers who previously used platform_data for other purposes can still use it; the difference is that mfd_get_data() must be used to access it (and the pdata structure is no longer allocated in mfd_add_devices). Note that mfd_get_data is intentionally vague (in name) about where the data is stored; variable name changes can come later without having to touch brazillions of drivers. Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd-core: Fix up typos/vagueness in commentAndres Salomon
Signed-off-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23regulator: twl: add clk32kg to twl-regulatorBalaji T K
In OMAP4 Blaze and Panda, 32KHz clock to WLAN is supplied from Phoenix TWL6030. The 32KHz clock state (ON/OFF) is configured in CLK32KG_CFG_[GRP, TRANS, STATE] register. This follows the same register programming model as other regulators in TWL6030. So add CLK32KG as pseudo regulator. Signed-off-by: Balaji T K <balajitk@ti.com> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Add new ab8500 GPADC driverArun Murthy
AB8500 GPADC driver used to convert Acc and battery/ac/usb voltage Signed-off-by: Arun Murthy <arun.murthy@stericsson.com> Acked-by: Linus Walleij <linus.walleij@stericsson.com> Acked-by: Mattias Wallin <mattias.wallin@stericsson.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: AB8500 system control driverMattias Nilsson
This adds a pretty straight-forward system control driver for the AB8500. This driver will be used from the core platform, e.g the clock tree implementation in the machine code, and is by nature singleton. There are a few simple functions to read, write, set and clear registers so that the machine code can control its own foundation. Cc: Mattias Wallin <mattias.wallin@stericsson.com> Signed-off-by: Mattias Nilsson <mattias.i.nilsson@stericsson.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-23mfd: Support configuration of WM831x /IRQ output in CMOS modeMark Brown
Provide platform data allowing the system to set the /IRQ pin into CMOS mode rather than the default open drain. The default value of this platform data reflects the default hardware configuration so there should be no change to existing users. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2011-03-22Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (66 commits) avr32: at32ap700x: fix typo in DMA master configuration dmaengine/dmatest: Pass timeout via module params dma: let IMX_DMA depend on IMX_HAVE_DMA_V1 instead of an explicit list of SoCs fsldma: make halt behave nicely on all supported controllers fsldma: reduce locking during descriptor cleanup fsldma: support async_tx dependencies and automatic unmapping fsldma: fix controller lockups fsldma: minor codingstyle and consistency fixes fsldma: improve link descriptor debugging fsldma: use channel name in printk output fsldma: move related helper functions near each other dmatest: fix automatic buffer unmap type drivers, pch_dma: Fix warning when CONFIG_PM=n. dmaengine/dw_dmac fix: use readl & writel instead of __raw_readl & __raw_writel avr32: at32ap700x: Specify DMA Flow Controller, Src and Dst msize dw_dmac: Setting Default Burst length for transfers as 16. dw_dmac: Allow src/dst msize & flow controller to be configured at runtime dw_dmac: Changing type of src_master and dest_master to u8. dw_dmac: Pass Channel Priority from platform_data dw_dmac: Pass Channel Allocation Order from platform_data ...
2011-03-22zlib: slim down zlib_deflate() workspace when possibleJim Keniston
Instead of always creating a huge (268K) deflate_workspace with the maximum compression parameters (windowBits=15, memLevel=8), allow the caller to obtain a smaller workspace by specifying smaller parameter values. For example, when capturing oops and panic reports to a medium with limited capacity, such as NVRAM, compression may be the only way to capture the whole report. In this case, a small workspace (24K works fine) is a win, whether you allocate the workspace when you need it (i.e., during an oops or panic) or at boot time. I've verified that this patch works with all accepted values of windowBits (positive and negative), memLevel, and compression level. Signed-off-by: Jim Keniston <jkenisto@us.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Miller <davem@davemloft.net> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22crc32: add missed brackets in macroKonstantin Khlebnikov
Add brackets around typecasted argument in crc32() macro. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22sigma-firmware: loader for Analog Devices' SigmaStudioMike Frysinger
Analog Devices' SigmaStudio can produce firmware blobs for devices with these DSPs embedded (like some audio codecs). Allow these device drivers to easily parse and load them. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22kstrto*: converting strings to integers done (hopefully) rightAlexey Dobriyan
1. simple_strto*() do not contain overflow checks and crufty, libc way to indicate failure. 2. strict_strto*() also do not have overflow checks but the name and comments pretend they do. 3. Both families have only "long long" and "long" variants, but users want strtou8() 4. Both "simple" and "strict" prefixes are wrong: Simple doesn't exactly say what's so simple, strict should not exist because conversion should be strict by default. The solution is to use "k" prefix and add convertors for more types. Enter kstrtoull() kstrtoll() kstrtoul() kstrtol() kstrtouint() kstrtoint() kstrtou64() kstrtos64() kstrtou32() kstrtos32() kstrtou16() kstrtos16() kstrtou8() kstrtos8() Include runtime testsuite (somewhat incomplete) as well. strict_strto*() become deprecated, stubbed to kstrto*() and eventually will be removed altogether. Use kstrto*() in code today! Note: on some archs _kstrtoul() and _kstrtol() are left in tree, even if they'll be unused at runtime. This is temporarily solution, because I don't want to hardcode list of archs where these functions aren't needed. Current solution with sizeof() and __alignof__ at least always works. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22smp: move smp setup functions to kernel/smp.cAmerigo Wang
Move setup_nr_cpu_ids(), smp_init() and some other SMP boot parameter setup functions from init/main.c to kenrel/smp.c, saves some #ifdef CONFIG_SMP. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Rakib Mullick <rakib.mullick@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22include/linux/err.h: add a function to cast error-pointers to a return valueUwe Kleine-König
PTR_RET() can be used if you have an error-pointer and are only interested in the eventual error value, but not the pointer. Yields the usual 0 for no error, -ESOMETHING otherwise. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22fs.h: remove 8 bytes of padding from block_device on 64bit buildsRichard Kennedy
Re-ordering struct block_inode to remove 8 bytes of padding on 64 bit builds, which also shrinks bdev_inode by 8 bytes (776 -> 768) allowing it to fit into one fewer cache lines. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22include/linux/compiler-gcc*.h: unify macro definitionsBorislav Petkov
Unify identical gcc3.x and gcc4.x macros. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22add the common dma_addr_t typedef to include/linux/types.hFUJITA Tomonori
All architectures can use the common dma_addr_t typedef now. We can remove the arch specific dma_addr_t. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Cc: Matt Turner <mattst88@gmail.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22mm: add __GFP_OTHER_NODE flagAndi Kleen
Add a new __GFP_OTHER_NODE flag to tell the low level numa statistics in zone_statistics() that an allocation is on behalf of another thread. This way the local and remote counters can be still correct, even when background daemons like khugepaged are changing memory mappings. This only affects the accounting, but I think it's worth doing that right to avoid confusing users. I first tried to just pass down the right node, but this required a lot of changes to pass down this parameter and at least one addition of a 10th argument to a 9 argument function. Using the flag is a lot less intrusive. Open: should be also used for migration? [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22mm: vmscan: kswapd should not free an excessive number of pages when ↵Mel Gorman
balancing small zones When reclaiming for order-0 pages, kswapd requires that all zones be balanced. Each cycle through balance_pgdat() does background ageing on all zones if necessary and applies equal pressure on the inactive zone unless a lot of pages are free already. A "lot of free pages" is defined as a "balance gap" above the high watermark which is currently 7*high_watermark. Historically this was reasonable as min_free_kbytes was small. However, on systems using huge pages, it is recommended that min_free_kbytes is higher and it is tuned with hugeadm --set-recommended-min_free_kbytes. With the introduction of transparent huge page support, this recommended value is also applied. On X86-64 with 4G of memory, min_free_kbytes becomes 67584 so one would expect around 68M of memory to be free. The Normal zone is approximately 35000 pages so under even normal memory pressure such as copying a large file, it gets exhausted quickly. As it is getting exhausted, kswapd applies pressure equally to all zones, including the DMA32 zone. DMA32 is approximately 700,000 pages with a high watermark of around 23,000 pages. In this situation, kswapd will reclaim around (23000*8 where 8 is the high watermark + balance gap of 7 * high watermark) pages or 718M of pages before the zone is ignored. What the user sees is that free memory far higher than it should be. To avoid an excessive number of pages being reclaimed from the larger zones, explicitely defines the "balance gap" to be either 1% of the zone or the low watermark for the zone, whichever is smaller. While kswapd will check all zones to apply pressure, it'll ignore zones that meets the (high_wmark + balance_gap) watermark. To test this, 80G were copied from a partition and the amount of memory being used was recorded. A comparison of a patch and unpatched kernel can be seen at http://www.csn.ul.ie/~mel/postings/minfree-20110222/memory-usage-hydra.ps and shows that kswapd is not reclaiming as much memory with the patch applied. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shaohua.li@intel.com> Cc: "Chen, Tim C" <tim.c.chen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22pagewalk: only split huge pages when necessaryDave Hansen
Right now, if a mm_walk has either ->pte_entry or ->pmd_entry set, it will unconditionally split any transparent huge pages it runs in to. In practice, that means that anyone doing a cat /proc/$pid/smaps will unconditionally break down every huge page in the process and depend on khugepaged to re-collapse it later. This is fairly suboptimal. This patch changes that behavior. It teaches each ->pmd_entry handler (there are five) that they must break down the THPs themselves. Also, the _generic_ code will never break down a THP unless a ->pte_entry handler is actually set. This means that the ->pmd_entry handlers can now choose to deal with THPs without breaking them down. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Eric B Munson <emunson@mgebm.net> Tested-by: Eric B Munson <emunson@mgebm.net> Cc: Michael J Wolf <mjwolf@us.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22memcg: move memcg reclaimable page into tail of inactive listMinchan Kim
The rotate_reclaimable_page function moves just written out pages, which the VM wanted to reclaim, to the end of the inactive list. That way the VM will find those pages first next time it needs to free memory. This patch applies the rule in memcg. It can help to prevent unnecessary working page eviction of memcg. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22mm: deactivate invalidated pagesMinchan Kim
Recently, there are reported problem about thrashing. (http://marc.info/?l=rsync&m=128885034930933&w=2) It happens by backup workloads(ex, nightly rsync). That's because the workload makes just use-once pages and touches pages twice. It promotes the page into active list so that it results in working set page eviction. Some app developer want to support POSIX_FADV_NOREUSE. But other OSes don't support it, either. (http://marc.info/?l=linux-mm&m=128928979512086&w=2) By other approach, app developers use POSIX_FADV_DONTNEED. But it has a problem. If kernel meets page is writing during invalidate_mapping_pages, it can't work. It makes for application programmer to use it since they always have to sync data before calling fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could be discardable. At last, they can't use deferred write of kernel so that they could see performance loss. (http://insights.oetiker.ch/linux/fadvise.html) In fact, invalidation is very big hint to reclaimer. It means we don't use the page any more. So let's move the writing page into inactive list's head if we can't truncate it right now. Why I move page to head of lru on this patch, Dirty/Writeback page would be flushed sooner or later. It can prevent writeout of pageout which is less effective than flusher's writeout. Originally, I reused lru_demote of Peter with some change so added his Signed-off-by. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Reported-by: Ben Gamari <bgamari.foss@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22mm: mm_struct: remove 16 bytes of alignment padding on 64 bit buildsRichard Kennedy
Reorder mm_struct to remove 16 bytes of alignment padding on 64 bit builds. On my config this shrinks mm_struct by enough to fit in one fewer cache lines and allows more objects per slab in mm_struct kmem_cache under SLUB. slabinfo before patch :- Sizes (bytes) Slabs -------------------------------- Object : 848 Total : 9 SlabObj: 896 Full : 2 SlabSiz: 16384 Partial: 5 Loss : 48 CpuSlab: 2 Align : 64 Objects: 18 slabinfo after :- Sizes (bytes) Slabs -------------------------------- Object : 832 Total : 7 SlabObj: 832 Full : 2 SlabSiz: 16384 Partial: 3 Loss : 0 CpuSlab: 2 Align : 64 Objects: 19 Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22mm: remove unused TestSetPageLocked() interfaceMichel Lespinasse
TestSetPageLocked() isn't being used anywhere. Also, using it would likely be an error, since the proper interface trylock_page() provides stronger ordering guarantees. Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22mm: simplify anon_vma refcountsPeter Zijlstra
This patch changes the anon_vma refcount to be 0 when the object is free. It does this by adding 1 ref to being in use in the anon_vma structure (iow. the anon_vma->head list is not empty). This allows a simpler release scheme without having to check both the refcount and the list as well as avoids taking a ref for each entry on the list. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22mm: move anon_vma ref out from under CONFIG_fooPeter Zijlstra
We need the anon_vma refcount unconditionally to simplify the anon_vma lifetime rules. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22mm: rename drop_anon_vma() to put_anon_vma()Peter Zijlstra
The normal code pattern used in the kernel is: get/put. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-22mm: change __remove_from_page_cache()Minchan Kim
Now we renamed remove_from_page_cache with delete_from_page_cache. As consistency of __remove_from_swap_cache and remove_from_swap_cache, we change internal page cache handling function name, too. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>