summaryrefslogtreecommitdiff
path: root/mm/zbud.c
AgeCommit message (Collapse)Author
2016-08-25mm: zbud: fix the locking scenarios with zcacheVinayak Menon
With zcache using zbud, strange locking scenarios are observed. The first problem seen is: Core 2 waiting on mapping->tree_lock which is taken by core 6 do_raw_spin_lock raw_spin_lock_irq atomic_cmpxchg page_freeze_refs __remove_mapping shrink_page_list Core 6 after taking mapping->tree_lock is waiting on zbud pool lock which is held by core 5 zbud_alloc zcache_store_page __cleancache_put_page cleancache_put_page __delete_from_page_cache spin_unlock_irq __remove_mapping shrink_page_list shrink_inactive_list Core 5 after taking zbud pool lock from zbud_free received an IRQ, and after IRQ exit, softirqs were scheduled and end_page_writeback tried to lock on mapping->tree_lock which is already held by Core 6. Deadlock. do_raw_spin_lock raw_spin_lock_irqsave test_clear_page_writeba end_page_writeback ext4_finish_bio ext4_end_bio bio_endio blk_update_request end_clone_bio bio_endio blk_update_request blk_update_bidi_request blk_end_bidi_request blk_end_request mmc_blk_cmdq_complete_r mmc_cmdq_softirq_done blk_done_softirq static_key_count static_key_false trace_softirq_exit __do_softirq() tick_irq_exit irq_exit() set_irq_regs __handle_domain_irq gic_handle_irq el1_irq exception __list_del_entry list_del zbud_free zcache_load_page __cleancache_get_page(? This shows that allowing softirqs while holding zbud pool lock can result in deadlocks. To fix this, 'commit 6a1fdaa36272 ("mm: zbud: prevent softirq during zbud alloc, free and reclaim")' decided to take spin_lock_bh during zbud_free, zbud_alloc and zbud_reclaim. But this resulted in another deadlock. spin_bug() do_raw_spin_lock() _raw_spin_lock_irqsave() test_clear_page_writeback() end_page_writeback() ext4_finish_bio() ext4_end_bio() bio_endio() blk_update_request() end_clone_bio() bio_endio() blk_update_request() blk_update_bidi_request() blk_end_request() mmc_blk_cmdq_complete_rq() mmc_cmdq_softirq_done() blk_done_softirq() __do_softirq() do_softirq() __local_bh_enable_ip() _raw_spin_unlock_bh() zbud_alloc() zcache_store_page() __cleancache_put_page() __delete_from_page_cache() __remove_mapping() shrink_page_list() Here, the spin_unlock_bh resulted in explicit invocation of do_sofirq, which resulted in the acquisition of mapping->tree_lock which was already taken by __remove_mapping. The new fix considers the following facts. 1) zcache_store_page is always be called from __delete_from_page_cache with mapping->tree_lock held and interrupts disabled. Thus zbud_alloc which is called only from zcache_store_page is always called with interrupts disabled. 2) zbud_free and zbud_reclaim_page can be called from zcache with or without interrupts disabled. So an interrupt while holding zbud pool lock can result in do_softirq and acquisition of mapping->tree_lock. (1) implies zbud_alloc need not explicitly disable bh. But disable interrupts to make sure zbud_alloc is safe with zcache, irrespective of future changes. This will fix the second scenario. (2) implies zbud_free and zbud_reclaim_page should use spin_lock_irqsave, so that interrupts will not be triggered and inturn softirqs. spin_lock_bh can't be used because a spin_unlock_bh can triger a softirq even in interrupt context. This will fix the first scenario. Change-Id: Ibc810525dddf97614db41643642fec7472bd6a2c Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-25mm: zbud: prevent softirq during zbud alloc, free and reclaimVinayak Menon
The following deadlock is observed. Core 2 waiting on mapping->tree_lock which is taken by core 6 do_raw_spin_lock raw_spin_lock_irq atomic_cmpxchg page_freeze_refs __remove_mapping shrink_page_list shrink_inactive_list shrink_list shrink_lruvec shrink_zone shrink_zones do_try_to_free_pages try_to_free_pages(?, ?, ?, ?) __perform_reclaim __alloc_pages_direct_reclaim __alloc_pages_slowpath __alloc_pages_nodemask alloc_kmem_pages_node alloc_thread_info_node dup_task_struct copy_process.part.56 do_fork sys_clone el0_svc_naked Core 6 after taking mapping->tree_lock is waiting on zbud pool lock which is held by core 5 zbud_alloc zcache_store_page __cleancache_put_page cleancache_put_page __delete_from_page_cache spin_unlock_irq __remove_mapping shrink_page_list shrink_inactive_list shrink_list shrink_lruvec shrink_zone bitmap_zero __nodes_clear kswapd_shrink_zone.constprop.58 balance_pgdat kswapd_try_to_sleep kswapd kthread ret_from_fork Core 5 after taking zbud pool lock from zbud_free received an IRQ, and after IRQ exit, softirqs were scheduled and end_page_writeback tried to lock on mapping->tree_lock which is already held by Core 6. Deadlock. do_raw_spin_lock raw_spin_lock_irqsave test_clear_page_writeba end_page_writeback ext4_finish_bio ext4_end_bio bio_endio blk_update_request end_clone_bio bio_endio blk_update_request blk_update_bidi_request blk_end_bidi_request blk_end_request mmc_blk_cmdq_complete_r mmc_cmdq_softirq_done blk_done_softirq static_key_count static_key_false trace_softirq_exit __do_softirq() tick_irq_exit irq_exit() set_irq_regs __handle_domain_irq gic_handle_irq el1_irq exception __list_del_entry list_del zbud_free zcache_load_page __cleancache_get_page(? So protect zbud_alloc/free/reclaim with spink_lock_bh CRs-Fixed: 986783 Change-Id: Ib0605b38e7371c29316ed81e43549a0b9503d531 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-25mm: zbud: initialize object to 0 on GFP_ZEROShiraz Hashim
zbud_alloc if returns free object from pool must also initialize it to 0 when asked to do so. The same is already taken care if a fresh object is allocated. CRs-fixed: 979234 Change-Id: Id171edf131df321385fcdcd7660d06da97689e3e Signed-off-by: Shiraz Hashim <shashim@codeaurora.org>
2015-11-06mm: zsmalloc: constify struct zs_pool nameSergey SENOZHATSKY
Constify `struct zs_pool' ->name. [akpm@inux-foundation.org: constify zpool_create_pool()'s `type' arg also] Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Dan Streetman <ddstreet@ieee.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-08mm: zbud: constify the zbud_opsKrzysztof Kozlowski
The structure zbud_ops is not modified so make the pointer to it a pointer to const. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-08mm: zpool: constify the zpool_opsKrzysztof Kozlowski
The structure zpool_ops is not modified so make the pointer to it a pointer to const. Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25zpool: remove zpool_evict()Dan Streetman
Remove zpool_evict() helper function. As zbud is currently the only zpool implementation that supports eviction, add zpool and zpool_ops references to struct zbud_pool and directly call zpool_ops->evict(zpool, handle) on eviction. Currently zpool provides the zpool_evict helper which locks the zpool list lock and searches through all pools to find the specific one matching the caller, and call the corresponding zpool_ops->evict function. However, this is unnecessary, as the zbud pool can simply keep a reference to the zpool that created it, as well as the zpool_ops, and directly call the zpool_ops->evict function, when it needs to evict a page. This avoids a spinlock and list search in zpool for each eviction. Signed-off-by: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12mm/zpool: add name argument to create zpoolGanesh Mahendran
Currently the underlay of zpool: zsmalloc/zbud, do not know who creates them. There is not a method to let zsmalloc/zbud find which caller they belong to. Now we want to add statistics collection in zsmalloc. We need to name the debugfs dir for each pool created. The way suggested by Minchan Kim is to use a name passed by caller(such as zram) to create the zsmalloc pool. /sys/kernel/debug/zsmalloc/zram0 This patch adds an argument `name' to zs_create_pool() and other related functions. Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13mm/zbud: init user ops only when it is neededHeesub Shin
When zbud is initialized through the zpool wrapper, pool->ops which points to user-defined operations is always set regardless of whether it is specified from the upper layer. This causes zbud_reclaim_page() to iterate its loop for evicting pool pages out without any gain. This patch sets the user-defined ops only when it is needed, so that zbud_reclaim_page() can bail out the reclamation loop earlier if there is no user-defined operations specified. Signed-off-by: Heesub Shin <heesub.shin@samsung.com> Acked-by: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Sunae Seo <sunae.seo@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-20Merge Linus' tree to be be to apply submitted patches to newer code thanJiri Kosina
current trivial.git base
2014-11-13zbud, zswap: change module author emailSeth Jennings
Old email no longer viable. Signed-off-by: Seth Jennings <sjennings@variantweb.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-10-09zbud: avoid accessing last unused freelistChao Yu
For now, there are NCHUNKS of 64 freelists in zbud_pool, the last unbuddied[63] freelist linked with all zbud pages which have free chunks of 63. Calculating according to context of num_free_chunks(), our max chunk number of unbuddied zbud page is 62, so none of zbud pages will be added/removed in last freelist, but still we will try to find an unbuddied zbud page in the last unused freelist, it is unneeded. This patch redefines NCHUNKS to 63 as free chunk number in one zbud page, hence we can decrease size of zpool and avoid accessing the last unused freelist whenever failing to allocate zbud from freelist in zbud_alloc. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Cc: Seth Jennings <sjennings@variantweb.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29mm/zpool: use prefixed module loadingKees Cook
To avoid potential format string expansion via module parameters, do not use the zpool type directly in request_module() without a format string. Additionally, to avoid arbitrary modules being loaded via zpool API (e.g. via the zswap_zpool_type module parameter) add a "zpool-" prefix to the requested module, as well as module aliases for the existing zpool types (zbud and zsmalloc). Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Acked-by: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06mm/zpool: zbud/zsmalloc implement zpoolDan Streetman
Update zbud and zsmalloc to implement the zpool api. [fengguang.wu@intel.com: make functions static] Signed-off-by: Dan Streetman <ddstreet@ieee.org> Tested-by: Seth Jennings <sjennings@variantweb.net> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Weijie Yang <weijie.yang@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-06mm/zbud: change zbud_alloc size type to size_tDan Streetman
Change the type of the zbud_alloc() size param from unsigned int to size_t. Technically, this should not make any difference, as the zbud implementation already restricts the size to well within either type's limits; but as zsmalloc (and kmalloc) use size_t, and zpool will use size_t, this brings the size parameter type in line with zsmalloc/zpool. Signed-off-by: Dan Streetman <ddstreet@ieee.org> Acked-by: Seth Jennings <sjennings@variantweb.net> Tested-by: Seth Jennings <sjennings@variantweb.net> Cc: Weijie Yang <weijie.yang@samsung.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04mm/zbud.c: make size unsigned like unique callsiteFabian Frederick
zbud_alloc is only called by zswap_frontswap_store with unsigned int len. Change function parameter + update >= 0 check. Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Seth Jennings <sjennings@variantweb.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11mm/zbud: fix some trivial typos in commentsJianguo Wu
Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-31mm: zbud: fix condition check on allocation sizeHeesub Shin
zbud_alloc() incorrectly verifies the size of allocation limit. It should deny the allocation request greater than (PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE), not (PAGE_SIZE - ZHDR_SIZE_ALIGNED) which has no remaining spaces for its buddy. There is no point in spending the entire zbud page storing only a single page, since we don't have any benefits. Signed-off-by: Heesub Shin <heesub.shin@samsung.com> Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Dongjun Shin <d.j.shin@samsung.com> Cc: Sunae Seo <sunae.seo@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-10zbud: add to mm/Seth Jennings
zbud is an special purpose allocator for storing compressed pages. It is designed to store up to two compressed pages per physical page. While this design limits storage density, it has simple and deterministic reclaim properties that make it preferable to a higher density approach when reclaim will be used. zbud works by storing compressed pages, or "zpages", together in pairs in a single memory page called a "zbud page". The first buddy is "left justifed" at the beginning of the zbud page, and the last buddy is "right justified" at the end of the zbud page. The benefit is that if either buddy is freed, the freed buddy space, coalesced with whatever slack space that existed between the buddies, results in the largest possible free region within the zbud page. zbud also provides an attractive lower bound on density. The ratio of zpages to zbud pages can not be less than 1. This ensures that zbud can never "do harm" by using more pages to store zpages than the uncompressed zpages would have used on their own. This implementation is a rewrite of the zbud allocator internally used by zcache in the driver/staging tree. The rewrite was necessary to remove some of the zcache specific elements that were ingrained throughout and provide a generic allocation interface that can later be used by zsmalloc and others. This patch adds zbud to mm/ for later use by zswap. Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Jenifer Hopper <jhopper@us.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Hansen <dave@sr71.net> Cc: Joe Perches <joe@perches.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: Hugh Dickens <hughd@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Bob Liu <bob.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>