summaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)Author
2016-03-22block: test-iosched: Fix compilation error in end_test_bioVenkat Gopalakrishnan
The function signature of bio_end_io_t is changed in 4.4 kernel and the error value is assigned in bi_error field of the bio struct, so just free the bio after bio completion. Change-Id: I08f64d8d51ae401fa608351b90b1120d8b84605f Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22block: don't allow nr_pending to go negativeSubhash Jadavani
nr_pending can go negative if we attempt to decrement it without matching increment call earlier. If nr_pending goes negative, LLD's runtime suspend might race with the ongoing request. This change allows decrementing nr_pending only if it is non-zero. Change-Id: I5f1e93ab0e0f950307e2e3c4f95c7cb01e83ffdd Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
2016-03-22block/fs: make tracking dirty task debug onlyVenkat Gopalakrishnan
Adding a new element "tsk_dirty" to struct page increases the size of mem_map/vmemmap, restrict this to a debug only functionality to save few MB of memory. Considering a system with 1G of RAM, there will be nearly 262144 pages and thus that many number of page structures in mem_map/vmemmap. With pointer size of 8 bytes on a 64 bit system, adding this pointer to "struct page" means an increase of "2MB" for mem_map. CRs-Fixed: 738692 Change-Id: Idf3217dcbe17cf1ab4d462d2aa8d39da1ffd8b13 Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org> [venkatg@codeaurora.org: Fixed trivial merge conflict] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22block/fs: keep track of the task that dirtied the pageVenkat Gopalakrishnan
Background writes happen in the context of a background thread. It is very useful to identify the actual task that generated the request instead of background task that submited the request. Hence keep track of the task when a page gets dirtied and dump this task info while tracing. Not all the pages in the bio are dirtied by the same task but most likely it will be, since the sectors accessed on the device must be adjacent. Change-Id: I6afba85a2063dd3350a0141ba87cf8440ce9f777 Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org> [venkatg@codeaurora.org: Fixed trivial merge conflicts] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22block: test-iosched: fix spinlock recursionGilad Broner
spin_lock_irq() / spin_unlock_irq() is used so interrupts are enabled after unlocking the spinlock. However, it is not guaranteed they were enabled before. This change uses the proper irqsave / irqrestore variants instead. Without it, a spinlock recursion on the scsi request completion path is possible if completion interrupt occurs when used for UFS testing. Change-Id: I25a9bf6faaa2bbfedc807111fbcb32276cccea2f Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
2016-03-22block: test-iosched: expose sector_range variable to userLee Susman
Expose "sector_range", which will indicate to the low-level driver unit-tests the size (in sectors, starting from "start_sector") of the address space in which they can perform I/O operations. This user-defined variable can be used to change the address space size from the default 512MiB. Change-Id: I515a6849eb39b78e653f4018993a2c8e64e2a77f Signed-off-by: Lee Susman <lsusman@codeaurora.org>
2016-03-22block: test-iosched: fix bio allocation for test requestsGilad Broner
Unit tests submit large requests of 512KB made of 128 bios. Current allocation was done via kmalloc which may not be able to allocate such a large buffer which is also physically contiguous. Using kmalloc to allocate each bio separately is also problematic as it might not be page aligned. Some bio may end up using more than a single memory segment which will fail the mapping of the bios to the request which supports up to 128 physical segments. To avoid such failure, allocate a separate page for each bio (bio size is single page size). Change-Id: Id0da394d458942e093d924bc7aa23aa3231cdca7 Signed-off-by: Gilad Broner <gbroner@codeaurora.org> [venkatg@codeaurora.org: Drop changes to mmc_block_test.c] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22block: test-iosched: enable running of simultaneous testsGilad Broner
Current test-iosched design enables running only a single test for a single block device. This change modifies the test-iosched framework to allow running several tests on several block devices. Change-Id: I051d842733873488b64e89053d9c4e30e1249870 Signed-off-by: Gilad Broner <gbroner@codeaurora.org> [merez@codeaurora.org: fix conflicts due to removal of BKOPs UT] Signed-off-by: Maya Erez <merez@codeaurora.org> Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org> [venkatg@codeaurora.org: Drop changes to ufs_test.c and mmc_block_test.c] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22block: test-iosched: fix spinlock recursionGilad Broner
blk_run_queue() takes the queue spinlock and disabled irqs. Consider the following callstack: blk_run_queue ->__blk_run_queue -> scsi_request_fn -> blk_peek_request -> __elv_next_request -> elevator_dispatch_fn -> test_dispatch_requests -> test_dispatch_from test_dispatch_from() will release the test-iosched spinlock using spin_unlock_irq which will enable interrupts, however, caller is assuming interrupts are disabled. An interrupt can occur now and scsi soft-irq may be scheduled with the following call stack: scsi_softirq_done -> scsi_finish_command -> scsi_device_unbusy scsi_device_unbusy() tries to lock the queue spinlock which was previously locked when blk_run_queue was called, resulting in a spinlock recursion. Change test_dispatch_from() to use the spinlock irq save/restore variants to prevent enabling the irq in case they were previously disabled. Change-Id: Icaea4f9ba54771edb0302c6005047fcc5478ce8d Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
2016-03-22block: test-iosched: remove test timeout timerGilad Broner
When running a test, a timer was set to detect test timeout and to unblock the wait_event() function which is waiting for the test to finish. This is redundant as wait_event timeout variant gives the same functionality without the overhead of managing a timer for this purpose and improve code readability. Change-Id: Icbd3cb0f3fcb5854673f4506b102b0c80e97d6bb Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
2016-03-22block: test-iosched: expose APIs to allow compiling ufs_test as a moduleMaya Erez
The UFS tests are used for testing the functionality and performance of the UFS driver. Some of the tests call compare_buffer_to_pattern for data integrity checking. This function should be exposed in order to allow compilation of ufs_test as a module. Change-Id: I2279b0ae9dbdf4ecad073fab2b15116be2ea1713 Signed-off-by: Gilad Broner <gbroner@codeaurora.org> Signed-off-by: Maya Erez <merez@codeaurora.org>
2016-03-22scsi: ufs: mixed long sequentialDolev Raviv
The test will verify correctness of sequential data pattern written to the device while new data (with same pattern) is written simultaneously. First this test will run a long sequential write scenario. This first stage will write the pattern that will be read later. Second, sequential read requests will read and compare the same data. The second stage reads, will issue in Parallel to write requests with the same LBA and size. NOTE: The test requires a long timeout. The purpose of this test is to mix read and write requests on the same LBA while checking for the read data correctness. Change-Id: I6a437ce689b66233af3055d07a7f62f1e7b40765 Signed-off-by: Dolev Raviv <draviv@codeaurora.org> [venkatg@codeaurora.org: Changes to ufs_test.c are already present as part of earlier commit, hence drop them here] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22scsi: ufs: add support for test specific completion checkDolev Raviv
Introduce a new callback 'check_test_completion_fn' to test-iosched framework. This callback is necessary to determine if a test has completed or not in situation where the request queue is empty, but the test was not completed. Change-Id: I60bd8cccffacab11a5a7cba78caccf53fea3e1d8 Signed-off-by: Dolev Raviv <draviv@codeaurora.org> [venkatg@codeaurora.org: Changes to ufs_test.c are already present as part of earlier commit, hence drop them here] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22scsi: ufs: long sequential read/write testsLee Susman
This test adds the ability to test the UFS task management feature in the driver. It loads the queue with requests in order to allow the task management to operate in full capacity. Modify test-iosched infrastructure to support the new tests: - expose check_test_completion() Note: we submit 16-bio requests since the current HW is very slow and we don't want to exceed the timeout duration. Change-Id: I8ee752cba3c6838d8edc05747fa0288c4b347ef6 Signed-off-by: Dolev Raviv <draviv@codeaurora.org> Signed-off-by: Lee Susman <lsusman@codeaurora.org> [merez@codeaurora.org: fix trivial conflicts in ufs_test.c] Signed-off-by: Maya Erez <merez@codeaurora.org> [venkatg@codeaurora.org: Changes to ufs_test.c are already present as part of earlier commit, hence drop them here] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22block: test-iosched: fix test-iosched compilationMaya Erez
Fix test-iosched compilation issues due to differences in data structures in kernel-3.14. Signed-off-by: Maya Erez <merez@codeaurora.org>
2016-03-22block: add test bio size define to test-ioschedLee Susman
Add a define for the test bio size (which is the size of a page), this is used for allocating the right sized buffer for the bio during test request creation. Change-Id: I9505c85c4352009bdee442172eb8ae8f4254cfb0 Signed-off-by: Lee Susman <lsusman@codeaurora.org>
2016-03-22mmc: Unit test fix for loggingMaya Erez
Update logging with: - prefix with module name - add '\n' in the end - test_pr_* removed Change-Id: I465c9809def9d294dcbb3f7cf7f474c189f5fdbf Signed-off-by: Konstantin Dorfman <kdorfman@codeaurora.org> [merez@codeaurora.org: fix conflicts due to removal of bkops tests] Signed-off-by: Maya Erez <merez@codeaurora.org> [venkatg@codeaurora.org: Drop changes to mmc_block_test.c] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22block: test-iosched: disable statistic flag on requestDolev Raviv
The flag REQ_IO_STAT is enabled by default this assumes statistics are initialized and might cause NULL references in the kernel. To avoid it this flag is cleared in the request and stats are not updated. Change-Id: I6a1890dde51dfa8ffdd376b13f4466c9db0ae05b Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
2016-03-22mmc: card: change long_sequential_test time measurements to ktimeLee Susman
Change time measurements in long_sequential_test from jiffies to ktime, and make the relevant change in test-iosched infrastructure. In long_sequential_test we measure throughput, and the jiffies resolution is not sensitive enough for this calculation. Change-Id: If7c9a03c687f61996609c014e056bcd7132b9012 Signed-off-by: Lee Susman <lsusman@codeaurora.org> [venkatg@codeaurora.org: Drop changes to mmc_block_test.c] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22block: fix test crashing due to synchronization issueDolev Raviv
The __blk_run_queue function is called from several contexts. The fix is replacing it with blk_run_queue function, this function is guarded with a lock, thus making it thread safe and prevents the crashing. Change-Id: I3e12fa9c8b9e161375fffa3570abfa46b223a60b Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
2016-03-22mmc: enhance long_sequential_test for higher throughputLee Susman
Change the test design so that requests are dynamically created and freed. This enables running tests with more than 128 requests, therefore more than 50MiB can be written/read and makes it possible to measure driver write/read throughput more accurately. Change-Id: I56c9d6c1afba5c91a0621a16d97feafd4689521d Signed-off-by: Lee Susman <lsusman@codeaurora.org> [merez@codeaurora.org: fix conflicts due to BKOPS tests removal] Signed-off-by: Maya Erez <merez@codeaurora.org> [venkatg@codeaurora.org: Drop changes to mmc_block_test.c] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22block: test-iosched: Add support for setting rq_diskDolev Raviv
Some block devices requires the rq_disk field to be assigned. This patch exposes a new API to the block device test utility for getting the rq_disk assigned, in the created request. Change-Id: I61dc4dad50eb7600728156a6cd08bb1ee134df0d Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
2016-03-22mmc: new request notification unit-testLee Susman
The new request notification test checks the following scenario: A new request arrives after a NULL request was sent to the mmc_queue, which is waiting for completion of a former request. Change-Id: I05db0959ded400e292eb5e84e1ecfc579b78ee62 Signed-off-by: Konstantin Dorfman <kdorfman@codeaurora.org> Signed-off-by: Lee Susman <lsusman@codeaurora.org> [merez@codeaurora.org: fixed conflicts due to removal of BKOPS tests] Signed-off-by: Maya Erez <merez@codeaurora.org> [venkatg@codeaurora.org: Drop changes to mmc_block_test.c] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22block: test-iosched infrastructure enhancementLee Susman
Add functionality to test-iosched so that it could simulate the ROW scheduler behaviour. The main additions are: - 3 distinct requests queue with counters - support for urgent request pending - reinsert request implementation (callback + dispatch behavior) Change-Id: I83b5d9e3d2b8cd9a2353afa6a3e6a4cbc83b0cd4 Signed-off-by: Konstantin Dorfman <kdorfman@codeaurora.org> Signed-off-by: Lee Susman <lsusman@codeaurora.org> [merez@codeaurora.org: fixed conflicts due to bkops tests removal] Signed-off-by: Maya Erez <merez@codeaurora.org> [venkatg@codeaurora.org: Dropping elevator is_urgent_fn and reinsert_req_fn ops fn as they are not present in 3.18 kernel] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22mmc: Enable eMMC unit-testsMaya Erez
Enable the compilation of eMMC4.5 unit-tests, required by APT team. This will allow the APT team to test the storage activity on released builds. The storage tests are disabled in normal operation and in order to activate them a test I/O scheduler should be chosen and the test should be triggered via debugfs. Therefore they have no effect on normal eMMC driver operation. Change-Id: I179c567f67cc8fab9ed1edab8246483de18bc76a Signed-off-by: Maya Erez <merez@codeaurora.org> [venkatg@codeaurora.org: Fixed trivial merge conflict] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22mmc: improve mmc_block_test printoutsLee Susman
Change the printout format to be more readable. Specifically, add quotes around the test case name strings. Change-Id: I51b0c1b94389e4b51af84c5e993207b18efc2226 Signed-off-by: Lee Susman <lsusman@codeaurora.org> [merez@codeaurora.org: fix conflicts as BKOPS tests were removed] Signed-off-by: Maya Erez <merez@codeaurora.org> [venkatg@codeaurora.org: Drop changes to mmc_block_test.c] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22mmc: card: Add long sequential read test to test-ioschedLee Susman
Long sequential read test measures read throughput at the driver level by reading large requests sequentially. Change-Id: I3b6d685930e1d0faceabbc7d20489111734cc9d4 Signed-off-by: Lee Susman <lsusman@codeaurora.org> [merez@codeaurora.org: Fix conflicts as BKOPS tests were removed] Signed-off-by: Maya Erez <merez@codeaurora.org> [venkatg@codeaurora.org: Drop changes to mmc_block_test.c] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22block: test-iosched: Sleep before each testTatyana Brokhman
In order to be sure that the packing statistics collected after the test reflect *only* requests issued by the test (and not real request from FS) - sleep before each test in order to give an already dispatched requests time to complete. Change-Id: If2f40efad1d79084a8ea85afe93cce58e49ff698 CRs-Fixed: 453712 Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
2016-03-22block: test-iosched error handling fixesMaya Erez
- Fix test-iosched crash when running multiple tests - Free the BIOs memory when a request is not completed Change-Id: I1baa916c04ae73c809dee7e67ec63f4546dc71aa Signed-off-by: Maya Erez <merez@codeaurora.org>
2016-03-22block: Add test-iosched schedulerMaya Erez
The test scheduler allows testing a block device by dispatching specific requests according to the test case and declare PASS/FAIL according to the requests completion error code Change-Id: Ief91f9fed6e3c3c75627d27264d5252ea14f10ad Signed-off-by: Maya Erez <merez@codeaurora.org>
2016-03-22block: blk-flush: Add support for Barrier flagDolev Raviv
A barrier request is used to control ordering of write requests without clearing the device's cache. LLD support for barrier is optional. If LLD doesn't support barrier, flush will be issued instead to insure logical correctness. To maintain this fallback flush s/w path and flags are appended. This patch implements the necessary requests marking in order to support the barrier feature in the block layer. This patch implements two major changes required for the barrier support. (1) A new flush execution-policy is added to support "ordered" requests and a fallback , in case barrier is not supported by LLD. (2) If there is a flush pending in the flush-queue, the received barrier is ignored, in order not to miss a demand for an actual flush. Change-Id: I6072d759e5c3bd983105852d81732e949da3d448 Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
2016-02-16block: genhd: Add disk/partition specific uevent callbacks for partition infoSan Mehat
For disk devices, a new uevent parameter 'NPARTS' specifies the number of partitions detected by the kernel. Partition devices get 'PARTN' which specifies the partitions index in the table, and 'PARTNAME', which specifies PARTNAME specifices the partition name of a partition device Signed-off-by: Dima Zavin <dima@android.com>
2016-01-08Revert "block: Split bios on chunk boundaries"Jens Axboe
This reverts commit d3805611130af9b911e908af9f67a3f64f4f0914. If we end up splitting on the first segment, we don't adjust the sector count. That results in hitting a BUG() with attempting to split 0 sectors. As this is just a performance issue and not a regression since 4.3 release, let's just rever this change. That gives us more time to test a real fix for 4.5, which would be marked for stable anyway.
2015-12-28block: add blk_start_queue_async()Jens Axboe
We currently only have an inline/sync helper to restart a stopped queue. If drivers need an async version, they have to roll their own. Add a generic helper instead. Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22block: Split bios on chunk boundariesKeith Busch
For h/w that advertise their block storage's underlying chunk size, it's a big performance win to not submit commands that cross them. This patch uses that criteria if it is provided. If it is not provided, this patch uses the max sectors as before. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: "Three small fixes for 4.4 final. Specifically: - The segment issue fix from Junichi, where the old IO path does a bio limit split before potentially bouncing the pages. We need to do that in the right order, to ensure that limitations are met. - A NVMe surprise removal IO hang fix from Keith. - A use-after-free in null_blk, introduced by a previous patch in this series. From Mike Krinkin" * 'for-linus' of git://git.kernel.dk/linux-block: null_blk: fix use-after-free error block: ensure to split after potentially bouncing a bio NVMe: IO ending fixes on surprise removal
2015-12-22block: ensure to split after potentially bouncing a bioJunichi Nomura
blk_queue_bio() does split then bounce, which makes the segment counting based on pages before bouncing and could go wrong. Move the split to after bouncing, like we do for blk-mq, and the we fix the issue of having the bio count for segments be wrong. Fixes: 54efd50bfd87 ("block: make generic_make_request handle arbitrarily sized bios") Cc: stable@vger.kernel.org Tested-by: Artem S. Tashkinov <t.artem@lycos.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-12Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: "A set of fixes for the current series. This contains: - A bunch of fixes for lightnvm, should be the last round for this series. From Matias and Wenwei. - A writeback detach inode fix from Ilya, also marked for stable. - A block (though it says SCSI) fix for an OOPS in SCSI runtime power management. - Module init error path fixes for null_blk from Minfei" * 'for-linus' of git://git.kernel.dk/linux-block: null_blk: Fix error path in module initialization lightnvm: do not compile in debugging by default lightnvm: prevent gennvm module unload on use lightnvm: fix media mgr registration lightnvm: replace req queue with nvmdev for lld lightnvm: comments on constants lightnvm: check mm before use lightnvm: refactor spin_unlock in gennvm_get_blk lightnvm: put blks when luns configure failed lightnvm: use flags in rrpc_get_blk block: detach bdev inode from its wb in __blkdev_put() SCSI: Fix NULL pointer dereference in runtime PM
2015-12-07Merge branch 'master' into for-4.4-fixesTejun Heo
The following commit which went into mainline through networking tree 3b13758f51de ("cgroups: Allow dynamically changing net_classid") conflicts in net/core/netclassid_cgroup.c with the following pending fix in cgroup/for-4.4-fixes. 1f7dd3e5a6e4 ("cgroup: fix handling of multi-destination migration from subtree_control enabling") The former separates out update_classid() from cgrp_attach() and updates it to walk all fds of all tasks in the target css so that it can be used from both migration and config change paths. The latter drops @css from cgrp_attach(). Resolve the conflict by making cgrp_attach() call update_classid() with the css from the first task. We can revive @tset walking in cgrp_attach() but given that net_cls is v1 only where there always is only one target css during migration, this is fine. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Nina Schiff <ninasc@fb.com>
2015-12-06Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is quite a bumper crop of fixes: three from Arnd correcting various build issues in some configurations, a lock recursion in qla2xxx. Two potentially exploitable issues in hpsa and mvsas, a potential null deref in st, a revert of a bdi registration fix that turned out to cause even more problems, a set of fixes to allow people who only defined MPT2SAS to still work after the mpt2/mpt3sas merger and a couple of fixes for issues turned up by the hyper-v storvsc driver" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: mpt3sas: fix Kconfig dependency problem for mpt2sas back compatibility Revert "scsi: Fix a bdi reregistration race" mpt3sas: Add dummy Kconfig option for backwards compatibility Fix a memory leak in scsi_host_dev_release() block/sd: Fix device-imposed transfer length limits scsi_debug: fix prevent_allow+verify regressions MAINTAINERS: Add myself as co-maintainer of the SCSI subsystem. sd: Make discard granularity match logical block size when LBPRZ=1 scsi: hpsa: select CONFIG_SCSI_SAS_ATTR scsi: advansys needs ISA dma api for ISA support scsi_sysfs: protect against double execution of __scsi_remove_device() st: fix potential null pointer dereference. scsi: report 'INQUIRY result too short' once per host advansys: fix big-endian builds qla2xxx: Fix rwlock recursion hpsa: logical vs bitwise AND typo mvsas: don't allow negative timeouts mpt3sas: Fix use sas_is_tlr_enabled API before enabling MPI2_SCSIIO_CONTROL_TLR_ON flag
2015-12-03SCSI: Fix NULL pointer dereference in runtime PMKen Xue
The routines in scsi_pm.c assume that if a runtime-PM callback is invoked for a SCSI device, it can only mean that the device's driver has asked the block layer to handle the runtime power management (by calling blk_pm_runtime_init(), which among other things sets q->dev). However, this assumption turns out to be wrong for things like the ses driver. Normally ses devices are not allowed to do runtime PM, but userspace can override this setting. If this happens, the kernel gets a NULL pointer dereference when blk_post_runtime_resume() tries to use the uninitialized q->dev pointer. This patch fixes the problem by checking q->dev in block layer before handle runtime PM. Since ses doesn't define any PM callbacks and call blk_pm_runtime_init(), the crash won't occur. This fixes Bugzilla #101371. https://bugzilla.kernel.org/show_bug.cgi?id=101371 More discussion can be found from below link. http://marc.info/?l=linux-scsi&m=144163730531875&w=2 Signed-off-by: Ken Xue <Ken.Xue@amd.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Cc: Xiangliang Yu <Xiangliang.Yu@amd.com> Cc: James E.J. Bottomley <JBottomley@odin.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Michael Terry <Michael.terry@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-03Merge branch 'mkp-fixes' into fixesJames Bottomley
2015-12-03cgroup: fix handling of multi-destination migration from subtree_control ↵Tejun Heo
enabling Consider the following v2 hierarchy. P0 (+memory) --- P1 (-memory) --- A \- B P0 has memory enabled in its subtree_control while P1 doesn't. If both A and B contain processes, they would belong to the memory css of P1. Now if memory is enabled on P1's subtree_control, memory csses should be created on both A and B and A's processes should be moved to the former and B's processes the latter. IOW, enabling controllers can cause atomic migrations into different csses. The core cgroup migration logic has been updated accordingly but the controller migration methods haven't and still assume that all tasks migrate to a single target css; furthermore, the methods were fed the css in which subtree_control was updated which is the parent of the target csses. pids controller depends on the migration methods to move charges and this made the controller attribute charges to the wrong csses often triggering the following warning by driving a counter negative. WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40() Modules linked in: CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29 ... ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000 ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00 ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8 Call Trace: [<ffffffff81551ffc>] dump_stack+0x4e/0x82 [<ffffffff810de202>] warn_slowpath_common+0x82/0xc0 [<ffffffff810de2fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff8118e031>] pids_cancel.constprop.6+0x31/0x40 [<ffffffff8118e0fd>] pids_can_attach+0x6d/0xf0 [<ffffffff81188a4c>] cgroup_taskset_migrate+0x6c/0x330 [<ffffffff81188e05>] cgroup_migrate+0xf5/0x190 [<ffffffff81189016>] cgroup_attach_task+0x176/0x200 [<ffffffff8118949d>] __cgroup_procs_write+0x2ad/0x460 [<ffffffff81189684>] cgroup_procs_write+0x14/0x20 [<ffffffff811854e5>] cgroup_file_write+0x35/0x1c0 [<ffffffff812e26f1>] kernfs_fop_write+0x141/0x190 [<ffffffff81265f88>] __vfs_write+0x28/0xe0 [<ffffffff812666fc>] vfs_write+0xac/0x1a0 [<ffffffff81267019>] SyS_write+0x49/0xb0 [<ffffffff81bcef32>] entry_SYSCALL_64_fastpath+0x12/0x76 This patch fixes the bug by removing @css parameter from the three migration methods, ->can_attach, ->cancel_attach() and ->attach() and updating cgroup_taskset iteration helpers also return the destination css in addition to the task being migrated. All controllers are updated accordingly. * Controllers which don't care whether there are one or multiple target csses can be converted trivially. cpu, io, freezer, perf, netclassid and netprio fall in this category. * cpuset's current implementation assumes that there's single source and destination and thus doesn't support v2 hierarchy already. The only change made by this patchset is how that single destination css is obtained. * memory migration path already doesn't do anything on v2. How the single destination css is obtained is updated and the prep stage of mem_cgroup_can_attach() is reordered to accomodate the change. * pids is the only controller which was affected by this bug. It now correctly handles multi-destination migrations and no longer causes counter underflow from incorrect accounting. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-and-tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Aleksa Sarai <cyphar@cyphar.com>
2015-11-30blk-merge: fix computing bio->bi_seg_front_size in case of single segmentMing Lei
When bio has only one physical segment, we should set bio's bi_seg_front_size as the real(final) size of the single segment. Fixes: 02e707424c2ea(blk-merge: fix blk_bio_segment_split) Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-29block: Always check queue limits for cloned requestsHannes Reinecke
When a cloned request is retried on other queues it always needs to be checked against the queue limits of that queue. Otherwise the calculations for nr_phys_segments might be wrong, leading to a crash in scsi_init_sgtable(). To clarify this the patch renames blk_rq_check_limits() to blk_cloned_rq_check_limits() and removes the symbol export, as the new function should only be used for cloned requests and never exported. Cc: Mike Snitzer <snitzer@redhat.com> Cc: Ewan Milne <emilne@redhat.com> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Fixes: e2a60da74 ("block: Clean up special command handling logic") Cc: stable@vger.kernel.org # 3.7+ Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25Return EBUSY from BLKRRPART for mounted whole-dev fsEric Sandeen
Today, blockdev --rereadpt /dev/sda will fail with EBUSY if any partition of sda is mounted (and will fail with EINVAL if pointed at a partition). But it will pass if the entire block device is formatted with a filesystem and mounted. I don't think this makes sense; partitioning should surely not ever change out from under a mounted device. So check for bdev->bd_super, and fail that with -EBUSY as well. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25block/sd: Fix device-imposed transfer length limitsMartin K. Petersen
Commit 4f258a46346c ("sd: Fix maximum I/O size for BLOCK_PC requests") had the unfortunate side-effect of removing an implicit clamp to BLK_DEF_MAX_SECTORS for REQ_TYPE_FS requests in the block layer code. This caused problems for some SMR drives. Debugging this issue revealed a few problems with the existing infrastructure since the block layer didn't know how to deal with device-imposed limits, only limits set by the I/O controller. - Introduce a new queue limit, max_dev_sectors, which is used by the ULD to signal the maximum sectors for a REQ_TYPE_FS request. - Ensure that max_dev_sectors is correctly stacked and taken into account when overriding max_sectors through sysfs. - Rework sd_read_block_limits() so it saves the max_xfer and opt_xfer values for later processing. - In sd_revalidate() set the queue's max_dev_sectors based on the MAXIMUM TRANSFER LENGTH value in the Block Limits VPD. If this value is not reported, fall back to a cap based on the CDB TRANSFER LENGTH field size. - In sd_revalidate(), use OPTIMAL TRANSFER LENGTH from the Block Limits VPD--if reported and sane--to signal the preferred device transfer size for FS requests. Otherwise use BLK_DEF_MAX_SECTORS. - blk_limits_max_hw_sectors() is no longer used and can be removed. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93581 Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: sweeneygj@gmx.com Tested-by: Arzeets <anatol.pomozov@gmail.com> Tested-by: David Eisner <david.eisner@oriel.oxon.org> Tested-by: Mario Kicherer <dev@kicherer.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-25Revert "blk-flush: Queue through IO scheduler when flush not required"Jens Axboe
This reverts commit 1b2ff19e6a957b1ef0f365ad331b608af80e932e. Jan writes: -- Thanks for report! After some investigation I found out we allocate elevator specific data in __get_request() only for non-flush requests. And this is actually required since the flush machinery uses the space in struct request for something else. Doh. So my patch is just wrong and not easy to fix since at the time __get_request() is called we are not sure whether the flush machinery will be used in the end. Jens, please revert 1b2ff19e6a957b1ef0f365ad331b608af80e932e. Thanks! I'm somewhat surprised that you can reliably hit the race where flushing gets disabled for the device just while the request is in flight. But I guess during boot it makes some sense. -- So let's just revert it, we can fix the queue run manually after the fact. This race is rare enough that it didn't trigger in testing, it requires the specific disable-while-in-flight scenario to trigger.
2015-11-24block: fix blk_abort_request for blk-mq driversChristoph Hellwig
We only added the request to the request list for the !blk-mq case, so we should only delete it in that case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-24Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: "A round of fixes/updates for the current series. This looks a little bigger than it is, but that's mainly because we pushed the lightnvm enabled null_blk change out of the merge window so it could be updated a bit. The rest of the volume is also mostly lightnvm. In particular: - Lightnvm. Various fixes, additions, updates from Matias and Javier, as well as from Wenwei Tao. - NVMe: - Fix for potential arithmetic overflow from Keith. - Also from Keith, ensure that we reap pending completions from a completion queue before deleting it. Fixes kernel crashes when resetting a device with IO pending. - Various little lightnvm related tweaks from Matias. - Fixup flushes to go through the IO scheduler, for the cases where a flush is not required. Fixes a case in CFQ where we would be idling and not see this request, hence not break the idling. From Jan Kara. - Use list_{first,prev,next} in elevator.c for cleaner code. From Gelian Tang. - Fix for a warning trigger on btrfs and raid on single queue blk-mq devices, where we would flush plug callbacks with preemption disabled. From me. - A mac partition validation fix from Kees Cook. - Two merge fixes from Ming, marked stable. A third part is adding a new warning so we'll notice this quicker in the future, if we screw up the accounting. - Cleanup of thread name/creation in mtip32xx from Rasmus Villemoes" * 'for-linus' of git://git.kernel.dk/linux-block: (32 commits) blk-merge: warn if figured out segment number is bigger than nr_phys_segments blk-merge: fix blk_bio_segment_split block: fix segment split blk-mq: fix calling unplug callbacks with preempt disabled mac: validate mac_partition is within sector mtip32xx: use formatting capability of kthread_create_on_node NVMe: reap completion entries when deleting queue lightnvm: add free and bad lun info to show luns lightnvm: keep track of block counts nvme: lightnvm: use admin queues for admin cmds lightnvm: missing free on init error lightnvm: wrong return value and redundant free null_blk: do not del gendisk with lightnvm null_blk: use device addressing mode null_blk: use ppa_cache pool NVMe: Fix possible arithmetic overflow for max segments blk-flush: Queue through IO scheduler when flush not required null_blk: register as a LightNVM device elevator: use list_{first,prev,next}_entry lightnvm: cleanup queue before target removal ...