summaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)Author
2007-02-09[PATCH] md: avoid possible BUG_ON in md bitmap handlingNeil Brown
md/bitmap tracks how many active write requests are pending on blocks associated with each bit in the bitmap, so that it knows when it can clear the bit (when count hits zero). The counter has 14 bits of space, so if there are ever more than 16383, we cannot cope. Currently the code just calles BUG_ON as "all" drivers have request queue limits much smaller than this. However is seems that some don't. Apparently some multipath configurations can allow more than 16383 concurrent write requests. So, in this unlikely situation, instead of calling BUG_ON we now wait for the count to drop down a bit. This requires a new wait_queue_head, some waiting code, and a wakeup call. Tested by limiting the counter to 20 instead of 16383 (writes go a lot slower in that case...). Signed-off-by: Neil Brown <neilb@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-09[PATCH] md: fix various bugs with aligned reads in RAID5Neil Brown
It is possible for raid5 to be sent a bio that is too big for an underlying device. So if it is a READ that we pass stright down to a device, it will fail and confuse RAID5. So in 'chunk_aligned_read' we check that the bio fits within the parameters for the target device and if it doesn't fit, fall back on reading through the stripe cache and making lots of one-page requests. Note that this is the earliest time we can check against the device because earlier we don't have a lock on the device, so it could change underneath us. Also, the code for handling a retry through the cache when a read fails has not been tested and was badly broken. This patch fixes that code. Signed-off-by: Neil Brown <neilb@suse.de> Cc: "Kai" <epimetreus@fastmail.fm> Cc: <stable@suse.de> Cc: <org@suse.de> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] md: remove unnecessary printk when raid5 gets an unaligned read.NeilBrown
raid5_mergeable_bvec tries to ensure that raid5 never sees a read request that does not fit within just one chunk. However as we must always accept a single-page read, that is not always possible. So when "in_chunk_boundary" fails, it might be unusual, but it is not a problem and printing a message every time is a bad idea. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] md: fix potential memalloc deadlock in mdNeilBrown
If a GFP_KERNEL allocation is attempted in md while the mddev_lock is held, it is possible for a deadlock to eventuate. This happens if the array was marked 'clean', and the memalloc triggers a write-out to the md device. For the writeout to succeed, the array must be marked 'dirty', and that requires getting the mddev_lock. So, before attempting a GFP_KERNEL allocation while holding the lock, make sure the array is marked 'dirty' (unless it is currently read-only). Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] dm-multipath: fix stall on noflush suspend/resumeJun'ichi Nomura
Allow noflush suspend/resume of device-mapper device only for the case where the device size is unchanged. Otherwise, dm-multipath devices can stall when resumed if noflush was used when suspending them, all paths have failed and queue_if_no_path is set. Explanation: 1. Something is doing fsync() on the block dev, holding inode->i_sem 2. The fsync write is blocked by all-paths-down and queue_if_no_path 3. Someone requests to suspend the dm device with noflush. Pending writes are left in queue. 4. In the middle of dm_resume(), __bind() tries to get inode->i_sem to do __set_size() and waits forever. 'noflush suspend' is a new device-mapper feature introduced in early 2.6.20. So I hope the fix being included before 2.6.20 is released. Example of reproducer: 1. Create a multipath device by dmsetup 2. Fail all paths during mkfs 3. Do dmsetup suspend --noflush and load new map with healthy paths 4. Do dmsetup resume Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] md: avoid reading past the end of a bitmap fileNeilBrown
In most cases we check the size of the bitmap file before reading data from it. However when reading the superblock, we always read the first PAGE_SIZE bytes, which might not always be appropriate. So limit that read to the size of the file if appropriate. Also, we get the count of available bytes wrong in one place, so that too can read past the end of the file. Cc: "yang yin" <yinyang801120@gmail.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] md: make sure the events count in an md array never returns to zeroNeilBrown
Now that we sometimes step the array events count backwards (when transitioning dirty->clean where nothing else interesting has happened - so that we don't need to write to spares all the time), it is possible for the event count to return to zero, which is potentially confusing and triggers and MD_BUG. We could possibly remove the MD_BUG, but is just as easy, and probably safer, to make sure we never return to zero. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-26[PATCH] md: make 'repair' actually work for raid1NeilBrown
When 'repair' finds a block that is different one the various parts of the mirror. it is meant to write a chosen good version to the others. However it currently writes out the original data to each. The memcpy to make all the data the same is missing. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-11[PATCH] md: pass down BIO_RW_SYNC in raid{1,10}Lars Ellenberg
md raidX make_request functions strip off the BIO_RW_SYNC flag, thus introducing additional latency. Fixing this in raid1 and raid10 seems to be straightforward enough. For our particular usage case in DRBD, passing this flag improved some initialization time from ~5 minutes to ~5 seconds. Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Lars Ellenberg <lars@linbit.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-22[PATCH] md: fix a few problems with the interface (sysfs and ioctl) to mdNeilBrown
While developing more functionality in mdadm I found some bugs in md... - When we remove a device from an inactive array (write 'remove' to the 'state' sysfs file - see 'state_store') would should not update the superblock information - as we may not have read and processed it all properly yet. - initialise all raid_disk entries to '-1' else the 'slot sysfs file will claim '0' for all devices in an array before the array is started. - all '\n' not to be present at the end of words written to sysfs files - when we use SET_ARRAY_INFO to set the md metadata version, set the flag to say that there is persistant metadata. - allow GET_BITMAP_FILE to be called on an array that hasn't been started yet. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-13[PATCH] md: Don't assume that READ==0 and WRITE==1 - use the names explicitlyNeilBrown
Thanks Jens for alerting me to this. Cc: Jens Axboe <jens.axboe@oracle.com> Cc: <raziebe@gmail.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10[CRYPTO] dm-crypt: Select CRYPTO_CBCHerbert Xu
As CBC is the default chaining method for cryptoloop, we should select it from cryptoloop to ease the transition. Spotted by Rene Herman. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10[PATCH] md: assorted md and raid1 one-linersNeilBrown
Fix few bugs that meant that: - superblocks weren't alway written at exactly the right time (this could show up if the array was not written to - writting to the array causes lots of superblock updates and so hides these errors). - restarting device recovery after a clean shutdown (version-1 metadata only) didn't work as intended (or at all). 1/ Ensure superblock is updated when a new device is added. 2/ Remove an inappropriate test on MD_RECOVERY_SYNC in md_do_sync. The body of this if takes one of two branches depending on whether MD_RECOVERY_SYNC is set, so testing it in the clause of the if is wrong. 3/ Flag superblock for updating after a resync/recovery finishes. 4/ If we find the neeed to restart a recovery in the middle (version-1 metadata only) make sure a full recovery (not just as guided by bitmaps) does get done. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10[PATCH] md: return a non-zero error to bi_end_io as appropriate in raid5NeilBrown
Currently raid5 depends on clearing the BIO_UPTODATE flag to signal an error to higher levels. While this should be sufficient, it is safer to explicitly set the error code as well - less room for confusion. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10[PATCH] md: remove some old ifdefed-out code from raid5.cNeilBrown
There are some vestiges of old code that was used for bypassing the stripe cache on reads in raid5.c. This was never updated after the change from buffer_heads to bios, but was left as a reminder. That functionality has nowe been implemented in a completely different way, so the old code can go. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10[PATCH] MD: conditionalize some codeJeff Garzik
The autorun code is only used if this module is built into the static kernel image. Adjust #ifdefs accordingly. Signed-off-by: Jeff Garzik <jeff@garzik.org> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10[PATCH] md: fix innocuous bug in raid6 stripe_to_pdidxNeilBrown
stripe_to_pdidx finds the index of the parity disk for a given stripe. It assumes raid5 in that it uses "disks-1" to determine the number of data disks. This is incorrect for raid6 but fortunately the two usages cancel each other out. The only way that 'data_disks' affects the calculation of pd_idx in raid5_compute_sector is when it is divided into the sector number. But as that sector number is calculated by multiplying in the wrong value of 'data_disks' the division produces the right value. So it is innocuous but needs to be fixed. Also change the calculation of raid_disks in compute_blocknr to make it more obviously correct (it seems at first to always use disks-1 too). Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10[PATCH] md: enable bypassing cache for readsRaz Ben-Jehuda(caro)
Call the chunk_aligned_read where appropriate. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10[PATCH] md: allow reads that have bypassed the cache to be retried on failureRaz Ben-Jehuda(caro)
If a bypass-the-cache read fails, we simply try again through the cache. If it fails again it will trigger normal recovery precedures. update 1: From: NeilBrown <neilb@suse.de> 1/ chunk_aligned_read and retry_aligned_read assume that data_disks == raid_disks - 1 which is not true for raid6. So when an aligned read request bypasses the cache, we can get the wrong data. 2/ The cloned bio is being used-after-free in raid5_align_endio (to test BIO_UPTODATE). 3/ We forgot to add rdev->data_offset when submitting a bio for aligned-read 4/ clone_bio calls blk_recount_segments and then we change bi_bdev, so we need to invalidate the segment counts. 5/ We don't de-reference the rdev when the read completes. This means we need to record the rdev to so it is still available in the end_io routine. Fortunately bi_next in the original bio is unused at this point so we can stuff it in there. 6/ We leak a cloned bio if the target rdev is not usable. From: NeilBrown <neilb@suse.de> update 2: 1/ When aligned requests fail (read error) they need to be retried via the normal method (stripe cache). As we cannot be sure that we can process a single read in one go (we may not be able to allocate all the stripes needed) we store a bio-being-retried and a list of bioes-that-still-need-to-be-retried. When find a bio that needs to be retried, we should add it to the list, not to single-bio... 2/ We were never incrementing 'scnt' when resubmitting failed aligned requests. [akpm@osdl.org: build fix] Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10[PATCH] md: handle bypassing the read cache (assuming nothing fails)Raz Ben-Jehuda(caro)
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10[PATCH] md: define raid5_mergeable_bvecRaz Ben-Jehuda(caro)
This will encourage read request to be on only one device, so we will often be able to bypass the cache for read requests. Signed-off-by: Neil Brown <neilb@suse.de> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-10[PATCH] md: tidy up device-change notification when an md array is stoppedNeilBrown
An md array can be stopped leaving all the setting still in place, or it can torn down and destroyed. set_capacity and other change notifications only happen in the latter case, but should happen in both. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] make drivers/md/dm-snap.c:ksnapd staticAdrian Bunk
Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] dm: raid1: reset sync_search on resumeJonathan E Brassow
Reset sync_search on resume. The effect is to retry syncing all out-of-sync regions when a mirror is resumed, including ones that previously failed. Signed-off-by: Jonathan E Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] dm: log: rename complete_resync_workJonathan E Brassow
The complete_resync_work function only provides the ability to change an out-of-sync region to in-sync. This patch enhances the function to allow us to change the status from in-sync to out-of-sync as well, something that is needed when a mirror write to one of the devices or an initial resync on a given region fails. Signed-off-by: Jonathan E Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] dm: snapshot: abstract memory releaseMilan Broz
Move the code that releases memory used by a snapshot into a separate function. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] dm: mpath: use noflush suspendingKiyoshi Ueda
Implement the pushback feature for the multipath target. The pushback request is used when: 1) there are no valid paths; 2) queue_if_no_path was set; 3) a suspend is being issued with the DMF_NOFLUSH_SUSPENDING flag. Otherwise bios are returned to applications with -EIO. To check whether queue_if_no_path is specified or not, you need to check both queue_if_no_path and saved_queue_if_no_path, because presuspend saves the original queue_if_no_path value to saved_queue_if_no_path. The check for 1 already exists in both map_io() and do_end_io(). So this patch adds __must_push_back() to check 2 and 3. Test results: See the test results in the preceding patch. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] dm: suspend: add noflush pushbackKiyoshi Ueda
In device-mapper I/O is sometimes queued within targets for later processing. For example the multipath target can be configured to store I/O when no paths are available instead of returning it -EIO. This patch allows the device-mapper core to instruct a target to transfer the contents of any such in-target queue back into the core. This frees up the resources used by the target so the core can replace that target with an alternative one and then resend the I/O to it. Without this patch the only way to change the target in such circumstances involves returning the I/O with an error back to the filesystem/application. In the multipath case, this patch will let us add new paths for existing I/O to try after all the existing paths have failed. DMF_NOFLUSH_SUSPENDING ---------------------- If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend(). It is always cleared before dm_suspend() returns. The flag must be visible while the target is flushing pending I/Os so it is set before presuspend where the flush starts and unset after the wait for md->pending where the flush ends. Target drivers can check this flag by calling dm_noflush_suspending(). DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE ----------------------------------- A target's map() function can now return DM_MAPIO_REQUEUE to request the device mapper core queue the bio. Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request the same. This has been labelled 'pushback'. The __map_bio() and clone_endio() functions in the core treat these return values as errors and call dec_pending() to end the I/O. dec_pending ----------- dec_pending() saves the pushback request in struct dm_io->error. Once all the split clones have ended, dec_pending() will put the original bio on the md->pushback list. Note that this supercedes any I/O errors. It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while in progress (e.g. by user interrupt). dec_pending() checks for this and returns -EIO if it happened. pushdback list and pushback_lock -------------------------------- The bio is queued on md->pushback temporarily in dec_pending(), and after all pending I/Os return, md->pushback is merged into md->deferred in dm_suspend() for re-issuing at resume time. md->pushback_lock protects md->pushback. The lock should be held with irq disabled because dec_pending() can be called from interrupt context. Queueing bios to md->pushback in dec_pending() must be done atomically with the check for DMF_NOFLUSH_SUSPENDING flag. So md->pushback_lock is held when checking the flag. Otherwise dec_pending() may queue a bio to md->pushback after the interrupted dm_suspend() flushes md->pushback. Then the bio would be left in md->pushback. Flag setting in dm_suspend() can be done without md->pushback_lock because the flag is checked only after presuspend and the set value is already made visible via the target's presuspend function. The flag can be checked without md->pushback_lock (e.g. the first part of the dec_pending() or target drivers), because the flag is checked again with md->pushback_lock held when the bio is really queued to md->pushback as described above. So even if the flag is cleared after the lockless checkings, the bio isn't left in md->pushback but returned to applications with -EIO. Other notes on the current patch -------------------------------- - md->pushback is added to the struct mapped_device instead of using md->deferred directly because md->io_lock which protects md->deferred is rw_semaphore and can't be used in interrupt context like dec_pending(), and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too. - Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG ioctl option is specified, because I/Os generated by lock_fs() would be pushed back and never return if there were no valid devices. - If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING flag is set, md->pushback must be flushed because I/Os may be queued to the list already. (flush_and_out label in dm_suspend()) Test results ------------ I have tested using multipath target with the next patch. The following tests are for regression/compatibility: - I/Os succeed when valid paths exist; - I/Os fail when there are no valid paths and queue_if_no_path is not set; - I/Os are queued in the multipath target when there are no valid paths and queue_if_no_path is set; - The queued I/Os above fail when suspend is issued without the DM_NOFLUSH_FLAG ioctl option. I/Os spanning 2 multipath targets also fail. The following tests are for the normal code path of new pushback feature: - Queued I/Os in the multipath target are flushed from the target but don't return when suspend is issued with the DM_NOFLUSH_FLAG ioctl option; - The I/Os above are queued in the multipath target again when resume is issued without path recovery; - The I/Os above succeed when resume is issued after path recovery or table load; - Queued I/Os in the multipath target succeed when resume is issued with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os spanning 2 multipath targets also succeed. The following tests are for the error paths of the new pushback feature: - When the bdget_disk() fails in dm_suspend(), the DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the pushback list are flushed properly. - When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted, o I/Os which had already been queued to the pushback list at the time don't return, and are re-issued at resume time; o I/Os which hadn't been returned at the time return with EIO. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] dm: ioctl: add noflush suspendKiyoshi Ueda
Provide a dm ioctl option to request noflush suspending. (See next patch for what this is for.) As the interface is extended, the version number is incremented. Other than accepting the new option through the interface, There is no change to existing behaviour. Test results: Confirmed the option is given from user-space correctly. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] dm: map and endio symbolic return codesKiyoshi Ueda
Update existing targets to use the new symbols for return values from target map and end_io functions. There is no effect on behaviour. Test results: Done build test without errors. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] dm: map and endio return code clarificationKiyoshi Ueda
Tighten the use of return values from the target map and end_io functions. Values of 2 and above are now explictly reserved for future use. There are no existing targets using such values. The patch has no effect on existing behaviour. o Reserve return values of 2 and above from target map functions. Any positive value currently indicates "mapping complete", but all existing drivers use the value 1. We now make that a requirement so we can assign new meaning to higher values in future. The new definition of return values from target map functions is: < 0 : error = 0 : The target will handle the io (DM_MAPIO_SUBMITTED). = 1 : Mapping completed (DM_MAPIO_REMAPPED). > 1 : Reserved (undefined). Previously this was the same as '= 1'. o Reserve return values of 2 and above from target end_io functions for similar reasons. DM_ENDIO_INCOMPLETE is introduced for a return value of 1. Test results: I have tested by using the multipath target. I/Os succeed when valid paths exist. I/Os are queued in the multipath target when there are no valid paths and queue_if_no_path is set. I/Os fail when there are no valid paths and queue_if_no_path is not set. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] dm: suspend: parameter changeKiyoshi Ueda
Change the interface of dm_suspend() so that we can pass several options without increasing the number of parameters. The existing 'do_lockfs' integer parameter is replaced by a flag DM_SUSPEND_LOCKFS_FLAG. There is no functional change to the code. Test results: I have tested 'dmsetup suspend' command with/without the '--nolockfs' option and confirmed the do_lockfs value is correctly set. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] dm: tidy core formattingKiyoshi Ueda
Remove unnecessary spaces in dm.c. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] dm io: fix bi_max_vecsHeinz Mauelshagen
The existing code allocates an extra slot in bi_io_vec[] and uses it to store the region number. This patch hides the extra slot from bio_add_page() so the region number can't get overwritten. Also remove a hard-coded SECTOR_SHIFT and fix a typo in a comment. Signed-off-by: Heinz Mauelshagen <hjm@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: Milan Broz <mbroz@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] LOG2: Implement a general integer log2 facility in the kernelDavid Howells
This facility provides three entry points: ilog2() Log base 2 of unsigned long ilog2_u32() Log base 2 of u32 ilog2_u64() Log base 2 of u64 These facilities can either be used inside functions on dynamic data: int do_something(long q) { ...; y = ilog2(x) ...; } Or can be used to statically initialise global variables with constant values: unsigned n = ilog2(27); When performing static initialisation, the compiler will report "error: initializer element is not constant" if asked to take a log of zero or of something not reducible to a constant. They treat negative numbers as unsigned. When not dealing with a constant, they fall back to using fls() which permits them to use arch-specific log calculation instructions - such as BSR on x86/x86_64 or SCAN on FRV - if available. [akpm@osdl.org: MMC fix] Signed-off-by: David Howells <dhowells@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Howells <dhowells@redhat.com> Cc: Wojtek Kaniewski <wojtekka@toxygen.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] struct path: convert mdJosef Sipek
Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] struct path: rename DM's struct pathJosef "Jeff" Sipek
Rename DM's struct path to struct dm_path to prevent name collision between it and struct path from fs/namei.c. Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu> Acked-by: Alasdair G Kergon <agk@redhat.com> Cc: <reiserfs-dev@namesys.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] lockdep: avoid lockdep warning in mdNeilBrown
md_open takes ->reconfig_mutex which causes lockdep to complain. This (normally) doesn't have deadlock potential as the possible conflict is with a reconfig_mutex in a different device. I say "normally" because if a loop were created in the array->member hierarchy a deadlock could happen. However that causes bigger problems than a deadlock and should be fixed independently. So we flag the lock in md_open as a nested lock. This requires defining mutex_lock_interruptible_nested. Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-08[PATCH] remove the old bd_mutex lockdep annotationPeter Zijlstra
Remove the old complex and crufty bd_mutex annotation. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Neil Brown <neilb@cse.unsw.edu.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (48 commits) [NETFILTER]: Fix non-ANSI func. decl. [TG3]: Identify Serdes devices more clearly. [TG3]: Use msleep. [TG3]: Use netif_msg_*. [TG3]: Allow partial speed advertisement. [TG3]: Add TG3_FLG2_IS_NIC flag. [TG3]: Add 5787F device ID. [TG3]: Fix Phy loopback. [WANROUTER]: Kill kmalloc debugging code. [TCP] inet_twdr_hangman: Delete unnecessary memory barrier(). [NET]: Memory barrier cleanups [IPSEC]: Fix inetpeer leak in ipv4 xfrm dst entries. audit: disable ipsec auditing when CONFIG_AUDITSYSCALL=n audit: Add auditing to ipsec [IRDA] irlan: Fix compile warning when CONFIG_PROC_FS=n [IrDA]: Incorrect TTP header reservation [IrDA]: PXA FIR code device model conversion [GENETLINK]: Fix misplaced command flags. [NETLIK]: Add a pointer to the Generic Netlink wiki page. [IPV6] RAW: Don't release unlocked sock. ...
2006-12-07[PATCH] Add include/linux/freezer.h and move definitions from sched.hNigel Cunningham
Move process freezing functions from include/linux/sched.h to freezer.h, so that modifications to the freezer or the kernel configuration don't require recompiling just about everything. [akpm@osdl.org: fix ueagle driver] Signed-off-by: Nigel Cunningham <nigel@suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-07[PATCH] slab: remove kmem_cache_tChristoph Lameter
Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-12-06[CRYPTO] dm-crypt: Make iv_gen_private a unionHerbert Xu
Rather than stuffing integers into pointers with casts, let's use a union. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-06[BLOCK] dm-crypt: Align IV to u64 for essivHerbert Xu
This patch makes the IV u64-aligned since essiv does a u64 store to it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-12-06[BLOCK] dm-crypt: benbi IV, big endian narrow block count for LRW-32-AESRik Snel
LRW-32-AES needs a certain IV. This IV should be provided dm-crypt. The block cipher mode could, in principle generate the correct IV from the plain IV, but I think that it is cleaner to supply the right IV directly. The sector -> narrow block calculation uses a shift for performance reasons. This shift is computed in .ctr and stored in cc->iv_gen_private (as a void *). Signed-off-by: Rik Snel <rsnel@cube.dyndns.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2006-11-22WorkStruct: make allyesconfigDavid Howells
Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>
2006-11-08[PATCH] md: do not freeze md threads for suspendRafael J. Wysocki
If there's a swap file on a software RAID, it should be possible to use this file for saving the swsusp's suspend image. Also, this file should be available to the memory management subsystem when memory is being freed before the suspend image is created. For the above reasons it seems that md_threads should not be frozen during the suspend and the appended patch makes this happen, but then there is the question if they don't cause any data to be written to disks after the suspend image has been created, provided that all filesystems are frozen at that time. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-08[PATCH] md: fix sizing problem with raid5-reshape and CONFIG_LBD=nNeilBrown
I forgot to has the size-in-blocks to (loff_t) before shifting up to a size-in-bytes. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-08[PATCH] md: change ONLINE/OFFLINE events to a single CHANGE eventNeilBrown
It turns out that CHANGE is preferred to ONLINE/OFFLINE for various reasons (not least of which being that udev understands it already). So remove the recently added KOBJ_OFFLINE (no-one is likely to care anyway) and change the ONLINE to a CHANGE event Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Neil Brown <neilb@suse.de> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-11-08[PATCH] dm: raid1: fix waiting for io on suspendJonathan E Brassow
All device-mapper targets must complete outstanding I/O before suspending. The mirror target generates I/O in its recovery phase and fails to wait for it. It needs to be tracked so we can ensure that it has completed before we suspend. [akpm@osdl.org: cleanup] Signed-off-by: Jonathan E Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: <dm-devel@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>