summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2012-07-10bnx2: Fix bug in bnx2_free_tx_skbs().Michael Chan
In rare cases, bnx2x_free_tx_skbs() can unmap the wrong DMA address when it gets to the last entry of the tx ring. We were not using the proper macro to skip the last entry when advancing the tx index. Reported-by: Zongyun Lai <zlai@vmware.com> Reviewed-by: Jeffrey Huang <huangjw@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10IPoIB: fix skb truesize underestimatiomEric Dumazet
Or Gerlitz reported triggering of WARN_ON_ONCE(delta < len); in skb_try_coalesce() This warning tracks drivers that incorrectly set skb->truesize IPoIB indeed allocates a full page to store a fragment, but only accounts in skb->truesize the used part of the page (frame length) This patch fixes skb truesize underestimation, and also fixes a performance issue, because RX skbs have not enough tailroom to allow IP and TCP stacks to pull their header in skb linear part without an expensive call to pskb_expand_head() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: Shlomo Pongartz <shlomop@mellanox.com> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-09gianfar: fix potential sk_wmem_alloc imbalanceEric Dumazet
commit db83d136d7f753 (gianfar: Fix missing sock reference when processing TX time stamps) added a potential sk_wmem_alloc imbalance If the new skb has a different truesize than old one, we can get a negative sk_wmem_alloc once new skb is orphaned at TX completion. Now we no longer early orphan skbs in dev_hard_start_xmit(), this probably can lead to fatal bugs. Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Manfred Rudigier <manfred.rudigier@omicron.at> Cc: Claudiu Manoil <claudiu.manoil@freescale.com> Cc: Jiajun Wu <b06378@freescale.com> Cc: Andy Fleming <afleming@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-09drivers/net/ethernet/broadcom/cnic.c: remove invalid reference to list ↵Julia Lawall
iterator variable If list_for_each_entry, etc complete a traversal of the list, the iterator variable ends up pointing to an address at an offset from the list head, and not a meaningful structure. Thus this value should not be used after the end of the iterator. There does not seem to be a meaningful value to provide to netdev_warn. Replace with pr_warn, since pr_err is used elsewhere. This problem was found using Coccinelle (http://coccinelle.lip6.fr/). Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-09drivers/isdn/mISDN/stack.c: remove invalid reference to list iterator variableJulia Lawall
If list_for_each_entry, etc complete a traversal of the list, the iterator variable ends up pointing to an address at an offset from the list head, and not a meaningful structure. Thus this value should not be used after the end of the iterator. The dereferences are just deleted from the debugging statement. This problem was found using Coccinelle (http://coccinelle.lip6.fr/). Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-09bonding: debugfs and network namespaces are incompatibleEric W. Biederman
The bonding debugfs support has been broken in the presence of network namespaces since it has been added. The debugfs support does not handle multiple bonding devices with the same name in different network namespaces. I haven't had any bug reports, and I'm not interested in getting any. Disable the debugfs support when network namespaces are enabled. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-09bonding: Manage /proc/net/bonding/ entries from the netdev eventsEric W. Biederman
It was recently reported that moving a bonding device between network namespaces causes warnings from /proc. It turns out after the move we were trying to add and to remove the /proc/net/bonding entries from the wrong network namespace. Move the bonding /proc registration code into the NETDEV_REGISTER and NETDEV_UNREGISTER events where the proc registration and unregistration will always happen at the right time. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-09stmmac: Fix for higher mtu size handlingDeepak Sikri
For the higher mtu sizes requiring the buffer size greater than 8192, the buffers are sent or received using multiple dma descriptors/ same descriptor with option of multi buffer handling. It was observed during tests that the driver was missing on data packets during the normal ping operations if the data buffers being used catered to jumbo frame handling. The memory barrriers are added in between preparation of dma descriptors in the jumbo frame handling path to ensure all instructions before enabling the dma are complete. Signed-off-by: Deepak Sikri <deepak.sikri@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-09stmmac: Fix for nfs hang on multiple rebootDeepak Sikri
It was observed that during multiple reboots nfs hangs. The status of receive descriptors shows that all the descriptors were in control of CPU, and none were assigned to DMA. Also the DMA status register confirmed that the Rx buffer is unavailable. This patch adds the fix for the same by adding the memory barriers to ascertain that the all instructions before enabling the Rx or Tx DMA are completed which involves the proper setting of the ownership bit in DMA descriptors. Signed-off-by: Deepak Sikri <deepak.sikri@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-09Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2012-07-09iwlegacy: don't mess up the SCD when removing a keyEmmanuel Grumbach
When we remove a key, we put a key index which was supposed to tell the fw that we are actually removing the key. But instead the fw took that index as a valid index and messed up the SRAM of the device. This memory corruption on the device mangled the data of the SCD. The impact on the user is that SCD queue 2 got stuck after having removed keys. Reported-by: Paul Bolle <pebolle@tiscali.nl> Cc: stable@vger.kernel.org Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-07-09iwlegacy: always monitor for stuck queuesStanislaw Gruszka
This is iwlegacy version of: commit 342bbf3fee2fa9a18147e74b2e3c4229a4564912 Author: Johannes Berg <johannes.berg@intel.com> Date: Sun Mar 4 08:50:46 2012 -0800 iwlwifi: always monitor for stuck queues If we only monitor while associated, the following can happen: - we're associated, and the queue stuck check runs, setting the queue "touch" time to X - we disassociate, stopping the monitoring, which leaves the time set to X - almost 2s later, we associate, and enqueue a frame - before the frame is transmitted, we monitor for stuck queues, and find the time set to X, although it is now later than X + 2000ms, so we decide that the queue is stuck and erroneously restart the device Cc: stable@vger.kernel.org Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-07-09rt2x00usb: fix indexes ordering on RX queue kickStanislaw Gruszka
On rt2x00_dmastart() we increase index specified by Q_INDEX and on rt2x00_dmadone() we increase index specified by Q_INDEX_DONE. So entries between Q_INDEX_DONE and Q_INDEX are those we currently process in the hardware. Entries between Q_INDEX and Q_INDEX_DONE are those we can submit to the hardware. According to that fix rt2x00usb_kick_queue(), as we need to submit RX entries that are not processed by the hardware. It worked before only for empty queue, otherwise was broken. Note that for TX queues indexes ordering are ok. We need to kick entries that have filled skb, but was not submitted to the hardware, i.e. started from Q_INDEX_DONE and have ENTRY_DATA_PENDING bit set. From practical standpoint this fixes RX queue stall, usually reproducible in AP mode, like for example reported here: https://bugzilla.redhat.com/show_bug.cgi?id=828824 Reported-and-tested-by: Franco Miceli <fmiceli@plan.ceibal.edu.uy> Reported-and-tested-by: Tom Horsley <horsley1953@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-07-09mwifiex: fix Coverity SCAN CID 709078: Resource leak (RESOURCE_LEAK)Bing Zhao
> *. CID 709078: Resource leak (RESOURCE_LEAK) > - drivers/net/wireless/mwifiex/cfg80211.c, line: 935 > Assigning: "bss_cfg" = storage returned from "kzalloc(132UL, 208U)" > - but was not free > drivers/net/wireless/mwifiex/cfg80211.c:935 Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-07-09cnic: Don't use netdev->base_addrMichael Chan
commit c0357e975afdbbedab5c662d19bef865f02adc17 bnx2: stop using net_device.{base_addr, irq}. removed netdev->base_addr so we need to update cnic to get the MMIO base address from pci_resource_start(). Otherwise, mmap of the uio device will fail. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-09net: qmi_wwan: add ZTE MF60Bjørn Mork
Adding a device with limited QMI support. It does not support normal QMI_WDS commands for connection management. Instead, sending a QMI_CTL SET_INSTANCE_ID command is required to enable the network interface: 01 0f 00 00 00 00 00 00 20 00 04 00 01 01 00 00 A number of QMI_DMS and QMI_NAS commands are also supported for optional device management. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-09netdev/phy: Fixup lockdep warnings in mdio-mux.cDavid Daney
With lockdep enabled we get: ============================================= [ INFO: possible recursive locking detected ] 3.4.4-Cavium-Octeon+ #313 Not tainted --------------------------------------------- kworker/u:1/36 is trying to acquire lock: (&bus->mdio_lock){+.+...}, at: [<ffffffff813da7e8>] mdio_mux_read+0x38/0xa0 but task is already holding lock: (&bus->mdio_lock){+.+...}, at: [<ffffffff813d79e4>] mdiobus_read+0x44/0x88 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&bus->mdio_lock); lock(&bus->mdio_lock); *** DEADLOCK *** May be due to missing lock nesting notation . . . This is a false positive, since we are indeed using 'nested' locking, we need to use mutex_lock_nested(). Now in theory we can stack multiple MDIO multiplexers, but that would require passing the nesting level (which is difficult to know) to mutex_lock_nested(). Instead we assume the simple case of a single level of nesting. Since these are only warning messages, it isn't so important to solve the general case. Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-09ixgbe: DCB and SR-IOV can not co-exist and will cause hangsAlexander Duyck
DCB and SR-IOV cannot currently be enabled at the same time as the queueing schemes are incompatible. If they are both enabled it will result in Tx hangs since only the first Tx queue will be able to transmit any traffic. This simple fix for this is to block us from enabling TCs in ixgbe_setup_tc if SR-IOV is enabled. This change will be reverted once we can support SR-IOV and DCB coexistence. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-08atl1c: fix issue of transmit queue 0 timed outCloud Ren
some people report atl1c could cause system hang with following kernel trace info: --------------------------------------- WARNING: at.../net/sched/sch_generic.c:258 dev_watchdog+0x1db/0x1d0() ... NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out ... --------------------------------------- This is caused by netif_stop_queue calling when cable Link is down. So remove netif_stop_queue, because link_watch will take it over. Signed-off-by: xiong <xiong@qca.qualcomm.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Cloud Ren <cjren@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-08net: dont use __netdev_alloc_skb for bounce bufferEric Dumazet
commit a1c7fff7e1 (net: netdev_alloc_skb() use build_skb()) broke b44 on some 64bit machines. It appears b44 and b43 use __netdev_alloc_skb() instead of alloc_skb() for their bounce buffers. There is no need to add an extra NET_SKB_PAD reservation for bounce buffers : - In TX path, NET_SKB_PAD is useless - In RX path in b44, we force a copy of incoming frames if GFP_DMA allocations were needed. Reported-and-bisected-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-03Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linuxLinus Torvalds
Pull fix to common clk framework from Michael Turquette: "The previous set of common clk fixes for -rc5 left an uninitialized int which could lead to bad array indexing when switching clock parents. The issue is fixed with a trivial change to the code flow in __clk_set_parent." * tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux: clk: fix parent validation in __clk_set_parent()
2012-07-03Merge tag 'md-3.5-fixes' of git://neil.brown.name/mdLinus Torvalds
Pull raid10 build failure fix from NeilBrown: "I really shouldn't do important things late in the day. It seems that I get careless." * tag 'md-3.5-fixes' of git://neil.brown.name/md: md/raid10: fix careless build error
2012-07-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking update from David Miller: 1) Fix RX sequence number handling in mwifiex, from Stone Piao. 2) Netfilter ipset mis-compares device names, fix from Florian Westphal. 3) Fix route leak in ipv6 IPVS, from Eric Dumazet. 4) NFS fixes. Several buffer overflows in NCI layer from Dan Rosenberg, and release sock OOPS'er fix from Eric Dumazet. 5) Fix WEP handling ath9k, we started using a bit the chip provides to indicate undecrypted packets but that bit turns out to be unreliable in certain configurations. Fix from Felix Fietkau. 6) Fix Kconfig dependency bug in wlcore, from Randy Dunlap. 7) New USB IDs for rtlwifi driver from Larry Finger. 8) Fix crashes in qmi_wwan usbnet driver when disconnecting, from Bjørn Mork. 9) Gianfar driver programs coalescing settings properly in single queue mode, but does not do so in multi-queue mode. Fix from Claudiu Manoil. 10) Missing module.h include in davinci_cpdma.c, from Daniel Mack. 11) Need dummy handler for IPSET_CMD_NONE otherwise we crash in ipset if we get this via nfnetlink, fix from Tomasz Bursztyka. 12) Missing RCU unlock in nfnetlink error path, also from Tomasz. 13) Fix divide by zero in igbvf when the user tries to set an RX coalescing value of 0 usecs, from Mitch A Williams. 14) We can process SCTP sacks for the wrong transport, oops. Fix from Neil Horman. 15) Remove hw IP payload checksumming from e1000e driver. This has zery value in our stack, and turning it on creates a very unintuitive restriction for users when using jumbo MTUs. Specifically, when IP payload checksums are on you cannot use both receive hashing offload and jumbo MTU. Fix from Bruce Allan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits) e1000e: remove use of IP payload checksum sctp: be more restrictive in transport selection on bundled sacks igbvf: fix divide by zero netfilter: nfnetlink: fix missing rcu_read_unlock in nfnetlink_rcv_msg netfilter: ipset: fix crash if IPSET_CMD_NONE command is sent davinci_cpdma: include linux/module.h gianfar: Fix RXICr/TXICr programming for multi-queue mode net: Downgrade CAP_SYS_MODULE deprecated message from error to warning. net: qmi_wwan: fix Oops while disconnecting mwifiex: fix memory leak associated with IE manamgement ath9k: fix panic caused by returning a descriptor we have queued for reuse mac80211: correct behaviour on unrecognised action frames ath9k: enable serialize_regmode for non-PCIE AR9287 rtlwifi: rtl8192cu: New USB IDs NFC: Return from rawsock_release when sk is NULL iwlwifi: fix activating inactive stations wlcore: drop INET dependency ath9k: fix dynamic WEP related regression NFC: Prevent multiple buffer overflows in NCI netfilter: update location of my trees ...
2012-07-04md/raid10: fix careless build errorNeilBrown
build error introduced by commit b357f04a67c2aeee8 That function doesn't get extra args until a later patch. Bother. Reported-by: Fengguang Wu <wfg@linux.intel.com> Reported-by: Simon Kirby <sim@hostway.ca> Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03floppy: cancel any pending fd_timeouts before adding a new oneLinus Torvalds
In commit 070ad7e793dc ("floppy: convert to delayed work and single-thread wq") the 'fd_timeout' timer was converted to a delayed work. However, the "del_timer(&fd_timeout)" was lost in the process, and any previous pending timeouts would stay active when we then re-queued the timeout. This resulted in the floppy probe sequence having a (stale) 20s timeout rather than the intended 3s timeout, and thus made booting with the floppy driver (but no actual floppy controller) take much longer than it should. Of course, there's little reason for most people to compile the floppy driver into the kernel at all, which is why most people never noticed. Canceling the delayed work where we used to do the del_timer() fixes the issue, and makes the floppy probing use the proper new timeout instead. The three second timeout is still very wasteful, but better than the 20s one. Reported-and-tested-by: Andi Kleen <ak@linux.intel.com> Reported-and-tested-by: Calvin Walton <calvin.walton@kepstin.ca> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-03Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block bits from Jens Axboe: "As vacation is coming up, thought I'd better get rid of my pending changes in my for-linus branch for this iteration. It contains: - Two patches for mtip32xx. Killing a non-compliant sysfs interface and moving it to debugfs, where it belongs. - A few patches from Asias. Two legit bug fixes, and one killing an interface that is no longer in use. - A patch from Jan, making the annoying partition ioctl warning a bit less annoying, by restricting it to !CAP_SYS_RAWIO only. - Three bug fixes for drbd from Lars Ellenberg. - A fix for an old regression for umem, it hasn't really worked since the plugging scheme was changed in 3.0. - A few fixes from Tejun. - A splice fix from Eric Dumazet, fixing an issue with pipe resizing." * 'for-linus' of git://git.kernel.dk/linux-block: scsi: Silence unnecessary warnings about ioctl to partition block: Drop dead function blk_abort_queue() block: Mitigate lock unbalance caused by lock switching block: Avoid missed wakeup in request waitqueue umem: fix up unplugging splice: fix racy pipe->buffers uses drbd: fix null pointer dereference with on-congestion policy when diskless drbd: fix list corruption by failing but already aborted reads drbd: fix access of unallocated pages and kernel panic xen/blkfront: Add WARN to deal with misbehaving backends. blkcg: drop local variable @q from blkg_destroy() mtip32xx: Create debugfs entries for troubleshooting mtip32xx: Remove 'registers' and 'flags' from sysfs blkcg: fix blkg_alloc() failure path block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED block: fix return value on cfq_init() failure mtip32xx: Remove version.h header file inclusion xen/blkback: Copy id field when doing BLKIF_DISCARD.
2012-07-03clk: fix parent validation in __clk_set_parent()Rajendra Nayak
The below commit introduced a bug in __clk_set_parent() which could cause it to *skip* the parent validation which makes sure the parent passed to the api is a valid one. commit 7975059db572eb47f0fb272a62afeae272a4b209 Author: Rajendra Nayak <rnayak@ti.com> Date: Wed Jun 6 14:41:31 2012 +0530 clk: Allow late cache allocation for clk->parents This was identified by the following compiler warning.. drivers/clk/clk.c: In function '__clk_set_parent': drivers/clk/clk.c:1083:5: warning: 'i' may be used uninitialized in this function [-Wuninitialized] .. as reported by Marc Kleine-Budde. There were various options discussed on how to fix this, one being initing 'i' to clk->num_parents, but the below approach was found to be more appropriate as it also makes the 'parent validation' code simpler to read. Reported-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Rajendra Nayak <rnayak@ti.com> Signed-off-by: Mike Turquette <mturquette@linaro.org> Cc: stable@kernel.org
2012-07-03Merge tag 'dm-3.5-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull device-mapper fixes from Alasdair G Kergon: "Four minor thin provisioning fixes and correct and update dm-verity documentation." * tag 'dm-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: dm: verity fix documentation dm persistent data: fix allocation failure in space map checker init dm persistent data: handle space map checker creation failure dm persistent data: fix shadow_info_leak on dm_tm_destroy dm thin: commit metadata before creating metadata snapshot
2012-07-03Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "One regression fix, two radeon fixes (one for an oops), and an i915 fix to unload framebuffers earlier. We originally were going to leave the i915 fix until -next, but grub2 in some situations causes vesafb/efifb to be loaded now, and this causes big slowdowns, and I have reports in rawhide I'd like to have fixed." * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/i915: kick any firmware framebuffers before claiming the gtt drm: edid: Don't add inferred modes with higher resolution drm/radeon: fix rare segfault drm/radeon: fix VM page table setup on SI
2012-07-03Merge tag 'md-3.5-fixes' of git://neil.brown.name/mdLinus Torvalds
Pull md fixes from NeilBrown: "md: collection of bug fixes for 3.5 You go away for 2 weeks vacation and what do you get when you come back? Piles of bugs :-) Some found by inspection, some by testing, some during use in the field, and some while developing for the next window..." * tag 'md-3.5-fixes' of git://neil.brown.name/md: md: fix up plugging (again). md: support re-add of recovering devices. md/raid1: fix bug in read_balance introduced by hot-replace raid5: delayed stripe fix md/raid456: When read error cannot be recovered, record bad block md: make 'name' arg to md_register_thread non-optional. md/raid10: fix failure when trying to repair a read error. md/raid5: fix refcount problem when blocked_rdev is set. md:Add blk_plug in sync_thread. md/raid5: In ops_run_io, inc nr_pending before calling md_wait_for_blocked_rdev md/raid5: Do not add data_offset before call to is_badblock md/raid5: prefer replacing failed devices over want-replacement devices. md/raid10: Don't try to recovery unmatched (and unused) chunks.
2012-07-03dm persistent data: fix allocation failure in space map checker initMike Snitzer
If CONFIG_DM_DEBUG_SPACE_MAPS is enabled and memory is fragmented and a sufficiently-large metadata device is used in a thin pool then the space map checker will fail to allocate the memory it requires. Switch from kmalloc to vmalloc to allow larger virtually contiguous allocations for the space map checker's internal count arrays. Reported-by: Vivek Goyal <vgoyal@redhat.com> Cc: stable@kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-03dm persistent data: handle space map checker creation failureMike Snitzer
If CONFIG_DM_DEBUG_SPACE_MAPS is enabled and dm_sm_checker_create() fails, dm_tm_create_internal() would still return success even though it cleaned up all resources it was supposed to have created. This will lead to a kernel crash: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC ... RIP: 0010:[<ffffffff81593659>] [<ffffffff81593659>] dm_bufio_get_block_size+0x9/0x20 Call Trace: [<ffffffff81599bae>] dm_bm_block_size+0xe/0x10 [<ffffffff8159b8b8>] sm_ll_init+0x78/0xd0 [<ffffffff8159c1a6>] sm_ll_new_disk+0x16/0xa0 [<ffffffff8159c98e>] dm_sm_disk_create+0xfe/0x160 [<ffffffff815abf6e>] dm_pool_metadata_open+0x16e/0x6a0 [<ffffffff815aa010>] pool_ctr+0x3f0/0x900 [<ffffffff8158d565>] dm_table_add_target+0x195/0x450 [<ffffffff815904c4>] table_load+0xe4/0x330 [<ffffffff815917ea>] ctl_ioctl+0x15a/0x2c0 [<ffffffff81591963>] dm_ctl_ioctl+0x13/0x20 [<ffffffff8116a4f8>] do_vfs_ioctl+0x98/0x560 [<ffffffff8116aa51>] sys_ioctl+0x91/0xa0 [<ffffffff81869f52>] system_call_fastpath+0x16/0x1b Fix the space map checker code to return an appropriate ERR_PTR and have dm_sm_disk_create() and dm_tm_create_internal() check for it with IS_ERR. Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-03dm persistent data: fix shadow_info_leak on dm_tm_destroyMike Snitzer
Cleanup the shadow table before destroying the transaction manager. Reference: leak was identified with kmemleak when running test_discard_random_sectors in the thinp-test-suite. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-03dm thin: commit metadata before creating metadata snapshotJoe Thornber
Userland sometimes sees a corrupt metadata block if metadata is changing rapidly when a metadata snapshot is reserved for userland, To make the problem go away, commit before we take the metadata snapshot (which is a sensible thing to do anyway). The checksums mean userland spots this corruption immediately so there's no risk of acting on incorrect data. No corruption exists from the kernel's point of view, and thin_check passes after pool shutdown. I believe this is to do with shared blocks at the first level of the {device, mapping} btree. Prior to the metadata-snap support no sharing at this level was possible, so this patch is only required after commit cc8394d86f045b86ff303d3c9e4ce47d97148951 ("dm thin: provide userspace access to pool metadata"). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-03drm/i915: kick any firmware framebuffers before claiming the gttDaniel Vetter
Especially vesafb likes to map everything as uc- (yikes), and if that mapping hangs around still while we try to map the gtt as wc the kernel will downgrade our request to uc-, resulting in abyssal performance. Unfortunately we can't do this as early as readon does (i.e. as the first thing we do when initializing the hw) because our fb/mmio space region moves around on a per-gen basis. So I've had to move it below the gtt initialization, but that seems to work, too. The important thing is that we do this before we set up the gtt wc mapping. Now an altogether different question is why people compile their kernels with vesafb enabled, but I guess making things just work isn't bad per se ... v2: - s/radeondrmfb/inteldrmfb/ - fix up error handling v3: Kill #ifdef X86, this is Intel after all. Noticed by Ben Widawsky. v4: Jani Nikula complained about the pointless bool primary initialization. v5: Don't oops if we can't allocate, noticed by Chris Wilson. v6: Resolve conflicts with agp rework and fixup whitespace. This is commit e188719a2891f01b3100d in drm-next. Backport to 3.5 -fixes queue requested by Dave Airlie - due to grub using vesa on fedora their initrd seems to load vesafb before loading the real kms driver. So tons more people actually experience a dead-slow gpu. Hence also the Cc: stable. Cc: stable@vger.kernel.org Reported-and-tested-by: "Kilarski, Bernard R" <bernard.r.kilarski@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-03drm: edid: Don't add inferred modes with higher resolutionTakashi Iwai
When a monitor EDID doesn't give the preferred bit, driver assumes that the mode with the higest resolution and rate is the preferred mode. Meanwhile the recent changes for allowing more modes in the GFT/CVT ranges give actually more modes, and some modes may be over the native size. Thus such a mode would be picked up as the preferred mode although it's no native resolution. For avoiding such a problem, this patch limits the addition of inferred modes by checking not to be greater than other modes. Also, it checks the duplicated mode entry at the same time. Reviewed-by: Adam Jackson <ajax@redhat.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-03drm/radeon: fix rare segfaultJerome Glisse
In gem idle/busy ioctl the radeon object was derefenced after drm_gem_object_unreference_unlocked which in case the object have been destroyed lead to use of a possibly free pointer with possibly wrong data. Signed-off-by: Jerome Glisse <jglisse@redhat.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-07-03md: fix up plugging (again).NeilBrown
The value returned by "mddev_check_plug" is only valid until the next 'schedule' as that will unplug things. This could happen at any call to mempool_alloc. So just calling mddev_check_plug at the start doesn't really make sense. So call it just before, or just after, queuing things for the thread. As the action that happens at unplug is to wake the thread, this makes lots of sense. If we cannot add a plug (which requires a small GFP_ATOMIC alloc) we wake thread immediately. RAID5 is a bit different. Requests are queued for the thread and the thread is woken by release_stripe. So we don't need to wake the thread on failure. However the thread doesn't perform certain actions when there is any active plug, so it is important to install a plug before waking the thread. So for RAID5 we install the plug *before* queuing the request and waking the thread. Without this patch it is possible for raid1 or raid10 to queue a request without then waking the thread, resulting in the array locking up. Also change raid10 to only flush_pending_write when there are not active plugs, just like raid1. This patch is suitable for 3.0 or later. I plan to submit it to -stable, but I'll like to let it spend a few weeks in mainline first to be sure it is completely safe. Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md: support re-add of recovering devices.NeilBrown
We currently only allow a device to be re-added if it appear to be in-sync. This is overly restrictive as it may be desirable to re-add a device that is in the middle of recovery. So remove the test for "InSync" - the test on rdev->raid_disk is sufficient to ensure that the re-add will succeed. Reported-by: Alexander Lyakas <alex.bolshoy@gmail.com> Tested-by: Alexander Lyakas <alex.bolshoy@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid1: fix bug in read_balance introduced by hot-replaceNeilBrown
When we added hot_replace we doubled the number of devices that could be in a RAID1 array. So we doubled how far read_balance would search. Unfortunately we didn't double the point at which it looped back to the beginning - so it effectively loops over all non-replacement disks twice. This doesn't cause bad behaviour, but it pointless and means we never read from replacement devices. Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03raid5: delayed stripe fixShaohua Li
There isn't locking setting STRIPE_DELAYED and STRIPE_PREREAD_ACTIVE bits, but the two bits have relationship. A delayed stripe can be moved to hold list only when preread active stripe count is below IO_THRESHOLD. If a stripe has both the bits set, such stripe will be in delayed list and preread count not 0, which will make such stripe never leave delayed list. Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid456: When read error cannot be recovered, record bad blockmajianpeng
We may not be able to fix a bad block if: - the array is degraded - the over-write fails. In these cases we currently eject the device, but we should record a bad block if possible. Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md: make 'name' arg to md_register_thread non-optional.NeilBrown
Having the 'name' arg optional and defaulting to the current personality name is no necessary and leads to errors, as when changing the level of an array we can end up using the name of the old level instead of the new one. So make it non-optional and always explicitly pass the name of the level that the array will be. Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid10: fix failure when trying to repair a read error.NeilBrown
commit 58c54fcca3bac5bf9290cfed31c76e4c4bfbabaf md/raid10: handle further errors during fix_read_error better. in 3.1 added "r10_sync_page_io" which takes an IO size in sectors. But we were passing the IO size in bytes!!! This resulting in bio_add_page failing, and empty request being sent down, and a consequent BUG_ON in scsi_lib. [fix missing space in error message at same time] This fix is suitable for 3.1.y and later. Cc: stable@vger.kernel.org Reported-by: Christian Balzer <chibi@gol.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid5: fix refcount problem when blocked_rdev is set.NeilBrown
commit 43220aa0f22cd3ce5b30246d50ccd696d119edea md/raid5: fix a hang on device failure. fixed a hang, but introduced a refcounting in-balance so that if the presence of bad-blocks ever caused an rdev to be 'blocked' we would increment the refcount on the rdev and never decrement it. So added the needed rdev_dec_pending when md_wait_for_blocked_rdev is not called. Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md:Add blk_plug in sync_thread.majianpeng
Add blk_plug in sync_thread will increase the performance of sync. Because sync_thread did not blk_plug,so when raid sync, the bio merge not well. Testing environment: SATA controller: Intel Corporation 82801JI (ICH10 Family) SATA AHCI Controller. OS:Linux xxx 3.5.0-rc2+ #340 SMP Tue Jun 12 09:00:25 CST 2012 x86_64 x86_64 x86_64 GNU/Linux. RAID5: four ST31000524NS disk. Without blk_plug:recovery speed about 63M/Sec; Add blk_plug:recovery speed about 120M/Sec. Using blktrace: blktrace -d /dev/sdb -w 60 -o -|blkparse -i - without blk_plug: Total (8,16): Reads Queued: 309811, 1239MiB Writes Queued: 0, 0KiB Read Dispatches: 283583, 1189MiB Write Dispatches: 0, 0KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 273351, 1149MiB Writes Completed: 0, 0KiB Read Merges: 23533, 94132KiB Write Merges: 0, 0KiB IO unplugs: 0 Timer unplugs: 0 add blk_plug: Total (8,16): Reads Queued: 428697, 1714MiB Writes Queued: 0, 0KiB Read Dispatches: 3954, 1714MiB Write Dispatches: 0, 0KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 3956, 1715MiB Writes Completed: 0, 0KiB Read Merges: 424743, 1698MiB Write Merges: 0, 0KiB IO unplugs: 0 Timer unplugs: 3384 The ratio of merge will be markedly increased. Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid5: In ops_run_io, inc nr_pending before calling md_wait_for_blocked_rdevmajianpeng
In ops_run_io(), the call to md_wait_for_blocked_rdev will decrement nr_pending so we lose the reference we hold on the rdev. So atomic_inc it first to maintain the reference. This bug was introduced by commit 73e92e51b7969ef5477d md/raid5. Don't write to known bad block on doubtful devices. which appeared in 3.0, so patch is suitable for stable kernels since then. Cc: stable@vger.kernel.org Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid5: Do not add data_offset before call to is_badblockmajianpeng
In chunk_aligned_read() we are adding data_offset before calling is_badblock. But is_badblock also adds data_offset, so that is bad. So move the addition of data_offset to after the call to is_badblock. This bug was introduced by commit 31c176ecdf3563140e639 md/raid5: avoid reading from known bad blocks. which first appeared in 3.0. So that patch is suitable for any -stable kernel from 3.0.y onwards. However it will need minor revision for most of those (as the comment didn't appear until recently). Cc: stable@vger.kernel.org Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid5: prefer replacing failed devices over want-replacement devices.NeilBrown
If a RAID5 has both a failed device and a device marked as 'WantReplacement', then we should preferentially replace the failed device. However the current code replaces whichever is found first. So split into 2 loops, check fail failed/missing first, and only check for WantReplacement if nothing is failed or missing. Reported-by: majianpeng <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
2012-07-03md/raid10: Don't try to recovery unmatched (and unused) chunks.NeilBrown
If a RAID10 has an odd number of chunks - as might happen when there are an odd number of devices - the last chunk has no pair and so is not mirrored. We don't store data there, but when recovering the last device in an array we retry to recover that last chunk from a non-existent location. This results in an error, and the recovery aborts. When we get to that last chunk we should just stop - there is nothing more to do anyway. This bug has been present since the introduction of RAID10, so the patch is appropriate for any -stable kernel. Cc: stable@vger.kernel.org Reported-by: Christian Balzer <chibi@gol.com> Tested-by: Christian Balzer <chibi@gol.com> Signed-off-by: NeilBrown <neilb@suse.de>