summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2007-05-02pciehp: Adapt to device driver modelKenji Kaneshige
This patch adapts PCIEHP driver to PCI device driver model. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02pciehp: Event handling reworkKenji Kaneshige
The event handler of PCIEHP driver is unnecessarily very complex. In addition, current event handler can only a fixed number of events at the same time, and some of events would be lost if several number of events happened at the same time. This patch simplify the event handler using 'work queue', and it also fix the above-mentioned issue. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02pci: New PCI-E reset APIBrian King
Adds a new API which can be used to issue various types of PCI-E reset, including PCI-E warm reset and PCI-E hot reset. This is needed for an ipr PCI-E adapter which does not properly implement BIST. Running BIST on this adapter results in PCI-E errors. The only reliable reset mechanism that exists on this hardware is PCI Fundamental reset (warm reset). Since driving this type of reset is architecture unique, this provides the necessary hooks for architectures to add this support. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-05-02PCI: Flush MSI-X table writesMitch Williams
This patch fixes a kernel bug which is triggered when using the irqbalance daemon with MSI-X hardware. Because both MSI-X interrupt messages and MSI-X table writes are posted, it's possible for them to cross while in-flight. This results in interrupts being received long after the kernel thinks they're disabled, and in interrupts being sent to stale vectors after rebalancing. This patch performs a read flush after writes to the MSI-X table for mask and unmask operations. Since the SMP affinity is set while the interrupt is masked, and since it's unmasked immediately after, no additional flushes are required in the various affinity setting routines. This patch has been validated with (unreleased) network hardware which uses MSI-X. Revised with input from Eric Biederman. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-04-30libata: honour host controllers that want just one hostLinus Torvalds
The Marvell IDE interface on my machine would hit a BUG_ON() in lib/iomem.c because it was calling ata_pci_init_one() specifying just a single port on the host, but that would actually end up trying to initialize two ports, the second one with bogus information. This fixes "ata_pci_init_one()" so that it actually passes down the n_ports variable that it got from the low-level driver to the host allocation routine ("ata_host_alloc_pinfo()"), which results in the ATA layer actually having the correct port number information. And in order to make it all work, I also needed to fix a few places that had incorrectly hard-coded the fact that a host always had exactly two ports (both ata_pci_init_bmdma() and ata_request_legacy_irqs() would just always iterate over both ports). Acked-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30pm: include EIO from errno-base.hDavid Rientjes
For backwards compatibility, call_platform_enable_wakeup() can return 0 instead of -EIO since we aren't guaranteed to have errno defined. Cc: David Brownell <david-b@pacbell.net> Signed-off-by: David Rientjes <rientjes@google.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30Add kvasprintf()Jeremy Fitzhardinge
Add a kvasprintf() function to complement kasprintf(). No in-tree users yet, but I have some coming up. [akpm@linux-foundation.org: EXPORT it] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Keir Fraser <keir@xensource.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30power management: force pm_ops.valid callback to be assignedJohannes Berg
This patch changes the docs and behaviour from "all states valid" to "no states valid" if no .valid callback is assigned. Users of pm_ops that only need mem sleep can assign pm_valid_only_mem without any overhead, others will require more elaborate callbacks. Now that all users of pm_ops have a .valid callback this is a safe thing to do and prevents things from getting messy again as they were before. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Pavel Machek <pavel@ucw.cz> Looks-okay-to: Rafael J. Wysocki <rjw@sisk.pl> Cc: <linux-pm@lists.linux-foundation.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30power management: implement pm_ops.valid for everybodyJohannes Berg
Almost all users of pm_ops only support mem sleep, don't check in .valid and don't reject any others in .prepare so users can be confused if they check /sys/power/state, especially when new states are added (these would then result in s-t-r although they're supposed to be something different). This patch implements a generic pm_valid_only_mem function that is then exported for users and puts it to use in almost all existing pm_ops. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: David Brownell <david-b@pacbell.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: linux-pm@lists.linux-foundation.org Cc: Len Brown <lenb@kernel.org> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30power management: remove firmware disk modeJohannes Berg
This patch removes the firmware disk suspend mode which is the wrong approach, it is supposed to be used for implementing firmware-based disk suspend but cannot actually be used for that. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <linux-pm@lists.linux-foundation.org> Cc: David Brownell <david-b@pacbell.net> Cc: Len Brown <lenb@kernel.org> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30rework pm_ops pm_disk_mode, kill misuseJohannes Berg
This patch series cleans up some misconceptions about pm_ops. Some users of the pm_ops structure attempt to use it to stop the user from entering suspend to disk, this, however, is not possible since the user can always use "shutdown" in /sys/power/disk and then the pm_ops are never invoked. Also, platforms that don't support suspend to disk simply should not allow configuring SOFTWARE_SUSPEND (read the help text on it, it only selects suspend to disk and nothing else, all the other stuff depends on PM). The pm_ops structure is actually intended to provide a way to enter platform-defined sleep states (currently supported states are "standby" and "mem" (suspend to ram)) and additionally (if SOFTWARE_SUSPEND is configured) allows a platform to support a platform specific way to enter low-power mode once everything has been saved to disk. This is currently only used by ACPI (S4). This patch: The pm_ops.pm_disk_mode is used in totally bogus ways since nobody really seems to understand what it actually does. This patch clarifies the pm_disk_mode description. It also removes all the arm and sh users that think they can veto suspend to disk via pm_ops; not so since the user can always do echo shutdown > /sys/power/disk, they need to find a better way involving Kconfig or such. ACPI is the only user left with a non-zero pm_disk_mode. The patch also sets the default mode to shutdown again, but when a new pm_ops is registered its pm_disk_mode is selected as default, that way the default stays for ACPI where it is apparently required. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: David Brownell <david-b@pacbell.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <linux-pm@lists.linux-foundation.org> Cc: Len Brown <lenb@kernel.org> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30reiserfs: suppress lockdep warningJeff Mahoney
We're getting lockdep warnings due to a post-2.6.21-rc7 bugfix. The xattr_sem can never be taken in the manner described. Internal inodes are protected by I_PRIVATE. Add the appropriate annotation. Cc: <stable@kernel.org> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30Extend print_symbol capabilityRobert Peterson
Today's print_symbol function dumps a kernel symbol with printk. This patch extends the functionality of kallsyms.c so that the symbol lookup function may be used without the printk. This is useful for modules that want to dump symbols elsewhere, for example, to debugfs. I intend to use the new function call in the GFS2 file system (which will be a separate patch). [akpm@linux-foundation.org: build fix] [clameter@sgi.com: sprint_symbol should return length of string like sprintf] Signed-off-by: Robert Peterson <rpeterso@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Paulo Marques <pmarques@grupopie.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-30[UDP]: Do not allow specific bind when wildcard bind exists.David S. Miller
When allocating local ports, do not allow a bind to a port with a specific local address when a bind to that port with a wildcard local address already exists. Noticed by Linus. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[IPV4] UDP: Fix endianness bugs in hashing changes.David S. Miller
I accidently applied an earlier version of Eric Dumazet's patch, from March 21st. His version from March 30th didn't have these bugs, so this just interdiffs to the correct patch. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (56 commits) ieee1394: remove garbage from Kconfig ieee1394: more help in Kconfig ieee1394: ohci1394: Fix mistake in printk message. ieee1394: ohci1394: remove unnecessary rcvPhyPkt bit flipping in LinkControl register ieee1394: ohci1394: fix cosmetic problem in error logging ieee1394: eth1394: send async streams at S100 on 1394b buses ieee1394: eth1394: fix error path in module_init ieee1394: eth1394: correct return codes in hard_start_xmit ieee1394: eth1394: hard_start_xmit is called in atomic context ieee1394: eth1394: some conditions are unlikely ieee1394: eth1394: clean up fragment_overlap ieee1394: eth1394: don't use alloc_etherdev ieee1394: eth1394: omit useless set_mac_address callback ieee1394: eth1394: CONFIG_INET is always defined ieee1394: eth1394: allow MTU bigger than 1500 ieee1394: unexport highlevel_host_reset ieee1394: eth1394: contain host reset ieee1394: eth1394: shorter error messages ieee1394: eth1394: correct a memset argument ieee1394: eth1394: refactor .probe and .update ...
2007-04-30Merge branch 'for-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jikos/hid * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jikos/hid: (21 commits) USB HID: don't warn on idVendor == 0 USB HID: add 'quirks' module parameter USB HID: add support for dynamically-created quirks USB HID: clarify static quirk handling as squirks USB HID: encapsulate quirk handling into hid-quirks.c USB HID: EMS USBII device needs HID_QUIRK_MULTI_INPUT HID: update copyright and authorship macro HID: introduce proper zeroing of unused bits in output reports USB HID: add support for WiseGroup MP-8800 Quad Joypad USB HID: add FF support for Logitech Force 3D Pro Joystick USB HID: numlock quirk for dell W7658 keyboard USB HID: Logitech MX3000 keyboard needs report descriptor quirk USB HID: extend quirk for Logitech S510 keyboard USB HID: usbkbd/usbmouse - handle errors when registering devices USB HID: add QUIRK_HIDDEV for Belkin Flip KVM HID: enable dead keys on a belkin wireless keyboard USB HID: Thustmaster firestorm dual power v1 support USB HID: specify explicit size for hid_blacklist.quirks USB HID: fix retry & reset logic USB HID: consolidate vendor/product ids ...
2007-04-30Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits) [IPV4] SNMP: Support OutMcastPkts and OutBcastPkts [IPV4] SNMP: Support InMcastPkts and InBcastPkts [IPV4] SNMP: Support InTruncatedPkts [IPV4] SNMP: Support InNoRoutes [SNMP]: Add definitions for {In,Out}BcastPkts [TCP] FRTO: RFC4138 allows Nagle override when new data must be sent [TCP] FRTO: Delay skb available check until it's mandatory [XFRM]: Restrict upper layer information by bundle. [TCP]: Catch skb with S+L bugs earlier [PATCH] INET : IPV4 UDP lookups converted to a 2 pass algo [L2TP]: Add the ability to autoload a pppox protocol module. [SKB]: Introduce skb_queue_walk_safe() [AF_IUCV/IUCV]: smp_call_function deadlock [IPV6]: Fix slab corruption running ip6sic [TCP]: Update references in two old comments [XFRM]: Export SPD info [IPV6]: Track device renames in snmp6. [SCTP]: Fix sctp_getsockopt_local_addrs_old() to use local storage. [NET]: Remove NETIF_F_INTERNAL_STATS, default to internal stats. [NETPOLL]: Remove CONFIG_NETPOLL_RX ...
2007-04-30Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block: [PATCH] elevator: elv_list_lock does not need irq disabling [BLOCK] Don't pin lots of memory in mempools cfq-iosched: speedup cic rb lookup ll_rw_blk: add io_context private pointer cfq-iosched: get rid of cfqq hash cfq-iosched: tighten queue request overlap condition cfq-iosched: improve sync vs async workloads cfq-iosched: never allow an async queue idling cfq-iosched: get rid of ->dispatch_slice cfq-iosched: don't pass unused preemption variable around cfq-iosched: get rid of ->cur_rr and ->cfq_list cfq-iosched: slice offset should take ioprio into account [PATCH] cfq-iosched: style cleanups and comments cfq-iosched: sort IDLE queues into the rbtree cfq-iosched: sort RT queues into the rbtree [PATCH] cfq-iosched: speed up rbtree handling cfq-iosched: rework the whole round-robin list concept cfq-iosched: minor updates cfq-iosched: development update cfq-iosched: improve preemption for cooperating tasks
2007-04-30Merge branch 'for-2.6.22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'for-2.6.22' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (255 commits) [POWERPC] Remove dev_dbg redefinition in drivers/ps3/vuart.c [POWERPC] remove kernel module option for booke wdt [POWERPC] Avoid putting cpu node twice [POWERPC] Spinlock initializer cleanup [POWERPC] ppc4xx_sgdma needs dma-mapping.h [POWERPC] arch/powerpc/sysdev/timer.c build fix [POWERPC] get_property cleanups [POWERPC] Remove the unused HTDMSOUND driver [POWERPC] cell: cbe_cpufreq cleanup and crash fix [POWERPC] Declare enable_kernel_spe in a header [POWERPC] Add dt_xlate_addr() to bootwrapper [POWERPC] bootwrapper: CONFIG_ -> CONFIG_DEVICE_TREE [POWERPC] Don't define a custom bd_t for Xilixn Virtex based boards. [POWERPC] Add sane defaults for Xilinx EDK generated xparameters files [POWERPC] Add uartlite boot console driver for the zImage wrapper [POWERPC] Stop using ppc_sys for Xilinx Virtex boards [POWERPC] New registration for common Xilinx Virtex ppc405 platform devices [POWERPC] Merge common virtex header files [POWERPC] Rework Kconfig dependancies for Xilinx Virtex ppc405 platform [POWERPC] Clean up cpufreq Kconfig dependencies ...
2007-04-30[IPV4] SNMP: Support OutMcastPkts and OutBcastPktsMitsuru Chinen
A transmitted IP multicast datagram should be counted as OutMcastPkts. By the same token, a transmitted IP broadcast datagram should be counted as OutBcastPkts. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[IPV4] SNMP: Support InMcastPkts and InBcastPktsMitsuru Chinen
A received IP multicast datagram should be counted as InMcastPkts. By the same token, a received IP broadcast datagram should be counted as InBcastPkts. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[IPV4] SNMP: Support InTruncatedPktsMitsuru Chinen
An IP datagram which is being discarded because the datagram frame didn't carry enough data should be counted as InTruncatedPkts. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[IPV4] SNMP: Support InNoRoutesMitsuru Chinen
An IP datagram which is being discarded because of no routes in the forwarding path should be counted as InNoRoutes. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[SNMP]: Add definitions for {In,Out}BcastPktsMitsuru Chinen
The updated IP-MIB RFC (RFC4293) specifys new objects, InBcastPkts and OutBcastPkts. This adds definitions for them. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[TCP] FRTO: RFC4138 allows Nagle override when new data must be sentIlpo Järvinen
This is a corner case where less than MSS sized new data thingie is awaiting in the send queue. For F-RTO to work correctly, a new data segment must be sent at certain point or F-RTO cannot be used at all. RFC4138 allows overriding of Nagle at that point. Implementation uses frto_counter states 2 and 3 to distinguish when Nagle override is needed. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[TCP] FRTO: Delay skb available check until it's mandatoryIlpo Järvinen
No new data is needed until the first ACK comes, so no need to check for application limitedness until then. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[XFRM]: Restrict upper layer information by bundle.Masahide NAKAMURA
On MIPv6 usage, XFRM sub policy is enabled. When main (IPsec) and sub (MIPv6) policy selectors have the same address set but different upper layer information (i.e. protocol number and its ports or type/code), multiple bundle should be created. However, currently we have issue to use the same bundle created for the first time with all flows covered by the case. It is useful for the bundle to have the upper layer information to be restructured correctly if it does not match with the flow. 1. Bundle was created by two policies Selector from another policy is added to xfrm_dst. If the flow does not match the selector, it goes to slow path to restructure new bundle by single policy. 2. Bundle was created by one policy Flow cache is added to xfrm_dst as originated one. If the flow does not match the cache, it goes to slow path to try searching another policy. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[TCP]: Catch skb with S+L bugs earlierIlpo Järvinen
SACKED_ACKED and LOST are mutually exclusive with SACK, thus having their sum larger than packets_out is bug with SACK. Eventually these bugs trigger traps in the tcp_clean_rtx_queue with SACK but it's much more informative to do this here. Non-SACK TCP, however, could get more than packets_out duplicate ACKs which each increment sacked_out, so it makes sense to do this kind of limitting for non-SACK TCP but not for SACK enabled one. Perhaps the author had the opposite in mind but did the logic accidently wrong way around? Anyway, the sacked_out incrementer code for non-SACK already deals this issue before calling sync_left_out so this trapping can be done unconditionally. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[PATCH] INET : IPV4 UDP lookups converted to a 2 pass algoEric Dumazet
Some people want to have many UDP sockets, binded to a single port but many different addresses. We currently hash all those sockets into a single chain. Processing of incoming packets is very expensive, because the whole chain must be examined to find the best match. I chose in this patch to hash UDP sockets with a hash function that take into account both their port number and address : This has a drawback because we need two lookups : one with a given address, one with a wildcard (null) address. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30[L2TP]: Add the ability to autoload a pppox protocol module.James Chapman
This patch allows a name "pppox-proto-nnn" to be used in modprobe.conf to autoload a PPPoX protocol nnn. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30Merge branch 'cfq' into for-linusJens Axboe
2007-04-30[PATCH] elevator: elv_list_lock does not need irq disablingJens Axboe
It's never grabbed from irq context, so just make it plain spin_lock(). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30[BLOCK] Don't pin lots of memory in mempoolsJens Axboe
Currently we scale the mempool sizes depending on memory installed in the machine, except for the bio pool itself which sits at a fixed 256 entry pre-allocation. There's really no point in "optimizing" this OOM path, we just need enough preallocated to make progress. A single unit is enough, lets scale it down to 2 just to be on the safe side. This patch saves ~150kb of pinned kernel memory on a 32-bit box. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30[SKB]: Introduce skb_queue_walk_safe()James Chapman
This patch provides a method for walking skb lists while inserting or removing skbs from the list. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30cfq-iosched: speedup cic rb lookupJens Axboe
We often lookup the same queue many times in succession, so cache the last looked up queue to avoid browsing the rbtree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30ll_rw_blk: add io_context private pointerJens Axboe
To be used by as/cfq as they see fit. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30cfq-iosched: get rid of cfqq hashVasily Tarasov
cfq hash is no more necessary. We always can get cfqq from io context. cfq_get_io_context_noalloc() function is introduced, because we don't want to allocate cic on merging and checking may_queue. In order to identify sync queue we've used hash key = CFQ_KEY_ASYNC. Since hash is eliminated we need to use other criterion: sync flag for queue is added. In all places where we dig in rb_tree we're in current context, so no additional locking is required. Advantages of this patch: no additional memory for hash, no seeking in hash, code is cleaner. But it is necessary now to seek cic in per-ioc rbtree, but it is faster: - most processes work only with few devices - most systems have only few block devices - it is a rb-tree Signed-off-by: Vasily Tarasov <vtaras@openvz.org> Changes by me: - Merge into CFQ devel branch - Get rid of cfq_get_io_context_noalloc() - Fix various bugs with dereferencing cic->cfqq[] with offset other than 0 or 1. - Fix bug in cfqq setup, is_sync condition was reversed. - Fix bug where only bio_sync() is used, we need to check for a READ too Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30cfq-iosched: tighten queue request overlap conditionJens Axboe
For tagged devices, allow overlap of requests if the idle window isn't enabled on the current active queue. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30cfq-iosched: improve sync vs async workloadsJens Axboe
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30cfq-iosched: never allow an async queue idlingJens Axboe
We don't enable it by default, don't let it get enabled during runtime. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30cfq-iosched: get rid of ->dispatch_sliceJens Axboe
We can track it fairly accurately locally, let the slice handling take care of the rest. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30cfq-iosched: don't pass unused preemption variable aroundJens Axboe
We don't use it anymore in the slice expiry handling. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30cfq-iosched: get rid of ->cur_rr and ->cfq_listJens Axboe
It's only used for preemption now that the IDLE and RT queues also use the rbtree. If we pass an 'add_front' variable to cfq_service_tree_add(), we can set ->rb_key to 0 to force insertion at the front of the tree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30cfq-iosched: slice offset should take ioprio into accountJens Axboe
Use the max_slice-cur_slice as the multipler for the insertion offset. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30[PATCH] cfq-iosched: style cleanups and commentsJens Axboe
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30cfq-iosched: sort IDLE queues into the rbtreeJens Axboe
Same treatment as the RT conversion, just put the sorted idle branch at the end of the tree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30cfq-iosched: sort RT queues into the rbtreeJens Axboe
Currently CFQ does a linked insert into the current list for RT queues. We can just factor the class into the rb insertion, and then we don't have to treat RT queues in a special way. It's faster, too. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30[PATCH] cfq-iosched: speed up rbtree handlingJens Axboe
For cases where the rbtree is mainly used for sorting and min retrieval, a nice speedup of the rbtree code is to maintain a cache of the leftmost node in the tree. Also spotted in the CFS CPU scheduler code. Improved by Alan D. Brunelle <Alan.Brunelle@hp.com> by updating the leftmost hint in cfq_rb_first() if it isn't set, instead of only updating it on insert. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-04-30cfq-iosched: rework the whole round-robin list conceptJens Axboe
Drawing on some inspiration from the CFS CPU scheduler design, overhaul the pending cfq_queue concept list management. Currently CFQ uses a doubly linked list per priority level for sorting and service uses. Kill those lists and maintain an rbtree of cfq_queue's, sorted by when to service them. This unfortunately means that the ionice levels aren't as strong anymore, will work on improving those later. We only scale the slice time now, not the number of times we service. This means that latency is better (for all priority levels), but that the distinction between the highest and lower levels aren't as big. The diffstat speaks for itself. cfq-iosched.c | 363 +++++++++++++++++--------------------------------- 1 file changed, 125 insertions(+), 238 deletions(-) Signed-off-by: Jens Axboe <jens.axboe@oracle.com>