summaryrefslogtreecommitdiff
path: root/net/sched
AgeCommit message (Collapse)Author
2007-05-14[NET_SCHED]: prio qdisc boundary conditionJamal Hadi Salim
This fixes an out-of-boundary condition when the classified band equals q->bands. Caught by Alexey Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10[NET_SCHED]: Avoid requeue warning on dev_deactivateHerbert Xu
When we relinquish queue_lock in qdisc_restart and then retake it for requeueing, we might race against dev_deactivate and end up requeueing onto noop_qdisc. This causes a warning to be printed. This patch fixes this by checking this before we requeue. As an added bonus, we can remove the same check in __qdisc_run which was added to prevent dev->gso_skb from being requeued when we're shutting down. Even though we've had to add a new conditional in its place, it's better because it only happens on requeues rather than every single time that qdisc_run is called. For this to work we also need to move the clearing of gso_skb up in dev_deactivate as now qdisc_restart can occur even after we wait for __LINK_STATE_QDISC_RUNNING to clear (but it won't do anything as long as the queue and gso_skb is already clear). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10[NET_SCHED]: Reread dev->qdisc for NETDEV_TX_OKHerbert Xu
Now that we return the queue length after NETDEV_TX_OK we better make sure that we have the right queue. Otherwise we can cause a stall after a really quick dev_deactive/dev_activate. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10[NET_SCHED]: Rationalise return value of qdisc_restartHerbert Xu
The current return value scheme and associated comment was invented back in the 20th century when we still had that tbusy flag. Things have changed quite a bit since then (even Tony Blair is moving on now, not to mention the new French president). All we need to indicate now is whether the caller should continue processing the queue. Therefore it's sufficient if we return 0 if we want to stop and non-zero otherwise. This is based on a patch by Krishna Kumar. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10[NET]: Fix dev->qdisc race for NETDEV_TX_LOCKED caseThomas Graf
When transmit fails with NETDEV_TX_LOCKED the skb is requeued to dev->qdisc again. The dev->qdisc pointer is protected by the queue lock which needs to be dropped when attempting to transmit and acquired again before requeing. The problem is that qdisc_restart() fetches the dev->qdisc pointer once and stores it in the `q' variable which is invalidated when dropping the queue_lock, therefore the variable needs to be refreshed before requeueing. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10[NET_SCHED]: teql_enqueue can check limits before skb enqueueKrishna Kumar
Optimize teql_enqueue so that it first checks limits before enqueing. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03[NET]: Rework dev_base via list_head (v3)Pavel Emelianov
Cleanup of dev_base list use, with the aim to simplify making device list per-namespace. In almost every occasion, use of dev_base variable and dev->next pointer could be easily replaced by for_each_netdev loop. A few most complicated places were converted to using first_netdev()/next_netdev(). Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET]: cleanup extra semicolonsStephen Hemminger
Spring cleaning time... There seems to be a lot of places in the network code that have extra bogus semicolons after conditionals. Most commonly is a bogus semicolon after: switch() { } Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: ingress: switch back to using ingress_lockPatrick McHardy
Switch ingress queueing back to use ingress_lock. qdisc_lock_tree now locks both the ingress and egress qdiscs on the device. All changes to data that might be used on both ingress and egress needs to be protected by using qdisc_lock_tree instead of manually taking dev->queue_lock. Additionally the qdisc stats_lock needs to be initialized to ingress_lock for ingress qdiscs. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: Eliminate qdisc_tree_lockPatrick McHardy
Since we're now holding the rtnl during the entire dump operation, we can remove qdisc_tree_lock, whose only purpose is to protect dump callbacks from concurrent changes to the qdisc tree. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: qdisc: remove unnecessary memory barriersPatrick McHardy
We're holding dev->queue_lock in qdisc_watchdog_schedule and qdisc_watchdog_cancel, no need for the barriers. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: Unline tcf_destroyPatrick McHardy
Uninline tcf_destroy and add a helper function to destroy an entire filter chain. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: turn PSCHED_GET_TIME into inline functionPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: turn PSCHED_TDIFF_SAFE into inline functionPatrick McHardy
Also rename to psched_tdiff_bounded. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: kill PSCHED_TDIFFPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: kill PSCHED_SET_PASTPERFECT/PSCHED_IS_PASTPERFECTPatrick McHardy
Use direct assignment and comparison instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: kill PSCHED_TLESSPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: kill PSCHED_TADD/PSCHED_TADD2Patrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: kill PSCHED_AUDIT_TDIFFPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: sch_netem: fix off-by-one in send time comparisonPatrick McHardy
netem checks PSCHED_TLESS(cb->time_to_send, now) to find out whether it is allowed to send a packet, which is equivalent to cb->time_to_send < now. Use !PSCHED_TLESS(now, cb->time_to_send) instead to properly handle cb->time_to_send == now. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NETEM]: spelling errorsStephen Hemminger
Get rid of some of my creative spelling. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED] qdisc: avoid transmit softirq on watchdog wakeupStephen Hemminger
If possible, avoid having to do a transmit softirq when a qdisc watchdog decides to re-enable. The watchdog routine runs off a timer, so it is already in the same effective context as the softirq. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NETEM]: avoid excessive requeuesStephen Hemminger
The netem code would call getnstimeofday() and dequeue/requeue after every packet, even if it was waiting. Avoid this overhead by using the throttled flag. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NETEM]: Optimize tfifoStephen Hemminger
In most cases, the next packet will be sent after the last one. So optimize that case. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NETEM]: use better types for time valuesStephen Hemminger
The random number generator always generates 32 bit values. The time values are limited by psched_tdiff_t Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NETEM]: report reorder percent correctly.Stephen Hemminger
If you setup netem to just delay packets; "tc qdisc ls" will report the reordering as 100%. Well it's a lie, reorder isn't used unless gap is set, so just set value to 0 so the output of utility is correct. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[PKT_SCHED] act: Use rtnl registration interfaceThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[PKT_SCHED] cls: Use rtnl registration interfaceThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[PKT_SCHED] qdisc: Use rtnl registration interfaceThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NETLINK]: Use nlmsg_trim() where appropriateArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Convert skb->tail to sk_buff_data_tArnaldo Carvalho de Melo
So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes on 64bit architectures, allowing us to combine the 4 bytes hole left by the layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4 64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN... :-) Many calculations that previously required that skb->{transport,network, mac}_header be first converted to a pointer now can be done directly, being meaningful as offsets or pointers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: Fix warningPatrick McHardy
net/sched/sch_api.c: In function 'psched_show': net/sched/sch_api.c:1219: warning: format '%08x' expects type 'unsigned int', but argument 6 has type 's64' Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: sch_cbq: fix watchdog scheduled too latePatrick McHardy
q->now is increased during dequeue and doesn't contain the current time afterwards, resulting in a too large timeout value for the qdisc watchdog. Use "now" instead, which still contains the current time. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: Export real timer resolution in /proc/net/pschedPatrick McHardy
The timer resolution exported in /proc/net/psched is used by userspace to calculate HTB's burst values. Currently it is set to HZ, since we're now using hrtimers, use KTIME_MONOTONIC_RES, which makes HTB use smaller burst values. This patch also affects libnl, which incorrectly uses this value for the SFQ perturbation parameter, which is always in seconds, and some routing cache values, which are in USER_HZ, so both cases are broken anyway. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: kill jiffie conversion macrosPatrick McHardy
Now that all packet schedulers have been converted to hrtimers most users of PSCHED_JIFFIE2US and PSCHED_US2JIFFIE are gone. The remaining users use it to convert external time units to packet scheduler clock ticks, so use PSCHED_TICKS_PER_SEC instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: sch_htb: use hrtimer based watchdogPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: sch_cbq: use hrtimer for delay_timerPatrick McHardy
Switch delay_timer to hrtimer. The class penalty parameter is changed to use psched ticks as units. Since iproute never supported using this and the only existing user (libnl) incorrectly assumes psched ticks as units anyway, this shouldn't break anything. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: sch_cbq: fix cbq_undelay_prio for non-active prioritesPatrick McHardy
cbq_undelay_prio is supposed to return a time delta, but returns the current time for non-active priorities, causing cbq_undelay to mark the priority as active and schedule a timer for twice the current time. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: sch_cbq: use hrtimer based watchdogPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: sch_netem: use hrtimer based watchdogPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: sch_tbf: use hrtimer based watchdogPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: sch_hfsc: use hrtimer based watchdogPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: Add hrtimer based qdisc watchdogPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET_SCHED]: Use ktime as clocksourcePatrick McHardy
Get rid of the manual clock source selection mess and use ktime. Also use a scalar representation, which allows to clean up pkt_sched.h a bit more and results in less ktime_to_ns() calls in most cases. The PSCHED_US2JIFFIE/PSCHED_JIFFIE2US macros are implemented quite inefficient by this patch, following patches will convert all qdiscs to hrtimers and get rid of them entirely. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6hArnaldo Carvalho de Melo
Now the skb->nh union has just one member, .raw, i.e. it is just like the skb->mac union, strange, no? I'm just leaving it like that till the transport layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or ->mac_header_offset?), ditto for ->{h,nh}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iphArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Introduce skb_network_header()Arnaldo Carvalho de Melo
For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Introduce skb_network_offset()Arnaldo Carvalho de Melo
For the quite common 'skb->nh.raw - skb->data' sequence. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[NET] SCHED: Use htons() where appropriate.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-09[NET_SCHED]: cls_tcindex: fix compatibility breakagePatrick McHardy
Userspace uses an integer for TCA_TCINDEX_SHIFT, the kernel was changed to expect and use a u16 value in 2.6.11, which broke compatibility on big endian machines. Change back to use int. Reported by Ole Reinartz <ole.reinartz@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>