summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2015-10-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/usb/asix_common.c net/ipv4/inet_connection_sock.c net/switchdev/switchdev.c In the inet_connection_sock.c case the request socket hashing scheme is completely different in net-next. The other two conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Account for extra headroom in ath9k driver, from Felix Fietkau. 2) Fix OOPS in pppoe driver due to incorrect socket state transition, from Guillaume Nault. 3) Kill memory leak in amd-xgbe debugfx, from Geliang Tang. 4) Power management fixes for iwlwifi, from Johannes Berg. 5) Fix races in reqsk_queue_unlink(), from Eric Dumazet. 6) Fix dst_entry usage in ARP replies, from Jiri Benc. 7) Cure OOPSes with SO_GET_FILTER, from Daniel Borkmann. 8) Missing allocation failure check in amd-xgbe, from Tom Lendacky. 9) Various resource allocation/freeing cures in DSA< from Neil Armstrong. 10) A series of bug fixes in the openvswitch conntrack support, from Joe Stringer. 11) Fix two cases (BPF and act_mirred) where we have to clean the sender cpu stored in the SKB before transmitting. From WANG Cong and Alexei Starovoitov. 12) Disable VLAN filtering in promiscuous mode in mlx5 driver, from Achiad Shochat. 13) Older bnx2x chips cannot do 4-tuple UDP hashing, so prevent this configuration via ethtool. From Yuval Mintz. 14) Don't call rt6_uncached_list_flush_dev() from rt6_ifdown() when 'dev' is NULL, from Eric Biederman. 15) Prevent stalled link synchronization in tipc, from Jon Paul Maloy. 16) kcalloc() gstrings ethtool buffer before having driver fill it in, in order to prevent kernel memory leaking. From Joe Perches. 17) Fix mixxing rt6_info initialization for blackhole routes, from Martin KaFai Lau. 18) Kill VLAN regression in via-rhine, from Andrej Ota. 19) Missing pfmemalloc check in sk_add_backlog(), from Eric Dumazet. 20) Fix spurious MSG_TRUNC signalling in netlink dumps, from Ronen Arad. 21) Scrube SKBs when pushing them between namespaces in openvswitch, from Joe Stringer. 22) bcmgenet enables link interrupts too early, fix from Florian Fainelli. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (92 commits) net: bcmgenet: Fix early link interrupt enabling tunnels: Don't require remote endpoint or ID during creation. openvswitch: Scrub skb between namespaces xen-netback: correctly check failed allocation net: asix: add support for the Billionton GUSB2AM-1G-B USB adapter netlink: Trim skb to alloc size to avoid MSG_TRUNC net: add pfmemalloc check in sk_add_backlog() via-rhine: fix VLAN receive handling regression. ipv6: Initialize rt6_info properly in ip6_blackhole_route() ipv6: Move common init code for rt6_info to a new function rt6_info_init() Bluetooth: Fix initializing conn_params in scan phase Bluetooth: Fix conn_params list update in hci_connect_le_scan_cleanup Bluetooth: Fix remove_device behavior for explicit connects Bluetooth: Fix LE reconnection logic Bluetooth: Fix reference counting for LE-scan based connections Bluetooth: Fix double scan updates mlxsw: core: Fix race condition in __mlxsw_emad_transmit tipc: move fragment importance field to new header position ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings tipc: eliminate risk of stalled link synchronization ...
2015-10-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. Most relevantly, updates for the nfnetlink_log to integrate with conntrack, fixes for cttimeout and improvements for nf_queue core, they are: 1) Remove useless ifdef around static inline function in IPVS, from Eric W. Biederman. 2) Simplify the conntrack support for nfnetlink_queue: Merge nfnetlink_queue_ct.c file into nfnetlink_queue_core.c, then rename it back to nfnetlink_queue.c 3) Use y2038 safe timestamp from nfnetlink_queue. 4) Get rid of dead function definition in nf_conntrack, from Flavio Leitner. 5) Attach conntrack support for nfnetlink_log.c, from Ken-ichirou MATSUZAWA. This adds a new NETFILTER_NETLINK_GLUE_CT Kconfig switch that controls enabling both nfqueue and nflog integration with conntrack. The userspace application can request this via NFULNL_CFG_F_CONNTRACK configuration flag. 6) Remove unused netns variables in IPVS, from Eric W. Biederman and Simon Horman. 7) Don't put back the refcount on the cttimeout object from xt_CT on success. 8) Fix crash on cttimeout policy object removal. We have to flush out the cttimeout extension area of the conntrack not to refer to an unexisting object that was just removed. 9) Make sure rcu_callback completion before removing nfnetlink_cttimeout module removal. 10) Fix compilation warning in br_netfilter when no nf_defrag_ipv4 and nf_defrag_ipv6 are enabled. Patch from Arnd Bergmann. 11) Autoload ctnetlink dependencies when NFULNL_CFG_F_CONNTRACK is requested. Again from Ken-ichirou MATSUZAWA. 12) Don't use pointer to previous hook when reinjecting traffic via nf_queue with NF_REPEAT verdict since it may be already gone. This also avoids a deadloop if the userspace application keeps returning NF_REPEAT. 13) A bunch of cleanups for netfilter IPv4 and IPv6 code from Ian Morris. 14) Consolidate logger instance existence check in nfulnl_recv_config(). 15) Fix broken atomicity when applying configuration updates to logger instances in nfnetlink_log. 16) Get rid of the .owner attribute in our hook object. We don't need this anymore since we're dropping pending packets that have escaped from the kernel when unremoving the hook. Patch from Florian Westphal. 17) Remove unnecessary rcu_read_lock() from nf_reinject code, we always assume RCU read side lock from .call_rcu in nfnetlink. Also from Florian. 18) Use static inline function instead of macros to define NF_HOOK() and NF_HOOK_COND() when no netfilter support in on, from Arnd Bergmann. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18uapi: add mpls_iptunnel.hstephen hemminger
Add missing rule to export mpls iptunnel header needed by iproute2 Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18tcp: do not set queue_mapping on SYNACKEric Dumazet
At the time of commit fff326990789 ("tcp: reflect SYN queue_mapping into SYNACK packets") we had little ways to cope with SYN floods. We no longer need to reflect incoming skb queue mappings, and instead can pick a TX queue based on cpu cooking the SYNACK, with normal XPS affinities. Note that all SYNACK retransmits were picking TX queue 0, this no longer is a win given that SYNACK rtx are now distributed on all cpus. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-17Merge branch 'master' of ↵Pablo Neira Ayuso
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next This merge resolves conflicts with 75aec9df3a78 ("bridge: Remove br_nf_push_frag_xmit_sk") as part of Eric Biederman's effort to improve netns support in the network stack that reached upstream via David's net-next tree. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Conflicts: net/bridge/br_netfilter_hooks.c
2015-10-17net: add pfmemalloc check in sk_add_backlog()Eric Dumazet
Greg reported crashes hitting the following check in __sk_backlog_rcv() BUG_ON(!sock_flag(sk, SOCK_MEMALLOC)); The pfmemalloc bit is currently checked in sk_filter(). This works correctly for TCP, because sk_filter() is ran in tcp_v[46]_rcv() before hitting the prequeue or backlog checks. For UDP or other protocols, this does not work, because the sk_filter() is ran from sock_queue_rcv_skb(), which might be called _after_ backlog queuing if socket is owned by user by the time packet is processed by softirq handler. Fixes: b4b9e35585089 ("netvm: set PF_MEMALLOC as appropriate during SKB processing") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Greg Thelen <gthelen@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "Nothing too crazy or exciting: - two MAINTAINERS entries that I didn't see the point in delaying. - one drm mst fix to stop sending uninitialised data to monitors - two amdgpu fixes - one radeon mst tiling fix - one vmwgfx regression fix - one virtio warning fix. I have found one locking problem that needs a bit of reorg to fix, but I'm not sure it's worth putting in -fixes as I don't think we've seen it hit in the real world ever, I just found it using the virtio-gpu driver when working on it. I'll possibly send it next week once I've time to discuss with Daniel" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/virtio: use %llu format string form atomic64_t MAINTAINERS: Add myself as maintainer for the gma500 driver MAINTAINERS: add a maintainer for the atmel-hlcdc DRM driver drm/amdgpu: Keep the pflip interrupts always enabled v7 drm/amdgpu: adjust default dispclk (v2) drm/dp/mst: make mst i2c transfer code more robust. drm/radeon: attach tile property to mst connector drm/vmwgfx: Fix kernel NULL pointer dereference on older hardware
2015-10-16netfilter: turn NF_HOOK into an inline functionArnd Bergmann
A recent change to the dst_output handling caused a new warning when the call to NF_HOOK() is the only used of a local variable passed as 'dev', and CONFIG_NETFILTER is disabled: net/ipv6/ip6_output.c: In function 'ip6_output': net/ipv6/ip6_output.c:135:21: warning: unused variable 'dev' [-Wunused-variable] The reason for this is that the NF_HOOK macro in this case does not reference the variable at all, and the call to dev_net(dev) got removed from the ip6_output function. To avoid that warning now and in the future, this changes the macro into an equivalent inline function, which tells the compiler that the variable is passed correctly but still unused. The dn_forward function apparently had the same problem in the past and added a local workaround that no longer works with the inline function. In order to avoid a regression, we have to also remove the #ifdef from decnet in the same patch. Fixes: ede2059dbaf9 ("dst: Pass net into dst->output") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-16netfilter: make nf_queue_entry_get_refs return voidFlorian Westphal
We don't care if module is being unloaded anymore since hook unregister handling will destroy queue entries using that hook. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-16netfilter: remove hook owner refcountingFlorian Westphal
since commit 8405a8fff3f8 ("netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook") all pending queued entries are discarded. So we can simply remove all of the owner handling -- when module is removed it also needs to unregister all its hooks. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-16net: introduce pre-change upper device notifierJiri Pirko
This newly introduced netdevice notifier is called before actual change upper happens. That provides a possibility for notifier handlers to know upper change will happen and react to it, including possibility to forbid the change. That is valuable for drivers which can check if the upper device linkage is supported and forbid that in case it is not. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16tcp/dccp: fix race at listener dismantle phaseEric Dumazet
Under stress, a close() on a listener can trigger the WARN_ON(sk->sk_ack_backlog) in inet_csk_listen_stop() We need to test if listener is still active before queueing a child in inet_csk_reqsk_queue_add() Create a common inet_child_forget() helper, and use it from inet_csk_reqsk_queue_add() and inet_csk_listen_stop() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-16tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helperEric Dumazet
Let's reduce the confusion about inet_csk_reqsk_queue_drop() : In many cases we also need to release reference on request socket, so add a helper to do this, reducing code size and complexity. Fixes: 4bdc3d66147b ("tcp/dccp: fix behavior of stale SYN_RECV request sockets") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15switchdev: introduce possibility to defer obj_add/delJiri Pirko
Similar to the attr usecase, the caller knows if he is holding RTNL and is in atomic section. So let the called to decide the correct call variant. This allows drivers to sleep inside their ops and wait for hw to get the operation status. Then the status is propagated into switchdev core. This avoids silent errors in drivers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15switchdev: remove pointers from switchdev objectsJiri Pirko
When object is used in deferred work, we cannot use pointers in switchdev object structures because the memory they point at may be already used by someone else. So rather do local copy of the value. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15switchdev: allow caller to explicitly request attr_set as deferredJiri Pirko
Caller should know if he can call attr_set directly (when holding RTNL) or if he has to defer the att_set processing for later. This also allows drivers to sleep inside attr_set and report operation status back to switchdev core. Switchdev core then warns if status is not ok, instead of silent errors happening in drivers. Benefit from newly introduced switchdev deferred ops infrastructure. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15switchdev: make struct switchdev_attr parameter const for attr_set callsJiri Pirko
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15switchdev: introduce switchdev deferred ops infrastructureJiri Pirko
Introduce infrastructure which will be used internally to defer ops. Note that the deferred ops are queued up and either are processed by scheduled work or explicitly by user calling deferred_process function. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14net/mlx4_core: Replace VF zero mac with random mac in mlx4_coreJack Morgenstein
By design, when no default MAC addresses are set in the Hypervisor for VFs, the VFs are passed zero-macs. When such a MAC is received by the VF, it generates a random MAC address and registers that MAC address with the Hypervisor. This random mac generation is currently done in the mlx4_en module. There is a problem, though, if the mlx4_ib module is loaded by a VF before the mlx4_en module. In this case, for RoCE, mlx4_ib will see the un-replaced zero-mac and register that zero-mac as part of QP1 initialization. Having a zero-mac in the port's MAC table creates problems for a Baseboard Management Console. The BMC occasionally sends packets with a zero-mac destination MAC. If there is a zero-mac present in the port's MAC table, the FW will send such BMC packets to the host driver rather than to the wire, and BMC will stop working. To address this problem, we move the replacement of zero-mac addresses with random-mac addresses to procedure mlx4_slave_cap(), which is part of the driver startup for VFs, and is before activation of mlx4_ib and mlx4_en. As a result, zero-mac addresses will never be registered in the port MAC table by the driver. In addition, when mlx4_en does initialize the net device, it needs to set the NET_ADDR_RANDOM flag in the netdev structure if the address was randomly generated. This is done so that udev on the VM does not create a new device name after each VF probe (VM boot and such). To accomplish this, we add a per-port flag in mlx4_dev which gets set whenever mlx4_core replaces a zero-mac with a randomly-generated mac. This flag is examined when mlx4_en initializes the net-device. Fix was suggested by Matan Barak <matanb@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14net/mlx5_core: Wait for FW readiness on startupEli Cohen
On device initialization, wait till firmware indicates that that it is done with initialization before proceeding to initialize the device. Also update initialization segment layout to match driver/firmware interface definitions. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14net/mlx5_core: Add pci error handlers to mlx5_core driverMajd Dibbiny
This patch implement the pci_error_handlers for mlx5_core which allow the driver to recover from PCI error. Once an error is detected in the PCI, the mlx5_pci_err_detected is called and it: 1) Marks the device to be in 'Internal Error' state. 2) Dispatches an event to the mlx5_ib to flush all the outstanding cqes with error. 3) Returns all the on going commands with error. 4) Unloads the driver. Afterwards, the FW is reset and mlx5_pci_slot_reset is called and it enables the device and restore it's pci state. If the later succeeds, mlx5_pci_resume is called, and it loads the SW stack. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14net/mlx5_core: Fix internal error detection conditionsEli Cohen
The detection of a fatal condition has been updated to take into account the state reported by the device or by detecting an all ones read of the firmware version which indicates that the device is not accessible. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14tcp: avoid spurious SYN flood detection at listen() timeEric Dumazet
At listen() time, there is a small window where listener is visible with a zero backlog, triggering a spurious "Possible SYN flooding on port" message. Nothing prevents us from setting the correct backlog. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14Merge tag 'linux-can-next-for-4.4-20151013' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2015-09-17 this is a pull request of 4 patches for net-next/master. Two patches are by Gerhard Bertelsmann, fixing some problems in the sun4i driver. The patch by Arnd Bergmann stops using timeval for the CAN broadcast manager. The last patch by Alexandre Belloni removes the otherwise unused struct at91_can_data from the driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15drm/dp/mst: make mst i2c transfer code more robust.Dave Airlie
This zeroes the msg so no random stack data ends up getting sent, it also limits the function to not accepting > 4 i2c msgs. Cc: stable@vger.kernel.org Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-14Revert "ipv4/icmp: redirect messages can use the ingress daddr as source"Paolo Abeni
Revert the commit e2ca690b657f ("ipv4/icmp: redirect messages can use the ingress daddr as source"), which tried to introduce a more suitable behaviour for ICMP redirect messages generated by VRRP routers. However RFC 5798 section 8.1.1 states: The IPv4 source address of an ICMP redirect should be the address that the end-host used when making its next-hop routing decision. while said commit used the generating packet destination address, which do not match the above and in most cases leads to no redirect packets to be generated. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-13can: at91: remove at91_can_dataAlexandre Belloni
struct at91_can_data was used to pass a callback to the driver, allowing it to switch the transceiver on and off. As all at91 boards are now using DT, this is not used anymore, remove that structure. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-10-13can: avoid using timeval for uapiArnd Bergmann
The can subsystem communicates with user space using a bcm_msg_head header, which contains two timestamps. This is problematic for multiple reasons: a) The structure layout is currently incompatible between 64-bit user space and 32-bit user space, and cannot work in compat mode (other than x32). b) The timeval structure layout will change in 32-bit user space when we fix the y2038 overflow problem by redefining time_t to 64-bit, making new 32-bit user space incompatible with the current kernel interface. Cars last a long time and often use old kernels, so the actual users of this code are the most likely ones to migrate to y2038 safe user space. This tries to work around part of the problem by changing the publicly visible user interface in the header, but not the binary interface. Fortunately, the values passed around in the structure are relative times and do not actually suffer from the y2038 overflow, so 32-bit is enough here. We replace the use of 'struct timeval' with a newly defined 'struct bcm_timeval' that uses the exact same binary layout as before and that still suffers from problem a) but not problem b). The downside of this approach is that any user space program that currently assigns a timeval structure to these members rather than writing the tv_sec/tv_usec portions individually will suffer a compile-time error when built with an updated kernel header. Fixing this error makes it work fine with old and new headers though. We could address problem a) by using '__u32' or 'int' members rather than 'long', but that would have a more significant downside in also breaking support for all existing 64-bit user binaries that might be using this interface, which is likely not acceptable. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Cc: linux-can@vger.kernel.org Cc: linux-api@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-10-13net: Add IPv6 support to l3mdevDavid Ahern
Add operations to retrieve cached IPv6 dst entry from l3mdev device and lookup IPv6 source address. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12ipv6: Pass struct net into nf_ct_frag6_gatherEric W. Biederman
The function nf_ct_frag6_gather is called on both the input and the output paths of the networking stack. In particular ipv6_defrag which calls nf_ct_frag6_gather is called from both the the PRE_ROUTING chain on input and the LOCAL_OUT chain on output. The addition of a net parameter makes it explicit which network namespace the packets are being reassembled in, and removes the need for nf_ct_frag6_gather to guess. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12ipv4: Pass struct net into ip_defrag and ip_check_defragEric W. Biederman
The function ip_defrag is called on both the input and the output paths of the networking stack. In particular conntrack when it is tracking outbound packets from the local machine calls ip_defrag. So add a struct net parameter and stop making ip_defrag guess which network namespace it needs to defragment packets in. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12rtnetlink: fix gcc -Wconversion warningArad, Ronen
RTA_ALIGNTO is currently define as 4. It has to be 4U to prevent warning for RTA_ALIGN and RTA_DATA expansions when -Wconversion gcc option is enabled. This follows NLMSG_ALIGNTO definition in <include/uapi/linux/netlink.h>. Signed-off-by: Ronen Arad <ronen.arad@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12Merge tag 'wireless-drivers-next-for-davem-2015-10-09' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== Major changes: iwlwifi * some debugfs improvements * fix signedness in beacon statistics * deinline some functions to reduce size when device tracing is enabled * filter beacons out in AP mode when no stations are associated * deprecate firmwares version -12 * fix a runtime PM vs. legacy suspend race * one-liner fix for a ToF bug * clean-ups in the rx code * small debugging improvement * fix WoWLAN with new firmware versions * more clean-ups towards multiple RX queues; * some rate scaling fixes and improvements; * some time-of-flight fixes; * other generic improvements and clean-ups; brcmfmac * rework code dealing with multiple interfaces * allow logging firmware console using debug level * support for BCM4350, BCM4365, and BCM4366 PCIE devices * fixed for legacy P2P and P2P device handling * correct set and get tx-power ath9k * add support for Outside Context of a BSS (OCB) mode mwifiex * add USB multichannel feature ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12ipv4/icmp: redirect messages can use the ingress daddr as sourcePaolo Abeni
This patch allows configuring how the source address of ICMP redirect messages is selected; by default the old behaviour is retained, while setting icmp_redirects_use_orig_daddr force the usage of the destination address of the packet that caused the redirect. The new behaviour fits closely the RFC 5798 section 8.1.1, and fix the following scenario: Two machines are set up with VRRP to act as routers out of a subnet, they have IPs x.x.x.1/24 and x.x.x.2/24, with VRRP holding on to x.x.x.254/24. If a host in said subnet needs to get an ICMP redirect from the VRRP router, i.e. to reach a destination behind a different gateway, the source IP in the ICMP redirect is chosen as the primary IP on the interface that the packet arrived at, i.e. x.x.x.1 or x.x.x.2. The host will then ignore said redirect, due to RFC 1122 section 3.2.2.2, and will continue to use the wrong next-op. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12tcp: shrink tcp_timewait_sock by 8 bytesEric Dumazet
Reducing tcp_timewait_sock from 280 bytes to 272 bytes allows SLAB to pack 15 objects per page instead of 14 (on x86) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12net: shrink struct sock and request_sock by 8 bytesEric Dumazet
One 32bit hole is following skc_refcnt, use it. skc_incoming_cpu can also be an union for request_sock rcv_wnd. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12net: align sk_refcnt on 128 bytes boundaryEric Dumazet
sk->sk_refcnt is dirtied for every TCP/UDP incoming packet. This is a performance issue if multiple cpus hit a common socket, or multiple sockets are chained due to SO_REUSEPORT. By moving sk_refcnt 8 bytes further, first 128 bytes of sockets are mostly read. As they contain the lookup keys, this has a considerable performance impact, as cpus can cache them. These 8 bytes are not wasted, we use them as a place holder for various fields, depending on the socket type. Tested: SYN flood hitting a 16 RX queues NIC. TCP listener using 16 sockets and SO_REUSEPORT and SO_INCOMING_CPU for proper siloing. Could process 6.0 Mpps SYN instead of 4.2 Mpps Kernel profile looked like : 11.68% [kernel] [k] sha_transform 6.51% [kernel] [k] __inet_lookup_listener 5.07% [kernel] [k] __inet_lookup_established 4.15% [kernel] [k] memcpy_erms 3.46% [kernel] [k] ipt_do_table 2.74% [kernel] [k] fib_table_lookup 2.54% [kernel] [k] tcp_make_synack 2.34% [kernel] [k] tcp_conn_request 2.05% [kernel] [k] __netif_receive_skb_core 2.03% [kernel] [k] kmem_cache_alloc Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12net: SO_INCOMING_CPU setsockopt() supportEric Dumazet
SO_INCOMING_CPU as added in commit 2c8c56e15df3 was a getsockopt() command to fetch incoming cpu handling a particular TCP flow after accept() This commits adds setsockopt() support and extends SO_REUSEPORT selection logic : If a TCP listener or UDP socket has this option set, a packet is delivered to this socket only if CPU handling the packet matches the specified one. This allows to build very efficient TCP servers, using one listener per RX queue, as the associated TCP listener should only accept flows handled in softirq by the same cpu. This provides optimal NUMA behavior and keep cpu caches hot. Note that __inet_lookup_listener() still has to iterate over the list of all listeners. Following patch puts sk_refcnt in a different cache line to let this iteration hit only shared and read mostly cache lines. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12sock: support per-packet fwmarkEdward Jee
It's useful to allow users to set fwmark for an individual packet, without changing the socket state. The function this patch adds in sock layer can be used by the protocols that need such a feature. Signed-off-by: Edward Hyunkoo Jee <edjee@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12bpf: charge user for creation of BPF maps and programsAlexei Starovoitov
since eBPF programs and maps use kernel memory consider it 'locked' memory from user accounting point of view and charge it against RLIMIT_MEMLOCK limit. This limit is typically set to 64Kbytes by distros, so almost all bpf+tracing programs would need to increase it, since they use maps, but kernel charges maximum map size upfront. For example the hash map of 1024 elements will be charged as 64Kbyte. It's inconvenient for current users and changes current behavior for root, but probably worth doing to be consistent root vs non-root. Similar accounting logic is done by mmap of perf_event. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12bpf: enable non-root eBPF programsAlexei Starovoitov
In order to let unprivileged users load and execute eBPF programs teach verifier to prevent pointer leaks. Verifier will prevent - any arithmetic on pointers (except R10+Imm which is used to compute stack addresses) - comparison of pointers (except if (map_value_ptr == 0) ... ) - passing pointers to helper functions - indirectly passing pointers in stack to helper functions - returning pointer from bpf program - storing pointers into ctx or maps Spill/fill of pointers into stack is allowed, but mangling of pointers stored in the stack or reading them byte by byte is not. Within bpf programs the pointers do exist, since programs need to be able to access maps, pass skb pointer to LD_ABS insns, etc but programs cannot pass such pointer values to the outside or obfuscate them. Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs, so that socket filters (tcpdump), af_packet (quic acceleration) and future kcm can use it. tracing and tc cls/act program types still require root permissions, since tracing actually needs to be able to see all kernel pointers and tc is for root only. For example, the following unprivileged socket filter program is allowed: int bpf_prog1(struct __sk_buff *skb) { u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)); u64 *value = bpf_map_lookup_elem(&my_map, &index); if (value) *value += skb->len; return 0; } but the following program is not: int bpf_prog1(struct __sk_buff *skb) { u32 index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)); u64 *value = bpf_map_lookup_elem(&my_map, &index); if (value) *value += (u64) skb; return 0; } since it would leak the kernel address into the map. Unprivileged socket filter bpf programs have access to the following helper functions: - map lookup/update/delete (but they cannot store kernel pointers into them) - get_random (it's already exposed to unprivileged user space) - get_smp_processor_id - tail_call into another socket filter program - ktime_get_ns The feature is controlled by sysctl kernel.unprivileged_bpf_disabled. This toggle defaults to off (0), but can be set true (1). Once true, bpf programs and maps cannot be accessed from unprivileged process, and the toggle cannot be set back to false. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12arm64: Fix MINSIGSTKSZ and SIGSTKSZManjeet Pawar
MINSIGSTKSZ and SIGSTKSZ for ARM64 are not correctly set in latest kernel. This patch fixes this issue. This issue is reported in LTP (testcase: sigaltstack02.c). Testcase failed when sigaltstack() called with stack size "MINSIGSTKSZ - 1" Since in Glibc-2.22, MINSIGSTKSZ is set to 5120 but in kernel it is set to 2048 so testcase gets failed. Testcase Output: sigaltstack02 1 TPASS : stgaltstack() fails, Invalid Flag value,errno:22 sigaltstack02 2 TFAIL : sigaltstack() returned 0, expected -1,errno:12 Reported Issue in Glibc Bugzilla: Bugfix in Glibc-2.22: [Bug 16850] https://sourceware.org/bugzilla/show_bug.cgi?id=16850 Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Akhilesh Kumar <akhilesh.k@samsung.com> Signed-off-by: Manjeet Pawar <manjeet.p@samsung.com> Signed-off-by: Rohit Thapliyal <r.thapliyal@samsung.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-12netfilter: conntrack: fix crash on timeout object removalPablo Neira Ayuso
The object and module refcounts are updated for each conntrack template, however, if we delete the iptables rules and we flush the timeout database, we may end up with invalid references to timeout object that are just gone. Resolve this problem by setting the timeout reference to NULL when the custom timeout entry is removed from our base. This patch requires some RCU trickery to ensure safe pointer handling. This handling is similar to what we already do with conntrack helpers, the idea is to avoid bumping the timeout object reference counter from the packet path to avoid the cost of atomic ops. Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-12switchdev: skip over ports returning -EOPNOTSUPP when recursing portsScott Feldman
This allows us to recurse over all the ports, skipping over unsupporting ports. Without the change, the recursion would stop at first unsupported port. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-12switchdev: add bridge ageing_time attributeScott Feldman
Setting the stage to push bridge-level attributes down to port driver so hardware can be programmed accordingly. Bridge-level attribute example is ageing_time. This is a per-bridge attribute, not a per-bridge-port attr. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-11Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "Three trivial commits: - Fix a kerneldoc regression - Export handle_bad_irq to unbreak a driver in next - Add an accessor for the of_node field so refactoring in next does not depend on merge ordering" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqdomain: Add an accessor for the of_node field genirq: Fix handle_bad_irq kerneldoc comment genirq: Export handle_bad_irq
2015-10-11net: dsa: use switchdev obj in port_fdb_delVivien Didelot
For consistency with the FDB add operation, propagate the switchdev_obj_port_fdb structure in the DSA drivers. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-11net: dsa: push prepare phase in port_fdb_addVivien Didelot
Now that the prepare phase is pushed down to the DSA drivers, propagate it to the port_fdb_add function. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-11net: dsa: add port_fdb_prepareVivien Didelot
Push the prepare phase for FDB operations down to the DSA drivers, with a new port_fdb_prepare function. Currently only mv88e6xxx is affected. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>