- 21 Jul, 2015 4 commits
-
-
Jon Paul Maloy authored
struct 'tipc_node' currently contains two arrays for link attributes, one for the link pointers, and one for the usable link MTUs. We now group those into a new struct 'tipc_link_entry', and intoduce one single array consisting of such enties. Apart from being a cosmetic improvement, this is a starting point for the strict master-slave relation between node and link that we will introduce in the following commits. Reviewed-by:
Ying Xue <ying.xue@windriver.com> Signed-off-by:
Jon Maloy <jon.maloy@ericsson.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Scott Feldman authored
skb->offload_fwd_mark and dev->offload_fwd_mark are 32-bit and should be unique for device and may even be unique for a sub-set of ports within device, so add switchdev helper function to generate unique marks based on port's switch ID and group_ifindex. group_ifindex would typically be the container dev's ifindex, such as the bridge's ifindex. The generator uses a global hash table to store offload_fwd_marks hashed by {switch ID, group_ifindex} key. Signed-off-by:
Scott Feldman <sfeldma@gmail.com> Acked-by:
Jiri Pirko <jiri@resnulli.us> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Scott Feldman authored
Signed-off-by:
Scott Feldman <sfeldma@gmail.com> Acked-by:
Jiri Pirko <jiri@resnulli.us> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Scott Feldman authored
Just before queuing skb for xmit on port, check if skb has been marked by switchdev port driver as already fordwarded by device. If so, drop skb. A non-zero skb->offload_fwd_mark field is set by the switchdev port driver/device on ingress to indicate the skb has already been forwarded by the device to egress ports with matching dev->skb_mark. The switchdev port driver would assign a non-zero dev->offload_skb_mark for each device port netdev during registration, for example. Signed-off-by:
Scott Feldman <sfeldma@gmail.com> Acked-by:
Jiri Pirko <jiri@resnulli.us> Acked-by:
Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by:
Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 20 Jul, 2015 6 commits
-
-
Nikolay Aleksandrov authored
Fix: net/bridge/br_if.c: In function 'br_dev_delete': >> net/bridge/br_if.c:284:2: error: implicit declaration of function >> 'br_multicast_dev_del' [-Werror=implicit-function-declaration] br_multicast_dev_del(br); ^ cc1: some warnings being treated as errors when igmp snooping is not defined. Signed-off-by:
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Phil Sutter authored
Newly created flows don't have flowi6_oif set (at least if the associated socket is not interface-bound). This leads to a mismatch in __xfrm6_selector_match() for policies which specify an interface in the selector (sel->ifindex != 0). Backtracing shows this happens in code-paths originating from e.g. ip6_datagram_connect(), rawv6_sendmsg() or tcp_v6_connect(). (UDP was not tested for.) In summary, this patch fixes policy matching on outgoing interface for locally generated packets. Signed-off-by:
Phil Sutter <phil@nwl.cc> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Satish Ashok authored
When the bridge (or port) is brought down/up flush only temp entries and leave the perm ones. Flush perm entries only when deleting the bridge device or the associated port. Signed-off-by:
Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by:
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Group notifications were not sent when a group expired or was deleted due to bridge/port device being deleted. So add br_mdb_notify() to br_multicast_del_pg(). Signed-off-by:
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
It would be very useful to retrieve the net_cls's classid from an eBPF program to allow for a more fine-grained classification, it could be directly used or in conjunction with additional policies. I.e. docker, but also tooling such as cgexec, can easily run applications via net_cls cgroups: cgcreate -g net_cls:/foo echo 42 > foo/net_cls.classid cgexec -g net_cls:foo <prog> Thus, their respecitve classid cookie of foo can then be looked up on the egress path to apply further policies. The helper is desigend such that a non-zero value returns the cgroup id. Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Cc: Thomas Graf <tgraf@suug.ch> Acked-by:
Alexei Starovoitov <ast@plumgrid.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Split out retrieving the cgroups net_cls classid retrieval into its own function, so that it can be reused later on from other parts of the traffic control subsystem. If there's no skb->sk, then the small helper returns 0 as well, which in cls_cgroup terms means 'could not classify'. Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 16 Jul, 2015 5 commits
-
-
YOSHIFUJI Hideaki authored
Signed-off-by:
YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Anuradha Karuppiah authored
Signed-off-by:
Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by:
Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by:
Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by:
Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Anuradha Karuppiah authored
This patch introduces the proto_down flag that can be used by user space applications to notify switch drivers that errors have been detected on the device. The switch driver can react to protodown notification by doing a phys down on the associated switch port. Signed-off-by:
Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by:
Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by:
Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by:
Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
YOSHIFUJI Hideaki/吉藤英明 authored
Commit 9131f3de ("ipv6: Do not iterate over all interfaces when finding source address on specific interface.") did not properly update best source address available. Plus, it introduced possible NULL pointer dereference. Bug was reported by Erik Kline <ek@google.com>. Based on patch proposed by Hajime Tazaki <thehajime@gmail.com>. Fixes: 9131f3de ("ipv6: Do not iterate over all interfaces when finding source address on specific interface.") Signed-off-by:
YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Acked-by:
Hajime Tazaki <thehajime@gmail.com> Acked-by:
Erik Kline <ek@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Andrea Parri authored
The member (u32) "num_active_agg" of struct qfq_sched has been unused since its introduction in 462dbc91 "pkt_sched: QFQ Plus: fair-queueing service at DRR cost" and (AFAICT) there is no active plan to use it; this removes the member. Signed-off-by:
Andrea Parri <parri.andrea@gmail.com> Acked-by:
Paolo Valente <paolo.valente@unimore.it> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 13 Jul, 2015 2 commits
-
-
Nikolay Aleksandrov authored
Until now all user mdb entries were added in vlan 0, this patch adds support to allow the user to specify the vlan for the entry. About the uapi change a hole in struct br_mdb_entry is used so the size and offsets are kept the same (verified with pahole and tested with older iproute2). Example: $ bridge mdb dev br0 port eth1 grp 239.0.0.1 permanent vlan 2000 dev br0 port eth1 grp 239.0.0.1 permanent vlan 200 dev br0 port eth1 grp 239.0.0.1 permanent Signed-off-by:
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
This patch makes the default to build IPv6 into the kernel. IPv6 now has significant traction and any remaining vestiges of IPv6 not being provided parity with IPv4 should be swept away. IPv6 is now core to the Internet and kernel. Points on IPv6 adoption: - Per Google statistics, IPv6 usage has reached 7% on the Internet and continues to exhibit an exponential growth rate https://www.google.com/intl/en/ipv6/statistics.html - Just a few days ago ARIN officially depleted its IPv4 pool - IPv6 only data centers are being successfully built (e.g. at Facebook) This patch changes the IPv6 Kconfig for IPV6. Default for CONFIG_IPV6 is set to "y" and the text has been updated to reflect the maturity of IPv6. Impact: Under some circumstances building modules in to kernel might have a performance advantage. In my testing, I did notice a very slight improvement. This will obviously increase the size of the kernel image. In my configuration I see: IPv6 as module: text data bss dec hex filename 9703666 1899288 933888 12536842 bf4c0a vmlinux IPv6 built into kernel text data bss dec hex filename 9436490 1879600 913408 12229498 ba9b7a vmlinux Which increases text size by ~270K (2.8% increase in size for me). If image size is an issue, presumably for a device which does not do IP networking (IMO we should be discouraging IPv4-only devices), IPV6 can be disabled or still built as a module. Acked-by:
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by:
Tom Herbert <tom@herbertland.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 12 Jul, 2015 5 commits
-
-
Oliver Hartkopp authored
Commit 514ac99c "can: fix multiple delivery of a single CAN frame for overlapping CAN filters" requires the skb->tstamp to be set to check for identical CAN skbs. Without timestamping to be required by user space applications this timestamp was not generated which lead to commit 36c01245 "can: fix loss of CAN frames in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs by introducing several __net_timestamp() calls. This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb() to add __net_timestamp() after skbuff creation to prevent the frame loss fixed in mainline Linux. This patch removes the timestamp dependency and uses an atomic counter to create an unique identifier together with the skbuff pointer. Btw: the new skbcnt element introduced in struct can_skb_priv has to be initialized with zero in out-of-tree drivers which are not using alloc_can{,fd}_skb() too. Signed-off-by:
Oliver Hartkopp <socketcan@hartkopp.net> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by:
Marc Kleine-Budde <mkl@pengutronix.de>
-
Florian Fainelli authored
cd->sw_addr is used as a MDIO bus address, which cannot exceed PHY_MAX_ADDR (32), our check was off-by-one. Fixes: 5e95329b ("dsa: add device tree bindings to register DSA switches") Signed-off-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Florian Fainelli authored
port_index is used an index into an array, and this information comes from Device Tree, make sure that port_index is not equal to the array size before using it. Move the check against port_index earlier in the loop. Fixes: 5e95329b : ("dsa: add device tree bindings to register DSA switches") Reported-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Vivien Didelot authored
There is no need to abort attribute setting or object addition, if the prepare phase returned operation not supported. Thus, abort these two transactions only if the error is not -EOPNOTSUPP. Signed-off-by:
Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by:
Jiri Pirko <jiri@resnulli.us> Acked-by:
Scott Feldman <sfeldma@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
This reverts commit 3cc49492 . There is nothing wrong with coalescing during defragmentation, it reduces truesize overhead and simplifies things for the receiving socket (no fraglist walk needed). However, it also destroys geometry of the original fragments. While that doesn't cause any breakage (we make sure to not exceed largest original size) ip_do_fragment contains a 'fastpath' that takes advantage of a present frag list and results in fragments that (in most cases) match what was received. In case its needed the coalescing could be done later, when we're sure the skb is not forwarded. But discussion during NFWS resulted in 'lets just remove this for now'. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by:
Florian Westphal <fw@strlen.de> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 11 Jul, 2015 5 commits
-
-
Phil Sutter authored
Reconsidering my commit 20462155 "net: inet_diag: export IPV6_V6ONLY sockopt", I am not happy with the limitations it causes for socket analysing code in userspace. Exporting the value only if it is set makes it hard for userspace to decide whether the option is not set or the kernel does not support exporting the option at all. >From an auditor's perspective, the interesting question for listening AF_INET6 sockets is: "Does it NOT have IPV6_V6ONLY set?" Because it is the unexpected case. This patch allows to answer this question reliably. Signed-off-by:
Phil Sutter <phil@nwl.cc> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
YOSHIFUJI Hideaki/吉藤英明 authored
If outgoing interface is specified and the candidate address is restricted to the outgoing interface, it is enough to iterate over that given interface only. Signed-off-by:
YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Acked-by:
Erik Kline <ek@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Satish Ashok authored
Until now when a querier was present static entries couldn't be deleted. Fix this and allow the user to manipulate the mdb with or without a querier. Signed-off-by:
Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by:
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Julian Anastasov authored
Incoming packet should be either in backlog queue or in RCU read-side section. Otherwise, the final sequence of flush_backlog() and synchronize_net() may miss packets that can run without device reference: CPU 1 CPU 2 skb->dev: no reference process_backlog:__skb_dequeue process_backlog:local_irq_enable on_each_cpu for flush_backlog => IPI(hardirq): flush_backlog - packet not found in backlog CPU delayed ... synchronize_net - no ongoing RCU read-side sections netdev_run_todo, rcu_barrier: no ongoing callbacks __netif_receive_skb_core:rcu_read_lock - too late free dev process packet for freed dev Fixes: 6e583ce5 ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Julian Anastasov authored
commit 381c759d ("ipv4: Avoid crashing in ip_error") fixes a problem where processed packet comes from device with destroyed inetdev (dev->ip_ptr). This is not expected because inetdev_destroy is called in NETDEV_UNREGISTER phase and packets should not be processed after dev_close_many() and synchronize_net(). Above fix is still required because inetdev_destroy can be called for other reasons. But it shows the real problem: backlog can keep packets for long time and they do not hold reference to device. Such packets are then delivered to upper levels at the same time when device is unregistered. Calling flush_backlog after NETDEV_UNREGISTER_FINAL still accounts all packets from backlog but before that some packets continue to be delivered to upper levels long after the synchronize_net call which is supposed to wait the last ones. Also, as Eric pointed out, processed packets, mostly from other devices, can continue to add new packets to backlog. Fix the problem by moving flush_backlog early, after the device driver is stopped and before the synchronize_net() call. Then use netif_running check to make sure we do not add more packets to backlog. We have to do it in enqueue_to_backlog context when the local IRQ is disabled. As result, after the flush_backlog and synchronize_net sequence all packets should be accounted. Thanks to Eric W. Biederman for the test script and his valuable feedback! Reported-by:
Vittorio Gambaletta <linuxbugs@vittgam.net> Fixes: 6e583ce5 ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by:
Julian Anastasov <ja@ssi.bg> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 10 Jul, 2015 5 commits
-
-
Eric Dumazet authored
Commit c29390c6 ("xps: must clear sender_cpu before forwarding") fixed an issue in normal forward path, caused by sender_cpu & napi_id skb fields being an union. Bridge is another point where skb can be forwarded, so we need the same cure. Bug triggers if packet was received on a NIC using skb_mark_napi_id() Fixes: 2bd82484 ("xps: fix xps for stacked devices") Signed-off-by:
Eric Dumazet <edumazet@google.com> Reported-by:
Bob Liu <bob.liu@oracle.com> Tested-by:
Bob Liu <bob.liu@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
After commit 900f65d3 ("tcp: move duplicate code from tcp_v4_init_sock()/tcp_v6_init_sock()"), we no longer need to export tcp_init_xmit_timers() Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by:
Neal Cardwell <ncardwell@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Nikolay Aleksandrov authored
Fill also the port group state when sending notifications. Signed-off-by:
Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by:
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Masatake YAMATO authored
flags local variable in __mkroute_input is not used as a variable. Signed-off-by:
Masatake YAMATO <yamato@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Tom Herbert authored
Add support to allow non-local binds similar to how this was done for IPv4. Non-local binds are very useful in emulating the Internet in a box, etc. This add the ip_nonlocal_bind sysctl under ipv6. Testing: Set up nonlocal binding and receive routing on a host, e.g.: ip -6 rule add from ::/0 iif eth0 lookup 200 ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200 sysctl -w net.ipv6.ip_nonlocal_bind=1 Set up routing to 2001:0:0:1::/64 on peer to go to first host ping6 -I 2001:0:0:1::1 peer-address -- to verify Signed-off-by:
Tom Herbert <tom@herbertland.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 09 Jul, 2015 8 commits
-
-
Eric Dumazet authored
inet_twsk_deschedule() calls are followed by inet_twsk_put(). Only particular case is in inet_twsk_purge() but there is no point to defer the inet_twsk_put() after re-enabling BH. Lets rename inet_twsk_deschedule() to inet_twsk_deschedule_put() and move the inet_twsk_put() inside. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
timewait sockets have a complex refcounting logic. Once we realize it should be similar to established and syn_recv sockets, we can use sk_nulls_del_node_init_rcu() and remove inet_twsk_unhash() In particular, deferred inet_twsk_put() added in commit 13475a30 ("tcp: connect() race with timewait reuse") looks unecessary : When removing a timewait socket from ehash or bhash, caller must own a reference on the socket anyway. Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Florian Westphal authored
Hop was always either 0 or sizeof(struct ipv6hdr). Signed-off-by:
Florian Westphal <fw@strlen.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Oleg Nesterov authored
pktgen_thread_worker() doesn't need to wait for kthread_stop(), it can simply exit. Just pktgen_create_thread() and pg_net_exit() should do get_task_struct()/put_task_struct(). kthread_stop(dead_thread) is fine. Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Oleg Nesterov authored
pktgen_thread_worker() is obviously racy, kthread_stop() can come between the kthread_should_stop() check and set_current_state(). Signed-off-by:
Oleg Nesterov <oleg@redhat.com> Reported-by:
Jan Stancek <jstancek@redhat.com> Reported-by:
Marcelo Leitner <mleitner@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Yuchung Cheng authored
The congestion state and cwnd can be updated in the wrong order. For example, upon receiving a dubious ACK, we incorrectly raise the cwnd first (tcp_may_raise_cwnd()/tcp_cong_avoid()) because the state is still Open, then enter recovery state to reduce cwnd. For another example, if the ACK indicates spurious timeout or retransmits, we first revert the cwnd reduction and congestion state back to Open state. But we don't raise the cwnd even though the ACK does not indicate any congestion. To fix this problem we should first call tcp_fastretrans_alert() to process the dubious ACK and update the congestion state, then call tcp_may_raise_cwnd() that raises cwnd based on the current state. Signed-off-by:
Yuchung Cheng <ycheng@google.com> Signed-off-by:
Neal Cardwell <ncardwell@google.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
Nandita Dukkipati <nanditad@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Yuchung Cheng authored
In the original design slow start is only used to raise cwnd when cwnd is stricly below ssthresh. It makes little sense to slow start when cwnd == ssthresh: especially when hystart has set ssthresh in the initial ramp, or after recovery when cwnd resets to ssthresh. Not doing so will also help reduce the buffer bloat slightly. Signed-off-by:
Yuchung Cheng <ycheng@google.com> Signed-off-by:
Neal Cardwell <ncardwell@google.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
Nandita Dukkipati <nanditad@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Yuchung Cheng authored
Add a helper to test the slow start condition in various congestion control modules and other places. This is to prepare a slight improvement in policy as to exactly when to slow start. Signed-off-by:
Yuchung Cheng <ycheng@google.com> Signed-off-by:
Neal Cardwell <ncardwell@google.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
Nandita Dukkipati <nanditad@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-