* [PATCH net-next v3 3/3] net: ipv4 only populate IP_PKTINFO when needed
From: Shawn Bohrer @ 2013-10-07 16:01 UTC (permalink / raw)
To: David Miller; +Cc: netdev, tomk, Eric Dumazet, Shawn Bohrer
In-Reply-To: <1381161700-14453-1-git-send-email-shawn.bohrer@gmail.com>
From: Shawn Bohrer <sbohrer@rgmadvisors.com>
The since the removal of the routing cache computing
fib_compute_spec_dst() does a fib_table lookup for each UDP multicast
packet received. This has introduced a performance regression for some
UDP workloads.
This change skips populating the packet info for sockets that do not have
IP_PKTINFO set.
Benchmark results from a netperf UDP_RR test:
Before 89789.68 transactions/s
After 90587.62 transactions/s
Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
Before 12.63us RTT
After 12.48us RTT
Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
include/net/ip.h | 2 +-
net/ipv4/ip_sockglue.c | 5 +++--
net/ipv4/raw.c | 2 +-
net/ipv4/udp.c | 2 +-
4 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/include/net/ip.h b/include/net/ip.h
index 16078f4..b39ebe5 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -459,7 +459,7 @@ int ip_options_rcv_srr(struct sk_buff *skb);
* Functions provided by ip_sockglue.c
*/
-void ipv4_pktinfo_prepare(struct sk_buff *skb);
+void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb);
void ip_cmsg_recv(struct msghdr *msg, struct sk_buff *skb);
int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc);
int ip_setsockopt(struct sock *sk, int level, int optname, char __user *optval,
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 56e3445..0626f2c 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1052,11 +1052,12 @@ e_inval:
* destination in skb->cb[] before dst drop.
* This way, receiver doesnt make cache line misses to read rtable.
*/
-void ipv4_pktinfo_prepare(struct sk_buff *skb)
+void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb)
{
struct in_pktinfo *pktinfo = PKTINFO_SKB_CB(skb);
- if (skb_rtable(skb)) {
+ if ((inet_sk(sk)->cmsg_flags & IP_CMSG_PKTINFO) &&
+ skb_rtable(skb)) {
pktinfo->ipi_ifindex = inet_iif(skb);
pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
} else {
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index b2fa14c..41e1d28 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -299,7 +299,7 @@ static int raw_rcv_skb(struct sock *sk, struct sk_buff *skb)
{
/* Charge it to the socket. */
- ipv4_pktinfo_prepare(skb);
+ ipv4_pktinfo_prepare(sk, skb);
if (sock_queue_rcv_skb(sk, skb) < 0) {
kfree_skb(skb);
return NET_RX_DROP;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 262ea39..4226c53 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1544,7 +1544,7 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
rc = 0;
- ipv4_pktinfo_prepare(skb);
+ ipv4_pktinfo_prepare(sk, skb);
bh_lock_sock(sk);
if (!sock_owned_by_user(sk))
rc = __udp_queue_rcv_skb(sk, skb);
--
1.7.7.6
^ permalink raw reply related
* [PATCH net-next v3 2/3] udp: ipv4: Add udp early demux
From: Shawn Bohrer @ 2013-10-07 16:01 UTC (permalink / raw)
To: David Miller; +Cc: netdev, tomk, Eric Dumazet, Shawn Bohrer
In-Reply-To: <1381161700-14453-1-git-send-email-shawn.bohrer@gmail.com>
From: Shawn Bohrer <sbohrer@rgmadvisors.com>
The removal of the routing cache introduced a performance regression for
some UDP workloads since a dst lookup must be done for each packet.
This change caches the dst per socket in a similar manner to what we do
for TCP by implementing early_demux.
For UDP multicast we can only cache the dst if there is only one
receiving socket on the host. Since caching only works when there is
one receiving socket we do the multicast socket lookup using RCU.
For UDP unicast we only demux sockets with an exact match in order to
not break forwarding setups. Additionally since the hash chains may be
long we only check the first socket to see if it is a match and not
waste extra time searching the whole chain when we might not find an
exact match.
Benchmark results from a netperf UDP_RR test:
Before 87961.22 transactions/s
After 89789.68 transactions/s
Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
Before 12.97us RTT
After 12.63us RTT
Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
---
v3 Changes:
- Use secondary hash for UDP unicast early demux lookup
- Double check socket match after increasing refcount in both unicast
and multicast early demux lookups
include/net/sock.h | 2 +-
include/net/udp.h | 1 +
net/ipv4/af_inet.c | 1 +
net/ipv4/udp.c | 202 +++++++++++++++++++++++++++++++++++++++++++++++-----
4 files changed, 187 insertions(+), 19 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index e3bf213..7953254 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -218,7 +218,7 @@ struct cg_proto;
* @sk_lock: synchronizer
* @sk_rcvbuf: size of receive buffer in bytes
* @sk_wq: sock wait queue and async head
- * @sk_rx_dst: receive input route used by early tcp demux
+ * @sk_rx_dst: receive input route used by early demux
* @sk_dst_cache: destination cache
* @sk_dst_lock: destination cache lock
* @sk_policy: flow policy
diff --git a/include/net/udp.h b/include/net/udp.h
index 510b8cb..fe4ba9f 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -175,6 +175,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
unsigned int hash2_nulladdr);
/* net/ipv4/udp.c */
+void udp_v4_early_demux(struct sk_buff *skb);
int udp_get_port(struct sock *sk, unsigned short snum,
int (*saddr_cmp)(const struct sock *,
const struct sock *));
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index cfeb85c..35913fb 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1546,6 +1546,7 @@ static const struct net_protocol tcp_protocol = {
};
static const struct net_protocol udp_protocol = {
+ .early_demux = udp_v4_early_demux,
.handler = udp_rcv,
.err_handler = udp_err,
.no_policy = 1,
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 5950e12..262ea39 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -103,6 +103,7 @@
#include <linux/seq_file.h>
#include <net/net_namespace.h>
#include <net/icmp.h>
+#include <net/inet_hashtables.h>
#include <net/route.h>
#include <net/checksum.h>
#include <net/xfrm.h>
@@ -565,6 +566,26 @@ struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
}
EXPORT_SYMBOL_GPL(udp4_lib_lookup);
+static inline bool __udp_is_mcast_sock(struct net *net, struct sock *sk,
+ __be16 loc_port, __be32 loc_addr,
+ __be16 rmt_port, __be32 rmt_addr,
+ int dif, unsigned short hnum)
+{
+ struct inet_sock *inet = inet_sk(sk);
+
+ if (!net_eq(sock_net(sk), net) ||
+ udp_sk(sk)->udp_port_hash != hnum ||
+ (inet->inet_daddr && inet->inet_daddr != rmt_addr) ||
+ (inet->inet_dport != rmt_port && inet->inet_dport) ||
+ (inet->inet_rcv_saddr && inet->inet_rcv_saddr != loc_addr) ||
+ ipv6_only_sock(sk) ||
+ (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif))
+ return false;
+ if (!ip_mc_sf_allow(sk, loc_addr, rmt_addr, dif))
+ return false;
+ return true;
+}
+
static inline struct sock *udp_v4_mcast_next(struct net *net, struct sock *sk,
__be16 loc_port, __be32 loc_addr,
__be16 rmt_port, __be32 rmt_addr,
@@ -575,20 +596,11 @@ static inline struct sock *udp_v4_mcast_next(struct net *net, struct sock *sk,
unsigned short hnum = ntohs(loc_port);
sk_nulls_for_each_from(s, node) {
- struct inet_sock *inet = inet_sk(s);
-
- if (!net_eq(sock_net(s), net) ||
- udp_sk(s)->udp_port_hash != hnum ||
- (inet->inet_daddr && inet->inet_daddr != rmt_addr) ||
- (inet->inet_dport != rmt_port && inet->inet_dport) ||
- (inet->inet_rcv_saddr &&
- inet->inet_rcv_saddr != loc_addr) ||
- ipv6_only_sock(s) ||
- (s->sk_bound_dev_if && s->sk_bound_dev_if != dif))
- continue;
- if (!ip_mc_sf_allow(s, loc_addr, rmt_addr, dif))
- continue;
- goto found;
+ if (__udp_is_mcast_sock(net, s,
+ loc_port, loc_addr,
+ rmt_port, rmt_addr,
+ dif, hnum))
+ goto found;
}
s = NULL;
found:
@@ -1581,6 +1593,14 @@ static void flush_stack(struct sock **stack, unsigned int count,
kfree_skb(skb1);
}
+static void udp_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
+{
+ struct dst_entry *dst = skb_dst(skb);
+
+ dst_hold(dst);
+ sk->sk_rx_dst = dst;
+}
+
/*
* Multicasts and broadcasts go to each listener.
*
@@ -1709,11 +1729,28 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
if (udp4_csum_init(skb, uh, proto))
goto csum_error;
- if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
- return __udp4_lib_mcast_deliver(net, skb, uh,
- saddr, daddr, udptable);
+ if (skb->sk) {
+ int ret;
+ sk = skb->sk;
- sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
+ if (unlikely(sk->sk_rx_dst == NULL))
+ udp_sk_rx_dst_set(sk, skb);
+
+ ret = udp_queue_rcv_skb(sk, skb);
+
+ /* a return value > 0 means to resubmit the input, but
+ * it wants the return to be -protocol, or 0
+ */
+ if (ret > 0)
+ return -ret;
+ return 0;
+ } else {
+ if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
+ return __udp4_lib_mcast_deliver(net, skb, uh,
+ saddr, daddr, udptable);
+
+ sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
+ }
if (sk != NULL) {
int ret;
@@ -1771,6 +1808,135 @@ drop:
return 0;
}
+/* We can only early demux multicast if there is a single matching socket.
+ * If more than one socket found returns NULL
+ */
+static struct sock *__udp4_lib_mcast_demux_lookup(struct net *net,
+ __be16 loc_port, __be32 loc_addr,
+ __be16 rmt_port, __be32 rmt_addr,
+ int dif)
+{
+ struct sock *sk, *result;
+ struct hlist_nulls_node *node;
+ unsigned short hnum = ntohs(loc_port);
+ unsigned int count, slot = udp_hashfn(net, hnum, udp_table.mask);
+ struct udp_hslot *hslot = &udp_table.hash[slot];
+
+ rcu_read_lock();
+begin:
+ count = 0;
+ result = NULL;
+ sk_nulls_for_each_rcu(sk, node, &hslot->head) {
+ if (__udp_is_mcast_sock(net, sk,
+ loc_port, loc_addr,
+ rmt_port, rmt_addr,
+ dif, hnum)) {
+ result = sk;
+ ++count;
+ }
+ }
+ /*
+ * if the nulls value we got at the end of this lookup is
+ * not the expected one, we must restart lookup.
+ * We probably met an item that was moved to another chain.
+ */
+ if (get_nulls_value(node) != slot)
+ goto begin;
+
+ if (result) {
+ if (count != 1 ||
+ unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
+ result = NULL;
+ else if (unlikely(!__udp_is_mcast_sock(net, sk,
+ loc_port, loc_addr,
+ rmt_port, rmt_addr,
+ dif, hnum))) {
+ sock_put(result);
+ result = NULL;
+ }
+ }
+ rcu_read_unlock();
+ return result;
+}
+
+/* For unicast we should only early demux connected sockets or we can
+ * break forwarding setups. The chains here can be long so only check
+ * if the first socket is an exact match and if not move on.
+ */
+static struct sock *__udp4_lib_demux_lookup(struct net *net,
+ __be16 loc_port, __be32 loc_addr,
+ __be16 rmt_port, __be32 rmt_addr,
+ int dif)
+{
+ struct sock *sk, *result;
+ struct hlist_nulls_node *node;
+ unsigned short hnum = ntohs(loc_port);
+ unsigned int hash2 = udp4_portaddr_hash(net, loc_addr, hnum);
+ unsigned int slot2 = hash2 & udp_table.mask;
+ struct udp_hslot *hslot2 = &udp_table.hash2[slot2];
+ INET_ADDR_COOKIE(acookie, rmt_addr, loc_addr)
+ const __portpair ports = INET_COMBINED_PORTS(rmt_port, hnum);
+
+ rcu_read_lock();
+ result = NULL;
+ udp_portaddr_for_each_entry_rcu(sk, node, &hslot2->head) {
+ if (INET_MATCH(sk, net, acookie,
+ rmt_addr, loc_addr, ports, dif))
+ result = sk;
+ /* Only check first socket in chain */
+ break;
+ }
+
+ if (result) {
+ if (unlikely(!atomic_inc_not_zero_hint(&result->sk_refcnt, 2)))
+ result = NULL;
+ else if (unlikely(!INET_MATCH(sk, net, acookie,
+ rmt_addr, loc_addr,
+ ports, dif))) {
+ sock_put(result);
+ result = NULL;
+ }
+ }
+ rcu_read_unlock();
+ return result;
+}
+
+void udp_v4_early_demux(struct sk_buff *skb)
+{
+ const struct iphdr *iph = ip_hdr(skb);
+ const struct udphdr *uh = udp_hdr(skb);
+ struct sock *sk;
+ struct dst_entry *dst;
+ struct net *net = dev_net(skb->dev);
+ int dif = skb->dev->ifindex;
+
+ /* validate the packet */
+ if (!pskb_may_pull(skb, skb_transport_offset(skb) + sizeof(struct udphdr)))
+ return;
+
+ if (skb->pkt_type == PACKET_BROADCAST ||
+ skb->pkt_type == PACKET_MULTICAST)
+ sk = __udp4_lib_mcast_demux_lookup(net, uh->dest, iph->daddr,
+ uh->source, iph->saddr, dif);
+ else if (skb->pkt_type == PACKET_HOST)
+ sk = __udp4_lib_demux_lookup(net, uh->dest, iph->daddr,
+ uh->source, iph->saddr, dif);
+ else
+ return;
+
+ if (!sk)
+ return;
+
+ skb->sk = sk;
+ skb->destructor = sock_edemux;
+ dst = sk->sk_rx_dst;
+
+ if (dst)
+ dst = dst_check(dst, 0);
+ if (dst)
+ skb_dst_set_noref(skb, dst);
+}
+
int udp_rcv(struct sk_buff *skb)
{
return __udp4_lib_rcv(skb, &udp_table, IPPROTO_UDP);
--
1.7.7.6
^ permalink raw reply related
* [PATCH net-next v3 0/3] Improve UDP multicast receive latency
From: Shawn Bohrer @ 2013-10-07 16:01 UTC (permalink / raw)
To: David Miller; +Cc: netdev, tomk, Eric Dumazet, Shawn Bohrer
From: Shawn Bohrer <sbohrer@rgmadvisors.com>
The removal of the routing cache in 3.6 had impacted the latency of our
UDP multicast workload. This patch series brings down the latency to
what we were seeing with 3.4.
Patch 1 "udp: Only allow busy read/poll on connected sockets" is mostly
done for correctness and because it allows unifying the unicast and
multicast paths when a socket is found in early demux. It can also
improve latency for a connected multicast socket if busy read/poll is
used.
Patches 2&3 remove the fib lookups and restore latency for our workload
to the pre 3.6 levels.
Benchmark results from a netperf UDP_RR test:
v3.12-rc3-447-g40dc9ab kernel 87961.22 transactions/s
v3.12-rc3-447-g40dc9ab + series 90587.62 transactions/s
Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
v3.12-rc3-447-g40dc9ab kernel 12.97us RTT
v3.12-rc3-447-g40dc9ab + series 12.48us RTT
v2 Changes:
- Unicast UDP early demux now requires an exact socket match and only
tests first socket in UDP hash chain.
- ipv4_pktinfo_prepare() now takes a const struct sock*
v3 Changes:
- Use secondary hash for UDP unicast early demux lookup
- Double check socket match after increasing refcount in both unicast
and multicast early demux lookups
Shawn Bohrer (3):
udp: Only allow busy read/poll on connected sockets
udp: ipv4: Add udp early demux
net: ipv4 only populate IP_PKTINFO when needed
include/net/ip.h | 2 +-
include/net/sock.h | 2 +-
include/net/udp.h | 1 +
net/ipv4/af_inet.c | 1 +
net/ipv4/ip_sockglue.c | 5 +-
net/ipv4/raw.c | 2 +-
net/ipv4/udp.c | 209 +++++++++++++++++++++++++++++++++++++++++++-----
net/ipv6/udp.c | 5 +-
8 files changed, 199 insertions(+), 28 deletions(-)
--
1.7.7.6
^ permalink raw reply
* [PATCH net-next v3 1/3] udp: Only allow busy read/poll on connected sockets
From: Shawn Bohrer @ 2013-10-07 16:01 UTC (permalink / raw)
To: David Miller; +Cc: netdev, tomk, Eric Dumazet, Shawn Bohrer
In-Reply-To: <1381161700-14453-1-git-send-email-shawn.bohrer@gmail.com>
From: Shawn Bohrer <sbohrer@rgmadvisors.com>
UDP sockets can receive packets from multiple endpoints and thus may be
received on multiple receive queues. Since packets packets can arrive
on multiple receive queues we should not mark the napi_id for all
packets. This makes busy read/poll only work for connected UDP sockets.
This additionally enables busy read/poll for UDP multicast packets as
long as the socket is connected by moving the check into
__udp_queue_rcv_skb().
Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/udp.c | 5 +++--
net/ipv6/udp.c | 5 +++--
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c41833e..5950e12 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1405,8 +1405,10 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
{
int rc;
- if (inet_sk(sk)->inet_daddr)
+ if (inet_sk(sk)->inet_daddr) {
sock_rps_save_rxhash(sk, skb);
+ sk_mark_napi_id(sk, skb);
+ }
rc = sock_queue_rcv_skb(sk, skb);
if (rc < 0) {
@@ -1716,7 +1718,6 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
if (sk != NULL) {
int ret;
- sk_mark_napi_id(sk, skb);
ret = udp_queue_rcv_skb(sk, skb);
sock_put(sk);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 8119791..3753247 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -549,8 +549,10 @@ static int __udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
{
int rc;
- if (!ipv6_addr_any(&inet6_sk(sk)->daddr))
+ if (!ipv6_addr_any(&inet6_sk(sk)->daddr)) {
sock_rps_save_rxhash(sk, skb);
+ sk_mark_napi_id(sk, skb);
+ }
rc = sock_queue_rcv_skb(sk, skb);
if (rc < 0) {
@@ -844,7 +846,6 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
if (sk != NULL) {
int ret;
- sk_mark_napi_id(sk, skb);
ret = udpv6_queue_rcv_skb(sk, skb);
sock_put(sk);
--
1.7.7.6
^ permalink raw reply related
* Re: [PATCH] static_key: WARN on usage before jump_label_init was called
From: Steven Rostedt @ 2013-10-07 15:51 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: netdev, linux-kernel, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, Jason Baron, Peter Zijlstra, Eric Dumazet,
andi @ firstfloor. org David S. Miller, x86
In-Reply-To: <20131006182919.GD9295@order.stressinduktion.org>
On Sun, 6 Oct 2013 20:29:19 +0200
Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> diff --git a/lib/jump_label_initialized.c b/lib/jump_label_initialized.c
> new file mode 100644
> index 0000000..a668a40
> --- /dev/null
> +++ b/lib/jump_label_initialized.c
> @@ -0,0 +1,6 @@
> +#include <linux/types.h>
> +#include <linux/cache.h>
> +
> +bool static_key_initialized __read_mostly = false;
> +EXPORT_SYMBOL_GPL(static_key_initialized);
> +
So far, the only thing I don't like about this patch is the creation of
this file for the sole purpose of adding this variable.
Perhaps it can just be added to init/main.c?
-- Steve
^ permalink raw reply
* Re: IPv6 path MTU discovery broken
From: Eric Dumazet @ 2013-10-07 15:49 UTC (permalink / raw)
To: Steinar H. Gunderson; +Cc: netdev, edumazet, fan.du
In-Reply-To: <20131007143404.GA24997@sesse.net>
On Mon, 2013-10-07 at 16:34 +0200, Steinar H. Gunderson wrote:
> On Mon, Oct 07, 2013 at 04:32:06PM +0200, Hannes Frederic Sowa wrote:
> > Strange, I have them in nstat:
> >
> > $ nstat -z | egrep '(TcpExtOutOfWindowIcmps|Icmp6InErrors|TcpExtTCPMinTTLDrop|TcpExtListenDrops)'
>
> -z was the trick. (I've never used nstat before.)
Best might be to use -a (absolute counters).
^ permalink raw reply
* [PATCH net-next] net_sched: increment drop counters in qdisc_tree_decrease_qlen()
From: Eric Dumazet @ 2013-10-07 15:32 UTC (permalink / raw)
To: David Miller; +Cc: netdev
From: Eric Dumazet <edumazet@google.com>
qdisc_tree_decrease_qlen() is called when some packets are dropped
on a qdisc, and we want to notify parents of qlen changes.
We also can increment parents qdisc qstats drop counters.
This permits more accurate drop counters up to root qdisc.
For example a graft operation typically resets a qdisc
(drops all packets) and call qdisc_tree_decrease_qlen()
Note that callers are responsible for their drop counters.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/sched/sch_api.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 2adda7f..cd81505 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -737,9 +737,11 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n)
const struct Qdisc_class_ops *cops;
unsigned long cl;
u32 parentid;
+ int drops;
if (n == 0)
return;
+ drops = max_t(int, n, 0);
while ((parentid = sch->parent)) {
if (TC_H_MAJ(parentid) == TC_H_MAJ(TC_H_INGRESS))
return;
@@ -756,6 +758,7 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n)
cops->put(sch, cl);
}
sch->q.qlen -= n;
+ sch->qstats.drops += drops;
}
}
EXPORT_SYMBOL(qdisc_tree_decrease_qlen);
^ permalink raw reply related
* Re: IPv6 path MTU discovery broken
From: Steinar H. Gunderson @ 2013-10-07 14:34 UTC (permalink / raw)
To: netdev, edumazet, fan.du
In-Reply-To: <20131007143206.GH9295@order.stressinduktion.org>
On Mon, Oct 07, 2013 at 04:32:06PM +0200, Hannes Frederic Sowa wrote:
> Strange, I have them in nstat:
>
> $ nstat -z | egrep '(TcpExtOutOfWindowIcmps|Icmp6InErrors|TcpExtTCPMinTTLDrop|TcpExtListenDrops)'
-z was the trick. (I've never used nstat before.)
pannekake:~> nstat -z | egrep '(TcpExtOutOfWindowIcmps|Icmp6InErrors|TcpExtTCPMinTTLDrop|TcpExtListenDrops)'
Icmp6InErrors 0 0.0
TcpExtOutOfWindowIcmps 4 0.0
TcpExtListenDrops 0 0.0
TcpExtTCPMinTTLDrop 0 0.0
Next time it triggers, I'll see if TcpExtOutOfWindowIcmps increases, then.
/* Steinar */
--
Homepage: http://www.sesse.net/
^ permalink raw reply
* Re: IPv6 path MTU discovery broken
From: Hannes Frederic Sowa @ 2013-10-07 14:32 UTC (permalink / raw)
To: Steinar H. Gunderson; +Cc: netdev, edumazet, fan.du
In-Reply-To: <20131007083228.GB18903@sesse.net>
On Mon, Oct 07, 2013 at 10:32:28AM +0200, Steinar H. Gunderson wrote:
> On Mon, Oct 07, 2013 at 05:09:10AM +0200, Hannes Frederic Sowa wrote:
> > Could you try to check (with e.g. nstat) if any of the following counters
> > change if the icmp messages hit the host?
> >
> > TcpExtOutOfWindowIcmps
> > Icmp6InErrors
> > TcpExtTCPMinTTLDrop
> > TcpExtListenDrops
>
> Icmp6InErrors is 0, so that's not it. How do I find the Tcp* counters?
> None of them show up in nstat, although other Tcp* counters do.
Strange, I have them in nstat:
$ nstat -z | egrep '(TcpExtOutOfWindowIcmps|Icmp6InErrors|TcpExtTCPMinTTLDrop|TcpExtListenDrops)'
Icmp6InErrors 0 0.0
TcpExtOutOfWindowIcmps 0 0.0
TcpExtListenDrops 0 0.0
TcpExtTCPMinTTLDrop 0 0.0
Otherwise you can manually extract them from /proc/net/netstat.
Greetings,
Hannes
^ permalink raw reply
* RE: [PATCHv1 net] xen-netback: transition to CLOSED when removing a VIF
From: Paul Durrant @ 2013-10-07 14:28 UTC (permalink / raw)
To: David Vrabel, xen-devel@lists.xen.org
Cc: David Vrabel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
netdev@vger.kernel.org, Ian Campbell, Wei Liu
In-Reply-To: <1381150519-14557-1-git-send-email-david.vrabel@citrix.com>
> -----Original Message-----
> From: David Vrabel [mailto:david.vrabel@citrix.com]
> Sent: 07 October 2013 13:55
> To: xen-devel@lists.xen.org
> Cc: David Vrabel; Konrad Rzeszutek Wilk; Boris Ostrovsky;
> netdev@vger.kernel.org; Ian Campbell; Wei Liu; Paul Durrant
> Subject: [PATCHv1 net] xen-netback: transition to CLOSED when removing a
> VIF
>
> From: David Vrabel <david.vrabel@citrix.com>
>
> If a guest is destroyed without transitioning its frontend to CLOSED,
> the domain becomes a zombie as netback was not grant unmapping the
> shared rings.
>
> When removing a VIF, transition the backend to CLOSED so the VIF is
> disconnected if necessary (which will unmap the shared rings etc).
>
> This fixes a regression introduced by
> 279f438e36c0a70b23b86d2090aeec50155034a9 (xen-netback: Don't destroy
> the netdev until the vif is shut down).
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Paul Durrant <Paul.Durrant@citrix.com>
> ---
> drivers/net/xen-netback/xenbus.c | 4 ++++
> 1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-
> netback/xenbus.c
> index b45bce2..1b08d87 100644
> --- a/drivers/net/xen-netback/xenbus.c
> +++ b/drivers/net/xen-netback/xenbus.c
> @@ -39,11 +39,15 @@ static int connect_rings(struct backend_info *);
> static void connect(struct backend_info *);
> static void backend_create_xenvif(struct backend_info *be);
> static void unregister_hotplug_status_watch(struct backend_info *be);
> +static void set_backend_state(struct backend_info *be,
> + enum xenbus_state state);
>
> static int netback_remove(struct xenbus_device *dev)
> {
> struct backend_info *be = dev_get_drvdata(&dev->dev);
>
> + set_backend_state(be, XenbusStateClosed);
> +
> unregister_hotplug_status_watch(be);
> if (be->vif) {
> kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
> --
> 1.7.2.5
Looks good to me.
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Paul
^ permalink raw reply
* Re: [PATCH] can: dev: fix nlmsg size calculation in can_get_size()
From: Marc Kleine-Budde @ 2013-10-07 14:26 UTC (permalink / raw)
To: Wolfgang Grandegger; +Cc: netdev, linux-can, kernel
In-Reply-To: <705c23ba0332d9658b8760cd5d460e8a@grandegger.com>
[-- Attachment #1: Type: text/plain, Size: 500 bytes --]
On 10/07/2013 04:24 PM, Wolfgang Grandegger wrote:
>> Is this an Acked-by? :)
> Yep, obviously a long time ago that I did something for Linux-CAN :(.
Thx, there are some netlink related patches coming soon :)
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]
^ permalink raw reply
* Re: [PATCH] can: dev: fix nlmsg size calculation in can_get_size()
From: Wolfgang Grandegger @ 2013-10-07 14:24 UTC (permalink / raw)
To: Marc Kleine-Budde; +Cc: netdev, linux-can, kernel
In-Reply-To: <5252C1CC.4060809@pengutronix.de>
On Mon, 07 Oct 2013 16:14:36 +0200, Marc Kleine-Budde <mkl@pengutronix.de>
wrote:
> On 10/05/2013 10:50 PM, Wolfgang Grandegger wrote:
>> On 10/05/2013 09:25 PM, Marc Kleine-Budde wrote:
>>> This patch fixes the calculation of the nlmsg size, by adding the
>>> missing
>>> nla_total_size().
>>>
>>> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
>>> ---
>>> Hello,
>>>
>>> this patch touches the rtnl_link_ops get_size() callback:
>>>
>>> static struct rtnl_link_ops can_link_ops __read_mostly = {
>>> ...
>>> .get_size = can_get_size,
>>> ...
>>> };
>>>
>>> By looking at other nlmsg size calculation I think a nla_total_size()
>>> for all
>>> contributers is needed. Am I correct?
>>
>> Yes, seems so, nla_put() calls this code:
>>
>> http://lxr.free-electrons.com/source/lib/nlattr.c#L328
>
> Is this an Acked-by? :)
Yep, obviously a long time ago that I did something for Linux-CAN :(.
Wolfgang.
>
> Marc
^ permalink raw reply
* Re: [PATCHv1 net] xen-netback: transition to CLOSED when removing a VIF
From: Wei Liu @ 2013-10-07 14:15 UTC (permalink / raw)
To: David Vrabel
Cc: Wei Liu, xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky,
netdev, Ian Campbell, Paul Durrant
In-Reply-To: <5252BDD1.1000301@citrix.com>
On Mon, Oct 07, 2013 at 02:57:37PM +0100, David Vrabel wrote:
> On 07/10/13 14:43, Wei Liu wrote:
> > On Mon, Oct 07, 2013 at 01:55:19PM +0100, David Vrabel wrote:
> >> From: David Vrabel <david.vrabel@citrix.com>
> >>
> >> If a guest is destroyed without transitioning its frontend to CLOSED,
> >> the domain becomes a zombie as netback was not grant unmapping the
> >> shared rings.
> >>
> >> When removing a VIF, transition the backend to CLOSED so the VIF is
> >> disconnected if necessary (which will unmap the shared rings etc).
> >>
> >> This fixes a regression introduced by
> >> 279f438e36c0a70b23b86d2090aeec50155034a9 (xen-netback: Don't destroy
> >> the netdev until the vif is shut down).
> >>
> >
> > Is this regression solely caused by 279f438e36c or caused by both
> > ea732dff5c and 279f438e36c? I ask because you make use of the new state
> > machine introduced in ea732dff5c. Or are you simply using the new state
> > machine to fix the regression instead of going back to old code?
>
> I bisected it to 279f438. I'm just using the handy new state machine to
> fix it.
>
Thanks for the explanation.
Acked-by: Wei Liu <wei.liu2@citrix.com>
Wei.
> David
^ permalink raw reply
* Re: [PATCH] can: dev: fix nlmsg size calculation in can_get_size()
From: Marc Kleine-Budde @ 2013-10-07 14:14 UTC (permalink / raw)
To: Wolfgang Grandegger; +Cc: netdev, linux-can, kernel
In-Reply-To: <52507B7D.6030008@grandegger.com>
[-- Attachment #1: Type: text/plain, Size: 1045 bytes --]
On 10/05/2013 10:50 PM, Wolfgang Grandegger wrote:
> On 10/05/2013 09:25 PM, Marc Kleine-Budde wrote:
>> This patch fixes the calculation of the nlmsg size, by adding the missing
>> nla_total_size().
>>
>> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
>> ---
>> Hello,
>>
>> this patch touches the rtnl_link_ops get_size() callback:
>>
>> static struct rtnl_link_ops can_link_ops __read_mostly = {
>> ...
>> .get_size = can_get_size,
>> ...
>> };
>>
>> By looking at other nlmsg size calculation I think a nla_total_size() for all
>> contributers is needed. Am I correct?
>
> Yes, seems so, nla_put() calls this code:
>
> http://lxr.free-electrons.com/source/lib/nlattr.c#L328
Is this an Acked-by? :)
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]
^ permalink raw reply
* Re: [PATCHv1 net] xen-netback: transition to CLOSED when removing a VIF
From: David Vrabel @ 2013-10-07 13:57 UTC (permalink / raw)
To: Wei Liu
Cc: xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky, netdev,
Ian Campbell, Paul Durrant
In-Reply-To: <20131007134314.GD28411@zion.uk.xensource.com>
On 07/10/13 14:43, Wei Liu wrote:
> On Mon, Oct 07, 2013 at 01:55:19PM +0100, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> If a guest is destroyed without transitioning its frontend to CLOSED,
>> the domain becomes a zombie as netback was not grant unmapping the
>> shared rings.
>>
>> When removing a VIF, transition the backend to CLOSED so the VIF is
>> disconnected if necessary (which will unmap the shared rings etc).
>>
>> This fixes a regression introduced by
>> 279f438e36c0a70b23b86d2090aeec50155034a9 (xen-netback: Don't destroy
>> the netdev until the vif is shut down).
>>
>
> Is this regression solely caused by 279f438e36c or caused by both
> ea732dff5c and 279f438e36c? I ask because you make use of the new state
> machine introduced in ea732dff5c. Or are you simply using the new state
> machine to fix the regression instead of going back to old code?
I bisected it to 279f438. I'm just using the handy new state machine to
fix it.
David
^ permalink raw reply
* Re: [PATCHv1 net] xen-netback: transition to CLOSED when removing a VIF
From: Wei Liu @ 2013-10-07 13:43 UTC (permalink / raw)
To: David Vrabel
Cc: xen-devel, Konrad Rzeszutek Wilk, Boris Ostrovsky, netdev,
Ian Campbell, Wei Liu, Paul Durrant
In-Reply-To: <1381150519-14557-1-git-send-email-david.vrabel@citrix.com>
On Mon, Oct 07, 2013 at 01:55:19PM +0100, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> If a guest is destroyed without transitioning its frontend to CLOSED,
> the domain becomes a zombie as netback was not grant unmapping the
> shared rings.
>
> When removing a VIF, transition the backend to CLOSED so the VIF is
> disconnected if necessary (which will unmap the shared rings etc).
>
> This fixes a regression introduced by
> 279f438e36c0a70b23b86d2090aeec50155034a9 (xen-netback: Don't destroy
> the netdev until the vif is shut down).
>
Is this regression solely caused by 279f438e36c or caused by both
ea732dff5c and 279f438e36c? I ask because you make use of the new state
machine introduced in ea732dff5c. Or are you simply using the new state
machine to fix the regression instead of going back to old code?
Wei.
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: Paul Durrant <Paul.Durrant@citrix.com>
> ---
> drivers/net/xen-netback/xenbus.c | 4 ++++
> 1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
> index b45bce2..1b08d87 100644
> --- a/drivers/net/xen-netback/xenbus.c
> +++ b/drivers/net/xen-netback/xenbus.c
> @@ -39,11 +39,15 @@ static int connect_rings(struct backend_info *);
> static void connect(struct backend_info *);
> static void backend_create_xenvif(struct backend_info *be);
> static void unregister_hotplug_status_watch(struct backend_info *be);
> +static void set_backend_state(struct backend_info *be,
> + enum xenbus_state state);
>
> static int netback_remove(struct xenbus_device *dev)
> {
> struct backend_info *be = dev_get_drvdata(&dev->dev);
>
> + set_backend_state(be, XenbusStateClosed);
> +
> unregister_hotplug_status_watch(be);
> if (be->vif) {
> kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
> --
> 1.7.2.5
^ permalink raw reply
* Re: [PATCH net V1 2/2] net/mlx4_en: Fix pages never dma unmapped on rx
From: Eric Dumazet @ 2013-10-07 13:47 UTC (permalink / raw)
To: Amir Vadai
Cc: David S. Miller, netdev, Or Gerlitz, Eugenia Emantayev,
Eric Dumazet
In-Reply-To: <1381145893-20930-3-git-send-email-amirv@mellanox.com>
On Mon, 2013-10-07 at 13:38 +0200, Amir Vadai wrote:
> This patch fixes a bug introduced by commit 51151a16 (mlx4: allow
> order-0 memory allocations in RX path).
>
> dma_unmap_page never reached because condition to detect last fragment
> in page is wrong. offset+frag_stride can't be greater than size, need to
> make sure no additional frag will fit in page => compare offset +
> frag_stride + next_frag_size instead.
> next_frag_size is the same as the current one, since page is shared only
> with frags of the same size.
>
> CC: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Amir Vadai <amirv@mellanox.com>
> ---
> After looking at the code again, need to use 2*frag_stride and not to look at
> size of next frag in skb. Changed it accordingly
Yes indeed this looks much better, thanks !
Acked-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: [PATCH net V1 1/2] net/mlx4_en: Rename name of mlx4_en_rx_alloc members
From: Eric Dumazet @ 2013-10-07 13:44 UTC (permalink / raw)
To: Amir Vadai
Cc: David S. Miller, netdev, Or Gerlitz, Eugenia Emantayev,
Eric Dumazet
In-Reply-To: <1381145893-20930-2-git-send-email-amirv@mellanox.com>
On Mon, 2013-10-07 at 13:38 +0200, Amir Vadai wrote:
> Add page prefix to page related members: @size and @offset into
> @page_size and @page_offset
>
> CC: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Amir Vadai <amirv@mellanox.com>
> ---
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 40 ++++++++++++++++------------
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 4 +--
> 2 files changed, 25 insertions(+), 19 deletions(-)
Acked-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply
* Re: [PATCH] usbnet: smsc95xx: Add device tree input for MAC address
From: Dan Murphy @ 2013-10-07 13:30 UTC (permalink / raw)
To: Ming Lei
Cc: Steve Glendinning, Network Development, Linux Kernel Mailing List,
linux-usb, mugunthanvnm-l0cyMroinI0
In-Reply-To: <CACVXFVNAB3VnmdoP_LGJsBjQ17j1aoqKMsBE7LRt65OqROeRUQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On 10/07/2013 06:42 AM, Ming Lei wrote:
> On Mon, Oct 7, 2013 at 1:31 AM, Dan Murphy <dmurphy-l0cyMroinI0@public.gmane.org> wrote:
>> On 10/06/2013 10:05 AM, Ming Lei wrote:
>>> On Sat, Oct 5, 2013 at 2:25 AM, Dan Murphy <dmurphy-l0cyMroinI0@public.gmane.org> wrote:
>>>> If the smsc95xx does not have a valid MAC address stored within
>>>> the eeprom then a random number is generated. The MAC can also
>>>> be set by uBoot but the smsc95xx does not have a way to read this.
>>>>
>>>> Create the binding for the smsc95xx so that uBoot can set the MAC
>>>> and the code can retrieve the MAC from the modified DTB file.
>>> Suppose there are two smsc95xx usbnet devices connected to usb bus, and
>>> one is built-in, another is hotplug device, can your patch handle the situation
>>> correctly?
>> Look at this line in the patch below
>>
>> sprintf(of_name, "%s%i", SMSC95XX_OF_NAME, dev->udev->dev.id);
>>
>> I am appending the dev ID of the device to the of_name here. As long as init_mac_address is called, the dev.id and the uBoot
>> entry match then yes.
> Currently, non-root-hub usb device is created and added into usb bus without
> any platform(device tree) knowledge, so you can't expect the match here.
That is correct. Platform/dev tree will have no concept of the PnP USB dongle therefore there should be no entry in either.
I will need to test this issue with a PnP usb->ethernet dongle.
Dan
> Also not mention the two smsc95xx devices may attach to two different
> usb host controllers(buses).
>
> Thanks,
--
------------------
Dan Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCHv1 net] xen-netback: transition to CLOSED when removing a VIF
From: David Vrabel @ 2013-10-07 12:55 UTC (permalink / raw)
To: xen-devel
Cc: David Vrabel, Konrad Rzeszutek Wilk, Boris Ostrovsky, netdev,
Ian Campbell, Wei Liu, Paul Durrant
From: David Vrabel <david.vrabel@citrix.com>
If a guest is destroyed without transitioning its frontend to CLOSED,
the domain becomes a zombie as netback was not grant unmapping the
shared rings.
When removing a VIF, transition the backend to CLOSED so the VIF is
disconnected if necessary (which will unmap the shared rings etc).
This fixes a regression introduced by
279f438e36c0a70b23b86d2090aeec50155034a9 (xen-netback: Don't destroy
the netdev until the vif is shut down).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Paul Durrant <Paul.Durrant@citrix.com>
---
drivers/net/xen-netback/xenbus.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index b45bce2..1b08d87 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -39,11 +39,15 @@ static int connect_rings(struct backend_info *);
static void connect(struct backend_info *);
static void backend_create_xenvif(struct backend_info *be);
static void unregister_hotplug_status_watch(struct backend_info *be);
+static void set_backend_state(struct backend_info *be,
+ enum xenbus_state state);
static int netback_remove(struct xenbus_device *dev)
{
struct backend_info *be = dev_get_drvdata(&dev->dev);
+ set_backend_state(be, XenbusStateClosed);
+
unregister_hotplug_status_watch(be);
if (be->vif) {
kobject_uevent(&dev->dev.kobj, KOBJ_OFFLINE);
--
1.7.2.5
^ permalink raw reply related
* Re: [PATCH] usbnet: smsc95xx: Add device tree input for MAC address
From: Ming Lei @ 2013-10-07 11:42 UTC (permalink / raw)
To: Dan Murphy
Cc: Steve Glendinning, Network Development, Linux Kernel Mailing List,
linux-usb, mugunthanvnm
In-Reply-To: <52519E7A.40804@ti.com>
On Mon, Oct 7, 2013 at 1:31 AM, Dan Murphy <dmurphy@ti.com> wrote:
> On 10/06/2013 10:05 AM, Ming Lei wrote:
>> On Sat, Oct 5, 2013 at 2:25 AM, Dan Murphy <dmurphy@ti.com> wrote:
>>> If the smsc95xx does not have a valid MAC address stored within
>>> the eeprom then a random number is generated. The MAC can also
>>> be set by uBoot but the smsc95xx does not have a way to read this.
>>>
>>> Create the binding for the smsc95xx so that uBoot can set the MAC
>>> and the code can retrieve the MAC from the modified DTB file.
>> Suppose there are two smsc95xx usbnet devices connected to usb bus, and
>> one is built-in, another is hotplug device, can your patch handle the situation
>> correctly?
>
> Look at this line in the patch below
>
> sprintf(of_name, "%s%i", SMSC95XX_OF_NAME, dev->udev->dev.id);
>
> I am appending the dev ID of the device to the of_name here. As long as init_mac_address is called, the dev.id and the uBoot
> entry match then yes.
Currently, non-root-hub usb device is created and added into usb bus without
any platform(device tree) knowledge, so you can't expect the match here.
Also not mention the two smsc95xx devices may attach to two different
usb host controllers(buses).
Thanks,
--
Ming Lei
^ permalink raw reply
* [PATCH net V1 2/2] net/mlx4_en: Fix pages never dma unmapped on rx
From: Amir Vadai @ 2013-10-07 11:38 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Amir Vadai, Or Gerlitz, Eugenia Emantayev, Eric Dumazet
In-Reply-To: <1381145893-20930-1-git-send-email-amirv@mellanox.com>
This patch fixes a bug introduced by commit 51151a16 (mlx4: allow
order-0 memory allocations in RX path).
dma_unmap_page never reached because condition to detect last fragment
in page is wrong. offset+frag_stride can't be greater than size, need to
make sure no additional frag will fit in page => compare offset +
frag_stride + next_frag_size instead.
next_frag_size is the same as the current one, since page is shared only
with frags of the same size.
CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
After looking at the code again, need to use 2*frag_stride and not to look at
size of next frag in skb. Changed it accordingly
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 066fc27..afe2efa 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -135,9 +135,10 @@ static void mlx4_en_free_frag(struct mlx4_en_priv *priv,
int i)
{
const struct mlx4_en_frag_info *frag_info = &priv->frag_info[i];
+ u32 next_frag_end = frags[i].page_offset + 2 * frag_info->frag_stride;
- if (frags[i].page_offset + frag_info->frag_stride >
- frags[i].page_size)
+
+ if (next_frag_end > frags[i].page_size)
dma_unmap_page(priv->ddev, frags[i].dma, frags[i].page_size,
PCI_DMA_FROMDEVICE);
--
1.8.3.4
^ permalink raw reply related
* [PATCH net V1 0/2] net/mlx4_en: Fix pages never dma unmapped on rx
From: Amir Vadai @ 2013-10-07 11:38 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Amir Vadai, Or Gerlitz, Eugenia Emantayev, Eric Dumazet
This patchset fixes a bug introduced by commit 51151a16 (mlx4: allow order-0
memory allocations in RX path). Where dma_unmap_page wasn't called.
Changes from V0:
- Added "Rename name of mlx4_en_rx_alloc members". Old names were confusing.
- Last frag in page calculation was wrong. Since all frags in page are of the
same size, need to add this frag_stride to end of frag offset, and not the
size of next frag in skb.
Amir Vadai (2):
net/mlx4_en: Rename name of mlx4_en_rx_alloc members
net/mlx4_en: Fix pages never dma unmapped on rx
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 41 ++++++++++++++++------------
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 4 +--
2 files changed, 26 insertions(+), 19 deletions(-)
--
1.8.3.4
^ permalink raw reply
* [PATCH net V1 1/2] net/mlx4_en: Rename name of mlx4_en_rx_alloc members
From: Amir Vadai @ 2013-10-07 11:38 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Amir Vadai, Or Gerlitz, Eugenia Emantayev, Eric Dumazet
In-Reply-To: <1381145893-20930-1-git-send-email-amirv@mellanox.com>
Add page prefix to page related members: @size and @offset into
@page_size and @page_offset
CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 40 ++++++++++++++++------------
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 4 +--
2 files changed, 25 insertions(+), 19 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index dec455c..066fc27 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -70,14 +70,15 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
put_page(page);
return -ENOMEM;
}
- page_alloc->size = PAGE_SIZE << order;
+ page_alloc->page_size = PAGE_SIZE << order;
page_alloc->page = page;
page_alloc->dma = dma;
- page_alloc->offset = frag_info->frag_align;
+ page_alloc->page_offset = frag_info->frag_align;
/* Not doing get_page() for each frag is a big win
* on asymetric workloads.
*/
- atomic_set(&page->_count, page_alloc->size / frag_info->frag_stride);
+ atomic_set(&page->_count,
+ page_alloc->page_size / frag_info->frag_stride);
return 0;
}
@@ -96,16 +97,19 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
for (i = 0; i < priv->num_frags; i++) {
frag_info = &priv->frag_info[i];
page_alloc[i] = ring_alloc[i];
- page_alloc[i].offset += frag_info->frag_stride;
- if (page_alloc[i].offset + frag_info->frag_stride <= ring_alloc[i].size)
+ page_alloc[i].page_offset += frag_info->frag_stride;
+
+ if (page_alloc[i].page_offset + frag_info->frag_stride <=
+ ring_alloc[i].page_size)
continue;
+
if (mlx4_alloc_pages(priv, &page_alloc[i], frag_info, gfp))
goto out;
}
for (i = 0; i < priv->num_frags; i++) {
frags[i] = ring_alloc[i];
- dma = ring_alloc[i].dma + ring_alloc[i].offset;
+ dma = ring_alloc[i].dma + ring_alloc[i].page_offset;
ring_alloc[i] = page_alloc[i];
rx_desc->data[i].addr = cpu_to_be64(dma);
}
@@ -117,7 +121,7 @@ out:
frag_info = &priv->frag_info[i];
if (page_alloc[i].page != ring_alloc[i].page) {
dma_unmap_page(priv->ddev, page_alloc[i].dma,
- page_alloc[i].size, PCI_DMA_FROMDEVICE);
+ page_alloc[i].page_size, PCI_DMA_FROMDEVICE);
page = page_alloc[i].page;
atomic_set(&page->_count, 1);
put_page(page);
@@ -132,9 +136,10 @@ static void mlx4_en_free_frag(struct mlx4_en_priv *priv,
{
const struct mlx4_en_frag_info *frag_info = &priv->frag_info[i];
- if (frags[i].offset + frag_info->frag_stride > frags[i].size)
- dma_unmap_page(priv->ddev, frags[i].dma, frags[i].size,
- PCI_DMA_FROMDEVICE);
+ if (frags[i].page_offset + frag_info->frag_stride >
+ frags[i].page_size)
+ dma_unmap_page(priv->ddev, frags[i].dma, frags[i].page_size,
+ PCI_DMA_FROMDEVICE);
if (frags[i].page)
put_page(frags[i].page);
@@ -161,7 +166,7 @@ out:
page_alloc = &ring->page_alloc[i];
dma_unmap_page(priv->ddev, page_alloc->dma,
- page_alloc->size, PCI_DMA_FROMDEVICE);
+ page_alloc->page_size, PCI_DMA_FROMDEVICE);
page = page_alloc->page;
atomic_set(&page->_count, 1);
put_page(page);
@@ -184,10 +189,11 @@ static void mlx4_en_destroy_allocator(struct mlx4_en_priv *priv,
i, page_count(page_alloc->page));
dma_unmap_page(priv->ddev, page_alloc->dma,
- page_alloc->size, PCI_DMA_FROMDEVICE);
- while (page_alloc->offset + frag_info->frag_stride < page_alloc->size) {
+ page_alloc->page_size, PCI_DMA_FROMDEVICE);
+ while (page_alloc->page_offset + frag_info->frag_stride <
+ page_alloc->page_size) {
put_page(page_alloc->page);
- page_alloc->offset += frag_info->frag_stride;
+ page_alloc->page_offset += frag_info->frag_stride;
}
page_alloc->page = NULL;
}
@@ -478,7 +484,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
/* Save page reference in skb */
__skb_frag_set_page(&skb_frags_rx[nr], frags[nr].page);
skb_frag_size_set(&skb_frags_rx[nr], frag_info->frag_size);
- skb_frags_rx[nr].page_offset = frags[nr].offset;
+ skb_frags_rx[nr].page_offset = frags[nr].page_offset;
skb->truesize += frag_info->frag_stride;
frags[nr].page = NULL;
}
@@ -517,7 +523,7 @@ static struct sk_buff *mlx4_en_rx_skb(struct mlx4_en_priv *priv,
/* Get pointer to first fragment so we could copy the headers into the
* (linear part of the) skb */
- va = page_address(frags[0].page) + frags[0].offset;
+ va = page_address(frags[0].page) + frags[0].page_offset;
if (length <= SMALL_PACKET_SIZE) {
/* We are copying all relevant data to the skb - temporarily
@@ -645,7 +651,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
dma_sync_single_for_cpu(priv->ddev, dma, sizeof(*ethh),
DMA_FROM_DEVICE);
ethh = (struct ethhdr *)(page_address(frags[0].page) +
- frags[0].offset);
+ frags[0].page_offset);
if (is_multicast_ether_addr(ethh->h_dest)) {
struct mlx4_mac_entry *entry;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 5e0aa56..bf06e36 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -237,8 +237,8 @@ struct mlx4_en_tx_desc {
struct mlx4_en_rx_alloc {
struct page *page;
dma_addr_t dma;
- u32 offset;
- u32 size;
+ u32 page_offset;
+ u32 page_size;
};
struct mlx4_en_tx_ring {
--
1.8.3.4
^ permalink raw reply related
* [PATCH mmots] net: static declaration of net_secret_init() follows non-static
From: Sergey Senozhatsky @ 2013-10-07 11:16 UTC (permalink / raw)
To: Andrew Morton, David S. Miller; +Cc: netdev, linux-kernel
Fix secure_seq build error
net/core/secure_seq.c:17:13: error: static declaration of ‘net_secret_init’ follows
non-static declaration
static void net_secret_init(void)
In file included from net/core/secure_seq.c:11:0:
include/net/secure_seq.h:6:6: note: previous declaration of ‘net_secret_init’ was
here
void net_secret_init(void);
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
include/net/secure_seq.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/include/net/secure_seq.h b/include/net/secure_seq.h
index 52c1a90..f257486 100644
--- a/include/net/secure_seq.h
+++ b/include/net/secure_seq.h
@@ -3,7 +3,6 @@
#include <linux/types.h>
-void net_secret_init(void);
__u32 secure_ip_id(__be32 daddr);
__u32 secure_ipv6_id(const __be32 daddr[4]);
u32 secure_ipv4_port_ephemeral(__be32 saddr, __be32 daddr, __be16 dport);
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox