From: Eric Dumazet <dada1@cosmosbay.com>
To: "David S. Miller" <davem@davemloft.net>
Cc: Linux Netdev List <netdev@vger.kernel.org>,
Jarek Poplawski <jarkao2@gmail.com>,
Patrick McHardy <kaber@trash.net>
Subject: [PATCH, v2] net: release dst entry in dev_hard_start_xmit()
Date: Tue, 12 May 2009 21:26:35 +0200 [thread overview]
Message-ID: <4A09CD6B.5070604@cosmosbay.com> (raw)
In-Reply-To: <4A093178.9020105@cosmosbay.com>
Eric Dumazet a écrit :
> Eric Dumazet a écrit :
>> One point of contention in high network loads is the dst_release() performed
>> when a transmited skb is freed. This is because NIC tx completion calls
>> dev_kree_skb() long after original call to dev_queue_xmit(skb).
>>
>> CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
>> quite visible if one CPU is 100% handling softirqs for a network device,
>> since dst_clone() is done by other cpus, involving cache line ping pongs.
>>
>> It seems right place to release dst is in dev_hard_start_xmit(), for most
>> devices but ones that are virtual, and some exceptions.
>>
>> David Miller suggested to define a new device flag, set in alloc_netdev_mq()
>> (so that most devices set it at init time), and carefuly unset in devices
>> which dont want a NULL skb->dst in their ndo_start_xmit().
>>
>> List of devices that must clear this flag is :
>>
>> - loopback device, because it calls netif_rx() and quoting Patrick :
>> "ip_route_input() doesn't accept loopback addresses, so loopback packets
>> already need to have a dst_entry attached."
>> - appletalk/ipddp.c : needs skb->dst in its xmit function
>>
>> - And all devices that call again dev_queue_xmit() from their xmit function
>> (as some classifiers need skb->dst) : bonding, macvlan, eql, ifb, hdlc_fr
>>
>> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
>> ---
>> drivers/net/appletalk/ipddp.c | 1 +
>> drivers/net/bonding/bond_main.c | 1 +
>> drivers/net/eql.c | 1 +
>> drivers/net/ifb.c | 1 +
>> drivers/net/loopback.c | 1 +
>> drivers/net/macvlan.c | 1 +
>> drivers/net/wan/hdlc_fr.c | 1 +
>> include/linux/if.h | 3 +++
>> net/core/dev.c | 9 +++++++++
>> 9 files changed, 19 insertions(+)
>>
>> diff --git a/drivers/net/appletalk/ipddp.c b/drivers/net/appletalk/ipddp.c
>> index da64ba8..0268561 100644
>> --- a/drivers/net/appletalk/ipddp.c
>> +++ b/drivers/net/appletalk/ipddp.c
>> @@ -71,6 +71,7 @@ static struct net_device * __init ipddp_init(void)
>> if (!dev)
>> return ERR_PTR(-ENOMEM);
>>
>> + dev->priv_flags &= IFF_XMIT_DST_RELEASE;
>
>
> Oops, I forgot the ~ here, sorry
Here is a second version, including change to vlan code as well.
Thank you
[PATCH] net: release dst entry in dev_hard_start_xmit()
One point of contention in high network loads is the dst_release() performed
when a transmited skb is freed. This is because NIC tx completion calls
dev_kree_skb() long after original call to dev_queue_xmit(skb).
CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
quite visible if one CPU is 100% handling softirqs for a network device,
since dst_clone() is done by other cpus, involving cache line ping pongs.
It seems right place to release dst is in dev_hard_start_xmit(), for most
devices but ones that are virtual, and some exceptions.
David Miller suggested to define a new device flag, set in alloc_netdev_mq()
(so that most devices set it at init time), and carefuly unset in devices
which dont want a NULL skb->dst in their ndo_start_xmit().
List of devices that must clear this flag is :
- loopback device, because it calls netif_rx() and quoting Patrick :
"ip_route_input() doesn't accept loopback addresses, so loopback packets
already need to have a dst_entry attached."
- appletalk/ipddp.c : needs skb->dst in its xmit function
- And all devices that call again dev_queue_xmit() from their xmit function
(as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
---
drivers/net/appletalk/ipddp.c | 1 +
drivers/net/bonding/bond_main.c | 1 +
drivers/net/eql.c | 1 +
drivers/net/ifb.c | 1 +
drivers/net/loopback.c | 1 +
drivers/net/macvlan.c | 1 +
drivers/net/wan/hdlc_fr.c | 1 +
include/linux/if.h | 3 +++
net/8021q/vlan_dev.c | 1 +
net/core/dev.c | 9 +++++++++
10 files changed, 20 insertions(+)
diff --git a/drivers/net/appletalk/ipddp.c b/drivers/net/appletalk/ipddp.c
index da64ba8..f939e92 100644
--- a/drivers/net/appletalk/ipddp.c
+++ b/drivers/net/appletalk/ipddp.c
@@ -71,6 +71,7 @@ static struct net_device * __init ipddp_init(void)
if (!dev)
return ERR_PTR(-ENOMEM);
+ dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
strcpy(dev->name, "ipddp%d");
if (version_printed++ == 0)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 815191d..a29f421 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -5148,6 +5148,7 @@ int bond_create(char *name, struct bond_params *params)
goto out_rtnl;
}
+ bond_dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
if (!name) {
res = dev_alloc_name(bond_dev, "bond%d");
if (res < 0)
diff --git a/drivers/net/eql.c b/drivers/net/eql.c
index 5210bb1..19b7dd9 100644
--- a/drivers/net/eql.c
+++ b/drivers/net/eql.c
@@ -194,6 +194,7 @@ static void __init eql_setup(struct net_device *dev)
dev->type = ARPHRD_SLIP;
dev->tx_queue_len = 5; /* Hands them off fast */
+ dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
}
static int eql_open(struct net_device *dev)
diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 60a2630..96713ef 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -156,6 +156,7 @@ static void ifb_setup(struct net_device *dev)
dev->flags |= IFF_NOARP;
dev->flags &= ~IFF_MULTICAST;
+ dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
random_ether_addr(dev->dev_addr);
}
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 6f71157..da472c6 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -170,6 +170,7 @@ static void loopback_setup(struct net_device *dev)
dev->tx_queue_len = 0;
dev->type = ARPHRD_LOOPBACK; /* 0x0001*/
dev->flags = IFF_LOOPBACK;
+ dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
dev->features = NETIF_F_SG | NETIF_F_FRAGLIST
| NETIF_F_TSO
| NETIF_F_NO_CSUM
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 329cd50..d5334b4 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -414,6 +414,7 @@ static void macvlan_setup(struct net_device *dev)
{
ether_setup(dev);
+ dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
dev->netdev_ops = &macvlan_netdev_ops;
dev->destructor = free_netdev;
dev->header_ops = &macvlan_hard_header_ops,
diff --git a/drivers/net/wan/hdlc_fr.c b/drivers/net/wan/hdlc_fr.c
index 8005301..bfa0161 100644
--- a/drivers/net/wan/hdlc_fr.c
+++ b/drivers/net/wan/hdlc_fr.c
@@ -1054,6 +1054,7 @@ static void pvc_setup(struct net_device *dev)
dev->flags = IFF_POINTOPOINT;
dev->hard_header_len = 10;
dev->addr_len = 2;
+ dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
}
static const struct net_device_ops pvc_ops = {
diff --git a/include/linux/if.h b/include/linux/if.h
index 1108f3e..b9a6229 100644
--- a/include/linux/if.h
+++ b/include/linux/if.h
@@ -67,6 +67,9 @@
#define IFF_ISATAP 0x80 /* ISATAP interface (RFC4214) */
#define IFF_MASTER_ARPMON 0x100 /* bonding master, ARP mon in use */
#define IFF_WAN_HDLC 0x200 /* WAN HDLC device */
+#define IFF_XMIT_DST_RELEASE 0x400 /* dev_hard_start_xmit() is allowed to
+ * release skb->dst
+ */
#define IF_GET_IFACE 0x0001 /* for querying only */
#define IF_GET_PROTO 0x0002
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 25ba41e..faa535f 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -739,6 +739,7 @@ void vlan_setup(struct net_device *dev)
ether_setup(dev);
dev->priv_flags |= IFF_802_1Q_VLAN;
+ dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
dev->tx_queue_len = 0;
dev->netdev_ops = &vlan_netdev_ops;
diff --git a/net/core/dev.c b/net/core/dev.c
index 14dd725..b9aef53 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1688,6 +1688,14 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
goto gso;
}
+ /*
+ * If device doesnt need skb->dst, release it right now while
+ * its hot in this cpu cache
+ */
+ if ((dev->priv_flags & IFF_XMIT_DST_RELEASE) && skb->dst) {
+ dst_release(skb->dst);
+ skb->dst = NULL;
+ }
rc = ops->ndo_start_xmit(skb, dev);
/*
* TODO: if skb_orphan() was called by
@@ -5028,6 +5036,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
netdev_init_queues(dev);
INIT_LIST_HEAD(&dev->napi_list);
+ dev->priv_flags = IFF_XMIT_DST_RELEASE;
setup(dev);
strcpy(dev->name, name);
return dev;
next prev parent reply other threads:[~2009-05-12 20:01 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-20 11:40 [RFC] net: release dst entry in dev_queue_xmit() Eric Dumazet
2009-03-20 14:10 ` Neil Horman
2009-03-25 6:43 ` David Miller
2009-03-25 7:13 ` Eric Dumazet
2009-03-25 7:17 ` David Miller
2009-03-25 18:22 ` Jarek Poplawski
2009-03-25 18:41 ` Eric Dumazet
2009-03-25 19:18 ` Jarek Poplawski
2009-03-25 19:40 ` Eric Dumazet
2009-03-25 19:54 ` Jarek Poplawski
2009-03-25 20:28 ` Eric Dumazet
2009-03-25 21:12 ` Jarek Poplawski
2009-03-25 21:20 ` Patrick McHardy
2009-05-12 8:12 ` [PATCH] net: release dst entry in dev_hard_start_xmit() Eric Dumazet
2009-05-12 8:21 ` Eric Dumazet
2009-05-12 19:26 ` Eric Dumazet [this message]
2009-05-19 5:19 ` [PATCH, v2] " David Miller
2009-05-19 5:44 ` Eric Dumazet
2009-05-19 19:44 ` Eric Dumazet
2009-05-19 21:09 ` Jarek Poplawski
2009-05-19 21:21 ` Eric Dumazet
2009-05-19 21:24 ` David Miller
2009-05-12 19:27 ` [PATCH] " Jarek Poplawski
2009-05-12 19:44 ` Eric Dumazet
2009-05-12 20:05 ` Jarek Poplawski
2009-05-12 20:24 ` Jarek Poplawski
2009-05-12 20:52 ` Eric Dumazet
2009-05-12 20:59 ` Jarek Poplawski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A09CD6B.5070604@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=jarkao2@gmail.com \
--cc=kaber@trash.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.