* Re: [PATCH v12 06/17] Use callback to deal with skb_release_data() specially.
From: David Miller @ 2010-10-01 7:14 UTC (permalink / raw)
To: xiaohui.xin; +Cc: netdev, kvm, linux-kernel, mst, mingo, herbert, jdike
In-Reply-To: <f2bf65a8a7676bbd3e8a749ae93d99e88671d35d.1285853725.git.xiaohui.xin@intel.com>
From: xiaohui.xin@intel.com
Date: Thu, 30 Sep 2010 22:04:23 +0800
> @@ -197,10 +197,11 @@ struct skb_shared_info {
> union skb_shared_tx tx_flags;
> struct sk_buff *frag_list;
> struct skb_shared_hwtstamps hwtstamps;
> - skb_frag_t frags[MAX_SKB_FRAGS];
> /* Intermediate layers must ensure that destructor_arg
> * remains valid until skb destructor */
> void * destructor_arg;
> +
> + skb_frag_t frags[MAX_SKB_FRAGS];
> };
>
> /* The structure is for a skb which pages may point to
Why are you moving frags[] to the end like this?
^ permalink raw reply
* [PATCH net-next] net: add a core netdev->rx_dropped counter
From: Eric Dumazet @ 2010-10-01 7:06 UTC (permalink / raw)
To: Jesse Gross, David Miller; +Cc: Roger Luethi, netdev, Patrick McHardy
In-Reply-To: <1285909831.2705.41.camel@edumazet-laptop>
Le vendredi 01 octobre 2010 à 07:10 +0200, Eric Dumazet a écrit :
> This seems very reasonable ;)
>
> I'll add a counter, a core generalization of
> commit 8990f468a (net: rx_dropped accounting)
>
> Because we can drop packets _after_ netif_rx() if RPS is in action
> anyway.
>
>
In this patch I fold the additional dev->rx_dropped into get_stats()
structure. We might chose to not fold it, and provides this counter in a
new /proc/net/dev column, a new rtnetlink attribute (and appropriate
iproute2 change)
What do you think ?
[PATCH net-next] net: add a core netdev->rx_dropped counter
In various situations, a device provides a packet to our stack and we
drop it before it enters protocol stack :
- softnet backlog full (accounted in /proc/net/softnet_stat)
- bad vlan tag (not accounted)
- unknown/unregistered protocol (not accounted)
We can handle a per-device counter of such dropped frames at core level,
and automatically adds it to the device provided stats (rx_dropped), so
that standard tools can be used (ifconfig, ip link, cat /proc/net/dev)
This is a generalization of commit 8990f468a (net: rx_dropped
accounting), thus reverting it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
drivers/net/loopback.c | 8 +-------
include/linux/netdevice.h | 3 +++
net/8021q/vlan.h | 2 --
net/8021q/vlan_core.c | 2 ++
net/8021q/vlan_dev.c | 11 ++++-------
net/core/dev.c | 19 +++++++++++--------
net/ipv4/ip_gre.c | 3 +--
net/ipv4/ipip.c | 3 +--
net/ipv6/ip6_tunnel.c | 3 +--
net/ipv6/ip6mr.c | 3 +--
net/ipv6/sit.c | 3 +--
11 files changed, 26 insertions(+), 34 deletions(-)
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 4b0e30b..2d9663a 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -64,7 +64,6 @@ struct pcpu_lstats {
u64 packets;
u64 bytes;
struct u64_stats_sync syncp;
- unsigned long drops;
};
/*
@@ -90,8 +89,7 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb,
lb_stats->bytes += len;
lb_stats->packets++;
u64_stats_update_end(&lb_stats->syncp);
- } else
- lb_stats->drops++;
+ }
return NETDEV_TX_OK;
}
@@ -101,7 +99,6 @@ static struct rtnl_link_stats64 *loopback_get_stats64(struct net_device *dev,
{
u64 bytes = 0;
u64 packets = 0;
- u64 drops = 0;
int i;
for_each_possible_cpu(i) {
@@ -115,14 +112,11 @@ static struct rtnl_link_stats64 *loopback_get_stats64(struct net_device *dev,
tbytes = lb_stats->bytes;
tpackets = lb_stats->packets;
} while (u64_stats_fetch_retry(&lb_stats->syncp, start));
- drops += lb_stats->drops;
bytes += tbytes;
packets += tpackets;
}
stats->rx_packets = packets;
stats->tx_packets = packets;
- stats->rx_dropped = drops;
- stats->rx_errors = drops;
stats->rx_bytes = bytes;
stats->tx_bytes = bytes;
return stats;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ceed347..444f042 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -884,6 +884,9 @@ struct net_device {
int iflink;
struct net_device_stats stats;
+ atomic_long_t rx_dropped; /* dropped packets by core network
+ * Do not use this in drivers.
+ */
#ifdef CONFIG_WIRELESS_EXT
/* List of functions to handle Wireless Extensions (instead of ioctl).
diff --git a/net/8021q/vlan.h b/net/8021q/vlan.h
index b26ce34..8d9503a 100644
--- a/net/8021q/vlan.h
+++ b/net/8021q/vlan.h
@@ -25,7 +25,6 @@ struct vlan_priority_tci_mapping {
* @rx_multicast: number of received multicast packets
* @syncp: synchronization point for 64bit counters
* @rx_errors: number of errors
- * @rx_dropped: number of dropped packets
*/
struct vlan_rx_stats {
u64 rx_packets;
@@ -33,7 +32,6 @@ struct vlan_rx_stats {
u64 rx_multicast;
struct u64_stats_sync syncp;
unsigned long rx_errors;
- unsigned long rx_dropped;
};
/**
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index 0eb486d..35a04a1 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -30,6 +30,7 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct vlan_group *grp,
return polling ? netif_receive_skb(skb) : netif_rx(skb);
drop:
+ atomic_long_inc(&skb->dev->rx_dropped);
dev_kfree_skb_any(skb);
return NET_RX_DROP;
}
@@ -117,6 +118,7 @@ vlan_gro_common(struct napi_struct *napi, struct vlan_group *grp,
return dev_gro_receive(napi, skb);
drop:
+ atomic_long_inc(&skb->dev->rx_dropped);
return GRO_DROP;
}
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index f6fbcc0..f54251e 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -225,16 +225,15 @@ int vlan_skb_recv(struct sk_buff *skb, struct net_device *dev,
}
}
- if (unlikely(netif_rx(skb) == NET_RX_DROP)) {
- if (rx_stats)
- rx_stats->rx_dropped++;
- }
+ netif_rx(skb);
+
rcu_read_unlock();
return NET_RX_SUCCESS;
err_unlock:
rcu_read_unlock();
err_free:
+ atomic_long_inc(&dev->rx_dropped);
kfree_skb(skb);
return NET_RX_DROP;
}
@@ -846,15 +845,13 @@ static struct rtnl_link_stats64 *vlan_dev_get_stats64(struct net_device *dev, st
accum.rx_packets += rxpackets;
accum.rx_bytes += rxbytes;
accum.rx_multicast += rxmulticast;
- /* rx_errors, rx_dropped are ulong, not protected by syncp */
+ /* rx_errors is ulong, not protected by syncp */
accum.rx_errors += p->rx_errors;
- accum.rx_dropped += p->rx_dropped;
}
stats->rx_packets = accum.rx_packets;
stats->rx_bytes = accum.rx_bytes;
stats->rx_errors = accum.rx_errors;
stats->multicast = accum.rx_multicast;
- stats->rx_dropped = accum.rx_dropped;
}
return stats;
}
diff --git a/net/core/dev.c b/net/core/dev.c
index a313bab..5143663 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1483,8 +1483,9 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
skb_orphan(skb);
nf_reset(skb);
- if (!(dev->flags & IFF_UP) ||
- (skb->len > (dev->mtu + dev->hard_header_len))) {
+ if (unlikely(!(dev->flags & IFF_UP) ||
+ (skb->len > (dev->mtu + dev->hard_header_len)))) {
+ atomic_long_inc(&dev->rx_dropped);
kfree_skb(skb);
return NET_RX_DROP;
}
@@ -2548,6 +2549,7 @@ enqueue:
local_irq_restore(flags);
+ atomic_long_inc(&skb->dev->rx_dropped);
kfree_skb(skb);
return NET_RX_DROP;
}
@@ -2996,6 +2998,7 @@ ncls:
if (pt_prev) {
ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
} else {
+ atomic_long_inc(&skb->dev->rx_dropped);
kfree_skb(skb);
/* Jamal, now you will not able to escape explaining
* me how you were going to use this. :-)
@@ -5431,14 +5434,14 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev,
if (ops->ndo_get_stats64) {
memset(storage, 0, sizeof(*storage));
- return ops->ndo_get_stats64(dev, storage);
- }
- if (ops->ndo_get_stats) {
+ ops->ndo_get_stats64(dev, storage);
+ } else if (ops->ndo_get_stats) {
netdev_stats_to_stats64(storage, ops->ndo_get_stats(dev));
- return storage;
+ } else {
+ netdev_stats_to_stats64(storage, &dev->stats);
+ dev_txq_stats_fold(dev, storage);
}
- netdev_stats_to_stats64(storage, &dev->stats);
- dev_txq_stats_fold(dev, storage);
+ storage->rx_dropped += atomic_long_read(&dev->rx_dropped);
return storage;
}
EXPORT_SYMBOL(dev_get_stats);
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index fbe2c47..9d421f4 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -679,8 +679,7 @@ static int ipgre_rcv(struct sk_buff *skb)
skb_reset_network_header(skb);
ipgre_ecn_decapsulate(iph, skb);
- if (netif_rx(skb) == NET_RX_DROP)
- tunnel->dev->stats.rx_dropped++;
+ netif_rx(skb);
rcu_read_unlock();
return 0;
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 6ad46c2..e9b816e 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -414,8 +414,7 @@ static int ipip_rcv(struct sk_buff *skb)
ipip_ecn_decapsulate(iph, skb);
- if (netif_rx(skb) == NET_RX_DROP)
- tunnel->dev->stats.rx_dropped++;
+ netif_rx(skb);
rcu_read_unlock();
return 0;
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 8be3c45..c2c0f89 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -768,8 +768,7 @@ static int ip6_tnl_rcv(struct sk_buff *skb, __u16 protocol,
dscp_ecn_decapsulate(t, ipv6h, skb);
- if (netif_rx(skb) == NET_RX_DROP)
- t->dev->stats.rx_dropped++;
+ netif_rx(skb);
rcu_read_unlock();
return 0;
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 2640c9b..6f32ffc 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -666,8 +666,7 @@ static int pim6_rcv(struct sk_buff *skb)
skb_tunnel_rx(skb, reg_dev);
- if (netif_rx(skb) == NET_RX_DROP)
- reg_dev->stats.rx_dropped++;
+ netif_rx(skb);
dev_put(reg_dev);
return 0;
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index d770178..367a6cc 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -600,8 +600,7 @@ static int ipip6_rcv(struct sk_buff *skb)
ipip6_ecn_decapsulate(iph, skb);
- if (netif_rx(skb) == NET_RX_DROP)
- tunnel->dev->stats.rx_dropped++;
+ netif_rx(skb);
rcu_read_unlock();
return 0;
^ permalink raw reply related
* [PATCH 17/18] net: Fix endianess issues in IBM newemac driver
From: Ian Munsie @ 2010-10-01 7:06 UTC (permalink / raw)
To: linux-kernel, linuxppc-dev, benh
Cc: paulus, Ian Munsie, David S. Miller, Grant Likely, Jiri Pirko,
Sean MacLennan, Tejun Heo, netdev, devicetree-discuss
In-Reply-To: <1285916771-18033-1-git-send-email-imunsie@au1.ibm.com>
From: Ian Munsie <imunsie@au1.ibm.com>
This patch fixes all the device tree and ring buffer accesses in the IBM
newemac driver.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
---
drivers/net/ibm_newemac/core.c | 68 ++++++++++++++++++++--------------------
drivers/net/ibm_newemac/mal.c | 6 ++--
drivers/net/ibm_newemac/mal.h | 6 ++--
3 files changed, 40 insertions(+), 40 deletions(-)
diff --git a/drivers/net/ibm_newemac/core.c b/drivers/net/ibm_newemac/core.c
index 3506fd6..67238b8 100644
--- a/drivers/net/ibm_newemac/core.c
+++ b/drivers/net/ibm_newemac/core.c
@@ -981,12 +981,12 @@ static int emac_resize_rx_ring(struct emac_instance *dev, int new_mtu)
* to simplify error recovery in the case of allocation failure later.
*/
for (i = 0; i < NUM_RX_BUFF; ++i) {
- if (dev->rx_desc[i].ctrl & MAL_RX_CTRL_FIRST)
+ if (dev->rx_desc[i].ctrl & cpu_to_be16(MAL_RX_CTRL_FIRST))
++dev->estats.rx_dropped_resize;
dev->rx_desc[i].data_len = 0;
- dev->rx_desc[i].ctrl = MAL_RX_CTRL_EMPTY |
- (i == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);
+ dev->rx_desc[i].ctrl = cpu_to_be16(MAL_RX_CTRL_EMPTY |
+ (i == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0));
}
/* Reallocate RX ring only if bigger skb buffers are required */
@@ -1005,9 +1005,9 @@ static int emac_resize_rx_ring(struct emac_instance *dev, int new_mtu)
dev_kfree_skb(dev->rx_skb[i]);
skb_reserve(skb, EMAC_RX_SKB_HEADROOM + 2);
- dev->rx_desc[i].data_ptr =
+ dev->rx_desc[i].data_ptr = cpu_to_be32(
dma_map_single(&dev->ofdev->dev, skb->data - 2, rx_sync_size,
- DMA_FROM_DEVICE) + 2;
+ DMA_FROM_DEVICE) + 2);
dev->rx_skb[i] = skb;
}
skip:
@@ -1067,7 +1067,7 @@ static void emac_clean_tx_ring(struct emac_instance *dev)
if (dev->tx_skb[i]) {
dev_kfree_skb(dev->tx_skb[i]);
dev->tx_skb[i] = NULL;
- if (dev->tx_desc[i].ctrl & MAL_TX_CTRL_READY)
+ if (dev->tx_desc[i].ctrl & cpu_to_be16(MAL_TX_CTRL_READY))
++dev->estats.tx_dropped;
}
dev->tx_desc[i].ctrl = 0;
@@ -1104,12 +1104,12 @@ static inline int emac_alloc_rx_skb(struct emac_instance *dev, int slot,
dev->rx_desc[slot].data_len = 0;
skb_reserve(skb, EMAC_RX_SKB_HEADROOM + 2);
- dev->rx_desc[slot].data_ptr =
+ dev->rx_desc[slot].data_ptr = cpu_to_be32(
dma_map_single(&dev->ofdev->dev, skb->data - 2, dev->rx_sync_size,
- DMA_FROM_DEVICE) + 2;
+ DMA_FROM_DEVICE) + 2);
wmb();
- dev->rx_desc[slot].ctrl = MAL_RX_CTRL_EMPTY |
- (slot == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);
+ dev->rx_desc[slot].ctrl = cpu_to_be16(MAL_RX_CTRL_EMPTY |
+ (slot == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0));
return 0;
}
@@ -1373,12 +1373,12 @@ static int emac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
DBG2(dev, "xmit(%u) %d" NL, len, slot);
dev->tx_skb[slot] = skb;
- dev->tx_desc[slot].data_ptr = dma_map_single(&dev->ofdev->dev,
+ dev->tx_desc[slot].data_ptr = cpu_to_be32(dma_map_single(&dev->ofdev->dev,
skb->data, len,
- DMA_TO_DEVICE);
- dev->tx_desc[slot].data_len = (u16) len;
+ DMA_TO_DEVICE));
+ dev->tx_desc[slot].data_len = cpu_to_be16(len);
wmb();
- dev->tx_desc[slot].ctrl = ctrl;
+ dev->tx_desc[slot].ctrl = cpu_to_be16(ctrl);
return emac_xmit_finish(dev, len);
}
@@ -1399,9 +1399,9 @@ static inline int emac_xmit_split(struct emac_instance *dev, int slot,
ctrl |= MAL_TX_CTRL_WRAP;
dev->tx_skb[slot] = NULL;
- dev->tx_desc[slot].data_ptr = pd;
- dev->tx_desc[slot].data_len = (u16) chunk;
- dev->tx_desc[slot].ctrl = ctrl;
+ dev->tx_desc[slot].data_ptr = cpu_to_be32(pd);
+ dev->tx_desc[slot].data_len = cpu_to_be16(chunk);
+ dev->tx_desc[slot].ctrl = cpu_to_be16(ctrl);
++dev->tx_cnt;
if (!len)
@@ -1442,9 +1442,9 @@ static int emac_start_xmit_sg(struct sk_buff *skb, struct net_device *ndev)
/* skb data */
dev->tx_skb[slot] = NULL;
chunk = min(len, MAL_MAX_TX_SIZE);
- dev->tx_desc[slot].data_ptr = pd =
- dma_map_single(&dev->ofdev->dev, skb->data, len, DMA_TO_DEVICE);
- dev->tx_desc[slot].data_len = (u16) chunk;
+ dev->tx_desc[slot].data_ptr = cpu_to_be32(pd =
+ dma_map_single(&dev->ofdev->dev, skb->data, len, DMA_TO_DEVICE));
+ dev->tx_desc[slot].data_len = cpu_to_be16(chunk);
len -= chunk;
if (unlikely(len))
slot = emac_xmit_split(dev, slot, pd + chunk, len, !nr_frags,
@@ -1473,7 +1473,7 @@ static int emac_start_xmit_sg(struct sk_buff *skb, struct net_device *ndev)
if (dev->tx_slot == NUM_TX_BUFF - 1)
ctrl |= MAL_TX_CTRL_WRAP;
wmb();
- dev->tx_desc[dev->tx_slot].ctrl = ctrl;
+ dev->tx_desc[dev->tx_slot].ctrl = cpu_to_be16(ctrl);
dev->tx_slot = (slot + 1) % NUM_TX_BUFF;
return emac_xmit_finish(dev, skb->len);
@@ -1541,7 +1541,7 @@ static void emac_poll_tx(void *param)
u16 ctrl;
int slot = dev->ack_slot, n = 0;
again:
- ctrl = dev->tx_desc[slot].ctrl;
+ ctrl = be16_to_cpu(dev->tx_desc[slot].ctrl);
if (!(ctrl & MAL_TX_CTRL_READY)) {
struct sk_buff *skb = dev->tx_skb[slot];
++n;
@@ -1583,8 +1583,8 @@ static inline void emac_recycle_rx_skb(struct emac_instance *dev, int slot,
dev->rx_desc[slot].data_len = 0;
wmb();
- dev->rx_desc[slot].ctrl = MAL_RX_CTRL_EMPTY |
- (slot == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);
+ dev->rx_desc[slot].ctrl = cpu_to_be16(MAL_RX_CTRL_EMPTY |
+ (slot == (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0));
}
static void emac_parse_rx_error(struct emac_instance *dev, u16 ctrl)
@@ -1628,7 +1628,7 @@ static inline void emac_rx_csum(struct emac_instance *dev,
static inline int emac_rx_sg_append(struct emac_instance *dev, int slot)
{
if (likely(dev->rx_sg_skb != NULL)) {
- int len = dev->rx_desc[slot].data_len;
+ int len = be16_to_cpu(dev->rx_desc[slot].data_len);
int tot_len = dev->rx_sg_skb->len + len;
if (unlikely(tot_len + 2 > dev->rx_skb_size)) {
@@ -1659,14 +1659,14 @@ static int emac_poll_rx(void *param, int budget)
while (budget > 0) {
int len;
struct sk_buff *skb;
- u16 ctrl = dev->rx_desc[slot].ctrl;
+ u16 ctrl = be16_to_cpu(dev->rx_desc[slot].ctrl);
if (ctrl & MAL_RX_CTRL_EMPTY)
break;
skb = dev->rx_skb[slot];
mb();
- len = dev->rx_desc[slot].data_len;
+ len = be16_to_cpu(dev->rx_desc[slot].data_len);
if (unlikely(!MAL_IS_SINGLE_RX(ctrl)))
goto sg;
@@ -1757,7 +1757,7 @@ static int emac_poll_rx(void *param, int budget)
if (unlikely(budget && test_bit(MAL_COMMAC_RX_STOPPED, &dev->commac.flags))) {
mb();
- if (!(dev->rx_desc[slot].ctrl & MAL_RX_CTRL_EMPTY)) {
+ if (!(dev->rx_desc[slot].ctrl & cpu_to_be16(MAL_RX_CTRL_EMPTY))) {
DBG2(dev, "rx restart" NL);
received = 0;
goto again;
@@ -1783,7 +1783,7 @@ static int emac_peek_rx(void *param)
{
struct emac_instance *dev = param;
- return !(dev->rx_desc[dev->rx_slot].ctrl & MAL_RX_CTRL_EMPTY);
+ return !(dev->rx_desc[dev->rx_slot].ctrl & cpu_to_be16(MAL_RX_CTRL_EMPTY));
}
/* NAPI poll context */
@@ -1793,7 +1793,7 @@ static int emac_peek_rx_sg(void *param)
int slot = dev->rx_slot;
while (1) {
- u16 ctrl = dev->rx_desc[slot].ctrl;
+ u16 ctrl = be16_to_cpu(dev->rx_desc[slot].ctrl);
if (ctrl & MAL_RX_CTRL_EMPTY)
return 0;
else if (ctrl & MAL_RX_CTRL_LAST)
@@ -2367,14 +2367,14 @@ static int __devinit emac_read_uint_prop(struct device_node *np, const char *nam
u32 *val, int fatal)
{
int len;
- const u32 *prop = of_get_property(np, name, &len);
+ const __be32 *prop = of_get_property(np, name, &len);
if (prop == NULL || len < sizeof(u32)) {
if (fatal)
printk(KERN_ERR "%s: missing %s property\n",
np->full_name, name);
return -ENODEV;
}
- *val = *prop;
+ *val = be32_to_cpup(prop);
return 0;
}
@@ -3013,7 +3013,7 @@ static void __init emac_make_bootlist(void)
/* Collect EMACs */
while((np = of_find_all_nodes(np)) != NULL) {
- const u32 *idx;
+ const __be32 *idx;
if (of_match_node(emac_match, np) == NULL)
continue;
@@ -3022,7 +3022,7 @@ static void __init emac_make_bootlist(void)
idx = of_get_property(np, "cell-index", NULL);
if (idx == NULL)
continue;
- cell_indices[i] = *idx;
+ cell_indices[i] = be32_to_cpup(idx);
emac_boot_list[i++] = of_node_get(np);
if (i >= EMAC_BOOT_LIST_SIZE) {
of_node_put(np);
diff --git a/drivers/net/ibm_newemac/mal.c b/drivers/net/ibm_newemac/mal.c
index d5717e2..9e4939e 100644
--- a/drivers/net/ibm_newemac/mal.c
+++ b/drivers/net/ibm_newemac/mal.c
@@ -524,7 +524,7 @@ static int __devinit mal_probe(struct platform_device *ofdev,
int err = 0, i, bd_size;
int index = mal_count++;
unsigned int dcr_base;
- const u32 *prop;
+ const __be32 *prop;
u32 cfg;
unsigned long irqflags;
irq_handler_t hdlr_serr, hdlr_txde, hdlr_rxde;
@@ -550,7 +550,7 @@ static int __devinit mal_probe(struct platform_device *ofdev,
err = -ENODEV;
goto fail;
}
- mal->num_tx_chans = prop[0];
+ mal->num_tx_chans = be32_to_cpu(prop[0]);
prop = of_get_property(ofdev->dev.of_node, "num-rx-chans", NULL);
if (prop == NULL) {
@@ -560,7 +560,7 @@ static int __devinit mal_probe(struct platform_device *ofdev,
err = -ENODEV;
goto fail;
}
- mal->num_rx_chans = prop[0];
+ mal->num_rx_chans = be32_to_cpu(prop[0]);
dcr_base = dcr_resource_start(ofdev->dev.of_node, 0);
if (dcr_base == 0) {
diff --git a/drivers/net/ibm_newemac/mal.h b/drivers/net/ibm_newemac/mal.h
index 6608421..b8ee413 100644
--- a/drivers/net/ibm_newemac/mal.h
+++ b/drivers/net/ibm_newemac/mal.h
@@ -147,9 +147,9 @@ static inline int mal_tx_chunks(int len)
/* MAL Buffer Descriptor structure */
struct mal_descriptor {
- u16 ctrl; /* MAL / Commac status control bits */
- u16 data_len; /* Max length is 4K-1 (12 bits) */
- u32 data_ptr; /* pointer to actual data buffer */
+ __be16 ctrl; /* MAL / Commac status control bits */
+ __be16 data_len; /* Max length is 4K-1 (12 bits) */
+ __be32 data_ptr; /* pointer to actual data buffer */
};
/* the following defines are for the MadMAL status and control registers. */
--
1.7.1
^ permalink raw reply related
* [PATCH V4] fs: allow for more than 2^31 files
From: Eric Dumazet @ 2010-10-01 5:29 UTC (permalink / raw)
To: Robin Holt
Cc: David Miller, dipankar, viro, bcrl, den, mingo, mszeredi, cmm,
npiggin, xemul, linux-kernel, netdev
In-Reply-To: <1285909434.2705.35.camel@edumazet-laptop>
Le vendredi 01 octobre 2010 à 07:03 +0200, Eric Dumazet a écrit :
> Le jeudi 30 septembre 2010 à 23:34 -0500, Robin Holt a écrit :
>
> > The proc_handler used to be proc_nr_files() which would call
> > get_nr_files() and deposit the result in files_stat.nr_files then cascade
> > to proc_dointvec() which would dump the 3 values. Now it will dump the
> > three values, but not update the middle (nr_files) value first.
> >
>
> Ah I get it now, thanks !
>
> I'll send a V4 shortly.
>
>
In this v4, I call proc_nr_files() again, and proc_nr_files() calls
proc_doulongvec_minmax() instead of proc_dointvec()
Added the "cat /proc/sys/fs/file-nr" in Changelog
Thanks again Robin
[PATCH V3] fs: allow for more than 2^31 files
Robin Holt tried to boot a 16TB system and found af_unix was overflowing
a 32bit value :
<quote>
We were seeing a failure which prevented boot. The kernel was incapable
of creating either a named pipe or unix domain socket. This comes down
to a common kernel function called unix_create1() which does:
atomic_inc(&unix_nr_socks);
if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
goto out;
The function get_max_files() is a simple return of files_stat.max_files.
files_stat.max_files is a signed integer and is computed in
fs/file_table.c's files_init().
n = (mempages * (PAGE_SIZE / 1024)) / 10;
files_stat.max_files = n;
In our case, mempages (total_ram_pages) is approx 3,758,096,384
(0xe0000000). That leaves max_files at approximately 1,503,238,553.
This causes 2 * get_max_files() to integer overflow.
</quote>
Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
integers, and change af_unix to use an atomic_long_t instead of
atomic_t.
get_max_files() is changed to return an unsigned long.
get_nr_files() is changed to return a long.
unix_nr_socks is changed from atomic_t to atomic_long_t, while not
strictly needed to address Robin problem.
Before patch (on a 64bit kernel) :
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
-18446744071562067968
After patch:
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
2147483648
# cat /proc/sys/fs/file-nr
704 0 2147483648
Reported-by: Robin Holt <holt@sgi.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
fs/file_table.c | 17 +++++++----------
include/linux/fs.h | 8 ++++----
kernel/sysctl.c | 6 +++---
net/unix/af_unix.c | 14 +++++++-------
4 files changed, 21 insertions(+), 24 deletions(-)
diff --git a/fs/file_table.c b/fs/file_table.c
index a04bdd8..c3dee38 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -60,7 +60,7 @@ static inline void file_free(struct file *f)
/*
* Return the total number of open files in the system
*/
-static int get_nr_files(void)
+static long get_nr_files(void)
{
return percpu_counter_read_positive(&nr_files);
}
@@ -68,7 +68,7 @@ static int get_nr_files(void)
/*
* Return the maximum number of open files in the system
*/
-int get_max_files(void)
+unsigned long get_max_files(void)
{
return files_stat.max_files;
}
@@ -82,7 +82,7 @@ int proc_nr_files(ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
files_stat.nr_files = get_nr_files();
- return proc_dointvec(table, write, buffer, lenp, ppos);
+ return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
}
#else
int proc_nr_files(ctl_table *table, int write,
@@ -105,7 +105,7 @@ int proc_nr_files(ctl_table *table, int write,
struct file *get_empty_filp(void)
{
const struct cred *cred = current_cred();
- static int old_max;
+ static long old_max;
struct file * f;
/*
@@ -140,8 +140,7 @@ struct file *get_empty_filp(void)
over:
/* Ran out of filps - report that */
if (get_nr_files() > old_max) {
- printk(KERN_INFO "VFS: file-max limit %d reached\n",
- get_max_files());
+ pr_info("VFS: file-max limit %lu reached\n", get_max_files());
old_max = get_nr_files();
}
goto fail;
@@ -487,7 +486,7 @@ retry:
void __init files_init(unsigned long mempages)
{
- int n;
+ unsigned long n;
filp_cachep = kmem_cache_create("filp", sizeof(struct file), 0,
SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
@@ -498,9 +497,7 @@ void __init files_init(unsigned long mempages)
*/
n = (mempages * (PAGE_SIZE / 1024)) / 10;
- files_stat.max_files = n;
- if (files_stat.max_files < NR_FILE)
- files_stat.max_files = NR_FILE;
+ files_stat.max_files = max_t(unsigned long, n, NR_FILE);
files_defer_init();
lg_lock_init(files_lglock);
percpu_counter_init(&nr_files, 0);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 63d069b..8c06590 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -34,9 +34,9 @@
/* And dynamically-tunable limits and defaults: */
struct files_stat_struct {
- int nr_files; /* read only */
- int nr_free_files; /* read only */
- int max_files; /* tunable */
+ unsigned long nr_files; /* read only */
+ unsigned long nr_free_files; /* read only */
+ unsigned long max_files; /* tunable */
};
struct inodes_stat_t {
@@ -404,7 +404,7 @@ extern void __init inode_init_early(void);
extern void __init files_init(unsigned long);
extern struct files_stat_struct files_stat;
-extern int get_max_files(void);
+extern unsigned long get_max_files(void);
extern int sysctl_nr_open;
extern struct inodes_stat_t inodes_stat;
extern int leases_enable, lease_break_time;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f88552c..f789a0a 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1352,16 +1352,16 @@ static struct ctl_table fs_table[] = {
{
.procname = "file-nr",
.data = &files_stat,
- .maxlen = 3*sizeof(int),
+ .maxlen = sizeof(files_stat),
.mode = 0444,
.proc_handler = proc_nr_files,
},
{
.procname = "file-max",
.data = &files_stat.max_files,
- .maxlen = sizeof(int),
+ .maxlen = sizeof(files_stat.max_files),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_doulongvec_minmax,
},
{
.procname = "nr_open",
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 0b39b24..3e1d7d1 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -117,7 +117,7 @@
static struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1];
static DEFINE_SPINLOCK(unix_table_lock);
-static atomic_t unix_nr_socks = ATOMIC_INIT(0);
+static atomic_long_t unix_nr_socks;
#define unix_sockets_unbound (&unix_socket_table[UNIX_HASH_SIZE])
@@ -360,13 +360,13 @@ static void unix_sock_destructor(struct sock *sk)
if (u->addr)
unix_release_addr(u->addr);
- atomic_dec(&unix_nr_socks);
+ atomic_long_dec(&unix_nr_socks);
local_bh_disable();
sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
local_bh_enable();
#ifdef UNIX_REFCNT_DEBUG
- printk(KERN_DEBUG "UNIX %p is destroyed, %d are still alive.\n", sk,
- atomic_read(&unix_nr_socks));
+ printk(KERN_DEBUG "UNIX %p is destroyed, %ld are still alive.\n", sk,
+ atomic_long_read(&unix_nr_socks));
#endif
}
@@ -606,8 +606,8 @@ static struct sock *unix_create1(struct net *net, struct socket *sock)
struct sock *sk = NULL;
struct unix_sock *u;
- atomic_inc(&unix_nr_socks);
- if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
+ atomic_long_inc(&unix_nr_socks);
+ if (atomic_long_read(&unix_nr_socks) > 2 * get_max_files())
goto out;
sk = sk_alloc(net, PF_UNIX, GFP_KERNEL, &unix_proto);
@@ -632,7 +632,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock)
unix_insert_socket(unix_sockets_unbound, sk);
out:
if (sk == NULL)
- atomic_dec(&unix_nr_socks);
+ atomic_long_dec(&unix_nr_socks);
else {
local_bh_disable();
sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
^ permalink raw reply related
* Re: [PATCH 1/2] net-next-2.6: SYN retransmits: Rename threshold variable
From: Damian Lukowski @ 2010-10-01 5:22 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20100930.172337.220062330.davem@davemloft.net>
Am Donnerstag, den 30.09.2010, 17:23 -0700 schrieb David Miller:
> Damian please don't do things like this.
No problem. It was just for preventing the merge conflict Stephen
experienced, as I've seen that parameters have changed in net-next-2.6
already.
Damian
> When we make a change in net-2.6, that change is going to propagate into
> net-next-2.6 the next time I do a merge.
>
> And in this case here, the addition of the "syn_set" boolean argument to
> retransmits_timed_out() will happen at that point.
>
> So if anything, you should build on top of the bug fix we put into
> net-2.6 instead of duplicating the change.
>
> Adding the same change in two different ways to net-2.6 and net-next-2.6
> makes the merge a pain in the neck for me and just makes things look
> real confusing.
>
> I'm not applying these two patches, please ask me to merge net-2.6 into
> net-next-2.6 and this way you can code them relative to that.
>
> Thanks!
^ permalink raw reply
* Re: VLAN packets silently dropped in promiscuous mode
From: Eric Dumazet @ 2010-10-01 5:10 UTC (permalink / raw)
To: Jesse Gross; +Cc: Roger Luethi, netdev, Patrick McHardy
In-Reply-To: <AANLkTi=Vdcn7xzJMPxkugvEVy32N7Bp=KVtir6NESnDF@mail.gmail.com>
Le jeudi 30 septembre 2010 à 19:37 -0700, Jesse Gross a écrit :
> That's true. Dropping here seems roughly equivalent to the effects of
> a hardware VLAN filter, which will also not be tracked by a counter,
> so that seems not too bad to me.
>
> The thing that concerns me though is why so many drivers seem to have
> this problem with completely dropping the VLAN header. I know that
> even several of the ones that work now were broken initially and had
> to be fixed. Seeing as the driver drops the VLAN information before
> it gets to the general networking code I don't see a generic fix to
> this as it is currently setup. However, perhaps we could make it so
> that it is harder to get wrong. Something like this:
>
> * Allow vlan_gro_receive() to take a NULL VLAN group and a tag of 0
> (and do the same thing for vlan_hwaccel_rx())
> * Now that the vlan functions can deal with non-VLAN packets, merge
> them into their non-VLAN counterparts.
> * We can now demultiplex between the VLAN/non-VLAN case in core
> networking. This is done anyways, it just prevents every driver from
> needing that code block I copied above and allows us to fix these
> types of problems centrally.
> * Dump the VLAN tag into the SKB and hand off the packet to the
> various consumers: VLAN devices, libpcap, bridge hook (not currently
> done but should be for trunking).
>
> I see a number of advantages of this:
> * Fixes all the problems with cards dropping VLAN headers at once.
> * Avoids having to disable VLAN acceleration when in promiscuous mode
> (good for bridging since it always puts devices in promiscuous mode).
> * Keeps VLAN tag separate until given to ultimate consumer, which
> avoids needing to do header reconstruction as in tg3 unless absolutely
> necessary.
> * Consolidates common driver code in core networking.
This seems very reasonable ;)
I'll add a counter, a core generalization of
commit 8990f468a (net: rx_dropped accounting)
Because we can drop packets _after_ netif_rx() if RPS is in action
anyway.
^ permalink raw reply
* Re: [PATCH V3] fs: allow for more than 2^31 files
From: Eric Dumazet @ 2010-10-01 5:03 UTC (permalink / raw)
To: Robin Holt
Cc: David Miller, dipankar, viro, bcrl, den, mingo, mszeredi, cmm,
npiggin, xemul, linux-kernel, netdev
In-Reply-To: <20101001043413.GN14068@sgi.com>
Le jeudi 30 septembre 2010 à 23:34 -0500, Robin Holt a écrit :
> The proc_handler used to be proc_nr_files() which would call
> get_nr_files() and deposit the result in files_stat.nr_files then cascade
> to proc_dointvec() which would dump the 3 values. Now it will dump the
> three values, but not update the middle (nr_files) value first.
>
Ah I get it now, thanks !
I'll send a V4 shortly.
^ permalink raw reply
* Re: [PATCH V3] fs: allow for more than 2^31 files
From: Robin Holt @ 2010-10-01 4:34 UTC (permalink / raw)
To: Eric Dumazet
Cc: Robin Holt, David Miller, dipankar, viro, bcrl, den, mingo,
mszeredi, cmm, npiggin, xemul, linux-kernel, netdev
In-Reply-To: <1285879545.2705.4.camel@edumazet-laptop>
On Thu, Sep 30, 2010 at 10:45:45PM +0200, Eric Dumazet wrote:
> Le jeudi 30 septembre 2010 à 15:26 -0500, Robin Holt a écrit :
> > On Tue, Sep 28, 2010 at 05:46:51AM +0200, Eric Dumazet wrote:
> > > Le lundi 27 septembre 2010 à 15:36 -0700, David Miller a écrit :
> > ...
> >
> > > Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
> > > integers, and change af_unix to use an atomic_long_t instead of
> > > atomic_t.
> > >
> > > get_max_files() is changed to return an unsigned long.
> >
> > I _THINK_ we actually want get_max_files to return a long and have
> > the files_stat_struct definitions be longs. If we do not have it that
> > way, we could theoretically open enough files on a single cpu to make
> > get_nr_files return a negative without overflowing max_files. That,
> > of course, would require an insane amount of memory, but I think it is
> > technically more correct.
> >
>
> Number of opened file is technically a positive (or null) value, I have
> no idea why you want it being signed.
>
> >
> > > --- a/kernel/sysctl.c
> > > +++ b/kernel/sysctl.c
> > > @@ -1352,16 +1352,16 @@ static struct ctl_table fs_table[] = {
> > > {
> > > .procname = "file-nr",
> > > .data = &files_stat,
> > > - .maxlen = 3*sizeof(int),
> > > + .maxlen = sizeof(files_stat),
> > > .mode = 0444,
> > > - .proc_handler = proc_nr_files,
> > > + .proc_handler = proc_doulongvec_minmax,
> >
> > With this change, don't we lose the current nr_files value? I think
> > you need proc_nr_files to stay as it was. If you disagree, we should
> > probably eliminate the definitions for proc_nr_files as I don't believe
> > they are used anywhere else.
> >
>
> I have no idea why you think I changed something. I only made the value
> use 64bit on 64bit arches, so that we are not anymore limited to 2^31
> files.
The proc_handler used to be proc_nr_files() which would call
get_nr_files() and deposit the result in files_stat.nr_files then cascade
to proc_dointvec() which would dump the 3 values. Now it will dump the
three values, but not update the middle (nr_files) value first.
Robin
^ permalink raw reply
* Re: [PATCH net-next 2/2] ipv4: rcu conversion in ip_route_output_slow
From: David Miller @ 2010-10-01 4:17 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1285853638.2615.520.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 30 Sep 2010 15:33:58 +0200
> ip_route_output_slow() is enclosed in an rcu_read_lock() protected
> section, so that no references are taken/released on device, thanks to
> __ip_dev_find() & dev_get_by_index_rcu()
>
> Tested with ip route cache disabled, and a stress test :
...
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Also applied, thanks!
^ permalink raw reply
* Re: PATCH net-next 1/2] ipv4: introduce __ip_dev_find()
From: David Miller @ 2010-10-01 4:17 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1285853516.2615.515.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 30 Sep 2010 15:31:56 +0200
> ip_dev_find(net, addr) finds a device given an IPv4 source address and
> takes a reference on it.
>
> Introduce __ip_dev_find(), taking a third argument, to optionally take
> the device reference. Callers not asking the reference to be taken
> should be in an rcu_read_lock() protected section.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied.
^ permalink raw reply
* Re: [net-next-2.6 PATCH 3/3] e1000e: 82579 performance improvements
From: David Miller @ 2010-10-01 4:17 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, bruce.w.allan
In-Reply-To: <20100930073934.13378.44230.stgit@localhost.localdomain>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 30 Sep 2010 00:39:37 -0700
> From: Bruce Allan <bruce.w.allan@intel.com>
>
> The initial support for 82579 was tuned poorly for performance. Adjust the
> packet buffer allocation appropriately for both standard and jumbo frames;
> and for jumbo frames increase the receive descriptor pre-fetch, disable
> adaptive interrupt moderation and set the DMA latency tolerance.
>
> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Applied.
^ permalink raw reply
* Re: [net-next-2.6 PATCH 1/3] e1000e: use hardware writeback batching
From: David Miller @ 2010-10-01 4:16 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, jesse.brandeburg
In-Reply-To: <20100930073814.13378.4212.stgit@localhost.localdomain>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 30 Sep 2010 00:38:49 -0700
> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
>
> Most e1000e parts support batching writebacks. The problem with this is
> that when some of the TADV or TIDV timers are not set, Tx can sit forever.
>
> This is solved in this patch with write flushes using the Flush Partial
> Descriptors (FPD) bit in TIDV and RDTR.
>
> This improves bus utilization and removes partial writes on e1000e,
> particularly from 82571 parts in S5500 chipset based machines.
>
> Only ES2LAN and 82571/2 parts are included in this optimization, to reduce
> testing load.
>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Applied.
^ permalink raw reply
* Re: [net-next-2.6 PATCH] ixgbe: fix link issues and panic with shared interrupts for 82598
From: David Miller @ 2010-10-01 4:16 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, gospo, bphilips, emil.s.tantilov
In-Reply-To: <20100930073251.12750.67720.stgit@localhost.localdomain>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Thu, 30 Sep 2010 00:35:23 -0700
> From: Emil Tantilov <emil.s.tantilov@intel.com>
>
> Fix possible panic/hang with shared Legacy interrupts by not enabling
> interrupts when interface is down.
>
> Also fixes an intermittent link by enabling LSC upon exit from ixgbe_intr()
>
> This patch adds flags to ixgbe_irq_enable() to allow for some flexibility
> when enabling interrupts.
>
> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
> Tested-by: Stephen Ko <stephen.s.ko@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next] ipv4: __mkroute_output() speedup
From: David Miller @ 2010-10-01 4:16 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1285797230.5211.173.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 29 Sep 2010 23:53:50 +0200
> While doing stress tests with a disabled IP route cache, I found
> __mkroute_output() was touching three times in_device atomic refcount.
>
> Use RCU to touch it once to reduce cache line ping pongs.
...
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Applied.
^ permalink raw reply
* Re: linux-next: manual merge of the net tree with the net-current tree
From: Jerry Chu @ 2010-10-01 3:27 UTC (permalink / raw)
To: Stephen Rothwell
Cc: David Miller, netdev, linux-next, linux-kernel, Damian Lukowski
In-Reply-To: <20101001124830.9c35d36f.sfr@canb.auug.org.au>
In tcp_write_timeout():
if (retransmits_timed_out(sk, retry_until,
(1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV) ? 0 :
icsk->icsk_user_timeout, syn_set)) {
should be simplified to
if (retransmits_timed_out(sk, retry_until,
syn_set ? 0 : icsk->icsk_user_timeout, syn_set)) {
Thanks,
Jerry
On Thu, Sep 30, 2010 at 7:48 PM, Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> Hi all,
>
> Today's linux-next merge of the net tree got a conflict in
> net/ipv4/tcp_timer.c between commit
> 4d22f7d372f5769c6c0149e427ed6353e2dcfe61 ("net-2.6: SYN retransmits: Add
> new parameter to retransmits_timed_out()") from the net-current tree and
> commit dca43c75e7e545694a9dd6288553f55c53e2a3a3 ("tcp: Add
> TCP_USER_TIMEOUT socket option") from the net tree.
>
> I fixed it up (see below) and can carry the fix as necessary.
> --
> Cheers,
> Stephen Rothwell sfr@canb.auug.org.au
>
> diff --cc net/ipv4/tcp_timer.c
> index 74c54b3,baea4a1..0000000
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@@ -140,11 -139,9 +140,11 @@@ static void tcp_mtu_probing(struct inet
> */
> static bool retransmits_timed_out(struct sock *sk,
> unsigned int boundary,
> - unsigned int timeout)
> ++ unsigned int timeout,
> + bool syn_set)
> {
> - unsigned int timeout, linear_backoff_thresh;
> - unsigned int start_ts;
> + unsigned int linear_backoff_thresh, start_ts;
> + unsigned int rto_base = syn_set ? TCP_TIMEOUT_INIT : TCP_RTO_MIN;
>
> if (!inet_csk(sk)->icsk_retransmits)
> return false;
> @@@ -154,14 -151,15 +154,16 @@@
> else
> start_ts = tcp_sk(sk)->retrans_stamp;
>
> - linear_backoff_thresh = ilog2(TCP_RTO_MAX/rto_base);
> + if (likely(timeout == 0)) {
> - linear_backoff_thresh = ilog2(TCP_RTO_MAX/TCP_RTO_MIN);
> ++ linear_backoff_thresh = ilog2(TCP_RTO_MAX/rto_base);
>
> - if (boundary <= linear_backoff_thresh)
> - timeout = ((2 << boundary) - 1) * rto_base;
> - else
> - timeout = ((2 << linear_backoff_thresh) - 1) * rto_base +
> - (boundary - linear_backoff_thresh) * TCP_RTO_MAX;
> + if (boundary <= linear_backoff_thresh)
> - timeout = ((2 << boundary) - 1) * TCP_RTO_MIN;
> ++ timeout = ((2 << boundary) - 1) * rto_base;
> + else
> - timeout = ((2 << linear_backoff_thresh) - 1) * TCP_RTO_MIN +
> ++ timeout = ((2 << linear_backoff_thresh) - 1) * rto_base +
> + (boundary - linear_backoff_thresh) * TCP_RTO_MAX;
> +
> + }
> return (tcp_time_stamp - start_ts) >= timeout;
> }
>
> @@@ -176,9 -174,8 +178,9 @@@ static int tcp_write_timeout(struct soc
> if (icsk->icsk_retransmits)
> dst_negative_advice(sk);
> retry_until = icsk->icsk_syn_retries ? : sysctl_tcp_syn_retries;
> + syn_set = 1;
> } else {
> -- if (retransmits_timed_out(sk, sysctl_tcp_retries1, 0)) {
> ++ if (retransmits_timed_out(sk, sysctl_tcp_retries1, 0, 0)) {
> /* Black hole detection */
> tcp_mtu_probing(icsk, sk);
>
> @@@ -191,14 -188,16 +193,16 @@@
>
> retry_until = tcp_orphan_retries(sk, alive);
> do_reset = alive ||
> -- !retransmits_timed_out(sk, retry_until, 0);
> ++ !retransmits_timed_out(sk, retry_until, 0, 0);
>
> if (tcp_out_of_resources(sk, do_reset))
> return 1;
> }
> }
>
> - if (retransmits_timed_out(sk, retry_until, syn_set)) {
> + if (retransmits_timed_out(sk, retry_until,
> + (1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV) ? 0 :
> - icsk->icsk_user_timeout)) {
> ++ icsk->icsk_user_timeout, syn_set)) {
> /* Has it gone just too far? */
> tcp_write_err(sk);
> return 1;
> @@@ -440,7 -439,7 +444,7 @@@ out_reset_timer
> icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
> }
> inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX);
> -- if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1, 0))
> ++ if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1, 0, 0))
> __sk_dst_reset(sk);
>
> out:;
>
^ permalink raw reply
* linux-next: manual merge of the net tree with the net-current tree
From: Stephen Rothwell @ 2010-10-01 2:48 UTC (permalink / raw)
To: David Miller, netdev
Cc: linux-next, linux-kernel, Dmitry Kozlov, Eric Dumazet
Hi all,
Today's linux-next merge of the net tree got a conflict in
net/ipv4/Kconfig between commit 68c1f3a96c32a4fe15ebadae45c8145a5e5a66d2
("ip_gre: Fix dependencies wrt. ipv6") from the net-current tree and
commit 00959ade36acadc00e757f87060bf6e4501d545f ("PPTP: PPP over IPv4
(Point-to-Point Tunneling Protocol)") from the net tree.
Just overlapping additions. I fixed it up (see below) and can carry the
fix as necessary.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
diff --cc net/ipv4/Kconfig
index 72380a3,5462e2d..0000000
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@@ -215,9 -215,15 +215,16 @@@ config NET_IPI
be inserted in and removed from the running kernel whenever you
want). Most people won't need this and can say N.
+ config NET_IPGRE_DEMUX
+ tristate "IP: GRE demultiplexer"
+ help
+ This is helper module to demultiplex GRE packets on GRE version field criteria.
+ Required by ip_gre and pptp modules.
+
config NET_IPGRE
tristate "IP: GRE tunnels over IP"
+ depends on IPV6 || IPV6=n
+ depends on NET_IPGRE_DEMUX
help
Tunneling means encapsulating data of one protocol type within
another protocol and sending it over a channel that understands the
^ permalink raw reply
* linux-next: manual merge of the net tree with the net-current tree
From: Stephen Rothwell @ 2010-10-01 2:48 UTC (permalink / raw)
To: David Miller, netdev; +Cc: linux-next, linux-kernel, Damian Lukowski, Jerry Chu
Hi all,
Today's linux-next merge of the net tree got a conflict in
net/ipv4/tcp_timer.c between commit
4d22f7d372f5769c6c0149e427ed6353e2dcfe61 ("net-2.6: SYN retransmits: Add
new parameter to retransmits_timed_out()") from the net-current tree and
commit dca43c75e7e545694a9dd6288553f55c53e2a3a3 ("tcp: Add
TCP_USER_TIMEOUT socket option") from the net tree.
I fixed it up (see below) and can carry the fix as necessary.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
diff --cc net/ipv4/tcp_timer.c
index 74c54b3,baea4a1..0000000
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@@ -140,11 -139,9 +140,11 @@@ static void tcp_mtu_probing(struct inet
*/
static bool retransmits_timed_out(struct sock *sk,
unsigned int boundary,
- unsigned int timeout)
++ unsigned int timeout,
+ bool syn_set)
{
- unsigned int timeout, linear_backoff_thresh;
- unsigned int start_ts;
+ unsigned int linear_backoff_thresh, start_ts;
+ unsigned int rto_base = syn_set ? TCP_TIMEOUT_INIT : TCP_RTO_MIN;
if (!inet_csk(sk)->icsk_retransmits)
return false;
@@@ -154,14 -151,15 +154,16 @@@
else
start_ts = tcp_sk(sk)->retrans_stamp;
- linear_backoff_thresh = ilog2(TCP_RTO_MAX/rto_base);
+ if (likely(timeout == 0)) {
- linear_backoff_thresh = ilog2(TCP_RTO_MAX/TCP_RTO_MIN);
++ linear_backoff_thresh = ilog2(TCP_RTO_MAX/rto_base);
- if (boundary <= linear_backoff_thresh)
- timeout = ((2 << boundary) - 1) * rto_base;
- else
- timeout = ((2 << linear_backoff_thresh) - 1) * rto_base +
- (boundary - linear_backoff_thresh) * TCP_RTO_MAX;
+ if (boundary <= linear_backoff_thresh)
- timeout = ((2 << boundary) - 1) * TCP_RTO_MIN;
++ timeout = ((2 << boundary) - 1) * rto_base;
+ else
- timeout = ((2 << linear_backoff_thresh) - 1) * TCP_RTO_MIN +
++ timeout = ((2 << linear_backoff_thresh) - 1) * rto_base +
+ (boundary - linear_backoff_thresh) * TCP_RTO_MAX;
+
+ }
return (tcp_time_stamp - start_ts) >= timeout;
}
@@@ -176,9 -174,8 +178,9 @@@ static int tcp_write_timeout(struct soc
if (icsk->icsk_retransmits)
dst_negative_advice(sk);
retry_until = icsk->icsk_syn_retries ? : sysctl_tcp_syn_retries;
+ syn_set = 1;
} else {
-- if (retransmits_timed_out(sk, sysctl_tcp_retries1, 0)) {
++ if (retransmits_timed_out(sk, sysctl_tcp_retries1, 0, 0)) {
/* Black hole detection */
tcp_mtu_probing(icsk, sk);
@@@ -191,14 -188,16 +193,16 @@@
retry_until = tcp_orphan_retries(sk, alive);
do_reset = alive ||
-- !retransmits_timed_out(sk, retry_until, 0);
++ !retransmits_timed_out(sk, retry_until, 0, 0);
if (tcp_out_of_resources(sk, do_reset))
return 1;
}
}
- if (retransmits_timed_out(sk, retry_until, syn_set)) {
+ if (retransmits_timed_out(sk, retry_until,
+ (1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV) ? 0 :
- icsk->icsk_user_timeout)) {
++ icsk->icsk_user_timeout, syn_set)) {
/* Has it gone just too far? */
tcp_write_err(sk);
return 1;
@@@ -440,7 -439,7 +444,7 @@@ out_reset_timer
icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
}
inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX);
-- if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1, 0))
++ if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1, 0, 0))
__sk_dst_reset(sk);
out:;
^ permalink raw reply
* Re: VLAN packets silently dropped in promiscuous mode
From: Jesse Gross @ 2010-10-01 2:37 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Roger Luethi, netdev, Patrick McHardy
In-Reply-To: <1285884253.2705.25.camel@edumazet-laptop>
On Thu, Sep 30, 2010 at 3:04 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 30 septembre 2010 à 14:21 -0700, Jesse Gross a écrit :
>> On Thu, Sep 30, 2010 at 1:07 AM, Roger Luethi <rl@hellgate.ch> wrote:
>> > On Wed, 29 Sep 2010 10:44:26 -0700, Jesse Gross wrote:
>> >> On Wed, Sep 29, 2010 at 4:37 AM, Roger Luethi <rl@hellgate.ch> wrote:
>> >> > I noticed packets for unknown VLANs getting silently dropped even in
>> >> > promiscuous mode (this is true only for the hardware accelerated path).
>> >> > netif_nit_deliver was introduced specifically to prevent that, but the
>> >> > function gets called only _after_ packets from unknown VLANs have been
>> >> > dropped.
>> >>
>> >> Some drivers are fixing this on a case by case basis by disabling
>> >> hardware accelerated VLAN stripping when in promiscuous mode, i.e.:
>> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5f6c01819979afbfec7e0b15fe52371b8eed87e8
>> >>
>> >> However, at this point it is more or less random which drivers do
>> >> this. It would obviously be much better if it were consistent.
>> >
>> > My understanding is this. Hardware VLAN tagging and stripping can always be
>> > enabled. The kernel passes 802.1Q information along with the stripped
>> > header to libpcap which reassembles the original header where necessary.
>> > Works for me.
>>
>> Sorry, I misread your original post as saying that the VLAN header
>> gets dropped, rather than the entire packet. I agree that this is how
>> it should work but not necessarily how it does work (again, depending
>> on the driver). Here's the problem that I was talking about:
>>
>> Most drivers have a snippet of code that looks something like this
>> (taken from ixgbe):
>>
>> if (adapter->vlgrp && is_vlan && (tag & VLAN_VID_MASK))
>> vlan_gro_receive(napi, adapter->vlgrp, tag, skb);
>> else
>> napi_gro_receive(napi, skb);
>>
>> At this point the VLAN has already been stripped in hardware. If
>> there is no VLAN group configured on the device then we hit the second
>> case. The VLAN header was removed from the SKB and the tag variable
>> is unused. It is no longer possible for libpcap to reconstruct the
>> header because the information was thrown away (even the fact that
>> there was a VLAN tag at all).
>>
>> There are a couple ways to fix this:
>>
>> * Turn off VLAN stripping when in promiscuous mode (as done by the ixgbe driver)
>> * Reconstruct the VLAN header when there is no VLAN group (as done by
>> the tg3 driver)
>>
>> A bunch of drivers do neither (bnx2x, for example) and exhibit this
>> problem. It's getting better but it seems like a common issue.
>
> tg3 is not perfect, because it does the reconstruction of VLAN header
> even if device is not in promiscuous mode.
>
> It could drop the frame instead.
>
> I wonder which SNMP counter is incremented in this case.
>
> Apparently, none :(
That's true. Dropping here seems roughly equivalent to the effects of
a hardware VLAN filter, which will also not be tracked by a counter,
so that seems not too bad to me.
The thing that concerns me though is why so many drivers seem to have
this problem with completely dropping the VLAN header. I know that
even several of the ones that work now were broken initially and had
to be fixed. Seeing as the driver drops the VLAN information before
it gets to the general networking code I don't see a generic fix to
this as it is currently setup. However, perhaps we could make it so
that it is harder to get wrong. Something like this:
* Allow vlan_gro_receive() to take a NULL VLAN group and a tag of 0
(and do the same thing for vlan_hwaccel_rx())
* Now that the vlan functions can deal with non-VLAN packets, merge
them into their non-VLAN counterparts.
* We can now demultiplex between the VLAN/non-VLAN case in core
networking. This is done anyways, it just prevents every driver from
needing that code block I copied above and allows us to fix these
types of problems centrally.
* Dump the VLAN tag into the SKB and hand off the packet to the
various consumers: VLAN devices, libpcap, bridge hook (not currently
done but should be for trunking).
I see a number of advantages of this:
* Fixes all the problems with cards dropping VLAN headers at once.
* Avoids having to disable VLAN acceleration when in promiscuous mode
(good for bridging since it always puts devices in promiscuous mode).
* Keeps VLAN tag separate until given to ultimate consumer, which
avoids needing to do header reconstruction as in tg3 unless absolutely
necessary.
* Consolidates common driver code in core networking.
^ permalink raw reply
* Re: pull-request: bluetooth-2.6 2010-09-27
From: Gustavo F. Padovan @ 2010-10-01 1:22 UTC (permalink / raw)
To: David Miller
Cc: linville-2XuSBdqkA4R54TAoqtyWWQ, marcel-kz+m5ild9QBg9hUCZPvPmw,
linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20100930.172657.123994559.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
Hi Dave,
* David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> [2010-09-30 17:26:57 -0700]:
> From: "Gustavo F. Padovan" <padovan-Y3ZbgMPKUGA34EUeqzHoZw@public.gmane.org>
> Date: Tue, 28 Sep 2010 19:49:41 -0300
>
> > Actually sk_stream_wait_memory is another point why it's safe to release
> > the lock and block waiting for memory. We've been doing that safely in
> > protocols like TCP, SCTP and DCCP for a long time.
>
> Do you notice what TCP does when sk_stream_wait_memory() returns?
>
> It reloads all volatile state that might have changed in the socket
> while the lock was dropped.
>
> For example, TCP will reload the current MSS that can change
> asynchronously while we don't have the socket lock.
I got your point. And what I tried to say in the last e-mail is that
ERTM doesn't have such volatile states that need to restore after get
the lock back. The others code path it affect are very simple and also
doesn't have such problem. So we are safe against asynchronous changes.
We obvious have volatiles states, but the code paths where
bt_skb_send_alloc() is used doesn't rely on that states. I'm seeing no
problem on release the lock, alloc memory, and lock it again.
--
Gustavo F. Padovan
ProFUSION embedded systems - http://profusion.mobi
^ permalink raw reply
* Re: Regression (ancient), bisected: TCP hangs with certain ESP6 SA.
From: David Miller @ 2010-10-01 1:17 UTC (permalink / raw)
To: nbowler; +Cc: linux-kernel, netdev, herbert, eric.dumazet
In-Reply-To: <20100929142213.GA26031@elliptictech.com>
From: Nick Bowler <nbowler@elliptictech.com>
Date: Wed, 29 Sep 2010 10:22:13 -0400
> b5c15fc004ac83b7ad280acbe0fd4bbed7e2c8d4 is the first bad commit
> commit b5c15fc004ac83b7ad280acbe0fd4bbed7e2c8d4
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Thu Feb 14 23:49:37 2008 -0800
>
> [IPV6]: Fix reversed local_df test in ip6_fragment
>
> I managed to reverse the local_df test when forward-porting this
> patch so it actually makes things worse by never fragmenting at
> all.
>
> Thanks to David Stevens for testing and reporting this bug.
>
> Bill Fink pointed out that the local_df setting is also the wrong
> way around.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> Signed-off-by: David S. Miller <davem@davemloft.net>
I suspect that Herbert's change is correct, it's just that for some
reason PMTU doesn't work correctly with IPV6 for whatever reason.
That matches with your observed behavior that ping and UDP stuff
works just fine, and it's just TCP with certain ESP6 transport mode
settings.
^ permalink raw reply
* Re: VLAN packets silently dropped in promiscuous mode
From: David Miller @ 2010-10-01 1:04 UTC (permalink / raw)
To: eric.dumazet; +Cc: kaber, rl, jesse, netdev
In-Reply-To: <1285849004.2615.394.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 30 Sep 2010 14:16:44 +0200
> [PATCH] vlan: dont drop packets from unknown vlans in promiscuous mode
>
> Roger Luethi noticed packets for unknown VLANs getting silently dropped
> even in promiscuous mode.
>
> Check for promiscuous mode in __vlan_hwaccel_rx() and vlan_gro_common()
> before drops.
>
> As suggested by Patrick, mark such packets to have skb->pkt_type set to
> PACKET_OTHERHOST to make sure they are dropped by IP stack.
>
> Reported-by: Roger Luethi <rl@hellgate.ch>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Patrick McHardy <kaber@trash.net>
Applied and queued up for -stable, thanks everyone!
^ permalink raw reply
* Re: [PATCH] Phonet: restore flow control credits when sending fails
From: David Miller @ 2010-10-01 0:57 UTC (permalink / raw)
To: kumar.sanghvi
Cc: netdev, remi.denis-courmont, eric.dumazet, gulshan.karmani,
linus.walleij
In-Reply-To: <1285835630-930-1-git-send-email-kumar.sanghvi@stericsson.com>
From: Kumar A Sanghvi <kumar.sanghvi@stericsson.com>
Date: Thu, 30 Sep 2010 14:03:50 +0530
> From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
>
> This patch restores the below flow control patch submitted by Rémi
> Denis-Courmont, which accidentaly got lost due to Pipe controller patch
> on Phonet.
>
> commit 1a98214feef2221cd7c24b17cd688a5a9d85b2ea
> Author: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
> Date: Mon Aug 30 12:57:03 2010 +0000
>
> Phonet: restore flow control credits when sending fails
>
> Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
> Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Applied, thank you.
^ permalink raw reply
* Re: [PATCH 0/2] qcusbnet: Cleanups
From: Joe Perches @ 2010-10-01 0:45 UTC (permalink / raw)
To: David Miller; +Cc: ellyjones, netdev, dbrownell, mjg59, jglasgow, msb, olofj
In-Reply-To: <20100930.173338.149831717.davem@davemloft.net>
On Thu, 2010-09-30 at 17:33 -0700, David Miller wrote:
> From: Joe Perches <joe@perches.com>
> Date: Tue, 28 Sep 2010 19:39:56 -0700
> > Perhaps some of these cleanups are in order?
> I don't see this driver in any of my trees, so someone else
> should be taking this in it seems.
These cleanups are meant for Elly Jones on top
of the Qualcomm Gobi 2000 driver she submitted.
http://patchwork.ozlabs.org/patch/66006/
^ permalink raw reply
* Re: [PATCH linux-2.6 v2] IPv6: Create temporary address if none exists.
From: David Miller @ 2010-10-01 0:42 UTC (permalink / raw)
To: brian.haley
Cc: gwurster, kuznet, pekkas, jmorris, yoshfuji, kaber, shemminger,
eric.dumazet, herbert, ebiederm, netdev, linux-kernel
In-Reply-To: <4CA35084.8010503@hp.com>
From: Brian Haley <brian.haley@hp.com>
Date: Wed, 29 Sep 2010 10:43:16 -0400
> From what I have found, this is fixing the case where we've changed
> use_tempaddr to 1 on an interface that already has a "stable" IPv6
> prefix. In that case you'll never add a temporary address:
We should have enough information to instantiate the temporary address
when the syscal value is enabled. So I would prefer if we fixed it
that way.
^ permalink raw reply
* Re: [PATCH net-next-2.6] be2net: add multiple RX queue support
From: David Miller @ 2010-10-01 0:39 UTC (permalink / raw)
To: sathya.perla; +Cc: netdev
In-Reply-To: <20100929120113.GA16206@emulex.com>
From: Sathya Perla <sathya.perla@emulex.com>
Date: Wed, 29 Sep 2010 17:31:13 +0530
> @@ -78,6 +78,13 @@ static inline char *nic_name(struct pci_dev *pdev)
> #define MCC_Q_LEN 128 /* total size not to exceed 8 pages */
> #define MCC_CQ_LEN 256
>
> +#ifdef CONFIG_PPC64 /* ppc platforms support only max of */
> +#define NUM_RSS_QS 2 /* 4 msix vectors per pci function */
> +#else
> +#define NUM_RSS_QS 4 /* BE limit is 4 queues/port */
> +#endif
If the first hunk I see in a patch is something like this, it is
not a good sign.
This is something you need to discover dynamically, and the MSI-X
vector enable functions in the kernel allow you to do this just fine.
Look at what other drivers do, they have a specific number of vectora
they try to obtain using pci_enable_msix(), and if that fails they
decrease the vector count until they are able to succeed.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox