* RE: [PATCH v13 10/16] Add a hook to intercept external buffers from NIC driver.
From: Xin, Xiaohui @ 2010-10-27 1:33 UTC (permalink / raw)
To: David Miller
Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, mst@redhat.com, mingo@elte.hu,
herbert@gondor.apana.org.au, jdike@linux.intel.com
In-Reply-To: <20101019.082401.28817543.davem@davemloft.net>
>-----Original Message-----
>From: David Miller [mailto:davem@davemloft.net]
>Sent: Tuesday, October 19, 2010 11:24 PM
>To: Xin, Xiaohui
>Cc: netdev@vger.kernel.org; kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
>mst@redhat.com; mingo@elte.hu; herbert@gondor.apana.org.au; jdike@linux.intel.com
>Subject: Re: [PATCH v13 10/16] Add a hook to intercept external buffers from NIC driver.
>
>From: xiaohui.xin@intel.com
>Date: Fri, 15 Oct 2010 17:12:11 +0800
>
>> @@ -2891,6 +2922,11 @@ static int __netif_receive_skb(struct sk_buff *skb)
>> ncls:
>> #endif
>>
>> + /* To intercept mediate passthru(zero-copy) packets here */
>> + skb = handle_mpassthru(skb, &pt_prev, &ret, orig_dev);
>> + if (!skb)
>> + goto out;
>> +
>> /* Handle special case of bridge or macvlan */
>> rx_handler = rcu_dereference(skb->dev->rx_handler);
>> if (rx_handler) {
>
>If you consume the packet here, devices in passthru mode cannot
>be use with bonding.
>
>But there is nothing that prevents a bond being created with such
>a device.
>
>So we have to either prevent such configurations (bad) or make
>it work somehow (good) :-)
The big picture may like this:
To prevent such configurations, we should add code to check in both
mp and bonding driver. If a nic is in zero-copy mode , bonding can't
be made with it, and if nic is in bonding mode, we can't bind the device
to do zero-copy.
If we want to support such configurations, it also has some constraints.
If bonding is created first, we need code to check if all the slaves support
zero-copy mode, and if yes, all the slaves should be assigned a same
page_ctor(), all the packets received should be intercepted with master nic.
If not, fails.
If zero-copy is enabled first, bonding created with it should fail.
Somehow, it seems not a trivial work to support it now. Can we support it
later and as a todo with our current work?
Thanks
Xiaohui
^ permalink raw reply
* business Proposal
From: jpoon @ 2010-10-27 1:33 UTC (permalink / raw)
I am still waiting to hear from you. I want to know if you got my
business proposal.
Joseph Poon
^ permalink raw reply
* Re: [RFC][net-next-2.6 PATCH v2] 8021q: set hard_header_len when VLAN offload features are toggled
From: Jesse Gross @ 2010-10-27 2:05 UTC (permalink / raw)
To: John Fastabend; +Cc: netdev, bhutchings
In-Reply-To: <20101026215933.2339.45454.stgit@jf-dev1-dcblab>
On Tue, Oct 26, 2010 at 2:59 PM, John Fastabend
<john.r.fastabend@intel.com> wrote:
> Toggling the vlan tx|rx hw offloads needs to set the hard_header_len
> as well otherwise we end up using LL_RESERVED_SPACE incorrectly.
> This results in pskb_expand_head() being used unnecessarily.
>
> This add a check in vlan_transfer_features to catch the ETH_FLAG_TXVLAN
> flag and set the header length. This requires drivers to add the
> ETH_FLAG_TXVLAN to vlan_features.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
I think this addresses all of the original problems. However, I don't
think that we want to have drivers claim to support vlan offloading as
a feature for vlan packets. That implies some type of QinQ
functionality to me. In addition, if the vlan device claims to
support offloading and a second vlan device is stacked on top of it,
then the two will clobber skb->vlan_tci. It's probably simpler to
just keep track of whether vlan offloading is currently enabled so we
can find out whether it changed.
> ---
>
> net/8021q/vlan.c | 10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index 05b867e..825011b 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -334,6 +334,16 @@ static void vlan_transfer_features(struct net_device *dev,
> vlandev->features &= ~dev->vlan_features;
> vlandev->features |= dev->features & dev->vlan_features;
> vlandev->gso_max_size = dev->gso_max_size;
> +
> + /* is ETH_FLAGS_TXVLAN being toggled */
> + if ((vlandev->features & ETH_FLAG_TXVLAN) ^
> + (old_features & ETH_FLAG_TXVLAN)) {
> + if (vlandev->features & ETH_FLAG_TXVLAN)
> + vlandev->hard_header_len -= VLAN_HLEN;
> + else
> + vlandev->hard_header_len += VLAN_HLEN;
> + }
The correct flag for dev->features is NETIF_F_HW_VLAN_TX.
ETH_FLAGS_TXVLAN is an ethtool construct (that happens to have the
same value).
Thanks.
^ permalink raw reply
* Re: [RFC][net-next-2.6 PATCH 4/4] net: remove check for headroom in vlan_dev_create
From: Jesse Gross @ 2010-10-27 2:07 UTC (permalink / raw)
To: John Fastabend; +Cc: netdev@vger.kernel.org
In-Reply-To: <4CC750B8.7060607@intel.com>
On Tue, Oct 26, 2010 at 3:05 PM, John Fastabend
<john.r.fastabend@intel.com> wrote:
> On 10/25/2010 3:45 PM, Jesse Gross wrote:
>> On Thu, Oct 21, 2010 at 3:10 PM, John Fastabend
>> <john.r.fastabend@intel.com> wrote:
>>> It is possible for the headroom to be smaller then the
>>> hard_header_len for a short period of time after toggling
>>> the vlan offload setting.
>>>
>>> This is not a hard error and skb_cow_head is called in
>>> __vlan_put_tag() to resolve this.
>>>
>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>
>> How is it possible that the hard_header_len changes on the vlan
>> device? It looks like the header length never gets changed after it
>> is initialized. There's no set_flags method in the vlan device to
>> toggle whether it is using offloading or not, it just rides on top of
>> the underlying device.
>
> Your right and I think this is why my previous patch was broken. If we
> can toggle the underlying offloads we should set the header length as
> well. With the updated patch I just sent this should be true now.
OK, it makes sense it that context.
Acked-by: Jesse Gross <jesse@nicira.com>
Thanks.
^ permalink raw reply
* Re: [PATCH] ipv6: addrconf: clear IPv6 addresses and routes when losing link
From: Brian Haley @ 2010-10-27 2:31 UTC (permalink / raw)
To: Lorenzo Colitti; +Cc: netdev
In-Reply-To: <AANLkTikC4pv8aOODM2pOg2bKQGL69wivcUU3f9ZziPhe@mail.gmail.com>
Hi Lorenzo,
On 10/25/2010 10:08 PM, Lorenzo Colitti wrote:
> When roaming between different networks (e.g., changing wireless
> SSIDs, or plugging in to different wired networks), IPv6 addresses and
> routes are not cleared. If the two networks have different IPv6
> subnets assigned, the host maintains both the old and new IPv6
> addresses and gateways, but only the new ones works. If the host
> chooses the wrong source address or gateway, or if the new network
> does not have IPv6 but the old one did, IPv6 connections time out,
> leading to long delays when trying to connect to IPv6 hosts.
>
> Fix this by ensuring that autoconfigured IPv6 addresses and routes are
> purged when link is lost, not only when the interface goes down.
>
> Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
>
> --- a/net/ipv6/addrconf.c 2010-10-20 13:30:22.000000000 -0700
> +++ b/net/ipv6/addrconf.c 2010-10-25 13:55:15.000000000 -0700
> @@ -2524,6 +2524,14 @@
> } else {
> if (!addrconf_qdisc_ok(dev)) {
> /* device is still not ready. */
> + if (idev && (idev->if_flags & IF_READY)) {
> + /* Link lost. Clear addresses and
> + routes, the device might come back
> + on a link where they are no longer
> + valid. */
> + addrconf_ifdown(dev, 0);
> + idev->if_flags &= ~IF_READY;
> + }
Just taking another look at this, you don't need that ~IF_READY line,
addrconf_ifdown() is already doing that when how==0.
Could you give my previous patch a try? I believe it will work the same
way as yours, but also fixes a case where DAD is started twice for some
addresses.
Thanks,
-Brian
^ permalink raw reply
* [PATCH 1/2 v4] xps: Improvements in TX queue selection
From: Tom Herbert @ 2010-10-27 3:38 UTC (permalink / raw)
To: davem, netdev; +Cc: eric.dumazet
In dev_pick_tx, don't do work in calculating queue index or setting
the index in the sock unless the device has more than one queue. This
allows the sock to be set only with a queue index of a multi-queue
device which is desirable if device are stacked like in a tunnel.
We also allow the mapping of a socket to queue to be changed. To
maintain in order packet transmission a flag (ooo_okay) has been
added to the sk_buff structure. If a transport layer sets this flag
on a packet, the transmit queue can be changed for the socket.
Presumably, the transport would set this if there was no possbility
of creating OOO packets (for instance, there are no packets in flight
for the socket). This patch includes the modification in TCP output
for setting this flag.
Signed-off-by: Tom Herbert <therbert@google.com>
---
include/linux/skbuff.h | 3 ++-
net/core/dev.c | 18 +++++++++++-------
net/ipv4/tcp_output.c | 5 ++++-
3 files changed, 17 insertions(+), 9 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e6ba898..19f37a6 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -386,9 +386,10 @@ struct sk_buff {
#else
__u8 deliver_no_wcard:1;
#endif
+ __u8 ooo_okay:1;
kmemcheck_bitfield_end(flags2);
- /* 0/14 bit hole */
+ /* 0/13 bit hole */
#ifdef CONFIG_NET_DMA
dma_cookie_t dma_cookie;
diff --git a/net/core/dev.c b/net/core/dev.c
index b2269ac..4df783c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2125,20 +2125,24 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
int queue_index;
const struct net_device_ops *ops = dev->netdev_ops;
- if (ops->ndo_select_queue) {
+ if (dev->real_num_tx_queues == 1)
+ queue_index = 0;
+ else if (ops->ndo_select_queue) {
queue_index = ops->ndo_select_queue(dev, skb);
queue_index = dev_cap_txqueue(dev, queue_index);
} else {
struct sock *sk = skb->sk;
queue_index = sk_tx_queue_get(sk);
- if (queue_index < 0) {
- queue_index = 0;
- if (dev->real_num_tx_queues > 1)
- queue_index = skb_tx_hash(dev, skb);
+ if (queue_index < 0 || skb->ooo_okay ||
+ queue_index >= dev->real_num_tx_queues) {
+ int old_index = queue_index;
- if (sk) {
- struct dst_entry *dst = rcu_dereference_check(sk->sk_dst_cache, 1);
+ queue_index = skb_tx_hash(dev, skb);
+
+ if (queue_index != old_index && sk) {
+ struct dst_entry *dst =
+ rcu_dereference_check(sk->sk_dst_cache, 1);
if (dst && skb_dst(skb) == dst)
sk_tx_queue_set(sk, queue_index);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 05b1ecf..2b6eb36 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -822,8 +822,11 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
&md5);
tcp_header_size = tcp_options_size + sizeof(struct tcphdr);
- if (tcp_packets_in_flight(tp) == 0)
+ if (tcp_packets_in_flight(tp) == 0) {
tcp_ca_event(sk, CA_EVENT_TX_START);
+ skb->ooo_okay = 1;
+ } else
+ skb->ooo_okay = 0;
skb_push(skb, tcp_header_size);
skb_reset_transport_header(skb);
--
1.7.1
^ permalink raw reply related
* [PATCH 2/2 v4] xps: Transmit Packet Steering
From: Tom Herbert @ 2010-10-27 3:38 UTC (permalink / raw)
To: davem, netdev; +Cc: eric.dumazet
This patch implements transmit packet steering (XPS) for multiqueue
devices. XPS selects a transmit queue during packet transmission based
on configuration. This is done by mapping the CPU transmitting the
packet to a queue. This is the transmit side analogue to RPS-- where
RPS is selecting a CPU based on receive queue, XPS selects a queue
based on the CPU (previously there was an XPS patch from Eric
Dumazet, but that might more appropriately be called transmit completion
steering).
Each transmit queue can be associated with a number of CPUs which will
use the queue to send packets. This is configured as a CPU mask on a
per queue basis in:
/sys/class/net/eth<n>/queues/tx-<n>/xps_cpus
The mappings are stored per device in an inverted data structure that
maps CPUs to queues. In the netdevice structure this is an array of
num_possible_cpu structures where each structure holds and array of
queue_indexes for queues which that CPU can use.
The benefits of XPS are improved locality in the per queue data
structures. Also, transmit completions are more likely to be done
nearer to the sending thread, so this should promote locality back
to the socket on free (e.g. UDP). The benefits of XPS are dependent on
cache hierarchy, application load, and other factors. XPS would
nominally be configured so that a queue would only be shared by CPUs
which are sharing a cache, the degenerative configuration woud be that
each CPU has it's own queue.
Below are some benchmark results which show the potential benfit of
this patch. The netperf test has 500 instances of netperf TCP_RR test
with 1 byte req. and resp.
bnx2x on 16 core AMD
XPS (16 queues, 1 TX queue per CPU) 1234K at 100% CPU
No XPS (16 queues) 996K at 100% CPU
Signed-off-by: Tom Herbert <therbert@google.com>
---
include/linux/netdevice.h | 27 ++++
net/core/dev.c | 55 +++++++-
net/core/net-sysfs.c | 367 ++++++++++++++++++++++++++++++++++++++++++++-
net/core/net-sysfs.h | 3 +
4 files changed, 446 insertions(+), 6 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fcd3dda..f19b78b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -503,6 +503,13 @@ struct netdev_queue {
struct Qdisc *qdisc;
unsigned long state;
struct Qdisc *qdisc_sleeping;
+#ifdef CONFIG_RPS
+ struct netdev_queue *first;
+ atomic_t count;
+ struct xps_dev_maps *xps_maps;
+ struct kobject kobj;
+#endif
+
/*
* write mostly part
*/
@@ -530,6 +537,26 @@ struct rps_map {
#define RPS_MAP_SIZE(_num) (sizeof(struct rps_map) + (_num * sizeof(u16)))
/*
+ * This structure holds an XPS map which can be of variable length. The
+ * map is an array of queues.
+ */
+struct xps_map {
+ unsigned int len;
+ unsigned int alloc_len;
+ struct rcu_head rcu;
+ u16 queues[0];
+};
+
+/*
+ * This structure holds all XPS maps for device. Maps are indexed by CPU.
+ */
+struct xps_dev_maps {
+ struct rcu_head rcu;
+ struct xps_map *cpu_map[0];
+};
+#define netdev_get_xps_maps(dev) ((dev)->_tx[0].xps_maps)
+
+/*
* The rps_dev_flow structure contains the mapping of a flow to a CPU and the
* tail pointer for that CPU's input queue at the time of last enqueue.
*/
diff --git a/net/core/dev.c b/net/core/dev.c
index 4df783c..12426a6 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2119,6 +2119,44 @@ static inline u16 dev_cap_txqueue(struct net_device *dev, u16 queue_index)
return queue_index;
}
+static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
+{
+#ifdef CONFIG_RPS
+ struct xps_dev_maps *dev_maps;
+ struct xps_map *map;
+ int queue_index = -1;
+
+ preempt_disable();
+ rcu_read_lock();
+ dev_maps = rcu_dereference(netdev_get_xps_maps(dev));
+ if (dev_maps) {
+ map = rcu_dereference(dev_maps->cpu_map[smp_processor_id()]);
+ if (map) {
+ if (map->len == 1)
+ queue_index = map->queues[0];
+ else {
+ u32 hash;
+ if (skb->sk && skb->sk->sk_hash)
+ hash = skb->sk->sk_hash;
+ else
+ hash = (__force u16) skb->protocol ^
+ skb->rxhash;
+ hash = jhash_1word(hash, hashrnd);
+ queue_index = map->queues[
+ ((u64)hash * map->len) >> 32];
+ }
+ if (unlikely(queue_index >= dev->real_num_tx_queues))
+ queue_index = -1;
+ }
+ }
+ rcu_read_unlock();
+ preempt_enable();
+
+ return queue_index;
+#endif
+ return -1;
+}
+
static struct netdev_queue *dev_pick_tx(struct net_device *dev,
struct sk_buff *skb)
{
@@ -2138,7 +2176,9 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
queue_index >= dev->real_num_tx_queues) {
int old_index = queue_index;
- queue_index = skb_tx_hash(dev, skb);
+ queue_index = get_xps_queue(dev, skb);
+ if (queue_index < 0)
+ queue_index = skb_tx_hash(dev, skb);
if (queue_index != old_index && sk) {
struct dst_entry *dst =
@@ -5052,6 +5092,17 @@ static int netif_alloc_netdev_queues(struct net_device *dev)
return -ENOMEM;
}
dev->_tx = tx;
+#ifdef CONFIG_RPS
+ /*
+ * Set a pointer to first element in the array which holds the
+ * reference count.
+ */
+ {
+ int i;
+ for (i = 0; i < count; i++)
+ tx[i].first = tx;
+ }
+#endif
return 0;
}
@@ -5616,7 +5667,9 @@ void free_netdev(struct net_device *dev)
release_net(dev_net(dev));
+#ifndef CONFIG_RPS
kfree(dev->_tx);
+#endif
kfree(rcu_dereference_raw(dev->ingress_queue));
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index b143173..e193cf2 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -764,18 +764,375 @@ net_rx_queue_update_kobjects(struct net_device *net, int old_num, int new_num)
return error;
}
-static int rx_queue_register_kobjects(struct net_device *net)
+/*
+ * netdev_queue sysfs structures and functions.
+ */
+struct netdev_queue_attribute {
+ struct attribute attr;
+ ssize_t (*show)(struct netdev_queue *queue,
+ struct netdev_queue_attribute *attr, char *buf);
+ ssize_t (*store)(struct netdev_queue *queue,
+ struct netdev_queue_attribute *attr, const char *buf, size_t len);
+};
+#define to_netdev_queue_attr(_attr) container_of(_attr, \
+ struct netdev_queue_attribute, attr)
+
+#define to_netdev_queue(obj) container_of(obj, struct netdev_queue, kobj)
+
+static ssize_t netdev_queue_attr_show(struct kobject *kobj,
+ struct attribute *attr, char *buf)
+{
+ struct netdev_queue_attribute *attribute = to_netdev_queue_attr(attr);
+ struct netdev_queue *queue = to_netdev_queue(kobj);
+
+ if (!attribute->show)
+ return -EIO;
+
+ return attribute->show(queue, attribute, buf);
+}
+
+static ssize_t netdev_queue_attr_store(struct kobject *kobj,
+ struct attribute *attr,
+ const char *buf, size_t count)
+{
+ struct netdev_queue_attribute *attribute = to_netdev_queue_attr(attr);
+ struct netdev_queue *queue = to_netdev_queue(kobj);
+
+ if (!attribute->store)
+ return -EIO;
+
+ return attribute->store(queue, attribute, buf, count);
+}
+
+static const struct sysfs_ops netdev_queue_sysfs_ops = {
+ .show = netdev_queue_attr_show,
+ .store = netdev_queue_attr_store,
+};
+
+static inline unsigned int get_netdev_queue_index(struct netdev_queue *queue)
+{
+ struct net_device *dev = queue->dev;
+ int i;
+
+ for (i = 0; i < dev->num_tx_queues; i++)
+ if (queue == &dev->_tx[i])
+ break;
+
+ BUG_ON(i >= dev->num_tx_queues);
+
+ return i;
+}
+
+
+static ssize_t show_xps_map(struct netdev_queue *queue,
+ struct netdev_queue_attribute *attribute, char *buf)
+{
+ struct netdev_queue *first = queue->first;
+ struct xps_dev_maps *dev_maps;
+ cpumask_var_t mask;
+ unsigned long index;
+ size_t len = 0;
+ int i;
+
+ if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
+ return -ENOMEM;
+
+ index = get_netdev_queue_index(queue);
+
+ rcu_read_lock();
+ dev_maps = rcu_dereference(first->xps_maps);
+ if (dev_maps) {
+ for (i = 0; i < num_possible_cpus(); i++) {
+ struct xps_map *map =
+ rcu_dereference(dev_maps->cpu_map[i]);
+ if (map) {
+ int j;
+ for (j = 0; j < map->len; j++) {
+ if (map->queues[j] == index) {
+ cpumask_set_cpu(i, mask);
+ break;
+ }
+ }
+ }
+ }
+ }
+ len += cpumask_scnprintf(buf + len, PAGE_SIZE, mask);
+ if (PAGE_SIZE - len < 3) {
+ rcu_read_unlock();
+ free_cpumask_var(mask);
+ return -EINVAL;
+ }
+ rcu_read_unlock();
+
+ free_cpumask_var(mask);
+ len += sprintf(buf + len, "\n");
+ return len;
+}
+
+static void xps_map_release(struct rcu_head *rcu)
+{
+ struct xps_map *map = container_of(rcu, struct xps_map, rcu);
+
+ kfree(map);
+}
+
+static void xps_dev_maps_release(struct rcu_head *rcu)
{
+ struct xps_dev_maps *dev_maps =
+ container_of(rcu, struct xps_dev_maps, rcu);
+
+ kfree(dev_maps);
+}
+
+static DEFINE_MUTEX(xps_map_mutex);
+
+static ssize_t store_xps_map(struct netdev_queue *queue,
+ struct netdev_queue_attribute *attribute,
+ const char *buf, size_t len)
+{
+ struct netdev_queue *first = queue->first;
+ cpumask_var_t mask;
+ int err, i, cpu, pos, map_len, alloc_len, need_set;
+ unsigned long index;
+ struct xps_map *map, *new_map;
+ struct xps_dev_maps *dev_maps, *new_dev_maps;
+ int nonempty = 0;
+
+ if (!capable(CAP_NET_ADMIN))
+ return -EPERM;
+
+ if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+ return -ENOMEM;
+
+ index = get_netdev_queue_index(queue);
+
+ err = bitmap_parse(buf, len, cpumask_bits(mask), nr_cpumask_bits);
+ if (err) {
+ free_cpumask_var(mask);
+ return err;
+ }
+
+ new_dev_maps = kzalloc(sizeof(struct xps_dev_maps) +
+ (num_possible_cpus() * sizeof(struct xps_map *)), GFP_KERNEL);
+ if (!new_dev_maps) {
+ free_cpumask_var(mask);
+ return err;
+ }
+
+ mutex_lock(&xps_map_mutex);
+
+ dev_maps = first->xps_maps;
+
+ for (cpu = 0; cpu < num_possible_cpus(); cpu++) {
+ new_map = map = dev_maps ? dev_maps->cpu_map[cpu] : NULL;
+
+ if (map) {
+ for (pos = 0; pos < map->len; pos++)
+ if (map->queues[pos] == index)
+ break;
+ map_len = map->len;
+ alloc_len = map->alloc_len;
+ } else
+ pos = map_len = alloc_len = 0;
+
+ need_set = cpu_isset(cpu, *mask) && cpu_online(cpu);
+
+ if (need_set && pos >= map_len) {
+ /* Need to add queue to this CPU's map */
+ if (map_len >= alloc_len) {
+ alloc_len = alloc_len ? 2 * alloc_len : 1;
+ new_map = kzalloc(sizeof(struct xps_map) +
+ (alloc_len * sizeof(u16)), GFP_KERNEL);
+ if (!new_map)
+ goto error;
+ new_map->alloc_len = alloc_len;
+ for (i = 0; i < map_len; i++)
+ new_map->queues[i] = map->queues[i];
+ new_map->len = map_len;
+ }
+ new_map->queues[new_map->len++] = index;
+ } else if (!need_set && pos < map_len) {
+ /* Need to remove queue from this CPU's map */
+ if (map_len > 1)
+ new_map->queues[pos] =
+ new_map->queues[--new_map->len];
+ else
+ new_map = NULL;
+ }
+ new_dev_maps->cpu_map[cpu] = new_map;
+ }
+
+ /* Cleanup old maps */
+ for (cpu = 0; cpu < num_possible_cpus(); cpu++) {
+ map = dev_maps ? dev_maps->cpu_map[cpu] : NULL;
+ if (map && new_dev_maps->cpu_map[cpu] != map)
+ call_rcu(&map->rcu, xps_map_release);
+ if (new_dev_maps->cpu_map[cpu])
+ nonempty = 1;
+ }
+
+ if (nonempty)
+ rcu_assign_pointer(first->xps_maps, new_dev_maps);
+ else {
+ kfree(new_dev_maps);
+ rcu_assign_pointer(first->xps_maps, NULL);
+ }
+
+ if (dev_maps)
+ call_rcu(&dev_maps->rcu, xps_dev_maps_release);
+
+ mutex_unlock(&xps_map_mutex);
+
+ free_cpumask_var(mask);
+ return len;
+
+error:
+ mutex_unlock(&xps_map_mutex);
+
+ if (new_dev_maps)
+ for (i = 0; i < num_possible_cpus(); i++)
+ kfree(new_dev_maps->cpu_map[i]);
+
+ kfree(new_dev_maps);
+ free_cpumask_var(mask);
+ return -ENOMEM;
+}
+
+static struct netdev_queue_attribute xps_cpus_attribute =
+ __ATTR(xps_cpus, S_IRUGO | S_IWUSR, show_xps_map, store_xps_map);
+
+static struct attribute *netdev_queue_default_attrs[] = {
+ &xps_cpus_attribute.attr,
+ NULL
+};
+
+static void netdev_queue_release(struct kobject *kobj)
+{
+ struct netdev_queue *queue = to_netdev_queue(kobj);
+ struct netdev_queue *first = queue->first;
+ struct xps_dev_maps *dev_maps;
+ struct xps_map *map;
+ unsigned long index;
+ int i, pos, nonempty = 0;
+
+ index = get_netdev_queue_index(queue);
+
+ mutex_lock(&xps_map_mutex);
+ dev_maps = first->xps_maps;
+
+ for (i = 0; i < num_possible_cpus(); i++) {
+ map = dev_maps ? dev_maps->cpu_map[i] : NULL;
+ if (!map)
+ continue;
+
+ for (pos = 0; pos < map->len; pos++)
+ if (map->queues[pos] == index)
+ break;
+
+ if (pos < map->len) {
+ if (map->len > 1)
+ map->queues[pos] = map->queues[--map->len];
+ else {
+ rcu_assign_pointer(dev_maps->cpu_map[i],
+ NULL);
+ call_rcu(&map->rcu, xps_map_release);
+ map = NULL;
+ }
+ }
+
+ if (map)
+ nonempty = 1;
+ }
+
+ if (!nonempty) {
+ rcu_assign_pointer(first->xps_maps, NULL);
+ call_rcu(&dev_maps->rcu, xps_dev_maps_release);
+ }
+ mutex_unlock(&xps_map_mutex);
+
+ if (atomic_dec_and_test(&first->count))
+ kfree(first);
+}
+
+static struct kobj_type netdev_queue_ktype = {
+ .sysfs_ops = &netdev_queue_sysfs_ops,
+ .release = netdev_queue_release,
+ .default_attrs = netdev_queue_default_attrs,
+};
+
+static int netdev_queue_add_kobject(struct net_device *net, int index)
+{
+ struct netdev_queue *queue = net->_tx + index;
+ struct netdev_queue *first = queue->first;
+ struct kobject *kobj = &queue->kobj;
+ int error = 0;
+
+ kobj->kset = net->queues_kset;
+ error = kobject_init_and_add(kobj, &netdev_queue_ktype, NULL,
+ "tx-%u", index);
+ if (error) {
+ kobject_put(kobj);
+ return error;
+ }
+
+ kobject_uevent(kobj, KOBJ_ADD);
+ atomic_inc(&first->count);
+
+ return error;
+}
+
+int
+netdev_queue_update_kobjects(struct net_device *net, int old_num, int new_num)
+{
+ int i;
+ int error = 0;
+
+ for (i = old_num; i < new_num; i++) {
+ error = netdev_queue_add_kobject(net, i);
+ if (error) {
+ new_num = old_num;
+ break;
+ }
+ }
+
+ while (--i >= new_num)
+ kobject_put(&net->_rx[i].kobj);
+
+ return error;
+}
+
+static int register_queue_kobjects(struct net_device *net)
+{
+ int error = 0, txq = 0, rxq = 0;
+
net->queues_kset = kset_create_and_add("queues",
NULL, &net->dev.kobj);
if (!net->queues_kset)
return -ENOMEM;
- return net_rx_queue_update_kobjects(net, 0, net->real_num_rx_queues);
+
+ error = net_rx_queue_update_kobjects(net, 0, net->real_num_rx_queues);
+ if (error)
+ goto error;
+ rxq = net->real_num_rx_queues;
+
+ error = netdev_queue_update_kobjects(net, 0,
+ net->real_num_tx_queues);
+ if (error)
+ goto error;
+ txq = net->real_num_tx_queues;
+
+ return 0;
+
+error:
+ netdev_queue_update_kobjects(net, txq, 0);
+ net_rx_queue_update_kobjects(net, rxq, 0);
+ return error;
}
-static void rx_queue_remove_kobjects(struct net_device *net)
+static void remove_queue_kobjects(struct net_device *net)
{
net_rx_queue_update_kobjects(net, net->real_num_rx_queues, 0);
+ netdev_queue_update_kobjects(net, net->real_num_tx_queues, 0);
kset_unregister(net->queues_kset);
}
#endif /* CONFIG_RPS */
@@ -878,7 +1235,7 @@ void netdev_unregister_kobject(struct net_device * net)
kobject_get(&dev->kobj);
#ifdef CONFIG_RPS
- rx_queue_remove_kobjects(net);
+ remove_queue_kobjects(net);
#endif
device_del(dev);
@@ -919,7 +1276,7 @@ int netdev_register_kobject(struct net_device *net)
return error;
#ifdef CONFIG_RPS
- error = rx_queue_register_kobjects(net);
+ error = register_queue_kobjects(net);
if (error) {
device_del(dev);
return error;
diff --git a/net/core/net-sysfs.h b/net/core/net-sysfs.h
index 778e157..25ec2ee 100644
--- a/net/core/net-sysfs.h
+++ b/net/core/net-sysfs.h
@@ -6,6 +6,9 @@ int netdev_register_kobject(struct net_device *);
void netdev_unregister_kobject(struct net_device *);
#ifdef CONFIG_RPS
int net_rx_queue_update_kobjects(struct net_device *, int old_num, int new_num);
+int netdev_queue_update_kobjects(struct net_device *net,
+ int old_num, int new_num);
+
#endif
#endif
--
1.7.1
^ permalink raw reply related
* [PATCHv2 0/3]qlcnic: bug fixes
From: Amit Kumar Salecha @ 2010-10-27 3:53 UTC (permalink / raw)
To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty
Hi
Series v2 of 3 patches for bug fixes. Patches are numbered.
Dropping "dma address align check" patch as pci_alloc_consistent is gaurantee to give
page align dma address.
-Amit
^ permalink raw reply
* [PATCH 1/3] qlcnic: fix mac learning
From: Amit Kumar Salecha @ 2010-10-27 3:53 UTC (permalink / raw)
To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty
In-Reply-To: <1288151589-32431-1-git-send-email-amit.salecha@qlogic.com>
In failover bonding case, same mac address can be programmed on other slave function.
Fw will delete old entry (original func) associated with that mac address.
Need to reporgram mac address, if failover again happen to original function.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
drivers/net/qlcnic/qlcnic.h | 1 +
drivers/net/qlcnic/qlcnic_main.c | 5 +++++
2 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 26c37d3..a60ff17 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -942,6 +942,7 @@ struct qlcnic_ipaddr {
#define QLCNIC_LOOPBACK_TEST 2
#define QLCNIC_FILTER_AGE 80
+#define QLCNIC_READD_AGE 20
#define QLCNIC_LB_MAX_FILTERS 64
struct qlcnic_filter {
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index f047c7c..5a3ce08 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -1860,6 +1860,11 @@ qlcnic_send_filter(struct qlcnic_adapter *adapter,
hlist_for_each_entry_safe(tmp_fil, tmp_hnode, n, head, fnode) {
if (!memcmp(tmp_fil->faddr, &src_addr, ETH_ALEN) &&
tmp_fil->vlan_id == vlan_id) {
+
+ if (jiffies >
+ (QLCNIC_READD_AGE * HZ + tmp_fil->ftime))
+ qlcnic_change_filter(adapter, src_addr, vlan_id,
+ tx_ring);
tmp_fil->ftime = jiffies;
return;
}
--
1.6.0.2
^ permalink raw reply related
* [PATCH 3/3] qlcnic: define valid vlan id range
From: Amit Kumar Salecha @ 2010-10-27 3:53 UTC (permalink / raw)
To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty, Sony Chacko
In-Reply-To: <1288151589-32431-1-git-send-email-amit.salecha@qlogic.com>
From: Sony Chacko <sony.chacko@qlogic.com>
4095 vlan id is reserved and should not be use.
Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
drivers/net/qlcnic/qlcnic.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 6400e6a..8ecc170 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -1134,7 +1134,7 @@ struct qlcnic_eswitch {
#define MAX_RX_QUEUES 4
#define DEFAULT_MAC_LEARN 1
-#define IS_VALID_VLAN(vlan) (vlan >= MIN_VLAN_ID && vlan <= MAX_VLAN_ID)
+#define IS_VALID_VLAN(vlan) (vlan >= MIN_VLAN_ID && vlan < MAX_VLAN_ID)
#define IS_VALID_BW(bw) (bw >= MIN_BW && bw <= MAX_BW)
#define IS_VALID_TX_QUEUES(que) (que > 0 && que <= MAX_TX_QUEUES)
#define IS_VALID_RX_QUEUES(que) (que > 0 && que <= MAX_RX_QUEUES)
--
1.6.0.2
^ permalink raw reply related
* [PATCH 2/3] qlcnic: reduce rx ring size
From: Amit Kumar Salecha @ 2010-10-27 3:53 UTC (permalink / raw)
To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty, Sony Chacko
In-Reply-To: <1288151589-32431-1-git-send-email-amit.salecha@qlogic.com>
From: Sony Chacko <sony.chacko@qlogic.com>
If eswitch is enabled, rcv ring size can be reduce, as
physical port is partition-ed.
Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
drivers/net/qlcnic/qlcnic.h | 4 ++++
drivers/net/qlcnic/qlcnic_ethtool.c | 23 +++++------------------
drivers/net/qlcnic/qlcnic_main.c | 14 ++++++++++++--
3 files changed, 21 insertions(+), 20 deletions(-)
diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index a60ff17..6400e6a 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -146,11 +146,13 @@
#define MAX_CMD_DESCRIPTORS 1024
#define MAX_RCV_DESCRIPTORS_1G 4096
#define MAX_RCV_DESCRIPTORS_10G 8192
+#define MAX_RCV_DESCRIPTORS_VF 2048
#define MAX_JUMBO_RCV_DESCRIPTORS_1G 512
#define MAX_JUMBO_RCV_DESCRIPTORS_10G 1024
#define DEFAULT_RCV_DESCRIPTORS_1G 2048
#define DEFAULT_RCV_DESCRIPTORS_10G 4096
+#define DEFAULT_RCV_DESCRIPTORS_VF 1024
#define MAX_RDS_RINGS 2
#define get_next_index(index, length) \
@@ -971,6 +973,8 @@ struct qlcnic_adapter {
u16 num_txd;
u16 num_rxd;
u16 num_jumbo_rxd;
+ u16 max_rxd;
+ u16 max_jumbo_rxd;
u8 max_rds_rings;
u8 max_sds_rings;
diff --git a/drivers/net/qlcnic/qlcnic_ethtool.c b/drivers/net/qlcnic/qlcnic_ethtool.c
index 25e93a5..ec21d24 100644
--- a/drivers/net/qlcnic/qlcnic_ethtool.c
+++ b/drivers/net/qlcnic/qlcnic_ethtool.c
@@ -437,14 +437,8 @@ qlcnic_get_ringparam(struct net_device *dev,
ring->rx_jumbo_pending = adapter->num_jumbo_rxd;
ring->tx_pending = adapter->num_txd;
- if (adapter->ahw.port_type == QLCNIC_GBE) {
- ring->rx_max_pending = MAX_RCV_DESCRIPTORS_1G;
- ring->rx_jumbo_max_pending = MAX_JUMBO_RCV_DESCRIPTORS_1G;
- } else {
- ring->rx_max_pending = MAX_RCV_DESCRIPTORS_10G;
- ring->rx_jumbo_max_pending = MAX_JUMBO_RCV_DESCRIPTORS_10G;
- }
-
+ ring->rx_max_pending = adapter->max_rxd;
+ ring->rx_jumbo_max_pending = adapter->max_jumbo_rxd;
ring->tx_max_pending = MAX_CMD_DESCRIPTORS;
ring->rx_mini_max_pending = 0;
@@ -472,24 +466,17 @@ qlcnic_set_ringparam(struct net_device *dev,
struct ethtool_ringparam *ring)
{
struct qlcnic_adapter *adapter = netdev_priv(dev);
- u16 max_rcv_desc = MAX_RCV_DESCRIPTORS_10G;
- u16 max_jumbo_desc = MAX_JUMBO_RCV_DESCRIPTORS_10G;
u16 num_rxd, num_jumbo_rxd, num_txd;
-
if (ring->rx_mini_pending)
return -EOPNOTSUPP;
- if (adapter->ahw.port_type == QLCNIC_GBE) {
- max_rcv_desc = MAX_RCV_DESCRIPTORS_1G;
- max_jumbo_desc = MAX_JUMBO_RCV_DESCRIPTORS_10G;
- }
-
num_rxd = qlcnic_validate_ringparam(ring->rx_pending,
- MIN_RCV_DESCRIPTORS, max_rcv_desc, "rx");
+ MIN_RCV_DESCRIPTORS, adapter->max_rxd, "rx");
num_jumbo_rxd = qlcnic_validate_ringparam(ring->rx_jumbo_pending,
- MIN_JUMBO_DESCRIPTORS, max_jumbo_desc, "rx jumbo");
+ MIN_JUMBO_DESCRIPTORS, adapter->max_jumbo_rxd,
+ "rx jumbo");
num_txd = qlcnic_validate_ringparam(ring->tx_pending,
MIN_CMD_DESCRIPTORS, MAX_CMD_DESCRIPTORS, "tx");
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 5a3ce08..7a298cd 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -656,13 +656,23 @@ qlcnic_check_options(struct qlcnic_adapter *adapter)
dev_info(&pdev->dev, "firmware v%d.%d.%d\n",
fw_major, fw_minor, fw_build);
-
if (adapter->ahw.port_type == QLCNIC_XGBE) {
- adapter->num_rxd = DEFAULT_RCV_DESCRIPTORS_10G;
+ if (adapter->flags & QLCNIC_ESWITCH_ENABLED) {
+ adapter->num_rxd = DEFAULT_RCV_DESCRIPTORS_VF;
+ adapter->max_rxd = MAX_RCV_DESCRIPTORS_VF;
+ } else {
+ adapter->num_rxd = DEFAULT_RCV_DESCRIPTORS_10G;
+ adapter->max_rxd = MAX_RCV_DESCRIPTORS_10G;
+ }
+
adapter->num_jumbo_rxd = MAX_JUMBO_RCV_DESCRIPTORS_10G;
+ adapter->max_jumbo_rxd = MAX_JUMBO_RCV_DESCRIPTORS_10G;
+
} else if (adapter->ahw.port_type == QLCNIC_GBE) {
adapter->num_rxd = DEFAULT_RCV_DESCRIPTORS_1G;
adapter->num_jumbo_rxd = MAX_JUMBO_RCV_DESCRIPTORS_1G;
+ adapter->max_jumbo_rxd = MAX_JUMBO_RCV_DESCRIPTORS_1G;
+ adapter->max_rxd = MAX_RCV_DESCRIPTORS_1G;
}
adapter->msix_supported = !!use_msi_x;
--
1.6.0.2
^ permalink raw reply related
* can: About Socket CAN with MSI issue
From: Tomoya MORINAGA @ 2010-10-27 4:29 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA,
socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
sameo-VuQAYsv1563Yd54FQh9/CA, 21cnbao-Re5JQEeQqe8AvxtiuMwx3w,
chripell-VaTbYqLCNhc, w.sang-bIcnvbaLZ9MEGnE8C9+IrQ
Cc: andrew.chih.howe.khor-ral2JQCrhuEAvxtiuMwx3w,
qi.wang-ral2JQCrhuEAvxtiuMwx3w,
margie.foster-ral2JQCrhuEAvxtiuMwx3w,
yong.y.wang-ral2JQCrhuEAvxtiuMwx3w,
kok.howg.ewe-ral2JQCrhuEAvxtiuMwx3w, David Miller,
joel.clark-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <20101026.105206.15244527.davem@davemloft.net>
We have faced issue when our CAN diriver whose MSI is enabled, after installing the driver,
once remove the driver and install the driver again,
As a result, interupt handler of the driver is not called again.
Do you have any information or suggestion about the above issue?
As to the our latest CAN driver,
please refer to followoing patch posted by me.
[PATCH net-next-2.6 v2] can: Topcliff: PCH_CAN driver: Fix build warnings
Thanks, Tomoya(OKI SEMICONDUCTOR CO., LTD.)
^ permalink raw reply
* Re: [PATCH 2/2 v4] xps: Transmit Packet Steering
From: Eric Dumazet @ 2010-10-27 4:32 UTC (permalink / raw)
To: Tom Herbert; +Cc: davem, netdev
In-Reply-To: <alpine.DEB.1.00.1010262033520.2997@pokey.mtv.corp.google.com>
Le mardi 26 octobre 2010 à 20:38 -0700, Tom Herbert a écrit :
> This patch implements transmit packet steering (XPS) for multiqueue
> devices. XPS selects a transmit queue during packet transmission based
> on configuration. This is done by mapping the CPU transmitting the
> packet to a queue. This is the transmit side analogue to RPS-- where
> RPS is selecting a CPU based on receive queue, XPS selects a queue
> based on the CPU (previously there was an XPS patch from Eric
> Dumazet, but that might more appropriately be called transmit completion
> steering).
>
> Each transmit queue can be associated with a number of CPUs which will
> use the queue to send packets. This is configured as a CPU mask on a
> per queue basis in:
>
> /sys/class/net/eth<n>/queues/tx-<n>/xps_cpus
>
> The mappings are stored per device in an inverted data structure that
> maps CPUs to queues. In the netdevice structure this is an array of
> num_possible_cpu structures where each structure holds and array of
> queue_indexes for queues which that CPU can use.
>
> The benefits of XPS are improved locality in the per queue data
> structures. Also, transmit completions are more likely to be done
> nearer to the sending thread, so this should promote locality back
> to the socket on free (e.g. UDP). The benefits of XPS are dependent on
> cache hierarchy, application load, and other factors. XPS would
> nominally be configured so that a queue would only be shared by CPUs
> which are sharing a cache, the degenerative configuration woud be that
> each CPU has it's own queue.
>
> Below are some benchmark results which show the potential benfit of
> this patch. The netperf test has 500 instances of netperf TCP_RR test
> with 1 byte req. and resp.
>
> bnx2x on 16 core AMD
> XPS (16 queues, 1 TX queue per CPU) 1234K at 100% CPU
> No XPS (16 queues) 996K at 100% CPU
>
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
> include/linux/netdevice.h | 27 ++++
> net/core/dev.c | 55 +++++++-
> net/core/net-sysfs.c | 367 ++++++++++++++++++++++++++++++++++++++++++++-
> net/core/net-sysfs.h | 3 +
> 4 files changed, 446 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index fcd3dda..f19b78b 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -503,6 +503,13 @@ struct netdev_queue {
> struct Qdisc *qdisc;
> unsigned long state;
> struct Qdisc *qdisc_sleeping;
> +#ifdef CONFIG_RPS
> + struct netdev_queue *first;
> + atomic_t count;
> + struct xps_dev_maps *xps_maps;
> + struct kobject kobj;
> +#endif
> +
> /*
> * write mostly part
> */
> @@ -530,6 +537,26 @@ struct rps_map {
> #define RPS_MAP_SIZE(_num) (sizeof(struct rps_map) + (_num * sizeof(u16)))
>
> /*
> + * This structure holds an XPS map which can be of variable length. The
> + * map is an array of queues.
> + */
> +struct xps_map {
> + unsigned int len;
> + unsigned int alloc_len;
> + struct rcu_head rcu;
> + u16 queues[0];
> +};
OK, so its a 'small' structure. And we dont want it to share a cache
line with an other user in the kernel, or false sharing might happen.
Make sure you allocate big enough ones to fill a full cache line.
alloc_len = (L1_CACHE_BYTES - sizeof(struct xps_map)) / sizeof(u16);
I see you allocate ones with alloc_len = 1. Thats not good.
> +
> +/*
> + * This structure holds all XPS maps for device. Maps are indexed by CPU.
> + */
> +struct xps_dev_maps {
> + struct rcu_head rcu;
> + struct xps_map *cpu_map[0];
Hmm... per_cpu maybe, instead of an array ?
Also make sure this xps_dev_maps use a full cache line, to avoid false
sharing.
> +};
> +#define netdev_get_xps_maps(dev) ((dev)->_tx[0].xps_maps)
> +
> +/*
> * The rps_dev_flow structure contains the mapping of a flow to a CPU and the
> * tail pointer for that CPU's input queue at the time of last enqueue.
> */
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 4df783c..12426a6 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2119,6 +2119,44 @@ static inline u16 dev_cap_txqueue(struct net_device *dev, u16 queue_index)
> return queue_index;
> }
>
> +static inline int get_xps_queue(struct net_device *dev, struct sk_buff *skb)
> +{
> +#ifdef CONFIG_RPS
> + struct xps_dev_maps *dev_maps;
> + struct xps_map *map;
> + int queue_index = -1;
> +
> + preempt_disable();
> + rcu_read_lock();
> + dev_maps = rcu_dereference(netdev_get_xps_maps(dev));
> + if (dev_maps) {
> + map = rcu_dereference(dev_maps->cpu_map[smp_processor_id()]);
Really I am not sure we need this array and smp_processor_id().
Please consider alloc_percpu().
Then, use __this_cpu_ptr() and avoid the preempt_disable()/enable()
thing. Its a hint we want to use, because as soon as we leave
get_xps_queue() we might migrate to another cpu ?
> + if (map) {
> + if (map->len == 1)
> + queue_index = map->queues[0];
> + else {
> + u32 hash;
> + if (skb->sk && skb->sk->sk_hash)
> + hash = skb->sk->sk_hash;
> + else
> + hash = (__force u16) skb->protocol ^
> + skb->rxhash;
> + hash = jhash_1word(hash, hashrnd);
> + queue_index = map->queues[
> + ((u64)hash * map->len) >> 32];
> + }
> + if (unlikely(queue_index >= dev->real_num_tx_queues))
> + queue_index = -1;
> + }
> + }
> + rcu_read_unlock();
> + preempt_enable();
> +
> + return queue_index;
> +#endif
> + return -1;
> +}
> +
> static struct netdev_queue *dev_pick_tx(struct net_device *dev,
> struct sk_buff *skb)
> {
> @@ -2138,7 +2176,9 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
> queue_index >= dev->real_num_tx_queues) {
> int old_index = queue_index;
>
> - queue_index = skb_tx_hash(dev, skb);
> + queue_index = get_xps_queue(dev, skb);
> + if (queue_index < 0)
> + queue_index = skb_tx_hash(dev, skb);
>
> if (queue_index != old_index && sk) {
> struct dst_entry *dst =
> @@ -5052,6 +5092,17 @@ static int netif_alloc_netdev_queues(struct net_device *dev)
> return -ENOMEM;
> }
> dev->_tx = tx;
> +#ifdef CONFIG_RPS
> + /*
> + * Set a pointer to first element in the array which holds the
> + * reference count.
> + */
> + {
> + int i;
> + for (i = 0; i < count; i++)
> + tx[i].first = tx;
> + }
> +#endif
> return 0;
> }
>
> @@ -5616,7 +5667,9 @@ void free_netdev(struct net_device *dev)
>
> release_net(dev_net(dev));
>
> +#ifndef CONFIG_RPS
> kfree(dev->_tx);
> +#endif
>
> kfree(rcu_dereference_raw(dev->ingress_queue));
>
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index b143173..e193cf2 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -764,18 +764,375 @@ net_rx_queue_update_kobjects(struct net_device *net, int old_num, int new_num)
> return error;
> }
>
> -static int rx_queue_register_kobjects(struct net_device *net)
> +/*
> + * netdev_queue sysfs structures and functions.
> + */
> +struct netdev_queue_attribute {
> + struct attribute attr;
> + ssize_t (*show)(struct netdev_queue *queue,
> + struct netdev_queue_attribute *attr, char *buf);
> + ssize_t (*store)(struct netdev_queue *queue,
> + struct netdev_queue_attribute *attr, const char *buf, size_t len);
> +};
> +#define to_netdev_queue_attr(_attr) container_of(_attr, \
> + struct netdev_queue_attribute, attr)
> +
> +#define to_netdev_queue(obj) container_of(obj, struct netdev_queue, kobj)
> +
> +static ssize_t netdev_queue_attr_show(struct kobject *kobj,
> + struct attribute *attr, char *buf)
> +{
> + struct netdev_queue_attribute *attribute = to_netdev_queue_attr(attr);
> + struct netdev_queue *queue = to_netdev_queue(kobj);
> +
> + if (!attribute->show)
> + return -EIO;
> +
> + return attribute->show(queue, attribute, buf);
> +}
> +
> +static ssize_t netdev_queue_attr_store(struct kobject *kobj,
> + struct attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct netdev_queue_attribute *attribute = to_netdev_queue_attr(attr);
> + struct netdev_queue *queue = to_netdev_queue(kobj);
> +
> + if (!attribute->store)
> + return -EIO;
> +
> + return attribute->store(queue, attribute, buf, count);
> +}
> +
> +static const struct sysfs_ops netdev_queue_sysfs_ops = {
> + .show = netdev_queue_attr_show,
> + .store = netdev_queue_attr_store,
> +};
> +
> +static inline unsigned int get_netdev_queue_index(struct netdev_queue *queue)
> +{
> + struct net_device *dev = queue->dev;
> + int i;
> +
> + for (i = 0; i < dev->num_tx_queues; i++)
> + if (queue == &dev->_tx[i])
> + break;
> +
> + BUG_ON(i >= dev->num_tx_queues);
> +
> + return i;
> +}
> +
> +
> +static ssize_t show_xps_map(struct netdev_queue *queue,
> + struct netdev_queue_attribute *attribute, char *buf)
> +{
> + struct netdev_queue *first = queue->first;
> + struct xps_dev_maps *dev_maps;
> + cpumask_var_t mask;
> + unsigned long index;
> + size_t len = 0;
> + int i;
> +
> + if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
> + return -ENOMEM;
> +
> + index = get_netdev_queue_index(queue);
> +
> + rcu_read_lock();
> + dev_maps = rcu_dereference(first->xps_maps);
> + if (dev_maps) {
> + for (i = 0; i < num_possible_cpus(); i++) {
The use of num_possible_cpus() seems wrong to me.
Dont you meant nr_cpu_ids ?
Some machines have two possible cpus, numbered 0 and 8 :
num_possible_cpus = 2
nr_cpu_ids = 8
anyway, using a per_cpu var, this loop becomes more friendly :
for_each_possible_cpu(i) {
and you use less ram, and you also use NUMA friendly allocations as
well.
> + struct xps_map *map =
> + rcu_dereference(dev_maps->cpu_map[i]);
> + if (map) {
> + int j;
> + for (j = 0; j < map->len; j++) {
> + if (map->queues[j] == index) {
> + cpumask_set_cpu(i, mask);
> + break;
> + }
> + }
> + }
> + }
> + }
> + len += cpumask_scnprintf(buf + len, PAGE_SIZE, mask);
> + if (PAGE_SIZE - len < 3) {
> + rcu_read_unlock();
> + free_cpumask_var(mask);
> + return -EINVAL;
> + }
> + rcu_read_unlock();
> +
> + free_cpumask_var(mask);
> + len += sprintf(buf + len, "\n");
> + return len;
> +}
> +
> +static void xps_map_release(struct rcu_head *rcu)
> +{
> + struct xps_map *map = container_of(rcu, struct xps_map, rcu);
> +
> + kfree(map);
> +}
> +
> +static void xps_dev_maps_release(struct rcu_head *rcu)
> {
> + struct xps_dev_maps *dev_maps =
> + container_of(rcu, struct xps_dev_maps, rcu);
> +
> + kfree(dev_maps);
> +}
> +
> +static DEFINE_MUTEX(xps_map_mutex);
> +
> +static ssize_t store_xps_map(struct netdev_queue *queue,
> + struct netdev_queue_attribute *attribute,
> + const char *buf, size_t len)
> +{
> + struct netdev_queue *first = queue->first;
> + cpumask_var_t mask;
> + int err, i, cpu, pos, map_len, alloc_len, need_set;
> + unsigned long index;
> + struct xps_map *map, *new_map;
> + struct xps_dev_maps *dev_maps, *new_dev_maps;
> + int nonempty = 0;
> +
> + if (!capable(CAP_NET_ADMIN))
> + return -EPERM;
> +
> + if (!alloc_cpumask_var(&mask, GFP_KERNEL))
> + return -ENOMEM;
> +
> + index = get_netdev_queue_index(queue);
> +
> + err = bitmap_parse(buf, len, cpumask_bits(mask), nr_cpumask_bits);
> + if (err) {
> + free_cpumask_var(mask);
> + return err;
> + }
> +
> + new_dev_maps = kzalloc(sizeof(struct xps_dev_maps) +
> + (num_possible_cpus() * sizeof(struct xps_map *)), GFP_KERNEL);
> + if (!new_dev_maps) {
> + free_cpumask_var(mask);
> + return err;
> + }
> +
> + mutex_lock(&xps_map_mutex);
> +
> + dev_maps = first->xps_maps;
> +
> + for (cpu = 0; cpu < num_possible_cpus(); cpu++) {
> + new_map = map = dev_maps ? dev_maps->cpu_map[cpu] : NULL;
> +
> + if (map) {
> + for (pos = 0; pos < map->len; pos++)
> + if (map->queues[pos] == index)
> + break;
> + map_len = map->len;
> + alloc_len = map->alloc_len;
> + } else
> + pos = map_len = alloc_len = 0;
> +
> + need_set = cpu_isset(cpu, *mask) && cpu_online(cpu);
> +
> + if (need_set && pos >= map_len) {
> + /* Need to add queue to this CPU's map */
> + if (map_len >= alloc_len) {
> + alloc_len = alloc_len ? 2 * alloc_len : 1;
See my first comment : Please reserve use a full cache line here
Also please use kzalloc_node() to get better NUMA affinity.
> + new_map = kzalloc(sizeof(struct xps_map) +
> + (alloc_len * sizeof(u16)), GFP_KERNEL);
> + if (!new_map)
> + goto error;
> + new_map->alloc_len = alloc_len;
> + for (i = 0; i < map_len; i++)
> + new_map->queues[i] = map->queues[i];
> + new_map->len = map_len;
> + }
> + new_map->queues[new_map->len++] = index;
> + } else if (!need_set && pos < map_len) {
> + /* Need to remove queue from this CPU's map */
> + if (map_len > 1)
> + new_map->queues[pos] =
> + new_map->queues[--new_map->len];
> + else
> + new_map = NULL;
> + }
> + new_dev_maps->cpu_map[cpu] = new_map;
> + }
^ permalink raw reply
* Re: [PATCH 2/2 v4] xps: Transmit Packet Steering
From: Eric Dumazet @ 2010-10-27 4:46 UTC (permalink / raw)
To: Tom Herbert; +Cc: davem, netdev
In-Reply-To: <alpine.DEB.1.00.1010262033520.2997@pokey.mtv.corp.google.com>
Le mardi 26 octobre 2010 à 20:38 -0700, Tom Herbert a écrit :
> The benefits of XPS are improved locality in the per queue data
> structures. Also, transmit completions are more likely to be done
> nearer to the sending thread, so this should promote locality back
> to the socket on free (e.g. UDP).
I dont understand this part of changelog :
We now early orphan packets before giving them to device.
(see skb_orphan_try())
So at completion time, we dont touch socket anymore.
However, we free skb, so this promotes locality on kmem caches, iff tx
completion is run on same cpu.
^ permalink raw reply
* Re: tbf/htb qdisc limitations
From: Bill Fink @ 2010-10-27 4:51 UTC (permalink / raw)
To: Jarek Poplawski; +Cc: Eric Dumazet, Rick Jones, Steven Brudenell, netdev
In-Reply-To: <20101020110612.GA18315@ff.dom.local>
On Wed, 20 Oct 2010, Jarek Poplawski wrote:
> On Tue, Oct 19, 2010 at 03:37:24AM -0400, Bill Fink wrote:
> > On Sun, 17 Oct 2010, Jarek Poplawski wrote:
> >
> > > On Sat, Oct 16, 2010 at 09:24:34PM -0400, Bill Fink wrote:
> > > > On Sat, 16 Oct 2010, Jarek Poplawski wrote:
> > > ...
> > > > > http://code.google.com/p/pspacer/wiki/HTBon10GbE
> > > > >
> > > > > If it doesn't help reconsider hfsc.
> > > >
> > > > Thanks for the link. From his results, it appears you can
> > > > get better accuracy by keeping TSO/GSO enabled and upping
> > > > the tc mtu parameter to 64000. I will have to try that out.
> > >
> > > Sure, but you have to remember that scheduler doesn't know real packet
> > > sizes and rate tables are less accurate especially for smaller packets,
> > > so it depends on conditions.
> >
> > On my testing on the real data path, TSO/GSO enabled did seem
> > to give more accurate results for a single stream. But when
> > I tried multiple 10-GigE paths simultaneously, each with a
> > single stream across it, non-TSO/GSO seemed to fare better
> > overall.
>
> Btw, if you find time I would be interested in checking an opposite
> concept of lower than real mtu (256) to use rate tables different way
> (other tbf parameters without change). The patch below is needed for
> this to work.
Sorry. I'm totally swamped at work currently and won't be able
to investigate that.
-Bill
> diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
> index 641a30d..9ac3460 100644
> --- a/net/sched/sch_tbf.c
> +++ b/net/sched/sch_tbf.c
> @@ -123,9 +123,6 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc* sch)
> struct tbf_sched_data *q = qdisc_priv(sch);
> int ret;
>
> - if (qdisc_pkt_len(skb) > q->max_size)
> - return qdisc_reshape_fail(skb, sch);
> -
> ret = qdisc_enqueue(skb, q->qdisc);
> if (ret != NET_XMIT_SUCCESS) {
> if (net_xmit_drop_count(ret))
^ permalink raw reply
* [PATCH] ehea: fix use after free
From: Eric Dumazet @ 2010-10-27 5:21 UTC (permalink / raw)
To: leitao; +Cc: davem, netdev
In-Reply-To: <1288118920.2652.4.camel@edumazet-laptop>
Le mardi 26 octobre 2010 à 20:48 +0200, Eric Dumazet a écrit :
> Note: driver already uses skb after its freeing, before your patch.
>
> if (vlan_tx_tag_present(skb)) {
> swqe->tx_control |= EHEA_SWQE_VLAN_INSERT;
> swqe->vlan_tag = vlan_tx_tag_get(skb);
> }
>
Could you please test following patch ?
Thanks
[PATCH] ehea: fix use after free
ehea_start_xmit() dereferences skb after its freeing in ehea_xmit3() to
get vlan tags.
Move the offending block before the potential ehea_xmit3() call.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
drivers/net/ehea/ehea_main.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index bb7d306..e59d386 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -2249,6 +2249,11 @@ static int ehea_start_xmit(struct sk_buff *skb, struct net_device *dev)
memset(swqe, 0, SWQE_HEADER_SIZE);
atomic_dec(&pr->swqe_avail);
+ if (vlan_tx_tag_present(skb)) {
+ swqe->tx_control |= EHEA_SWQE_VLAN_INSERT;
+ swqe->vlan_tag = vlan_tx_tag_get(skb);
+ }
+
if (skb->len <= SWQE3_MAX_IMM) {
u32 sig_iv = port->sig_comp_iv;
u32 swqe_num = pr->swqe_id_counter;
@@ -2279,11 +2284,6 @@ static int ehea_start_xmit(struct sk_buff *skb, struct net_device *dev)
}
pr->swqe_id_counter += 1;
- if (vlan_tx_tag_present(skb)) {
- swqe->tx_control |= EHEA_SWQE_VLAN_INSERT;
- swqe->vlan_tag = vlan_tx_tag_get(skb);
- }
-
if (netif_msg_tx_queued(port)) {
ehea_info("post swqe on QP %d", pr->qp->init_attr.qp_nr);
ehea_dump(swqe, 512, "swqe");
^ permalink raw reply related
* Re: can: About Socket CAN with MSI issue
From: Wolfgang Grandegger @ 2010-10-27 7:31 UTC (permalink / raw)
To: Tomoya MORINAGA
Cc: andrew.chih.howe.khor-ral2JQCrhuEAvxtiuMwx3w,
sameo-VuQAYsv1563Yd54FQh9/CA,
margie.foster-ral2JQCrhuEAvxtiuMwx3w, David Miller,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
kok.howg.ewe-ral2JQCrhuEAvxtiuMwx3w,
joel.clark-ral2JQCrhuEAvxtiuMwx3w,
yong.y.wang-ral2JQCrhuEAvxtiuMwx3w, chripell-VaTbYqLCNhc,
qi.wang-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <002801cb758f$876aaf00$66f8800a-a06+6cuVnkTSQfdrb5gaxUEOCMrvLtNR@public.gmane.org>
On 10/27/2010 06:29 AM, Tomoya MORINAGA wrote:
> We have faced issue when our CAN diriver whose MSI is enabled, after installing the driver,
> once remove the driver and install the driver again,
> As a result, interupt handler of the driver is not called again.
>
> Do you have any information or suggestion about the above issue?
Not really, the remove functions looks ok, apart from the fact, that
pch_can_reset() is called *after* pci_iounmap().
Wolfgang.
^ permalink raw reply
* Re: can: About Socket CAN with MSI issue
From: Dave Airlie @ 2010-10-27 7:56 UTC (permalink / raw)
To: Tomoya MORINAGA
Cc: linux-kernel, netdev, socketcan-core, sameo, 21cnbao, chripell,
w.sang, wg, David Miller, margie.foster, kok.howg.ewe, joel.clark,
andrew.chih.howe.khor, yong.y.wang, qi.wang
In-Reply-To: <002801cb758f$876aaf00$66f8800a@maildom.okisemi.com>
On Wed, Oct 27, 2010 at 2:29 PM, Tomoya MORINAGA
<tomoya-linux@dsn.okisemi.com> wrote:
> We have faced issue when our CAN diriver whose MSI is enabled, after installing the driver,
> once remove the driver and install the driver again,
> As a result, interupt handler of the driver is not called again.
>
> Do you have any information or suggestion about the above issue?
Its a bug in the PCI layer most likely,
http://amailbox.org/mailarchive/linux-kernel/2010/10/7/4629072
Dave.
^ permalink raw reply
* Re: [PATCH] ipv6: addrconf: clear IPv6 addresses and routes when losing link
From: Maciej Żenczykowski @ 2010-10-27 8:35 UTC (permalink / raw)
To: Lorenzo Colitti; +Cc: netdev, Brian Haley
In-Reply-To: <4CC78F19.9080504@hp.com>
So how does all this work with privacy addresses, established
connections, and a link flap?
^ permalink raw reply
* Re: VLAN packets silently dropped in promiscuous mode
From: Guillaume Gaudonville @ 2010-10-27 8:32 UTC (permalink / raw)
To: Jesse Gross; +Cc: Roger Luethi, netdev, Patrick McHardy
In-Reply-To: <AANLkTinH1Q4iq1_wg7HU_==khAtKOTZ2gz79s_EfzLYP@mail.gmail.com>
Jesse Gross wrote:
> On Mon, Oct 25, 2010 at 6:48 AM, Guillaume Gaudonville
> <guillaume.gaudonville@6wind.com> wrote:
>
>> Jesse Gross wrote:
>>
>>> On Fri, Oct 15, 2010 at 2:16 AM, Guillaume Gaudonville
>>> <guillaume.gaudonville@6wind.com> wrote:
>>>
>>>
>>>> Jesse Gross wrote:
>>>>
>>>>
>>>>> On Thu, Sep 30, 2010 at 1:07 AM, Roger Luethi <rl@hellgate.ch> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> On Wed, 29 Sep 2010 10:44:26 -0700, Jesse Gross wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Wed, Sep 29, 2010 at 4:37 AM, Roger Luethi <rl@hellgate.ch> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> I noticed packets for unknown VLANs getting silently dropped even in
>>>>>>>> promiscuous mode (this is true only for the hardware accelerated
>>>>>>>> path).
>>>>>>>> netif_nit_deliver was introduced specifically to prevent that, but
>>>>>>>> the
>>>>>>>> function gets called only _after_ packets from unknown VLANs have
>>>>>>>> been
>>>>>>>> dropped.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> Some drivers are fixing this on a case by case basis by disabling
>>>>>>> hardware accelerated VLAN stripping when in promiscuous mode, i.e.:
>>>>>>>
>>>>>>>
>>>>>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=5f6c01819979afbfec7e0b15fe52371b8eed87e8
>>>>>>>
>>>>>>> However, at this point it is more or less random which drivers do
>>>>>>> this. It would obviously be much better if it were consistent.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> My understanding is this. Hardware VLAN tagging and stripping can
>>>>>> always
>>>>>> be
>>>>>> enabled. The kernel passes 802.1Q information along with the stripped
>>>>>> header to libpcap which reassembles the original header where
>>>>>> necessary.
>>>>>> Works for me.
>>>>>>
>>>>>>
>>>>>>
>>>>> Sorry, I misread your original post as saying that the VLAN header
>>>>> gets dropped, rather than the entire packet. I agree that this is how
>>>>> it should work but not necessarily how it does work (again, depending
>>>>> on the driver). Here's the problem that I was talking about:
>>>>>
>>>>> Most drivers have a snippet of code that looks something like this
>>>>> (taken from ixgbe):
>>>>>
>>>>> if (adapter->vlgrp && is_vlan && (tag & VLAN_VID_MASK))
>>>>> vlan_gro_receive(napi, adapter->vlgrp, tag, skb);
>>>>> else
>>>>> napi_gro_receive(napi, skb);
>>>>>
>>>>> At this point the VLAN has already been stripped in hardware. If
>>>>> there is no VLAN group configured on the device then we hit the second
>>>>> case. The VLAN header was removed from the SKB and the tag variable
>>>>> is unused. It is no longer possible for libpcap to reconstruct the
>>>>> header because the information was thrown away (even the fact that
>>>>> there was a VLAN tag at all).
>>>>>
>>>>> There are a couple ways to fix this:
>>>>>
>>>>> * Turn off VLAN stripping when in promiscuous mode (as done by the ixgbe
>>>>> driver)
>>>>>
>>>>>
>>>>>
>>>> This is not totally true: if changing the MTU ixgbe_change_mtu will call:
>>>> ixgbe_reinit_locked--> ixgbe_up --> ixgbe_configure:
>>>> --> ixgbe_set_rx_mode: flag IFF_PROMISC is tested
>>>> ixgbe_vlan_filter_enable is not called
>>>> --> ixgbe_restore_vlan --> ixgbe_vlan_rx_register: flag
>>>> IFF_PROMISC is not tested ixgbe_vlan_filter_enable
>>>> will be called.
>>>>
>>>> In fact it should happen each time we configure something which needs a
>>>> reset of the device. Why don't add a test
>>>> on flag promiscuous directly in ixgbe_vlan_filter_enable? Or do it on
>>>> each
>>>> call, if we want to allow a device in promiscuous
>>>> mode to enable this feature.
>>>>
>>>> What do you think?
>>>>
>>>>
>>> I can believe that there are paths that lead to this not working
>>> correctly. That was actually my larger point: this is something that
>>> is commonly not implemented correctly in drivers. Rather than try to
>>> study every driver my goal is to just avoid the problem completely by
>>> handling vlan acceleration centrally in the networking core. I sent
>>> out an RFC patch series a few days ago that should solve this problem:
>>>
>>> http://marc.info/?l=linux-netdev&m=128700022614170&w=3
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>>
>> Thank you, I'm going to check these patches and try to apply them in our
>> kernel.
>>
>
> An updated set of patches has been merged into net-2.6, so you might
> want to try that instead.
>
Ok I will, thank you.
--
Guillaume Gaudonville
6WIND
Software Engineer
Tel: +33 1 39 30 92 63
Mob: +33 6 47 85 34 33
Fax: +33 1 39 30 92 11
guillaume.gaudonville@6wind.com
www.6wind.com
Join the Multicore Packet Processing Forum: www.multicorepacketprocessing.com
Ce courriel ainsi que toutes les pièces jointes, est uniquement destiné à son ou ses destinataires. Il contient des informations confidentielles qui sont la propriété de 6WIND. Toute révélation, distribution ou copie des informations qu'il contient est strictement interdite. Si vous avez reçu ce message par erreur, veuillez immédiatement le signaler à l'émetteur et détruire toutes les données reçues
This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and contains information that is confidential and proprietary to 6WIND. All unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
^ permalink raw reply
* [PATCH 0/2] can: pch_can: fix sparse warnings and section mismatch
From: Marc Kleine-Budde @ 2010-10-27 8:38 UTC (permalink / raw)
To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA
Hello,
this patch fixes two problems found during compile time, a sparse warning
and a section mismatch.
These patches apply to net-2.6/master
cheers, Marc
---
The following changes since commit 7a876b0efcba3804da3051313445fa7be751cab7:
IPv6: Temp addresses are immediately deleted. (2010-10-26 12:35:13 -0700)
are available in the git repository at:
git://git.pengutronix.de/git/mkl/linux-2.6.git can/pch_can-for-net-2.6
Marc Kleine-Budde (2):
can: pch_can: fix sparse warning
can: pch_can: fix section mismatch warning by using a whitelisted name
drivers/net/can/pch_can.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)
^ permalink raw reply
* [PATCH 2/2] can: pch_can: fix section mismatch warning by using a whitelisted name
From: Marc Kleine-Budde @ 2010-10-27 8:38 UTC (permalink / raw)
To: socketcan-core-0fE9KPoRgkgATYTw5x5z8w
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Marc Kleine-Budde
In-Reply-To: <1288168706-870-1-git-send-email-mkl-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
This patch fixes the following section mismatch warning:
WARNING: drivers/net/can/pch_can.o(.data+0x18):
Section mismatch in reference from the variable pch_can_pcidev
to the variable .devinit.rodata:pch_pci_tbl
The variable pch_can_pcidev references
the variable __devinitconst pch_pci_tbl
This is actually a false positive which is fixed by giving the offending
variable a whitelisted name, it's renamed to "pch_can_pci_driver".
This makes sense because the variable is of the type "struct pci_driver".
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
---
drivers/net/can/pch_can.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/can/pch_can.c b/drivers/net/can/pch_can.c
index c63209f..6727182 100644
--- a/drivers/net/can/pch_can.c
+++ b/drivers/net/can/pch_can.c
@@ -1437,7 +1437,7 @@ probe_exit_endev:
return rc;
}
-static struct pci_driver pch_can_pcidev = {
+static struct pci_driver pch_can_pci_driver = {
.name = "pch_can",
.id_table = pch_pci_tbl,
.probe = pch_can_probe,
@@ -1448,13 +1448,13 @@ static struct pci_driver pch_can_pcidev = {
static int __init pch_can_pci_init(void)
{
- return pci_register_driver(&pch_can_pcidev);
+ return pci_register_driver(&pch_can_pci_driver);
}
module_init(pch_can_pci_init);
static void __exit pch_can_pci_exit(void)
{
- pci_unregister_driver(&pch_can_pcidev);
+ pci_unregister_driver(&pch_can_pci_driver);
}
module_exit(pch_can_pci_exit);
--
1.7.2.3
_______________________________________________
Socketcan-core mailing list
Socketcan-core@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/socketcan-core
^ permalink raw reply related
* [PATCH 1/2] can: pch_can: fix sparse warning
From: Marc Kleine-Budde @ 2010-10-27 8:38 UTC (permalink / raw)
To: socketcan-core; +Cc: netdev, Marc Kleine-Budde, Tomoya MORINAGA
In-Reply-To: <1288168706-870-1-git-send-email-mkl@pengutronix.de>
This patch fixes the following sparse warning:
drivers/net/can/pch_can.c:231:26: warning: incorrect type in argument 1 (different address spaces)
drivers/net/can/pch_can.c:231:26: expected unsigned int [usertype] *addr
drivers/net/can/pch_can.c:231:26: got unsigned int [noderef] <asn:2>*<noident>
Let pch_can_bit_{set,clear} first parameter be a void __iomem pointer.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com>
---
drivers/net/can/pch_can.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/pch_can.c b/drivers/net/can/pch_can.c
index 55ec324..c63209f 100644
--- a/drivers/net/can/pch_can.c
+++ b/drivers/net/can/pch_can.c
@@ -213,12 +213,12 @@ static DEFINE_PCI_DEVICE_TABLE(pch_pci_tbl) = {
};
MODULE_DEVICE_TABLE(pci, pch_pci_tbl);
-static inline void pch_can_bit_set(u32 *addr, u32 mask)
+static inline void pch_can_bit_set(void __iomem *addr, u32 mask)
{
iowrite32(ioread32(addr) | mask, addr);
}
-static inline void pch_can_bit_clear(u32 *addr, u32 mask)
+static inline void pch_can_bit_clear(void __iomem *addr, u32 mask)
{
iowrite32(ioread32(addr) & ~mask, addr);
}
--
1.7.2.3
^ permalink raw reply related
* [PATCH 1/1] netxen: fix kdump
From: Amit Kumar Salecha @ 2010-10-27 8:51 UTC (permalink / raw)
To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty, Rajesh Borundia
From: Rajesh Borundia <rajesh.borundia@qlogic.com>
Reset the whole hw instead of freeing hw resources
consumed by each pci function.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
drivers/net/netxen/netxen_nic_ctx.c | 15 ---------------
drivers/net/netxen/netxen_nic_main.c | 7 +++++++
2 files changed, 7 insertions(+), 15 deletions(-)
diff --git a/drivers/net/netxen/netxen_nic_ctx.c b/drivers/net/netxen/netxen_nic_ctx.c
index 1261212..f7d06cb 100644
--- a/drivers/net/netxen/netxen_nic_ctx.c
+++ b/drivers/net/netxen/netxen_nic_ctx.c
@@ -255,19 +255,6 @@ out_free_rq:
}
static void
-nx_fw_cmd_reset_ctx(struct netxen_adapter *adapter)
-{
-
- netxen_issue_cmd(adapter, adapter->ahw.pci_func, NXHAL_VERSION,
- adapter->ahw.pci_func, NX_DESTROY_CTX_RESET, 0,
- NX_CDRP_CMD_DESTROY_RX_CTX);
-
- netxen_issue_cmd(adapter, adapter->ahw.pci_func, NXHAL_VERSION,
- adapter->ahw.pci_func, NX_DESTROY_CTX_RESET, 0,
- NX_CDRP_CMD_DESTROY_TX_CTX);
-}
-
-static void
nx_fw_cmd_destroy_rx_ctx(struct netxen_adapter *adapter)
{
struct netxen_recv_context *recv_ctx = &adapter->recv_ctx;
@@ -698,8 +685,6 @@ int netxen_alloc_hw_resources(struct netxen_adapter *adapter)
if (!NX_IS_REVISION_P2(adapter->ahw.revision_id)) {
if (test_and_set_bit(__NX_FW_ATTACHED, &adapter->state))
goto done;
- if (reset_devices)
- nx_fw_cmd_reset_ctx(adapter);
err = nx_fw_cmd_create_rx_ctx(adapter);
if (err)
goto err_out_free;
diff --git a/drivers/net/netxen/netxen_nic_main.c b/drivers/net/netxen/netxen_nic_main.c
index 50820be..35ae1aa 100644
--- a/drivers/net/netxen/netxen_nic_main.c
+++ b/drivers/net/netxen/netxen_nic_main.c
@@ -1356,6 +1356,13 @@ netxen_nic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
break;
}
+ if (reset_devices) {
+ if (adapter->portnum == 0) {
+ NXWR32(adapter, NX_CRB_DEV_REF_COUNT, 0);
+ adapter->need_fw_reset = 1;
+ }
+ }
+
err = netxen_start_firmware(adapter);
if (err)
goto err_out_decr_ref;
--
1.6.0.2
^ permalink raw reply related
* [PATCH] conntrack: allow nf_ct_alloc_hashtable() to get highmem pages
From: Eric Dumazet @ 2010-10-27 9:09 UTC (permalink / raw)
To: David Miller, Patrick McHardy
Cc: netdev, Netfilter Development Mailinglist, stable
commit ea781f197d6a8 (use SLAB_DESTROY_BY_RCU and get rid of call_rcu())
did a mistake in __vmalloc() call in nf_ct_alloc_hashtable().
I forgot to add __GFP_HIGHMEM, so pages were taken from LOWMEM only.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
net/netfilter/nf_conntrack_core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 1eacf8d..27a5ea6 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1312,7 +1312,8 @@ void *nf_ct_alloc_hashtable(unsigned int *sizep, int *vmalloced, int nulls)
if (!hash) {
*vmalloced = 1;
printk(KERN_WARNING "nf_conntrack: falling back to vmalloc.\n");
- hash = __vmalloc(sz, GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL);
+ hash = __vmalloc(sz, GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO,
+ PAGE_KERNEL);
}
if (hash && nulls)
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox