* [PATCH] neigh: reorder struct neighbour
From: Eric Dumazet @ 2010-11-11 16:57 UTC (permalink / raw)
To: David Miller; +Cc: netdev
It is important to move nud_state outside of the often modified cache
line (because of refcnt), to reduce false sharing in neigh_event_send()
This is a followup of commit 0ed8ddf4045f (neigh: Protect neigh->ha[]
with a seqlock)
This gives a 7% speedup on routing test with IP route cache disabled.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
David, it appears I forgot to push this patch, I have it in my tree
since one month. Thanks !
include/net/neighbour.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 55590ab..815b2ce 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -96,16 +96,16 @@ struct neighbour {
struct neigh_parms *parms;
unsigned long confirmed;
unsigned long updated;
- __u8 flags;
- __u8 nud_state;
- __u8 type;
- __u8 dead;
+ rwlock_t lock;
atomic_t refcnt;
struct sk_buff_head arp_queue;
struct timer_list timer;
unsigned long used;
atomic_t probes;
- rwlock_t lock;
+ __u8 flags;
+ __u8 nud_state;
+ __u8 type;
+ __u8 dead;
seqlock_t ha_lock;
unsigned char ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))];
struct hh_cache *hh;
^ permalink raw reply related
* Re: [PATCH] net: b43legacy: fix compile error
From: Larry Finger @ 2010-11-11 17:06 UTC (permalink / raw)
To: John W. Linville
Cc: Arnd Hannemann, David S. Miller, netdev, linux-kernel,
linux-wireless, Eric Dumazet
In-Reply-To: <20101111163531.GB2559@tuxdriver.com>
On 11/11/2010 10:35 AM, John W. Linville wrote:
> On Mon, Oct 25, 2010 at 10:13:06PM +0200, Arnd Hannemann wrote:
>> Am 25.10.2010 20:59, schrieb Larry Finger:
>>> On 10/25/2010 01:44 PM, Arnd Hannemann wrote:
>>>> Am 25.10.2010 20:36, schrieb Larry Finger:
>>>>> On 10/25/2010 01:26 PM, Arnd Hannemann wrote:
>>>>>> Am 25.10.2010 17:32, schrieb Larry Finger:
>>>>>>> On 10/25/2010 09:41 AM, Arnd Hannemann wrote:
>>>>>>>> On todays linus tree the following compile error happened to me:
>>>>>>>>
>>>>>>>> CC [M] drivers/net/wireless/b43legacy/xmit.o
>>>>>>>> In file included from include/net/dst.h:11,
>>>>>>>> from drivers/net/wireless/b43legacy/xmit.c:31:
>>>>>>>> include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__' before '____cacheline_aligned_in_smp'
>>>>>>>> include/net/dst_ops.h: In function 'dst_entries_get_fast':
>>>>>>>> include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named 'pcpuc_entries'
>>>>>>>> include/net/dst_ops.h: In function 'dst_entries_get_slow':
>>>>>>>> include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named 'pcpuc_entries'
>>>>>>>> include/net/dst_ops.h: In function 'dst_entries_add':
>>>>>>>> include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named 'pcpuc_entries'
>>>>>>>> include/net/dst_ops.h: In function 'dst_entries_init':
>>>>>>>> include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named 'pcpuc_entries'
>>>>>>>> include/net/dst_ops.h: In function 'dst_entries_destroy':
>>>>>>>> include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named 'pcpuc_entries'
>>>>>>>> make[4]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
>>>>>>>> make[3]: *** [drivers/net/wireless/b43legacy] Error 2
>>>>>>>> make[2]: *** [drivers/net/wireless] Error 2
>>>>>>>> make[1]: *** [drivers/net] Error 2
>>>>>>>> make: *** [drivers] Error 2
>>>>>>>>
>>>>>>>> This patch fixes this issue by adding "linux/cache.h" as an include to
>>>>>>>> "include/net/dst_ops.h".
>>>>>>>
>>>>>>> Strange. Compiling b43legacy from the linux-2.6.git tree (git describe is
>>>>>>> v2.6.36-4464-g229aebb) works fine on x86_64. I wonder what is different.
>>>>>>
>>>>>> Exactly the same git describe here.
>>>>>> Maybe your arch includes cache.h already, in my case its a compile for ARM (shmobile).
>>>>>
>>>>> That probably makes the difference. Using Eric's fix that removes the #include
>>>>> <linux/dst.h> should be better. Does it work for you?
>>>>>
>>>>> There are probably a lot more of the system includes that may not be needed. If
>>>>> I send you a patch removing them, could you test?
>>>>
>>>> As it turns out my card is not supported by b43legacy, but compilation testing,
>>>> sure I can test that.
>>>
>>> If it is a Broadcom card, it is likely handled by b43.
>>
>> Yes. It seems it should work with b43 (its an SDIO card) and it almost does...
>>
>>> Attached is a trial removal of a number of include statements. Does it compile?
>>
>> Nope:
>> NSTALL_MOD_PATH=/home/arnd/projekte/renesas-2/nfs modules
>> CHK include/linux/version.h
>> CHK include/generated/utsrelease.h
>> make[1]: `include/generated/mach-types.h' is up to date.
>> CALL scripts/checksyscalls.sh
>> CC [M] drivers/net/wireless/b43legacy/main.o
>> drivers/net/wireless/b43legacy/main.c: In function 'b43legacy_upload_microcode':
>> drivers/net/wireless/b43legacy/main.c:1688: error: implicit declaration of function 'signal_pending'
>> make[4]: *** [drivers/net/wireless/b43legacy/main.o] Error 1
>> make[3]: *** [drivers/net/wireless/b43legacy] Error 2
>> make[2]: *** [drivers/net/wireless] Error 2
>> make[1]: *** [drivers/net] Error 2
>> make: *** [drivers] Error 2
>
> Is this issue resolved? Should I be expecting a b43 patch?
I don't know if a similar patch for b43 is needed. I tried to set up a cross
compiler for ARM. My initial attempt failed and I did not have time to explore
the situation. If anyone has links to a cross-compiler solution for x86_64 on
openSUSE, please let me know.
Larry
^ permalink raw reply
* Re: [PATCH] net: b43legacy: fix compile error
From: John W. Linville @ 2010-11-11 17:07 UTC (permalink / raw)
To: Larry Finger
Cc: Arnd Hannemann, David S. Miller, netdev, linux-kernel,
linux-wireless, Eric Dumazet
In-Reply-To: <4CDC2298.1050803@lwfinger.net>
On Thu, Nov 11, 2010 at 11:06:32AM -0600, Larry Finger wrote:
> On 11/11/2010 10:35 AM, John W. Linville wrote:
> > On Mon, Oct 25, 2010 at 10:13:06PM +0200, Arnd Hannemann wrote:
> >> Am 25.10.2010 20:59, schrieb Larry Finger:
> >>> On 10/25/2010 01:44 PM, Arnd Hannemann wrote:
> >>>> Am 25.10.2010 20:36, schrieb Larry Finger:
> >>>>> On 10/25/2010 01:26 PM, Arnd Hannemann wrote:
> >>>>>> Am 25.10.2010 17:32, schrieb Larry Finger:
> >>>>>>> On 10/25/2010 09:41 AM, Arnd Hannemann wrote:
> >>>>>>>> On todays linus tree the following compile error happened to me:
> >>>>>>>>
> >>>>>>>> CC [M] drivers/net/wireless/b43legacy/xmit.o
> >>>>>>>> In file included from include/net/dst.h:11,
> >>>>>>>> from drivers/net/wireless/b43legacy/xmit.c:31:
> >>>>>>>> include/net/dst_ops.h:28: error: expected ':', ',', ';', '}' or '__attribute__' before '____cacheline_aligned_in_smp'
> >>>>>>>> include/net/dst_ops.h: In function 'dst_entries_get_fast':
> >>>>>>>> include/net/dst_ops.h:33: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> >>>>>>>> include/net/dst_ops.h: In function 'dst_entries_get_slow':
> >>>>>>>> include/net/dst_ops.h:41: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> >>>>>>>> include/net/dst_ops.h: In function 'dst_entries_add':
> >>>>>>>> include/net/dst_ops.h:49: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> >>>>>>>> include/net/dst_ops.h: In function 'dst_entries_init':
> >>>>>>>> include/net/dst_ops.h:55: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> >>>>>>>> include/net/dst_ops.h: In function 'dst_entries_destroy':
> >>>>>>>> include/net/dst_ops.h:60: error: 'struct dst_ops' has no member named 'pcpuc_entries'
> >>>>>>>> make[4]: *** [drivers/net/wireless/b43legacy/xmit.o] Error 1
> >>>>>>>> make[3]: *** [drivers/net/wireless/b43legacy] Error 2
> >>>>>>>> make[2]: *** [drivers/net/wireless] Error 2
> >>>>>>>> make[1]: *** [drivers/net] Error 2
> >>>>>>>> make: *** [drivers] Error 2
> >>>>>>>>
> >>>>>>>> This patch fixes this issue by adding "linux/cache.h" as an include to
> >>>>>>>> "include/net/dst_ops.h".
> >>>>>>>
> >>>>>>> Strange. Compiling b43legacy from the linux-2.6.git tree (git describe is
> >>>>>>> v2.6.36-4464-g229aebb) works fine on x86_64. I wonder what is different.
> >>>>>>
> >>>>>> Exactly the same git describe here.
> >>>>>> Maybe your arch includes cache.h already, in my case its a compile for ARM (shmobile).
> >>>>>
> >>>>> That probably makes the difference. Using Eric's fix that removes the #include
> >>>>> <linux/dst.h> should be better. Does it work for you?
> >>>>>
> >>>>> There are probably a lot more of the system includes that may not be needed. If
> >>>>> I send you a patch removing them, could you test?
> >>>>
> >>>> As it turns out my card is not supported by b43legacy, but compilation testing,
> >>>> sure I can test that.
> >>>
> >>> If it is a Broadcom card, it is likely handled by b43.
> >>
> >> Yes. It seems it should work with b43 (its an SDIO card) and it almost does...
> >>
> >>> Attached is a trial removal of a number of include statements. Does it compile?
> >>
> >> Nope:
> >> NSTALL_MOD_PATH=/home/arnd/projekte/renesas-2/nfs modules
> >> CHK include/linux/version.h
> >> CHK include/generated/utsrelease.h
> >> make[1]: `include/generated/mach-types.h' is up to date.
> >> CALL scripts/checksyscalls.sh
> >> CC [M] drivers/net/wireless/b43legacy/main.o
> >> drivers/net/wireless/b43legacy/main.c: In function 'b43legacy_upload_microcode':
> >> drivers/net/wireless/b43legacy/main.c:1688: error: implicit declaration of function 'signal_pending'
> >> make[4]: *** [drivers/net/wireless/b43legacy/main.o] Error 1
> >> make[3]: *** [drivers/net/wireless/b43legacy] Error 2
> >> make[2]: *** [drivers/net/wireless] Error 2
> >> make[1]: *** [drivers/net] Error 2
> >> make: *** [drivers] Error 2
> >
> > Is this issue resolved? Should I be expecting a b43 patch?
>
> I don't know if a similar patch for b43 is needed. I tried to set up a cross
> compiler for ARM. My initial attempt failed and I did not have time to explore
> the situation. If anyone has links to a cross-compiler solution for x86_64 on
> openSUSE, please let me know.
Sorry, I mean b43legacy -- will there be a formal patch?
--
John W. Linville Someday the world will need a hero, and you
linville@tuxdriver.com might be all we have. Be ready.
^ permalink raw reply
* [PATCH net-next-2.6] net: get rid of rtable->idev
From: Eric Dumazet @ 2010-11-11 17:14 UTC (permalink / raw)
To: David Miller, Herbert Xu
Cc: netdev, Roland Dreier, Sean Hefty, Hal Rosenstock
It seems idev field in struct rtable has no special purpose, but adding
extra atomic ops.
We hold refcounts on the device itself (using percpu data, so pretty
cheap in current kernel).
infiniband case is solved using dst.dev instead of idev->dev
Removal of this field means routing without route cache is now using
shared data, percpu data, and only potential contention is a pair of
atomic ops on struct neighbour per forwarded packet.
About 5% speedup on routing test.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
---
drivers/infiniband/core/addr.c | 8 +++---
include/net/route.h | 2 -
net/ipv4/route.c | 37 +++----------------------------
net/ipv4/xfrm4_policy.c | 24 --------------------
4 files changed, 8 insertions(+), 63 deletions(-)
diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index a5ea1bc..c15fd2e 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -200,7 +200,7 @@ static int addr4_resolve(struct sockaddr_in *src_in,
src_in->sin_family = AF_INET;
src_in->sin_addr.s_addr = rt->rt_src;
- if (rt->idev->dev->flags & IFF_LOOPBACK) {
+ if (rt->dst.dev->flags & IFF_LOOPBACK) {
ret = rdma_translate_ip((struct sockaddr *) dst_in, addr);
if (!ret)
memcpy(addr->dst_dev_addr, addr->src_dev_addr, MAX_ADDR_LEN);
@@ -208,12 +208,12 @@ static int addr4_resolve(struct sockaddr_in *src_in,
}
/* If the device does ARP internally, return 'done' */
- if (rt->idev->dev->flags & IFF_NOARP) {
- rdma_copy_addr(addr, rt->idev->dev, NULL);
+ if (rt->dst.dev->flags & IFF_NOARP) {
+ rdma_copy_addr(addr, rt->dst.dev, NULL);
goto put;
}
- neigh = neigh_lookup(&arp_tbl, &rt->rt_gateway, rt->idev->dev);
+ neigh = neigh_lookup(&arp_tbl, &rt->rt_gateway, rt->dst.dev);
if (!neigh || !(neigh->nud_state & NUD_VALID)) {
neigh_event_send(rt->dst.neighbour, NULL);
ret = -ENODATA;
diff --git a/include/net/route.h b/include/net/route.h
index 7e5e73b..cea533e 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -55,8 +55,6 @@ struct rtable {
/* Cache lookup keys */
struct flowi fl;
- struct in_device *idev;
-
int rt_genid;
unsigned rt_flags;
__u16 rt_type;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 987bf9a..5955965 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -140,13 +140,15 @@ static unsigned long expires_ljiffies;
static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie);
static void ipv4_dst_destroy(struct dst_entry *dst);
-static void ipv4_dst_ifdown(struct dst_entry *dst,
- struct net_device *dev, int how);
static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst);
static void ipv4_link_failure(struct sk_buff *skb);
static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu);
static int rt_garbage_collect(struct dst_ops *ops);
+static void ipv4_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
+ int how)
+{
+}
static struct dst_ops ipv4_dst_ops = {
.family = AF_INET,
@@ -1433,8 +1435,6 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
rt->dst.child = NULL;
if (rt->dst.dev)
dev_hold(rt->dst.dev);
- if (rt->idev)
- in_dev_hold(rt->idev);
rt->dst.obsolete = -1;
rt->dst.lastuse = jiffies;
rt->dst.path = &rt->dst;
@@ -1728,33 +1728,13 @@ static void ipv4_dst_destroy(struct dst_entry *dst)
{
struct rtable *rt = (struct rtable *) dst;
struct inet_peer *peer = rt->peer;
- struct in_device *idev = rt->idev;
if (peer) {
rt->peer = NULL;
inet_putpeer(peer);
}
-
- if (idev) {
- rt->idev = NULL;
- in_dev_put(idev);
- }
}
-static void ipv4_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
- int how)
-{
- struct rtable *rt = (struct rtable *) dst;
- struct in_device *idev = rt->idev;
- if (dev != dev_net(dev)->loopback_dev && idev && idev->dev == dev) {
- struct in_device *loopback_idev =
- in_dev_get(dev_net(dev)->loopback_dev);
- if (loopback_idev) {
- rt->idev = loopback_idev;
- in_dev_put(idev);
- }
- }
-}
static void ipv4_link_failure(struct sk_buff *skb)
{
@@ -1910,7 +1890,6 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
rth->fl.iif = dev->ifindex;
rth->dst.dev = init_net.loopback_dev;
dev_hold(rth->dst.dev);
- rth->idev = in_dev_get(rth->dst.dev);
rth->fl.oif = 0;
rth->rt_gateway = daddr;
rth->rt_spec_dst= spec_dst;
@@ -2050,7 +2029,6 @@ static int __mkroute_input(struct sk_buff *skb,
rth->fl.iif = in_dev->dev->ifindex;
rth->dst.dev = (out_dev)->dev;
dev_hold(rth->dst.dev);
- rth->idev = in_dev_get(rth->dst.dev);
rth->fl.oif = 0;
rth->rt_spec_dst= spec_dst;
@@ -2231,7 +2209,6 @@ local_input:
rth->fl.iif = dev->ifindex;
rth->dst.dev = net->loopback_dev;
dev_hold(rth->dst.dev);
- rth->idev = in_dev_get(rth->dst.dev);
rth->rt_gateway = daddr;
rth->rt_spec_dst= spec_dst;
rth->dst.input= ip_local_deliver;
@@ -2417,9 +2394,6 @@ static int __mkroute_output(struct rtable **result,
if (!rth)
return -ENOBUFS;
- in_dev_hold(in_dev);
- rth->idev = in_dev;
-
atomic_set(&rth->dst.__refcnt, 1);
rth->dst.flags= DST_HOST;
if (IN_DEV_CONF_GET(in_dev, NOXFRM))
@@ -2759,9 +2733,6 @@ static int ipv4_dst_blackhole(struct net *net, struct rtable **rp, struct flowi
rt->fl = ort->fl;
- rt->idev = ort->idev;
- if (rt->idev)
- in_dev_hold(rt->idev);
rt->rt_genid = rt_genid(net);
rt->rt_flags = ort->rt_flags;
rt->rt_type = ort->rt_type;
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 4464f3b..dd1fd8c 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -80,10 +80,6 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
xdst->u.dst.dev = dev;
dev_hold(dev);
- xdst->u.rt.idev = in_dev_get(dev);
- if (!xdst->u.rt.idev)
- return -ENODEV;
-
xdst->u.rt.peer = rt->peer;
if (rt->peer)
atomic_inc(&rt->peer->refcnt);
@@ -189,8 +185,6 @@ static void xfrm4_dst_destroy(struct dst_entry *dst)
{
struct xfrm_dst *xdst = (struct xfrm_dst *)dst;
- if (likely(xdst->u.rt.idev))
- in_dev_put(xdst->u.rt.idev);
if (likely(xdst->u.rt.peer))
inet_putpeer(xdst->u.rt.peer);
xfrm_dst_destroy(xdst);
@@ -199,27 +193,9 @@ static void xfrm4_dst_destroy(struct dst_entry *dst)
static void xfrm4_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
int unregister)
{
- struct xfrm_dst *xdst;
-
if (!unregister)
return;
- xdst = (struct xfrm_dst *)dst;
- if (xdst->u.rt.idev->dev == dev) {
- struct in_device *loopback_idev =
- in_dev_get(dev_net(dev)->loopback_dev);
- BUG_ON(!loopback_idev);
-
- do {
- in_dev_put(xdst->u.rt.idev);
- xdst->u.rt.idev = loopback_idev;
- in_dev_hold(loopback_idev);
- xdst = (struct xfrm_dst *)xdst->u.dst.child;
- } while (xdst->u.dst.xfrm);
-
- __in_dev_put(loopback_idev);
- }
-
xfrm_dst_ifdown(dst, dev);
}
^ permalink raw reply related
* Re: [PATCH 01/10] cxgb4vf: minor comment/symbolic name cleanup.
From: Casey Leedom @ 2010-11-11 17:17 UTC (permalink / raw)
To: Joe Perches; +Cc: netdev, davem
In-Reply-To: <1289445578.15905.173.camel@Joe-Laptop>
[[Second attempt: First effort at replying got dropped because of local relay
security insanity ... and would have been dropped in any case because my mailer
was configured for HTML "Rich Text. -- Casey]]
| | From: Joe Perches <joe@perches.com>
| | Date: Wednesday, November 10, 2010 07:19 pm
| |
| | > const struct port_info *pi = netdev_priv(dev);
| | > int qs, msi;
| | >
| | > - for (qs = 0, msi = MSIX_NIQFLINT;
| | > + for (qs = 0, msi = MSIX_IQFLINT;
| | >
| | > qs < pi->nqsets;
| | > qs++, msi++) {
| |
| | This for now fits on a single line.
| |
| | > - struct fw_cmd_hdr *cmd_hdr = (struct fw_cmd_hdr *)rpl;
| | > + const struct fw_cmd_hdr *cmd_hdr = (const struct fw_cmd_hdr *)rpl;
| | > @@ -1265,7 +1265,7 @@ int t4vf_handle_fw_rpl(struct adapter *adapter,
| | > const __be64 *rpl) - const struct fw_port_cmd *port_cmd = (void *)rpl;
| | > + const struct fw_port_cmd *port_cmd = (const void *)rpl;
| |
| | might be better to have a consistent casting style.
| | 1st uses a direct cast, 2nd an implicit one.
|
| Sure. These both look good. May I fix these in a follow up patch or
| should I respin the patches with this last change?
[[To which Joe responded to me because of the aforementioned HTML email botch:]]
| Date: Thu Nov 11 06:55:43 2010
| From: Joe Perches <joe@perches.com>
| To: Casey Leedom <leedom@chelsio.com>
|
| It's just trivial formatting.
| Do what you think appropriate.
|
| fyi: linux-kernel doesn't accept emails with html formatting.
| You'll need to default to text otherwise your emails to
| any kernel related list will be silently rejected.
|
| cheers, Joe
Thanks Joe!
Casey
^ permalink raw reply
* Re: [PATCH] macvlan: lockless tx path
From: Ben Greear @ 2010-11-11 17:20 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1289494618.17691.1498.camel@edumazet-laptop>
On 11/11/2010 08:56 AM, Eric Dumazet wrote:
> Le jeudi 11 novembre 2010 à 08:40 -0800, Ben Greear a écrit :
>
>> So, they assume counters are exactly 32 or 64 bits.
>> Your example of the 36-bit counter would break their
>> assumptions of 32 or 64 bits.
>>
>
> They 'assume'. I am not. How do you handle counters that suddenly go to
> 0 ?
I've shown you my algorithm that requires one to know the counter
width, and in response you offered on that will work with 32 OR 64 bits,
as long as you make some assumptions.
If you have an algorithm that can properly deal with wrapped counters of
arbitrary bits, then post it.
As for counters that go to zero, if the previous value was > 0, then
it was a wrap. There is a reason Dave won't let anyone add a patch to
clear network counters. If the network device was removed and came back,
then you must listen for those events and take proper precautions (set
prev to 0 before sampling any counters, for instance).
>> I agree that you can guess if the counter is 32 or 64, at least with today's
>> hardware and relatively normal poll times, and the requirement that the
>> counters can ONLY be 32 or 64 bits. I still consider it a kludge to
>> return 32 bit counters in stats64, however. Would you consider
>> a patch to have netlink pay attention to whether the stats are 32 or
>> 64 (based on a flag returned from dev_get_stats perhaps)?
>
> So what ? How is it going to help /proc/net/dev users (most apps use it)
At least they will have an opportunity to use a better defined API
if they wish. And, if you only want stats for one interface, and
you have 4000 VLANs in the system, reading /proc/net/dev seems quite
inefficient.
> Could you please adapt your software, and not adapt linux to your
> needs ? Dont your software runs on linux 2.6.32 ?
Yes, I used to carry a patch that allowed direct access to the netdev_stats,
and I'd fall back to parsing /proc/net/dev on standard kernels. I've now
moved to using netlink API as that seems the preferred method going
forward and allows the granularity (ie, get stats for a single device)
that I prefer.
And, I'll certainly keep trying to improve Linux, because if it helps
my needs, it may help others. If it causes harm to others, then hopefully
someone will notice and reject my patch or help me to see how to make
it better.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
* Re: [PATCH 00/10] cxgb4vf: a number of bug fixes and minor cleanup
From: David Miller @ 2010-11-11 17:33 UTC (permalink / raw)
To: leedom; +Cc: netdev
In-Reply-To: <450C6E98-D0D9-42A6-9F15-4DB98647DB04@chelsio.com>
From: Casey Leedom <leedom@chelsio.com>
Date: Wed, 10 Nov 2010 18:01:33 -0800
> The following patch set includes a number of bug fixes and some
> minor cleanup for the cxgb4vf network driver. If there are any
> problems with formatting, etc. of the patch set please just reject
> the patch set and kick my butt ― I'm still a novice at putting these
> together!
This is a mix of bug fixes (which would be appropriate for net-2.6)
and also changes like cleanups and minor feature adds which are not
appropriate in net-2.6
Please split them up.
I'd suggest sending out only the real bug fixes so I can quickly get
them into net-2.6, and then later you can submit the cleanups and
feature-adds realtive to the net-2.6 stuff.
^ permalink raw reply
* SEASON LOAN!
From: PRIVATE HOME LENDER INC @ 2010-11-11 17:35 UTC (permalink / raw)
To: thiery455
Apply for a loan to establish your business.
Our interest is very affordable and our loan process is very fast as well as
percentage rate of 2.5% yea rly from $5 000.00 Min. To $100 000 000.00 M a x.
Contact us via email if you are interested with the following details
NAME:
PHONE:
DURATION:
ADDRESS:
AMOUNT:
Regards,
Private Money Lender Inc
phl111mikejack@live.com.my
^ permalink raw reply
* Re: [PATCH 00/10] cxgb4vf: a number of bug fixes and minor cleanup
From: Casey Leedom @ 2010-11-11 17:40 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20101111.093315.183049721.davem@davemloft.net>
| From: David Miller <davem@davemloft.net>
| Date: Thursday, November 11, 2010 09:33 am
|
| From: Casey Leedom <leedom@chelsio.com>
| Date: Wed, 10 Nov 2010 18:01:33 -0800
|
| > The following patch set includes a number of bug fixes and some
| > minor cleanup for the cxgb4vf network driver. If there are any
| > problems with formatting, etc. of the patch set please just reject
| > the patch set and kick my butt ― I'm still a novice at putting these
| > together!
|
| This is a mix of bug fixes (which would be appropriate for net-2.6)
| and also changes like cleanups and minor feature adds which are not
| appropriate in net-2.6
|
| Please split them up.
|
| I'd suggest sending out only the real bug fixes so I can quickly get
| them into net-2.6, and then later you can submit the cleanups and
| feature-adds realtive to the net-2.6 stuff.
Okay, sorry, I didn't know what the policies were. It'll take a bit for me to
wrangle "git send-email" into doing this. Two new patch sets coming before the
end of the day!
Casey
^ permalink raw reply
* Re: [PATCH net-next-2.6] vlan: lockless transmit path
From: Jesse Gross @ 2010-11-11 17:40 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Patrick McHardy
In-Reply-To: <1289468520.17691.1018.camel@edumazet-laptop>
On Thu, Nov 11, 2010 at 1:42 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> vlan is a stacked device, like tunnels. We should use the lockless
> mechanism we are using in tunnels and loopback.
>
> This patch completely removes locking in TX path.
>
> tx stat counters are added into existing percpu stat structure, renamed
> from vlan_rx_stats to vlan_pcpu_stats.
>
> Note : this partially reverts commit 2e59af3dcbdf (vlan: multiqueue vlan
> device)
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Patrick McHardy <kaber@trash.net>
> ---
> net/8021q/vlan.c | 4 -
> net/8021q/vlan.h | 16 +++++--
> net/8021q/vlan_core.c | 4 -
> net/8021q/vlan_dev.c | 78 +++++++++++++++++++++----------------
> net/8021q/vlan_netlink.c | 20 ---------
> 5 files changed, 59 insertions(+), 63 deletions(-)
>
> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index 52077ca..2f54ce8 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -272,13 +272,11 @@ static int register_vlan_device(struct net_device *real_dev, u16 vlan_id)
> snprintf(name, IFNAMSIZ, "vlan%.4i", vlan_id);
> }
>
> - new_dev = alloc_netdev_mq(sizeof(struct vlan_dev_info), name,
> - vlan_setup, real_dev->num_tx_queues);
> + new_dev = alloc_netdev(sizeof(struct vlan_dev_info), name, vlan_setup);
If we're only allocating a single queue then we should also drop
vlan_dev_select_queue() and the netdev_ops that call it. If the
underlying device is multiqueue and has its own select_queue function
then it can pick a queue number that is larger than what the vlan
device has. The problem will be caught by dev_cap_txqueue() but it's
not right and it would also be nice to get rid of half of those
netdev_ops.
^ permalink raw reply
* Re: [PATCH net-next-2.6] vlan: lockless transmit path
From: Eric Dumazet @ 2010-11-11 17:52 UTC (permalink / raw)
To: Jesse Gross; +Cc: David Miller, netdev, Patrick McHardy
In-Reply-To: <AANLkTikBkUNfrfSZkoVd1xzisD3jgg5kVUUZC45gNpde@mail.gmail.com>
Le jeudi 11 novembre 2010 à 09:40 -0800, Jesse Gross a écrit :
> If we're only allocating a single queue then we should also drop
> vlan_dev_select_queue() and the netdev_ops that call it. If the
> underlying device is multiqueue and has its own select_queue function
> then it can pick a queue number that is larger than what the vlan
> device has. The problem will be caught by dev_cap_txqueue() but it's
> not right and it would also be nice to get rid of half of those
> netdev_ops.
Hmm, you refer to old kernels dont you ?
My patch is for net-next
The plan is that after last Tom Herbert patches, dev_pick_tx() wont
call do_select_queue() on mono queue device.
http://patchwork.ozlabs.org/patch/70369/
This logicaly is a second cleanup patch I believe.
^ permalink raw reply
* Re: [PATCH] macvlan: lockless tx path
From: Eric Dumazet @ 2010-11-11 18:02 UTC (permalink / raw)
To: Ben Greear; +Cc: netdev
In-Reply-To: <4CDC25F5.7010501@candelatech.com>
Le jeudi 11 novembre 2010 à 09:20 -0800, Ben Greear a écrit :
> I've shown you my algorithm that requires one to know the counter
> width, and in response you offered on that will work with 32 OR 64 bits,
> as long as you make some assumptions.
>
> If you have an algorithm that can properly deal with wrapped counters of
> arbitrary bits, then post it.
>
Its only a generalization of RRD algo
No rocket science I am afraid.
You can try all the numbers in `seq 32 64` in this order, to get a
generic algo.
If you can certify all your devices are 32 or 64, then the RRD method is
OK.
^ permalink raw reply
* Re: vxge update version 3
From: David Miller @ 2010-11-11 18:12 UTC (permalink / raw)
To: jon.mason; +Cc: netdev, sivakumar.subramani, sreenivasa.honnur
In-Reply-To: <1289485564-8822-1-git-send-email-jon.mason@exar.com>
From: Jon Mason <jon.mason@exar.com>
Date: Thu, 11 Nov 2010 08:25:52 -0600
> This version of the patch series includes Ben Hutchings' comments on the
> firmware flashing patch and resolves the merge issues with the latest
> code.
All applied, thank you.
^ permalink raw reply
* Re: [PATCH] macvlan: lockless tx path
From: Ben Greear @ 2010-11-11 18:13 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1289498572.17691.1599.camel@edumazet-laptop>
On 11/11/2010 10:02 AM, Eric Dumazet wrote:
> Le jeudi 11 novembre 2010 à 09:20 -0800, Ben Greear a écrit :
>
>> I've shown you my algorithm that requires one to know the counter
>> width, and in response you offered on that will work with 32 OR 64 bits,
>> as long as you make some assumptions.
>>
>> If you have an algorithm that can properly deal with wrapped counters of
>> arbitrary bits, then post it.
>>
>
> Its only a generalization of RRD algo
>
> No rocket science I am afraid.
>
> You can try all the numbers in `seq 32 64` in this order, to get a
> generic algo.
Ok, so then you have to sample soon enough that a 32-bit counter
can't wrap twice..otherwise you couldn't tell a 32 bit from a
33 bit counter, and you basically gain nothing from having "64-bit"
stats.
And that still assumes at least 32-bit stats..not 16 or whatever.
Thankfully, I doubt there are any drivers using < 32 bit stats.
I'll work on a patch for my idea when I get a chance..we'll see if
anyone likes it.
If you are aware of any drivers that return counters of other than 32 or
64bit widths, please let us know and perhaps we can fix them as well.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply
* Regression 2.6.36 - driver rtl8169 crashes kernel, triggered by user app
From: Michael Monnerie @ 2010-11-11 18:33 UTC (permalink / raw)
To: linux-kernel; +Cc: romieu, netdev
[-- Attachment #1: Type: text/plain, Size: 1495 bytes --]
Dear list, I've hunted down a bug which does *NOT* occur in kernel
2.6.34.7-0.5-desktop from openSUSE 11.3, but crashes stock kernel 2.6.36
triggered by a user!
It's actually very simple. From my desktop (kernel 2.6.36) I "cd" to an
NFS4 share, where a xz compressed image of Win7-64.iso.xz is located.
# cd /q/iso-images
and then I try to uncompress it there (as user, not root!):
# xz -kv Win7-64.iso.xz
With kernel 2.6.34.7-0.5-desktop, this runs with ~41MiB/s without
problems.
With kernel 2.6.36, it runs at ~26MiB/s, and while doing so, dmesg shows
a lot of noise about r8169 complaining:
http://zmi.at/x/kernel2.6.36-crash86.jpg
Here are 2 pictures of different crashes:
http://zmi.at/x/kernel2.6.36-crash84.jpg
http://zmi.at/x/kernel2.6.36-crash85.jpg
Neither the dmesg-messages nor the crash happens with kernel
2.6.34.7-0.5-desktop as delivered by openSUSE 11.3, but it always fully
crashes 2.6.36. I've retried about 10 times, it *never* finished to
uncompress the ~3GB image. At around 500-1000MB the kernel was gone.
I'm sure someone knows how to fix it. :-)
--
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc
it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31
****** Radiointerview zum Thema Spam ******
http://www.it-podcast.at/archiv.html#podcast-100716
// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* Re: [PATCH] neigh: reorder struct neighbour
From: David Miller @ 2010-11-11 18:41 UTC (permalink / raw)
To: eric.dumazet; +Cc: netdev
In-Reply-To: <1289494639.17691.1499.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 11 Nov 2010 17:57:19 +0100
> It is important to move nud_state outside of the often modified cache
> line (because of refcnt), to reduce false sharing in neigh_event_send()
>
> This is a followup of commit 0ed8ddf4045f (neigh: Protect neigh->ha[]
> with a seqlock)
>
> This gives a 7% speedup on routing test with IP route cache disabled.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> David, it appears I forgot to push this patch, I have it in my tree
> since one month. Thanks !
Applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next-2.6] net: get rid of rtable->idev
From: David Miller @ 2010-11-11 18:41 UTC (permalink / raw)
To: eric.dumazet; +Cc: herbert, netdev, rolandd, sean.hefty, hal.rosenstock
In-Reply-To: <1289495647.17691.1536.camel@edumazet-laptop>
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 11 Nov 2010 18:14:07 +0100
> It seems idev field in struct rtable has no special purpose, but adding
> extra atomic ops.
>
> We hold refcounts on the device itself (using percpu data, so pretty
> cheap in current kernel).
>
> infiniband case is solved using dst.dev instead of idev->dev
>
> Removal of this field means routing without route cache is now using
> shared data, percpu data, and only potential contention is a pair of
> atomic ops on struct neighbour per forwarded packet.
>
> About 5% speedup on routing test.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Roland Dreier <rolandd@cisco.com>
> Cc: Sean Hefty <sean.hefty@intel.com>
> Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Yes, let's remove as much unused crap as possible :-)
Applied, thanks!
^ permalink raw reply
* Re: net-next-2.6 [PATCH 0/3] dccp: Ack Vectors in circular buffer instead of array
From: David Miller @ 2010-11-11 18:45 UTC (permalink / raw)
To: gerrit; +Cc: dccp, netdev
In-Reply-To: <1289455653-5463-1-git-send-email-gerrit@erg.abdn.ac.uk>
From: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Date: Thu, 11 Nov 2010 07:07:30 +0100
> Patch #1: cleans up the old interface to prepare for the improved one.
> Patch #2: also tidies up the old interface, by separating the internals
> of Ack Vectors from the option-parsing code.
> Patch #3: Completes the implementation of a circular Ack Vector buffer.
>
>
> I have also placed this in into a fresh (today's) copy of net-next-2.6, on
>
> git://eden-feed.erg.abdn.ac.uk/net-next-2.6 [subtree 'dccp']
>
> The set has been tested for 3 years, and is fully bisectable.
Pulled, thanks!
^ permalink raw reply
* Re: [PATCH] macvlan: lockless tx path
From: Eric Dumazet @ 2010-11-11 18:46 UTC (permalink / raw)
To: Ben Greear; +Cc: netdev
In-Reply-To: <4CDC324B.40206@candelatech.com>
Le jeudi 11 novembre 2010 à 10:13 -0800, Ben Greear a écrit :
> If you are aware of any drivers that return counters of other than 32 or
> 64bit widths, please let us know and perhaps we can fix them as well.
I am pretty sure I found some of them, but cannot recall right now.
Some ethtool stats have definitly not a 32 or 64 bit range
^ permalink raw reply
* Re: [PATCH net-next-2.6] vlan: lockless transmit path
From: Jesse Gross @ 2010-11-11 18:56 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Patrick McHardy
In-Reply-To: <1289497978.17691.1582.camel@edumazet-laptop>
On Thu, Nov 11, 2010 at 9:52 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le jeudi 11 novembre 2010 à 09:40 -0800, Jesse Gross a écrit :
>
>> If we're only allocating a single queue then we should also drop
>> vlan_dev_select_queue() and the netdev_ops that call it. If the
>> underlying device is multiqueue and has its own select_queue function
>> then it can pick a queue number that is larger than what the vlan
>> device has. The problem will be caught by dev_cap_txqueue() but it's
>> not right and it would also be nice to get rid of half of those
>> netdev_ops.
>
> Hmm, you refer to old kernels dont you ?
>
> My patch is for net-next
Well, I was referring to checked-in code.
>
> The plan is that after last Tom Herbert patches, dev_pick_tx() wont
> call do_select_queue() on mono queue device.
>
> http://patchwork.ozlabs.org/patch/70369/
Before Tom's patch, a warning will be generated if a single queue vlan
device is stacked on top of a multiqueue physical device that
implements ndo_select_queue(). After Tom's patch, we avoid the
warning so vlan_dev_select_queue() is merely dead code. Either way,
what's the benefit in keeping it?
>
>
> This logicaly is a second cleanup patch I believe.
I'm not arguing against your patch, I just think it should go a step further.
^ permalink raw reply
* WESTERN UNION FUNDS COMPENSATION!!!
From: Sueli Silva de Oliveira - Membro da Enfermagem @ 2010-11-11 18:26 UTC (permalink / raw)
Please be informed that you have $250,000.00 Lodged in our Western Union to
transfer to you as Compensation. Contact
Email: western.union0660@yahoo.com.hk
^ permalink raw reply
* [PATCH net-26 1/6] cxgb4vf: don't implement trivial (and incorrect) ndo_select_queue()
From: Casey Leedom @ 2010-11-11 19:06 UTC (permalink / raw)
To: netdev; +Cc: davem, Casey Leedom
In-Reply-To: <1289502413-9895-1-git-send-email-leedom@chelsio.com>
Don't implement (struct net_device_ops *)->ndo_select_queue() with simple
call to skb_tx_hash(). This leads to non-persistent TX queue selection in
the Linux dev_pick_tx() routine for TCP connections.
Signed-off-by: Casey Leedom <leedom@chelsio.com>
---
drivers/net/cxgb4vf/cxgb4vf_main.c | 14 --------------
1 files changed, 0 insertions(+), 14 deletions(-)
diff --git a/drivers/net/cxgb4vf/cxgb4vf_main.c b/drivers/net/cxgb4vf/cxgb4vf_main.c
index 6de5e2e..24808ac 100644
--- a/drivers/net/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/cxgb4vf/cxgb4vf_main.c
@@ -1103,18 +1103,6 @@ static int cxgb4vf_set_mac_addr(struct net_device *dev, void *_addr)
return 0;
}
-/*
- * Return a TX Queue on which to send the specified skb.
- */
-static u16 cxgb4vf_select_queue(struct net_device *dev, struct sk_buff *skb)
-{
- /*
- * XXX For now just use the default hash but we probably want to
- * XXX look at other possibilities ...
- */
- return skb_tx_hash(dev, skb);
-}
-
#ifdef CONFIG_NET_POLL_CONTROLLER
/*
* Poll all of our receive queues. This is called outside of normal interrupt
@@ -2417,7 +2405,6 @@ static const struct net_device_ops cxgb4vf_netdev_ops = {
.ndo_get_stats = cxgb4vf_get_stats,
.ndo_set_rx_mode = cxgb4vf_set_rxmode,
.ndo_set_mac_address = cxgb4vf_set_mac_addr,
- .ndo_select_queue = cxgb4vf_select_queue,
.ndo_validate_addr = eth_validate_addr,
.ndo_do_ioctl = cxgb4vf_do_ioctl,
.ndo_change_mtu = cxgb4vf_change_mtu,
@@ -2624,7 +2611,6 @@ static int __devinit cxgb4vf_pci_probe(struct pci_dev *pdev,
netdev->do_ioctl = cxgb4vf_do_ioctl;
netdev->change_mtu = cxgb4vf_change_mtu;
netdev->set_mac_address = cxgb4vf_set_mac_addr;
- netdev->select_queue = cxgb4vf_select_queue;
#ifdef CONFIG_NET_POLL_CONTROLLER
netdev->poll_controller = cxgb4vf_poll_controller;
#endif
--
1.7.0.4
^ permalink raw reply related
* [PATCH net-26 0/6] cxgb4vf: a number of bug fixes
From: Casey Leedom @ 2010-11-11 19:06 UTC (permalink / raw)
To: netdev; +Cc: davem
The following patch set includes a number of bug fixes for the cxgb4vf
network driver. As always, please toss these in the bin if they're not
right.
drivers/net/cxgb4vf/cxgb4vf_main.c | 42 ++++++++-----
drivers/net/cxgb4vf/sge.c | 122 ++++++++++++++++++++++--------------
drivers/net/cxgb4vf/t4vf_common.h | 1 +
drivers/net/cxgb4vf/t4vf_hw.c | 19 ++++++
4 files changed, 122 insertions(+), 62 deletions(-)
^ permalink raw reply
* [PATCH net-26 2/6] cxgb4vf: fix bug in Generic Receive Offload
From: Casey Leedom @ 2010-11-11 19:06 UTC (permalink / raw)
To: netdev; +Cc: davem, Casey Leedom
In-Reply-To: <1289502413-9895-1-git-send-email-leedom@chelsio.com>
Fix botch in Generic Receive Offload (the Packet Gather List Total length
field wasn't being initialized).
Signed-off-by: Casey Leedom <leedom@chelsio.com>
---
drivers/net/cxgb4vf/sge.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/drivers/net/cxgb4vf/sge.c b/drivers/net/cxgb4vf/sge.c
index f10864d..6a6e18b 100644
--- a/drivers/net/cxgb4vf/sge.c
+++ b/drivers/net/cxgb4vf/sge.c
@@ -1679,6 +1679,7 @@ int process_responses(struct sge_rspq *rspq, int budget)
}
len = RSPD_LEN(len);
}
+ gl.tot_len = len;
/*
* Gather packet fragments.
--
1.7.0.4
^ permalink raw reply related
* [PATCH net-26 3/6] cxgb4vf: fix some errors in Gather List to skb conversion
From: Casey Leedom @ 2010-11-11 19:06 UTC (permalink / raw)
To: netdev; +Cc: davem, Casey Leedom
In-Reply-To: <1289502413-9895-1-git-send-email-leedom@chelsio.com>
There were some errors in the way that internal Gather Lists were being
translated into skb's. This also makes the VF Driver look more like the PF
Driver to facilitate easier comarison.
Signed-off-by: Casey Leedom <leedom@chelsio.com>
---
drivers/net/cxgb4vf/sge.c | 121 +++++++++++++++++++++++++++-----------------
1 files changed, 74 insertions(+), 47 deletions(-)
diff --git a/drivers/net/cxgb4vf/sge.c b/drivers/net/cxgb4vf/sge.c
index 6a6e18b..ecf0770 100644
--- a/drivers/net/cxgb4vf/sge.c
+++ b/drivers/net/cxgb4vf/sge.c
@@ -154,13 +154,14 @@ enum {
*/
RX_COPY_THRES = 256,
RX_PULL_LEN = 128,
-};
-/*
- * Can't define this in the above enum because PKTSHIFT isn't a constant in
- * the VF Driver ...
- */
-#define RX_PKT_PULL_LEN (RX_PULL_LEN + PKTSHIFT)
+ /*
+ * Main body length for sk_buffs used for RX Ethernet packets with
+ * fragments. Should be >= RX_PULL_LEN but possibly bigger to give
+ * pskb_may_pull() some room.
+ */
+ RX_SKB_LEN = 512,
+};
/*
* Software state per TX descriptor.
@@ -1355,6 +1356,67 @@ out_free:
}
/**
+ * t4vf_pktgl_to_skb - build an sk_buff from a packet gather list
+ * @gl: the gather list
+ * @skb_len: size of sk_buff main body if it carries fragments
+ * @pull_len: amount of data to move to the sk_buff's main body
+ *
+ * Builds an sk_buff from the given packet gather list. Returns the
+ * sk_buff or %NULL if sk_buff allocation failed.
+ */
+struct sk_buff *t4vf_pktgl_to_skb(const struct pkt_gl *gl,
+ unsigned int skb_len, unsigned int pull_len)
+{
+ struct sk_buff *skb;
+ struct skb_shared_info *ssi;
+
+ /*
+ * If the ingress packet is small enough, allocate an skb large enough
+ * for all of the data and copy it inline. Otherwise, allocate an skb
+ * with enough room to pull in the header and reference the rest of
+ * the data via the skb fragment list.
+ *
+ * Below we rely on RX_COPY_THRES being less than the smallest Rx
+ * buff! size, which is expected since buffers are at least
+ * PAGE_SIZEd. In this case packets up to RX_COPY_THRES have only one
+ * fragment.
+ */
+ if (gl->tot_len <= RX_COPY_THRES) {
+ /* small packets have only one fragment */
+ skb = alloc_skb(gl->tot_len, GFP_ATOMIC);
+ if (unlikely(!skb))
+ goto out;
+ __skb_put(skb, gl->tot_len);
+ skb_copy_to_linear_data(skb, gl->va, gl->tot_len);
+ } else {
+ skb = alloc_skb(skb_len, GFP_ATOMIC);
+ if (unlikely(!skb))
+ goto out;
+ __skb_put(skb, pull_len);
+ skb_copy_to_linear_data(skb, gl->va, pull_len);
+
+ ssi = skb_shinfo(skb);
+ ssi->frags[0].page = gl->frags[0].page;
+ ssi->frags[0].page_offset = gl->frags[0].page_offset + pull_len;
+ ssi->frags[0].size = gl->frags[0].size - pull_len;
+ if (gl->nfrags > 1)
+ memcpy(&ssi->frags[1], &gl->frags[1],
+ (gl->nfrags-1) * sizeof(skb_frag_t));
+ ssi->nr_frags = gl->nfrags;
+
+ skb->len = gl->tot_len;
+ skb->data_len = skb->len - pull_len;
+ skb->truesize += skb->data_len;
+
+ /* Get a reference for the last page, we don't own it */
+ get_page(gl->frags[gl->nfrags - 1].page);
+ }
+
+out:
+ return skb;
+}
+
+/**
* t4vf_pktgl_free - free a packet gather list
* @gl: the gather list
*
@@ -1463,10 +1525,8 @@ int t4vf_ethrx_handler(struct sge_rspq *rspq, const __be64 *rsp,
{
struct sk_buff *skb;
struct port_info *pi;
- struct skb_shared_info *ssi;
const struct cpl_rx_pkt *pkt = (void *)&rsp[1];
bool csum_ok = pkt->csum_calc && !pkt->err_vec;
- unsigned int len = be16_to_cpu(pkt->len);
struct sge_eth_rxq *rxq = container_of(rspq, struct sge_eth_rxq, rspq);
/*
@@ -1481,42 +1541,14 @@ int t4vf_ethrx_handler(struct sge_rspq *rspq, const __be64 *rsp,
}
/*
- * If the ingress packet is small enough, allocate an skb large enough
- * for all of the data and copy it inline. Otherwise, allocate an skb
- * with enough room to pull in the header and reference the rest of
- * the data via the skb fragment list.
+ * Convert the Packet Gather List into an skb.
*/
- if (len <= RX_COPY_THRES) {
- /* small packets have only one fragment */
- skb = alloc_skb(gl->frags[0].size, GFP_ATOMIC);
- if (!skb)
- goto nomem;
- __skb_put(skb, gl->frags[0].size);
- skb_copy_to_linear_data(skb, gl->va, gl->frags[0].size);
- } else {
- skb = alloc_skb(RX_PKT_PULL_LEN, GFP_ATOMIC);
- if (!skb)
- goto nomem;
- __skb_put(skb, RX_PKT_PULL_LEN);
- skb_copy_to_linear_data(skb, gl->va, RX_PKT_PULL_LEN);
-
- ssi = skb_shinfo(skb);
- ssi->frags[0].page = gl->frags[0].page;
- ssi->frags[0].page_offset = (gl->frags[0].page_offset +
- RX_PKT_PULL_LEN);
- ssi->frags[0].size = gl->frags[0].size - RX_PKT_PULL_LEN;
- if (gl->nfrags > 1)
- memcpy(&ssi->frags[1], &gl->frags[1],
- (gl->nfrags-1) * sizeof(skb_frag_t));
- ssi->nr_frags = gl->nfrags;
- skb->len = len + PKTSHIFT;
- skb->data_len = skb->len - RX_PKT_PULL_LEN;
- skb->truesize += skb->data_len;
-
- /* Get a reference for the last page, we don't own it */
- get_page(gl->frags[gl->nfrags - 1].page);
+ skb = t4vf_pktgl_to_skb(gl, RX_SKB_LEN, RX_PULL_LEN);
+ if (unlikely(!skb)) {
+ t4vf_pktgl_free(gl);
+ rxq->stats.rx_drops++;
+ return 0;
}
-
__skb_pull(skb, PKTSHIFT);
skb->protocol = eth_type_trans(skb, rspq->netdev);
skb_record_rx_queue(skb, rspq->idx);
@@ -1549,11 +1581,6 @@ int t4vf_ethrx_handler(struct sge_rspq *rspq, const __be64 *rsp,
netif_receive_skb(skb);
return 0;
-
-nomem:
- t4vf_pktgl_free(gl);
- rxq->stats.rx_drops++;
- return 0;
}
/**
--
1.7.0.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox