Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 2/3] net: ipv6: also allow token to be set when device not ready
From: Daniel Borkmann @ 2013-04-09 13:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, hannes
In-Reply-To: <1365515236-7154-1-git-send-email-dborkman@redhat.com>

When we set the iftoken in inet6_set_iftoken(), we return -EINVAL
when the device does not have flag IF_READY. This is however not
necessary and rather an artificial usability barrier, since we
simply can set the token despite that, and in case the device is
ready, we just send out our rs, otherwise ifup et al. will do
this for us anyway.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 net/ipv6/addrconf.c |   22 ++++++++++++++++------
 1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 645bf31..713ebe3 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4296,9 +4296,9 @@ static int inet6_fill_link_af(struct sk_buff *skb, const struct net_device *dev)
 
 static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token)
 {
-	struct in6_addr ll_addr;
 	struct inet6_ifaddr *ifp;
 	struct net_device *dev = idev->dev;
+	bool update_rs = false;
 
 	if (token == NULL)
 		return -EINVAL;
@@ -4306,8 +4306,6 @@ static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token)
 		return -EINVAL;
 	if (dev->flags & (IFF_LOOPBACK | IFF_NOARP))
 		return -EINVAL;
-	if (idev->dead || !(idev->if_flags & IF_READY))
-		return -EINVAL;
 	if (!ipv6_accept_ra(idev))
 		return -EINVAL;
 	if (idev->cnf.rtr_solicits <= 0)
@@ -4320,11 +4318,23 @@ static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token)
 
 	write_unlock_bh(&idev->lock);
 
-	ipv6_get_lladdr(dev, &ll_addr, IFA_F_TENTATIVE | IFA_F_OPTIMISTIC);
-	ndisc_send_rs(dev, &ll_addr, &in6addr_linklocal_allrouters);
+	if (!idev->dead && (idev->if_flags & IF_READY)) {
+		struct in6_addr ll_addr;
+
+		ipv6_get_lladdr(dev, &ll_addr, IFA_F_TENTATIVE |
+				IFA_F_OPTIMISTIC);
+
+		/* If we're not ready, then normal ifup will take care
+		 * of this. Otherwise, we need to request our rs here.
+		 */
+		ndisc_send_rs(dev, &ll_addr, &in6addr_linklocal_allrouters);
+		update_rs = true;
+	}
 
 	write_lock_bh(&idev->lock);
-	idev->if_flags |= IF_RS_SENT;
+
+	if (update_rs)
+		idev->if_flags |= IF_RS_SENT;
 
 	/* Well, that's kinda nasty ... */
 	list_for_each_entry(ifp, &idev->addr_list, if_list) {
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 3/3] net: ipv6: only invalidate previously tokenized addresses
From: Daniel Borkmann @ 2013-04-09 13:47 UTC (permalink / raw)
  To: davem; +Cc: netdev, hannes
In-Reply-To: <1365515236-7154-1-git-send-email-dborkman@redhat.com>

Instead of invalidating all IPv6 addresses with global scope
when one decides to use IPv6 tokens, we should only invalidate
previous tokens and leave the rest intact until they expire
eventually (or are intact forever). For doing this less greedy
approach, we're adding a bool at the end of inet6_ifaddr structure
instead, for two reasons: i) per-inet6_ifaddr flag space is
already used up, making it wider might not be a good idea,
since ii) also we do not necessarily need to export this
information into user space.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/net/if_inet6.h |    2 ++
 net/ipv6/addrconf.c    |    7 +++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index f1063d6..100fb8c 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -71,6 +71,8 @@ struct inet6_ifaddr {
 	struct inet6_ifaddr	*ifpub;
 	int			regen_count;
 #endif
+	bool			tokenized;
+
 	struct rcu_head		rcu;
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 713ebe3..28b61e8 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -878,6 +878,7 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr, int pfxlen,
 	ifa->prefix_len = pfxlen;
 	ifa->flags = flags | IFA_F_TENTATIVE;
 	ifa->cstamp = ifa->tstamp = jiffies;
+	ifa->tokenized = false;
 
 	ifa->rt = rt;
 
@@ -2134,6 +2135,7 @@ void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, int len, bool sllao)
 		struct inet6_ifaddr *ifp;
 		struct in6_addr addr;
 		int create = 0, update_lft = 0;
+		bool tokenized = false;
 
 		if (pinfo->prefix_len == 64) {
 			memcpy(&addr, &pinfo->prefix, 8);
@@ -2143,6 +2145,7 @@ void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, int len, bool sllao)
 				memcpy(addr.s6_addr + 8,
 				       in6_dev->token.s6_addr + 8, 8);
 				read_unlock_bh(&in6_dev->lock);
+				tokenized = true;
 			} else if (ipv6_generate_eui64(addr.s6_addr + 8, dev) &&
 				   ipv6_inherit_eui64(addr.s6_addr + 8, in6_dev)) {
 				in6_dev_put(in6_dev);
@@ -2185,6 +2188,7 @@ ok:
 
 			update_lft = create = 1;
 			ifp->cstamp = jiffies;
+			ifp->tokenized = tokenized;
 			addrconf_dad_start(ifp);
 		}
 
@@ -4339,8 +4343,7 @@ static int inet6_set_iftoken(struct inet6_dev *idev, struct in6_addr *token)
 	/* Well, that's kinda nasty ... */
 	list_for_each_entry(ifp, &idev->addr_list, if_list) {
 		spin_lock(&ifp->lock);
-		if (ipv6_addr_src_scope(&ifp->addr) ==
-		    IPV6_ADDR_SCOPE_GLOBAL) {
+		if (ifp->tokenized) {
 			ifp->valid_lft = 0;
 			ifp->prefered_lft = 0;
 		}
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH 4/7] xen-netfront: frags -> slots in log message
From: Wei Liu @ 2013-04-09 13:47 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Wei Liu, netdev@vger.kernel.org, xen-devel@lists.xen.org,
	Ian Campbell, David Vrabel, konrad.wilk@oracle.com,
	annie.li@oracle.com, wdauchy@gmail.com
In-Reply-To: <5164190D.70905@cogentembedded.com>

On Tue, Apr 09, 2013 at 02:35:09PM +0100, Sergei Shtylyov wrote:
> Hello.
> 
> On 09-04-2013 15:07, Wei Liu wrote:
> 
> > Also fix a typo in comment.
> 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >   drivers/net/xen-netfront.c |    4 ++--
> >   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index d9097a7..1bb2e20 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> [...]
> > @@ -771,7 +771,7 @@ next:
> >
> >   	if (unlikely(slots > max)) {
> >   		if (net_ratelimit())
> > -			dev_warn(dev, "Too many frags\n");
> > +			dev_warn(dev, "Too many slots\n");
> 
>     Shouldn't you have done this change as a part of patch #2?

Because patch 2 has been applied to David Miller's tree, this is an
incremental patch on top of that.


Wei.

> 
> WBR, Sergei

^ permalink raw reply

* Re: Modifying the exponential backoff on new connection SYN packets
From: Eric Dumazet @ 2013-04-09 13:48 UTC (permalink / raw)
  To: Ed W; +Cc: Linux Networking Developer Mailing List
In-Reply-To: <5163DA09.5070202@wildgooses.com>

On Tue, 2013-04-09 at 10:06 +0100, Ed W wrote:
> Hi, I have an unusual situation in that I would like to cap the 
> retransmit frequency on the initial SYN packets at some fairly short 
> time interval, eg a max of 2-4 seconds, rather than the usual 
> exponentially increasing interval.  I could use some help figuring out 
> the exact point in the kernel to make such a change please?
> 
> The situation is that I am building a firewall which will be used with 
> expensive satellite links (think $10-100/MB range). Some of the links 
> are dialup links which take 20-40 seconds to bring up, and then we have 
> PPP drop the link after 10 seconds of inactivity. However, with the 
> default exponential backoff on new connections we are generally 
> retransmitting with a 16sec or 32 sec interval by the time the dialup 
> link is connected, the timout for inactivity kicks in and drops the link 
> before the retransmit...
> 
> I believe the exponential backoff is intended to prevent amplification 
> attacks? In this particular case we are accounting for traffic per user 
> and the internet costs are extremely substantial, so I think it's not a 
> problem
> 
> Could someone please help figure out the appropriate place to tweak the 
> exponential backoff? Note this is not retransmit of in flight data, just 
> the backoff for the initial syn (which doesn't seem to be configurable 
> in user space?)
> 
> Note, we have an application proxy here, but I can't see a sensible way 
> to fake it in user space without a lot of extra coding - any suggestions?

You'll have to change inet_csk_reqsk_queue_prune() in
net/ipv4/inet_connection_sock.c

timeo = min(timeout << req->num_timeout, max_rto);
req->expires = now + timeo; 

Good luck !

^ permalink raw reply

* Re: [Xen-devel] [PATCH 6/7] xen-netback: coalesce slots and fix regressions
From: Wei Liu @ 2013-04-09 13:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Wei Liu, David Vrabel, Ian Campbell, wdauchy@gmail.com,
	xen-devel@lists.xen.org, annie.li@oracle.com,
	konrad.wilk@oracle.com, netdev@vger.kernel.org
In-Reply-To: <5164301902000078000CBC46@nat28.tlf.novell.com>

On Tue, Apr 09, 2013 at 02:13:29PM +0100, Jan Beulich wrote:
> >>> On 09.04.13 at 14:48, Wei Liu <wei.liu2@citrix.com> wrote:
> > On Tue, Apr 09, 2013 at 01:13:39PM +0100, Jan Beulich wrote:
> >> >>> On 09.04.13 at 13:07, Wei Liu <wei.liu2@citrix.com> wrote:
> > [...]
> >> > +
> >> > +static struct kernel_param_ops max_skb_slots_param_ops = {
> >> 
> >> __moduleparam_const
> > 
> > TBH I don't see any driver makes use of this.
> 
> Sure, because generally you use the simple module_param() or
> module_param_named() macros.
> 

That means other modules using this need to be fixed too. :-)

> > Probably a simple "const" can do?
> 
> The purpose of __moduleparam_const is to abstract away the
> need to not have the const for a very limited set of architectures.
> Even if Xen currently doesn't support any of those, I would still
> not want to see architecture incompatibilities introduced if
> avoidable.
> 

Sure.

> >> > @@ -251,7 +291,7 @@ static int max_required_rx_slots(struct xenvif *vif)
> >> >  	int max = DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
> >> >  
> >> >  	if (vif->can_sg || vif->gso || vif->gso_prefix)
> >> > -		max += MAX_SKB_FRAGS + 1; /* extra_info + frags */
> >> > +		max += XEN_NETIF_NR_SLOTS_MIN + 1; /* extra_info + frags */
> >> >  
> >> >  	return max;
> >> >  }
> >> > @@ -657,7 +697,7 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk)
> >> >  		__skb_queue_tail(&rxq, skb);
> >> >  
> >> >  		/* Filled the batch queue? */
> >> > -		if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
> >> > +		if (count + XEN_NETIF_NR_SLOTS_MIN >= XEN_NETIF_RX_RING_SIZE)
> >> >  			break;
> >> >  	}
> >> >  
> >> 
> >> Are the two changes above really correct? You're having an skb as
> >> input here, and hence you want to use all the frags, and nothing
> >> beyond. Another question is whether the frontend can handle
> >> those, but that aspect isn't affected by the code being modified
> >> here.
> >> 
> > 
> > This patch tries to remove dependency on MAX_SKB_FRAGS. Writing the
> > protocol-defined value here is OK IMHO.
> 
> I understand the intentions of the patch, but you shouldn't go
> further with this than you need to. Just think through carefully
> the cases of MAX_SKB_FRAGS being smaller/bigger than
> XEN_NETIF_NR_SLOTS_MIN: In the first instance, you needlessly
> return too big a value when the latter is the bigger one, and in
> the second instance you bail from the loop early in the same case.
> 
> What's worse, in the opposite case I'm having the impression that
> you would continue the loop when you shouldn't (because there's
> not enough room left), and I'd suspect problems for the caller of
> max_required_rx_slots() in that case too.
> 

The frontend and backend work at the moment is because MAX_SKB_FRAGS only
went down once. If it goes like 18 -> 17 -> 19 then we are screwed...

For the MAX_SKB_FRAGS < XEN_NETIF_NR_SLOTS_MIN case it is fine, we are
just reserving more room in the ring.

For the MAX_SKB_FRAGS > XEN_NETIF_NR_SLOTS_MIN case, my thought is that
is not likely to happen in the near future, we could possibly upstream
mechinasim to negotiate number of slots before MAX_SKB_FRAGS >
XEN_NETIF_NR_SLOTS_MIN ever happens.

But yes, let's leave RX path along at the moment, need to investigate
more on this.


Wei.

^ permalink raw reply

* Re: [PATCH 1/3] if.h: add IFF_BRIDGE_RESTRICTED flag
From: Antonio Quartulli @ 2013-04-09 13:51 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Stephen Hemminger, David S. Miller,
	bridge@lists.linux-foundation.org, netdev@vger.kernel.org
In-Reply-To: <51641049.3030100@mojatatu.com>

[-- Attachment #1: Type: text/plain, Size: 1997 bytes --]

On Tue, Apr 09, 2013 at 05:57:45 -0700, Jamal Hadi Salim wrote:
> Hi,
> 
> Consider using tc for this.
> You can tag the packet using skb mark on the receiving end point,
> match them on the bridge and execute actions not to forward them.


Does this work at the bridge level? A packet entering a port and going out from
another one can be affected by tc/mark?


> 
> cheers,
> jamal
> 
> On 13-04-09 03:56 AM, Antonio Quartulli wrote:
> > On Mon, Apr 08, 2013 at 11:58:48 -0700, Stephen Hemminger wrote:
> >> The standard way to do this is to use netfilter. Considering the
> >> additional device flags and skb flag changes, I am not sure that your
> >> method is better.
> >
> > To make it a bit more clear:
> >
> > 1) the skb flag will be used on the "receiving end-point" by batman-adv to mark
> > received packets and so instruct the bridge to do not forward them to restricted
> > interfaces.
> >
> > 2) the IFF_ flag is used by batman-adv on the "sending side" to determine
> > whether a packet has been originated by a restricted interface and so instruct
> > the remote endpoint to mark the skb when received.
> >
> > 3) to make the bridge code general enough, I decided to let it mark packets
> > coming from restricted interfaces as well so that it can also apply the policy
> > at 1) locally, without any further setting. The logic described in 1) is
> > therefore applied by the bridge even for local packets (not passing through
> > batman-adv)
> >
> >
> >
> > Point 3) is the only one where netfilter might help. But using two mechanism to
> > achieve one goal looked not sane to me and therefore I decided to to do it this
> > way. And actually the code allowing point 3 is only:
> >
> > +       skb->bridge_restricted = !!(skb->dev->flags & IFF_BRIDGE_RESTRICTED);
> >
> >
> > I hope this summary did not create further confusion :)
> >
> 

-- 
Antonio Quartulli

..each of us alone is worth nothing..
Ernesto "Che" Guevara

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH 1/3 net-next RFC] selftest: add abstractions for net selftests
From: Daniel Borkmann @ 2013-04-09 13:54 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Alexandru Copot, netdev, davem, willemb, edumazet, Daniel Baluta
In-Reply-To: <51641A09.7040108@cogentembedded.com>

On 04/09/2013 03:39 PM, Sergei Shtylyov wrote:
> On 09-04-2013 14:30, Alexandru Copot wrote:
>> Signed-of by Alexandru Copot <alex.mihai.c@gmail.com>
>> Cc: Daniel Baluta <dbaluta@ixiacom.com>
> [...]
>
>> diff --git a/tools/testing/selftests/net/selftests.c b/tools/testing/selftests/net/selftests.c
>> new file mode 100644
>> index 0000000..cd6e427
>> --- /dev/null
>> +++ b/tools/testing/selftests/net/selftests.c
>> @@ -0,0 +1,30 @@
[...]
>> +    for (i = 0; i < test->testcase_count; i++) {
>> +        rc = test->run(ptr);
>> +        allrc |= rc;
>> +
>> +        if (test->abort_on_fail && rc) {
>> +            printf("Testcase %d failed, aborting\n", i);
>> +        }
>
>     Nit: {} not needed here, at least if you folow the Linux coding style (you seem to).

We already figured out earlier in this thread that a ``break'' was missing. ;-)

^ permalink raw reply

* Re: [PATCH] tcp: assign the sock correctly to an outgoing SYNACK packet
From: Eric Dumazet @ 2013-04-09 14:00 UTC (permalink / raw)
  To: Paul Moore
  Cc: Casey Schaufler, David Miller, netdev, mvadkert, selinux,
	linux-security-module
In-Reply-To: <28452040.xEi3pLPik0@sifl>

On Tue, 2013-04-09 at 09:19 -0400, Paul Moore wrote:

> As Casey already mentioned, if this isn't acceptable please help me understand 
> why.
> 

You see something which is not the reality. If you do such analysis,
better do it properly, because any change you are going to submit will
be doubly checked by people who really care.

sizeof(sk_buff) is not 280. (aligned to 320 because of cache line being
64)

Tell me what you have when doing :

ls -l /sys/kernel/slab/skbuff_head_cache

Do you really see 

$ ls -l /sys/kernel/slab/skbuff_head_cache
lrwxrwxrwx 1 root root 0 Apr  9 06:54 /sys/kernel/slab/skbuff_head_cache -> :t-0000320

Here I get :

$ ls -l /sys/kernel/slab/skbuff_head_cache
lrwxrwxrwx 1 root root 0 Apr  9 06:54 /sys/kernel/slab/skbuff_head_cache -> :t-0000256

because sizeof(sk_buff) <= 256

^ permalink raw reply

* Re: [PATCH] tcp: assign the sock correctly to an outgoing SYNACK packet
From: Ben Hutchings @ 2013-04-09 14:05 UTC (permalink / raw)
  To: Paul Moore
  Cc: Casey Schaufler, Eric Dumazet, David Miller, netdev, mvadkert,
	selinux, linux-security-module
In-Reply-To: <28452040.xEi3pLPik0@sifl>

On Tue, 2013-04-09 at 09:19 -0400, Paul Moore wrote:
> On Monday, April 08, 2013 06:24:59 PM Casey Schaufler wrote:
> > On 4/8/2013 6:09 PM, Eric Dumazet wrote:
> > > On Mon, 2013-04-08 at 17:59 -0700, Casey Schaufler wrote:
> > >> I don't see that with adding 4 bytes. Again, I'm willing to be
> > >> educated if I'm wrong.
> > > 
> > > Feel free to add 4 bytes without having the 'align to 8 bytes' problem
> > > on 64 bit arches. Show us your patch.
> > 
> > Recall that it's replacing an existing 4 byte value with an 8 byte value.
> > My compiler days were quite short and long ago, but it would seem that
> > an 8 byte value ought not have an 'align to 8 bytes' problem.
> > 
> > Again, I'm willing to be educated.
> 
> Armed with a cup of coffee I took a look at the sk_buff structure this morning 
> with the pahole tool and using the current sk_buff if we turn on all the 
> #ifdefs here is what I see on x86_64:
> 
> struct sk_buff {
[...]
>         sk_buff_data_t             inner_transport_header; /*   200     8 */
>         sk_buff_data_t             inner_network_header; /*   208     8 */
>         sk_buff_data_t             transport_header;     /*   216     8 */
>         sk_buff_data_t             network_header;       /*   224     8 */
>         sk_buff_data_t             mac_header;           /*   232     8 */
>         sk_buff_data_t             tail;                 /*   240     8 */
>         sk_buff_data_t             end;                  /*   248     8 */
[...]

This is wrong; sk_buff_data_t is always 32-bit:

#if BITS_PER_LONG > 32
#define NET_SKBUFF_DATA_USES_OFFSET 1
#endif

#ifdef NET_SKBUFF_DATA_USES_OFFSET
typedef unsigned int sk_buff_data_t;
#else
typedef unsigned char *sk_buff_data_t;
#endif

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH] tcp: assign the sock correctly to an outgoing SYNACK packet
From: Paul Moore @ 2013-04-09 14:10 UTC (permalink / raw)
  To: Ben Hutchings, Casey Schaufler, Eric Dumazet
  Cc: David Miller, netdev, mvadkert, selinux, linux-security-module
In-Reply-To: <1365516342.2623.1.camel@bwh-desktop.uk.solarflarecom.com>

On Tuesday, April 09, 2013 03:05:42 PM Ben Hutchings wrote:
> On Tue, 2013-04-09 at 09:19 -0400, Paul Moore wrote:
> > On Monday, April 08, 2013 06:24:59 PM Casey Schaufler wrote:
> > > On 4/8/2013 6:09 PM, Eric Dumazet wrote:
> > > > On Mon, 2013-04-08 at 17:59 -0700, Casey Schaufler wrote:
> > > >> I don't see that with adding 4 bytes. Again, I'm willing to be
> > > >> educated if I'm wrong.
> > > > 
> > > > Feel free to add 4 bytes without having the 'align to 8 bytes' problem
> > > > on 64 bit arches. Show us your patch.
> > > 
> > > Recall that it's replacing an existing 4 byte value with an 8 byte
> > > value.
> > > My compiler days were quite short and long ago, but it would seem that
> > > an 8 byte value ought not have an 'align to 8 bytes' problem.
> > > 
> > > Again, I'm willing to be educated.
> > 
> > Armed with a cup of coffee I took a look at the sk_buff structure this
> > morning with the pahole tool and using the current sk_buff if we turn on
> > all the #ifdefs here is what I see on x86_64:
> > 
> > struct sk_buff {
> 
> [...]
> 
> >         sk_buff_data_t             inner_transport_header; /*   200     8
> >         */
> >         sk_buff_data_t             inner_network_header; /*   208     8 */
> >         sk_buff_data_t             transport_header;     /*   216     8 */
> >         sk_buff_data_t             network_header;       /*   224     8 */
> >         sk_buff_data_t             mac_header;           /*   232     8 */
> >         sk_buff_data_t             tail;                 /*   240     8 */
> >         sk_buff_data_t             end;                  /*   248     8 */
> 
> [...]
> 
> This is wrong; sk_buff_data_t is always 32-bit:

Yep.  My mistake.

While looking at this a bit more after my original email I noticed the same 
thing.  Ultimately it doesn't change the size or cachelines as the 
sk_buff_data_t structures are at the bottom of sk_buff but here are the 
correct breakdowns:

ORIG:
struct sk_buff {
        struct sk_buff *           next;                 /*     0     8 */
        struct sk_buff *           prev;                 /*     8     8 */
        ktime_t                    tstamp;               /*    16     8 */
        struct sock *              sk;                   /*    24     8 */
        struct net_device *        dev;                  /*    32     8 */
        char                       cb[48];               /*    40    48 */
        /* --- cacheline 1 boundary (64 bytes) was 24 bytes ago --- */
        long unsigned int          _skb_refdst;          /*    88     8 */
        struct sec_path *          sp;                   /*    96     8 */
        unsigned int               len;                  /*   104     4 */
        unsigned int               data_len;             /*   108     4 */
        __u16                      mac_len;              /*   112     2 */
        __u16                      hdr_len;              /*   114     2 */
        union {
                __wsum             csum;                 /*           4 */
                struct {
                        __u16      csum_start;           /*   116     2 */
                        __u16      csum_offset;          /*   118     2 */
                };                                       /*           4 */
        };                                               /*   116     4 */
        __u32                      priority;             /*   120     4 */
        int                        flags1_begin[0];      /*   124     0 */
        __u8                       local_df:1;           /*   124: 7  1 */
        __u8                       cloned:1;             /*   124: 6  1 */
        __u8                       ip_summed:2;          /*   124: 4  1 */
        __u8                       nohdr:1;              /*   124: 3  1 */
        __u8                       nfctinfo:3;           /*   124: 0  1 */
        __u8                       pkt_type:3;           /*   125: 5  1 */
        __u8                       fclone:2;             /*   125: 3  1 */
        __u8                       ipvs_property:1;      /*   125: 2  1 */
        __u8                       peeked:1;             /*   125: 1  1 */
        __u8                       nf_trace:1;           /*   125: 0  1 */

        /* XXX 2 bytes hole, try to pack */

        /* --- cacheline 2 boundary (128 bytes) --- */
        int                        flags1_end[0];        /*   128     0 */
        __be16                     protocol;             /*   128     2 */

        /* XXX 6 bytes hole, try to pack */

        void                       (*destructor)(struct sk_buff *); /*   136     
8 */
        struct nf_conntrack *      nfct;                 /*   144     8 */
        struct sk_buff *           nfct_reasm;           /*   152     8 */
        struct nf_bridge_info *    nf_bridge;            /*   160     8 */
        int                        skb_iif;              /*   168     4 */
        __u32                      rxhash;               /*   172     4 */
        __u16                      vlan_tci;             /*   176     2 */
        __u16                      tc_index;             /*   178     2 */
        __u16                      tc_verd;              /*   180     2 */
        __u16                      queue_mapping;        /*   182     2 */
        int                        flags2_begin[0];      /*   184     0 */
        __u8                       ndisc_nodetype:2;     /*   184: 6  1 */
        __u8                       pfmemalloc:1;         /*   184: 5  1 */
        __u8                       ooo_okay:1;           /*   184: 4  1 */
        __u8                       l4_rxhash:1;          /*   184: 3  1 */
        __u8                       wifi_acked_valid:1;   /*   184: 2  1 */
        __u8                       wifi_acked:1;         /*   184: 1  1 */
        __u8                       no_fcs:1;             /*   184: 0  1 */
        __u8                       head_frag:1;          /*   185: 7  1 */
        __u8                       encapsulation:1;      /*   185: 6  1 */

        /* XXX 6 bits hole, try to pack */
        /* XXX 2 bytes hole, try to pack */

        int                        flags2_end[0];        /*   188     0 */
        dma_cookie_t               dma_cookie;           /*   188     4 */
        /* --- cacheline 3 boundary (192 bytes) --- */
        __u32                      secmark;              /*   192     4 */
        union {
                __u32              mark;                 /*           4 */
                __u32              dropcount;            /*           4 */
                __u32              reserved_tailroom;    /*           4 */
        };                                               /*   196     4 */
        sk_buff_data_t             inner_transport_header; /*   200     4 */
        sk_buff_data_t             inner_network_header; /*   204     4 */
        sk_buff_data_t             transport_header;     /*   208     4 */
        sk_buff_data_t             network_header;       /*   212     4 */
        sk_buff_data_t             mac_header;           /*   216     4 */
        sk_buff_data_t             tail;                 /*   220     4 */
        sk_buff_data_t             end;                  /*   224     4 */

        /* XXX 4 bytes hole, try to pack */

        unsigned char *            head;                 /*   232     8 */
        unsigned char *            data;                 /*   240     8 */
        unsigned int               truesize;             /*   248     4 */
        atomic_t                   users;                /*   252     4 */
        /* --- cacheline 4 boundary (256 bytes) --- */

        /* size: 256, cachelines: 4, members: 62 */
        /* sum members: 242, holes: 4, sum holes: 14 */
        /* bit holes: 1, sum bit holes: 6 bits */
};

W/BLOB:
struct sk_buff_test {
        struct sk_buff *           next;                 /*     0     8 */
        struct sk_buff *           prev;                 /*     8     8 */
        ktime_t                    tstamp;               /*    16     8 */
        struct sock *              sk;                   /*    24     8 */
        struct net_device *        dev;                  /*    32     8 */
        char                       cb[48];               /*    40    48 */
        /* --- cacheline 1 boundary (64 bytes) was 24 bytes ago --- */
        long unsigned int          _skb_refdst;          /*    88     8 */
        struct sec_path *          sp;                   /*    96     8 */
        unsigned int               len;                  /*   104     4 */
        unsigned int               data_len;             /*   108     4 */
        __u16                      mac_len;              /*   112     2 */
        __u16                      hdr_len;              /*   114     2 */
        union {
                __wsum             csum;                 /*           4 */
                struct {
                        __u16      csum_start;           /*   116     2 */
                        __u16      csum_offset;          /*   118     2 */
                };                                       /*           4 */
        };                                               /*   116     4 */
        __u32                      priority;             /*   120     4 */
        int                        flags1_begin[0];      /*   124     0 */
        __u8                       local_df:1;           /*   124: 7  1 */
        __u8                       cloned:1;             /*   124: 6  1 */
        __u8                       ip_summed:2;          /*   124: 4  1 */
        __u8                       nohdr:1;              /*   124: 3  1 */
        __u8                       nfctinfo:3;           /*   124: 0  1 */
        __u8                       pkt_type:3;           /*   125: 5  1 */
        __u8                       fclone:2;             /*   125: 3  1 */
        __u8                       ipvs_property:1;      /*   125: 2  1 */
        __u8                       peeked:1;             /*   125: 1  1 */
        __u8                       nf_trace:1;           /*   125: 0  1 */

        /* XXX 2 bytes hole, try to pack */

        /* --- cacheline 2 boundary (128 bytes) --- */
        int                        flags1_end[0];        /*   128     0 */
        void *                     security;             /*   128     8 */
        void                       (*destructor)(struct sk_buff *); /*   136     
8 */
        struct nf_conntrack *      nfct;                 /*   144     8 */
        struct sk_buff *           nfct_reasm;           /*   152     8 */
        struct nf_bridge_info *    nf_bridge;            /*   160     8 */
        int                        skb_iif;              /*   168     4 */
        __u32                      rxhash;               /*   172     4 */
        __u16                      vlan_tci;             /*   176     2 */
        __u16                      tc_index;             /*   178     2 */
        __u16                      tc_verd;              /*   180     2 */
        __u16                      queue_mapping;        /*   182     2 */
        int                        flags2_begin[0];      /*   184     0 */
        __u8                       ndisc_nodetype:2;     /*   184: 6  1 */
        __u8                       pfmemalloc:1;         /*   184: 5  1 */
        __u8                       ooo_okay:1;           /*   184: 4  1 */
        __u8                       l4_rxhash:1;          /*   184: 3  1 */
        __u8                       wifi_acked_valid:1;   /*   184: 2  1 */
        __u8                       wifi_acked:1;         /*   184: 1  1 */
        __u8                       no_fcs:1;             /*   184: 0  1 */
        __u8                       head_frag:1;          /*   185: 7  1 */
        __u8                       encapsulation:1;      /*   185: 6  1 */

        /* XXX 6 bits hole, try to pack */
        /* XXX 2 bytes hole, try to pack */

        int                        flags2_end[0];        /*   188     0 */
        __be16                     protocol;             /*   188     2 */

        /* XXX 2 bytes hole, try to pack */

        /* --- cacheline 3 boundary (192 bytes) --- */
        dma_cookie_t               dma_cookie;           /*   192     4 */
        union {
                __u32              mark;                 /*           4 */
                __u32              dropcount;            /*           4 */
                __u32              reserved_tailroom;    /*           4 */
        };                                               /*   196     4 */
        sk_buff_data_t             inner_transport_header; /*   200     4 */
        sk_buff_data_t             inner_network_header; /*   204     4 */
        sk_buff_data_t             transport_header;     /*   208     4 */
        sk_buff_data_t             network_header;       /*   212     4 */
        sk_buff_data_t             mac_header;           /*   216     4 */
        sk_buff_data_t             tail;                 /*   220     4 */
        sk_buff_data_t             end;                  /*   224     4 */

        /* XXX 4 bytes hole, try to pack */

        unsigned char *            head;                 /*   232     8 */
        unsigned char *            data;                 /*   240     8 */
        unsigned int               truesize;             /*   248     4 */
        atomic_t                   users;                /*   252     4 */
        /* --- cacheline 4 boundary (256 bytes) --- */

        /* size: 256, cachelines: 4, members: 62 */
        /* sum members: 246, holes: 4, sum holes: 10 */
        /* bit holes: 1, sum bit holes: 6 bits */
};

-- 
paul moore
security and virtualization @ redhat


^ permalink raw reply

* Re: [PATCH] tcp: assign the sock correctly to an outgoing SYNACK packet
From: Paul Moore @ 2013-04-09 14:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Casey Schaufler, David Miller, netdev, mvadkert, selinux,
	linux-security-module
In-Reply-To: <1365516022.3887.131.camel@edumazet-glaptop>

On Tuesday, April 09, 2013 07:00:22 AM Eric Dumazet wrote:
> On Tue, 2013-04-09 at 09:19 -0400, Paul Moore wrote:
> > As Casey already mentioned, if this isn't acceptable please help me
> > understand why.
> 
> You see something which is not the reality. If you do such analysis,
> better do it properly, because any change you are going to submit will
> be doubly checked by people who really care.

I am attempting to do it properly, I simply made a mistake.  Ben also pointed 
it out.  As you wrote yesterday, "Lets go forward".

After fixing the BITS_PER_LONG problem I looked at it again and it appears 
that by simply replacing the "secmark" field with a blob we retain the size of 
the sk_buff as well as the cacheline positions of all the fields, e.g. 
dma_cookie no longer moves cachelines.  Thoughts?

struct sk_buff_test {
        struct sk_buff *           next;                 /*     0     8 */
        struct sk_buff *           prev;                 /*     8     8 */
        ktime_t                    tstamp;               /*    16     8 */
        struct sock *              sk;                   /*    24     8 */
        struct net_device *        dev;                  /*    32     8 */
        char                       cb[48];               /*    40    48 */
        /* --- cacheline 1 boundary (64 bytes) was 24 bytes ago --- */
        long unsigned int          _skb_refdst;          /*    88     8 */
        struct sec_path *          sp;                   /*    96     8 */
        unsigned int               len;                  /*   104     4 */
        unsigned int               data_len;             /*   108     4 */
        __u16                      mac_len;              /*   112     2 */
        __u16                      hdr_len;              /*   114     2 */
        union {
                __wsum             csum;                 /*           4 */
                struct {
                        __u16      csum_start;           /*   116     2 */
                        __u16      csum_offset;          /*   118     2 */
                };                                       /*           4 */
        };                                               /*   116     4 */
        __u32                      priority;             /*   120     4 */
        int                        flags1_begin[0];      /*   124     0 */
        __u8                       local_df:1;           /*   124: 7  1 */
        __u8                       cloned:1;             /*   124: 6  1 */
        __u8                       ip_summed:2;          /*   124: 4  1 */
        __u8                       nohdr:1;              /*   124: 3  1 */
        __u8                       nfctinfo:3;           /*   124: 0  1 */
        __u8                       pkt_type:3;           /*   125: 5  1 */
        __u8                       fclone:2;             /*   125: 3  1 */
        __u8                       ipvs_property:1;      /*   125: 2  1 */
        __u8                       peeked:1;             /*   125: 1  1 */
        __u8                       nf_trace:1;           /*   125: 0  1 */

        /* XXX 2 bytes hole, try to pack */

        /* --- cacheline 2 boundary (128 bytes) --- */
        int                        flags1_end[0];        /*   128     0 */
        __be16                     protocol;             /*   128     2 */

        /* XXX 6 bytes hole, try to pack */

        void                       (*destructor)(struct sk_buff *); /*   136     
8 */
        struct nf_conntrack *      nfct;                 /*   144     8 */
        struct sk_buff *           nfct_reasm;           /*   152     8 */
        struct nf_bridge_info *    nf_bridge;            /*   160     8 */
        int                        skb_iif;              /*   168     4 */
        __u32                      rxhash;               /*   172     4 */
        __u16                      vlan_tci;             /*   176     2 */
        __u16                      tc_index;             /*   178     2 */
        __u16                      tc_verd;              /*   180     2 */
        __u16                      queue_mapping;        /*   182     2 */
        int                        flags2_begin[0];      /*   184     0 */
        __u8                       ndisc_nodetype:2;     /*   184: 6  1 */
        __u8                       pfmemalloc:1;         /*   184: 5  1 */
        __u8                       ooo_okay:1;           /*   184: 4  1 */
        __u8                       l4_rxhash:1;          /*   184: 3  1 */
        __u8                       wifi_acked_valid:1;   /*   184: 2  1 */
        __u8                       wifi_acked:1;         /*   184: 1  1 */
        __u8                       no_fcs:1;             /*   184: 0  1 */
        __u8                       head_frag:1;          /*   185: 7  1 */
        __u8                       encapsulation:1;      /*   185: 6  1 */

        /* XXX 6 bits hole, try to pack */
        /* XXX 2 bytes hole, try to pack */

        int                        flags2_end[0];        /*   188     0 */
        dma_cookie_t               dma_cookie;           /*   188     4 */
        /* --- cacheline 3 boundary (192 bytes) --- */
        void *                     security;             /*   192     8 */
        union {
                __u32              mark;                 /*           4 */
                __u32              dropcount;            /*           4 */
                __u32              reserved_tailroom;    /*           4 */
        };                                               /*   200     4 */
        sk_buff_data_t             inner_transport_header; /*   204     4 */
        sk_buff_data_t             inner_network_header; /*   208     4 */
        sk_buff_data_t             transport_header;     /*   212     4 */
        sk_buff_data_t             network_header;       /*   216     4 */
        sk_buff_data_t             mac_header;           /*   220     4 */
        sk_buff_data_t             tail;                 /*   224     4 */
        sk_buff_data_t             end;                  /*   228     4 */
        unsigned char *            head;                 /*   232     8 */
        unsigned char *            data;                 /*   240     8 */
        unsigned int               truesize;             /*   248     4 */
        atomic_t                   users;                /*   252     4 */
        /* --- cacheline 4 boundary (256 bytes) --- */

        /* size: 256, cachelines: 4, members: 62 */
        /* sum members: 246, holes: 3, sum holes: 10 */
        /* bit holes: 1, sum bit holes: 6 bits */
};

-- 
paul moore
security and virtualization @ redhat


^ permalink raw reply

* Re: be2net: GRO for non-inet protocols
From: Erik Hugne @ 2013-04-09 14:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: sathya.perla, subbu.seetharaman, ajit.khaparde, netdev
In-Reply-To: <1365515075.3887.122.camel@edumazet-glaptop>

On Tue, Apr 09, 2013 at 06:44:35AM -0700, Eric Dumazet wrote:
> I suggested you try GRE tunnels on emulex NIC, to make sure the problem
> is not coming from your changes. Nevermind ...
>
I understood that, but since i could not load anything newer than 3.0.61 on them,
GRO support for GRE tunnels was not included.. but anyway...
I got hold of a new set of machines that i could load 3.8 and perform the GRE 
test on, these also use Emulex NIC's.
What i found was that all packets received on the GRE device are MTU-sized, 
and are not GRO'd together at all. The same thing happens in a qemu environment, 
so maybe there's another issue with GRO for GRE..

//E

^ permalink raw reply

* Re: [Xen-devel] [PATCH 1/4] xen-netfront: remove unused variable `extra'
From: Ian Campbell @ 2013-04-09 14:28 UTC (permalink / raw)
  To: Wei Liu
  Cc: Paul Durrant, annie li, netdev@vger.kernel.org,
	konrad.wilk@oracle.com, xen-devel@lists.xen.org
In-Reply-To: <1363706805.3088.6.camel@zion.uk.xensource.com>

(apologies for the late reply, I've been away)

On Tue, 2013-03-19 at 15:26 +0000, Wei Liu wrote:
> I think Ian's (and my) idea of redundant is that this 'extra' variable
> is never used in the code now and causes confusion. It can be removed
> now and add back in the future if necessary.

Right, the "extra" I was questioning at the top was a local variable in
the Linux code not the XEN_NETIF_EXTRA_FLAG_MORE thing. Although the
variable was related to the handling of that flag it currently was
written and then never read...

Ian.

^ permalink raw reply

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
From: Ian Campbell @ 2013-04-09 14:30 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Wei Liu, netdev@vger.kernel.org, xen-devel@lists.xen.org,
	konrad.wilk@oracle.com, annie.li@oracle.com
In-Reply-To: <1363728480.31336.10.camel@deadeye.wl.decadent.org.uk>

On Tue, 2013-03-19 at 21:28 +0000, Ben Hutchings wrote:
> On Tue, 2013-03-19 at 21:24 +0000, Ben Hutchings wrote:
> > On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
> > > On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> > > > On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> > > > > On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > > > > > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > > > > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > > > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > > > > > 65535 will cause overflow.
> > > > > > > > 
> > > > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > > > > ---
> > > > > > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > > > > > >  1 file changed, 12 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > > > > > index 5527663..8c3d065 100644
> > > > > > > > --- a/drivers/net/xen-netfront.c
> > > > > > > > +++ b/drivers/net/xen-netfront.c
> > > > > > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > > > >  	unsigned int len = skb_headlen(skb);
> > > > > > > >  	unsigned long flags;
> > > > > > > >  
> > > > > > > > +	/*
> > > > > > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > > > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > > > > > +	 * uint16_t.
> > > > > > > 
> > > > > > > Is there some field we can set e.g. in struct ethernet_device which
> > > > > > > would stop this from happening?
> > > > > > > 
> > > > > > 
> > > > > > struct ethernet_device? I could not find it.
> > > > > > 
> > > > > > And for struct net_device,
> > > > > 
> > > > > I meant struct net_device.
> > > > > 
> > > > > >  there is no field for this AFAICT.
> > > > > 
> > > > > Interesting. Are hardware devices expected to cope with arbitrary sized
> > > > > GSO skbs then I wonder.
> > > > > 
> > > > 
> > > > No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> > > > net_device. :-)
> > > 
> > > But aren't we seeing skb's bigger than that?
> > > 
> > > Maybe this is just a historical bug in some older guests?
> > 
> > GSO_MAX_SIZE is the maximum payload length, not the maximum total length
> > of an skb.
> 
> ...and it's actually just the default value assigned to
> dev->gso_max_size.  You'll want to change it to your actual maximum
> (65535 - maximum length of headers) before registering your net devices.

Thanks. 

"maximum length of headers" might be a bit tricky to determine
generically :-(.

Ian.

^ permalink raw reply

* Re: [PATCH] tcp: assign the sock correctly to an outgoing SYNACK packet
From: Eric Dumazet @ 2013-04-09 14:31 UTC (permalink / raw)
  To: Paul Moore
  Cc: Casey Schaufler, David Miller, netdev, mvadkert, selinux,
	linux-security-module
In-Reply-To: <2238729.HES6agzVX2@sifl>

On Tue, 2013-04-09 at 10:19 -0400, Paul Moore wrote:
> On Tuesday, April 09, 2013 07:00:22 AM Eric Dumazet wrote:
> > On Tue, 2013-04-09 at 09:19 -0400, Paul Moore wrote:
> > > As Casey already mentioned, if this isn't acceptable please help me
> > > understand why.
> > 
> > You see something which is not the reality. If you do such analysis,
> > better do it properly, because any change you are going to submit will
> > be doubly checked by people who really care.
> 
> I am attempting to do it properly, I simply made a mistake.  Ben also pointed 
> it out.  As you wrote yesterday, "Lets go forward".
> 
> After fixing the BITS_PER_LONG problem I looked at it again and it appears 
> that by simply replacing the "secmark" field with a blob we retain the size of 
> the sk_buff as well as the cacheline positions of all the fields, e.g. 
> dma_cookie no longer moves cachelines.  Thoughts?

If you take a look at recent history of changes on sk_buff, you can see
we added very recently fields for encapsulation support. These were
absolutely wanted for modern operations at datacenter level.

This effort might still need new room, so I prefer not filling sk_buff
right now.

Take a look at the cloned sk_buff. We need an extra atomic_t at the end,
so if make sk_buff bigger than 0xf8 bytes,  fclone_cache will use an
extra cache line as well. Not a big deal, but RPC workloads like netperf
-t TCP_RR will probably show a regression.

ls -l /sys/kernel/slab/skbuff_fclone_cache




^ permalink raw reply

* RE: be2net: GRO for non-inet protocols
From: Bandi,Sarveshwar @ 2013-04-09 14:31 UTC (permalink / raw)
  To: Erik Hugne, Eric Dumazet
  Cc: Perla, Sathya, Seetharaman, Subramanian, Khaparde, Ajit,
	netdev@vger.kernel.org
In-Reply-To: <20130408152417.GD19951@eerihug-hybrid.ki.sw.ericsson.se>

Erik,
   Checked the driver code. With the change that Eric proposed the driver does nothing more than call eth_type_trans to parse the ether protocol type and set up skb variables appropriately.  Please verify that all packets are only taking the be_rx_compl_process path which calls napi_gro_receive. 

   Apart from this I can't see anything in the driver that can cause corruption. 

Thanks,
Sarvesh

-----Original Message-----
From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Erik Hugne
Sent: Monday, April 08, 2013 8:54 PM
To: Eric Dumazet
Cc: Perla, Sathya; Seetharaman, Subramanian; Khaparde, Ajit; netdev@vger.kernel.org
Subject: Re: be2net: GRO for non-inet protocols

On Mon, Apr 08, 2013 at 08:40:10AM +0200, Erik Hugne wrote:
> Thanks Eric, it works as expected after applying this.

So, on to the next problem, now i'm getting corrupted packets from the driver instead. Would be great to get some comments from the Emulex guys regarding this.

Attaching a printk trace where i log the mac header and packet data of all 0x88CA (TIPC) packets in the gro_receive routine that have an erroneous TIPC header. This happens immediately when i register myself with the device.


kernel: [ 3455.608572] tipc: Activated (version 2.0.0)
kernel: [ 3455.609545] NET: Registered protocol family 30
kernel: [ 3455.609547] tipc: Started in single node mode
kernel: [ 3458.837149] tipc: Started in network mode
kernel: [ 3458.837153] tipc: Own node address <1.1.11>, network identity 4711
kernel: [ 3458.837244] tipc: Enabled bearer <eth:eth1>, discovery domain <1.1.0>, priority 10
kernel: [ 3458.837916] tipc: Garbage packet received
kernel: [ 3458.837919] tipc: packet length=56 data_len=56
kernel: [ 3458.837925] pmachdr: e4 11 5b db 24 a4 e4 11 5b d7 36 9c 88 ca                                                        ..[.$...[.6...
kernel: [ 3458.837929] pdata: 10 0b 00 00 00 01 9e dd 00 a0 01 00 10 0b 01 00 10 0a 00 00 00 00 01 77 05 dc 65 74 68 31 00 00  .......................w..eth1..
kernel: [ 3458.837933] pdata: 00 00 00 00 00 00 00 00 00 00 04 00 00 00 04 00 00 00 50 e5 74 64 5c e6                          ..................P.td\.
kernel: [ 3458.837942] tipc: Established link <1.1.11:eth1-1.1.10:eth1> on network plane A
kernel: [ 3458.838225] tipc: Garbage packet received
kernel: [ 3458.838228] tipc: packet length=56 data_len=56
kernel: [ 3458.838232] pmachdr: e4 11 5b db 24 a4 e4 11 5b d7 36 9c 88 ca                                                        ..[.$...[.6...
kernel: [ 3458.838236] pdata: 10 0b 00 00 00 01 9e dd 00 a1 01 00 10 0b 01 00 10 0a 00 00 00 00 00 00 00 00 65 74 68 31 00 00  ..........................eth1..
kernel: [ 3458.838239] pdata: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                          ........................
kernel: [ 3458.838244] tipc: Garbage packet received
kernel: [ 3458.838246] tipc: packet length=56 data_len=56
kernel: [ 3458.838249] pmachdr: e4 11 5b db 24 a4 e4 11 5b d7 36 9c 88 ca                                                        ..[.$...[.6...
kernel: [ 3458.838254] pdata: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00  ................................
kernel: [ 3458.838258] pdata: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00                          ........................
kernel: [ 3458.838262] tipc: Garbage packet received
kernel: [ 3458.838263] tipc: packet length=60 data_len=60
kernel: [ 3458.838268] pmachdr: e4 11 5b db 24 a4 e4 11 5b d7 36 9c 88 ca                                                        ..[.$...[.6...
kernel: [ 3458.838272] pdata: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 72 65 65 64 65 73  ..........................reedes
kernel: [ 3458.838276] pdata: 6b 74 6f 70 2f 48 61 6c 2f 64 65 76 69 63 65 73 2f 6e 65 74 5f 65 34 5f 31 31 5f 35              ktop/Hal/devices/net_e4_11_5
kernel: [ 3459.976074] tipc: Garbage packet received
kernel: [ 3459.976077] tipc: packet length=56 data_len=56
kernel: [ 3459.976081] pmachdr: e4 11 5b db 24 a4 e4 11 5b d7 36 9c 88 ca                                                        ..[.$...[.6...
kernel: [ 3459.976085] pdata: 10 0b 00 00 00 03 9e dd 00 a1 01 00 10 0b 01 00 10 0a 00 00 00 00 00 00 00 00 65 74 68 31 00 00  ..........................eth1..
kernel: [ 3459.976089] pdata: 00 00 00 00 00 00 00 00 00 00 04 00 00 00 04 00 00 00 50 e5 74 64 5c e6                          ..................P.td\.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: be2net: GRO for non-inet protocols
From: Eric Dumazet @ 2013-04-09 14:32 UTC (permalink / raw)
  To: Erik Hugne; +Cc: sathya.perla, subbu.seetharaman, ajit.khaparde, netdev
In-Reply-To: <20130409142239.GB27063@eerihug-hybrid.ki.sw.ericsson.se>

On Tue, 2013-04-09 at 16:22 +0200, Erik Hugne wrote:
> On Tue, Apr 09, 2013 at 06:44:35AM -0700, Eric Dumazet wrote:
> > I suggested you try GRE tunnels on emulex NIC, to make sure the problem
> > is not coming from your changes. Nevermind ...
> >
> I understood that, but since i could not load anything newer than 3.0.61 on them,
> GRO support for GRE tunnels was not included.. but anyway...
> I got hold of a new set of machines that i could load 3.8 and perform the GRE 
> test on, these also use Emulex NIC's.
> What i found was that all packets received on the GRE device are MTU-sized, 
> and are not GRO'd together at all. The same thing happens in a qemu environment, 
> so maybe there's another issue with GRO for GRE..

But you included my patch ?

^ permalink raw reply

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
From: Ben Hutchings @ 2013-04-09 14:45 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Wei Liu, netdev@vger.kernel.org, xen-devel@lists.xen.org,
	konrad.wilk@oracle.com, annie.li@oracle.com
In-Reply-To: <1365517818.10725.44.camel@zakaz.uk.xensource.com>

On Tue, 2013-04-09 at 15:30 +0100, Ian Campbell wrote:
> On Tue, 2013-03-19 at 21:28 +0000, Ben Hutchings wrote:
> > On Tue, 2013-03-19 at 21:24 +0000, Ben Hutchings wrote:
> > > On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
> > > > On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> > > > > On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> > > > > > On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > > > > > > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > > > > > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > > > > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > > > > > > 65535 will cause overflow.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > > > > > ---
> > > > > > > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > > > > > > >  1 file changed, 12 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > > > > > > index 5527663..8c3d065 100644
> > > > > > > > > --- a/drivers/net/xen-netfront.c
> > > > > > > > > +++ b/drivers/net/xen-netfront.c
> > > > > > > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > > > > >  	unsigned int len = skb_headlen(skb);
> > > > > > > > >  	unsigned long flags;
> > > > > > > > >  
> > > > > > > > > +	/*
> > > > > > > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > > > > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > > > > > > +	 * uint16_t.
> > > > > > > > 
> > > > > > > > Is there some field we can set e.g. in struct ethernet_device which
> > > > > > > > would stop this from happening?
> > > > > > > > 
> > > > > > > 
> > > > > > > struct ethernet_device? I could not find it.
> > > > > > > 
> > > > > > > And for struct net_device,
> > > > > > 
> > > > > > I meant struct net_device.
> > > > > > 
> > > > > > >  there is no field for this AFAICT.
> > > > > > 
> > > > > > Interesting. Are hardware devices expected to cope with arbitrary sized
> > > > > > GSO skbs then I wonder.
> > > > > > 
> > > > > 
> > > > > No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> > > > > net_device. :-)
> > > > 
> > > > But aren't we seeing skb's bigger than that?
> > > > 
> > > > Maybe this is just a historical bug in some older guests?
> > > 
> > > GSO_MAX_SIZE is the maximum payload length, not the maximum total length
> > > of an skb.
> > 
> > ...and it's actually just the default value assigned to
> > dev->gso_max_size.  You'll want to change it to your actual maximum
> > (65535 - maximum length of headers) before registering your net devices.
> 
> Thanks. 
> 
> "maximum length of headers" might be a bit tricky to determine
> generically :-(.

Well you don't need to be generic, you need to know the maximum length
of headers that might appear in a TSO skb.

Ethernet + VLAN tag + IPv6 + TCP + timestamp option = 90 bytes, but I'm
not sure whether there can be other IP or TCP options in a TSO skb.  I'd
really like to get the TSO requirements clearly documented somewhere.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] tcp: assign the sock correctly to an outgoing SYNACK packet
From: Paul Moore @ 2013-04-09 14:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Casey Schaufler, David Miller, netdev, mvadkert, selinux,
	linux-security-module
In-Reply-To: <1365517864.3887.137.camel@edumazet-glaptop>

On Tuesday, April 09, 2013 07:31:04 AM Eric Dumazet wrote:
> On Tue, 2013-04-09 at 10:19 -0400, Paul Moore wrote:
> > On Tuesday, April 09, 2013 07:00:22 AM Eric Dumazet wrote:
> > > On Tue, 2013-04-09 at 09:19 -0400, Paul Moore wrote:
> > > > As Casey already mentioned, if this isn't acceptable please help me
> > > > understand why.
> > > 
> > > You see something which is not the reality. If you do such analysis,
> > > better do it properly, because any change you are going to submit will
> > > be doubly checked by people who really care.
> > 
> > I am attempting to do it properly, I simply made a mistake.  Ben also
> > pointed it out.  As you wrote yesterday, "Lets go forward".
> > 
> > After fixing the BITS_PER_LONG problem I looked at it again and it appears
> > that by simply replacing the "secmark" field with a blob we retain the
> > size of the sk_buff as well as the cacheline positions of all the fields,
> > e.g. dma_cookie no longer moves cachelines.  Thoughts?
> 
> If you take a look at recent history of changes on sk_buff, you can see
> we added very recently fields for encapsulation support. These were
> absolutely wanted for modern operations at datacenter level.
> 
> This effort might still need new room, so I prefer not filling sk_buff
> right now.

Has anyone proposed any additional encapsulation patches which need additional 
fields in the sk_buff?  Are you aware of any additional encapsulation patches 
which are in progress?  When would you consider it "safe"?

> Take a look at the cloned sk_buff. We need an extra atomic_t at the end,
> so if make sk_buff bigger than 0xf8 bytes,  fclone_cache will use an
> extra cache line as well. Not a big deal, but RPC workloads like netperf
> -t TCP_RR will probably show a regression.
> 
> ls -l /sys/kernel/slab/skbuff_fclone_cache

Perhaps I'm misunderstanding, but these comments above only apply if we were 
to increase the size of the sk_buff struct, yes?  What I proposed, replacing 
"secmark" with a blob, does not currently change the size of the sk_buff 
struct so the performance and memory usage should remain unchanged as well.

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply

* Re: be2net: GRO for non-inet protocols
From: Erik Hugne @ 2013-04-09 14:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: sathya.perla, subbu.seetharaman, ajit.khaparde, netdev
In-Reply-To: <1365517951.3887.138.camel@edumazet-glaptop>

On Tue, Apr 09, 2013 at 07:32:31AM -0700, Eric Dumazet wrote:
> But you included my patch ?
Yes :)

^ permalink raw reply

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
From: Christoph Egger @ 2013-04-09 14:53 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Ian Campbell, netdev@vger.kernel.org, annie.li@oracle.com,
	konrad.wilk@oracle.com, Wei Liu, xen-devel@lists.xen.org
In-Reply-To: <1365518703.2623.6.camel@bwh-desktop.uk.solarflarecom.com>

On 09.04.13 16:45, Ben Hutchings wrote:
> On Tue, 2013-04-09 at 15:30 +0100, Ian Campbell wrote:
>> On Tue, 2013-03-19 at 21:28 +0000, Ben Hutchings wrote:
>>> On Tue, 2013-03-19 at 21:24 +0000, Ben Hutchings wrote:
>>>> On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
>>>>> On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
>>>>>> On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
>>>>>>> On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
>>>>>>>> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
>>>>>>>>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
>>>>>>>>>> The `size' field of Xen network wire format is uint16_t, anything bigger than
>>>>>>>>>> 65535 will cause overflow.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>>>>>>>>> ---
>>>>>>>>>>   drivers/net/xen-netfront.c |   12 ++++++++++++
>>>>>>>>>>   1 file changed, 12 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>>>>>>>>>> index 5527663..8c3d065 100644
>>>>>>>>>> --- a/drivers/net/xen-netfront.c
>>>>>>>>>> +++ b/drivers/net/xen-netfront.c
>>>>>>>>>> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>>>>>>>   	unsigned int len = skb_headlen(skb);
>>>>>>>>>>   	unsigned long flags;
>>>>>>>>>>
>>>>>>>>>> +	/*
>>>>>>>>>> +	 * wire format of xen_netif_tx_request only supports skb->len
>>>>>>>>>> +	 * < 64K, because size field in xen_netif_tx_request is
>>>>>>>>>> +	 * uint16_t.
>>>>>>>>>
>>>>>>>>> Is there some field we can set e.g. in struct ethernet_device which
>>>>>>>>> would stop this from happening?
>>>>>>>>>
>>>>>>>>
>>>>>>>> struct ethernet_device? I could not find it.
>>>>>>>>
>>>>>>>> And for struct net_device,
>>>>>>>
>>>>>>> I meant struct net_device.
>>>>>>>
>>>>>>>>   there is no field for this AFAICT.
>>>>>>>
>>>>>>> Interesting. Are hardware devices expected to cope with arbitrary sized
>>>>>>> GSO skbs then I wonder.
>>>>>>>
>>>>>>
>>>>>> No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
>>>>>> net_device. :-)
>>>>>
>>>>> But aren't we seeing skb's bigger than that?
>>>>>
>>>>> Maybe this is just a historical bug in some older guests?
>>>>
>>>> GSO_MAX_SIZE is the maximum payload length, not the maximum total length
>>>> of an skb.
>>>
>>> ...and it's actually just the default value assigned to
>>> dev->gso_max_size.  You'll want to change it to your actual maximum
>>> (65535 - maximum length of headers) before registering your net devices.
>>
>> Thanks.
>>
>> "maximum length of headers" might be a bit tricky to determine
>> generically :-(.
>
> Well you don't need to be generic, you need to know the maximum length
> of headers that might appear in a TSO skb.
>
> Ethernet + VLAN tag + IPv6 + TCP + timestamp option = 90 bytes, but I'm
> not sure whether there can be other IP or TCP options in a TSO skb.  I'd
> really like to get the TSO requirements clearly documented somewhere.

What about encapsulated IPSEC, IP-in-IP-tunnels, etc. ?

Christoph

^ permalink raw reply

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
From: Ben Hutchings @ 2013-04-09 14:59 UTC (permalink / raw)
  To: Christoph Egger
  Cc: Ian Campbell, netdev@vger.kernel.org, annie.li@oracle.com,
	konrad.wilk@oracle.com, Wei Liu, xen-devel@lists.xen.org
In-Reply-To: <51642B59.4080407@amazon.de>

On Tue, 2013-04-09 at 16:53 +0200, Christoph Egger wrote:
> On 09.04.13 16:45, Ben Hutchings wrote:
> > On Tue, 2013-04-09 at 15:30 +0100, Ian Campbell wrote:
> >> On Tue, 2013-03-19 at 21:28 +0000, Ben Hutchings wrote:
> >>> On Tue, 2013-03-19 at 21:24 +0000, Ben Hutchings wrote:
> >>>> On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
> >>>>> On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> >>>>>> On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> >>>>>>> On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> >>>>>>>> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> >>>>>>>>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> >>>>>>>>>> The `size' field of Xen network wire format is uint16_t, anything bigger than
> >>>>>>>>>> 65535 will cause overflow.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> >>>>>>>>>> ---
> >>>>>>>>>>   drivers/net/xen-netfront.c |   12 ++++++++++++
> >>>>>>>>>>   1 file changed, 12 insertions(+)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> >>>>>>>>>> index 5527663..8c3d065 100644
> >>>>>>>>>> --- a/drivers/net/xen-netfront.c
> >>>>>>>>>> +++ b/drivers/net/xen-netfront.c
> >>>>>>>>>> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >>>>>>>>>>   	unsigned int len = skb_headlen(skb);
> >>>>>>>>>>   	unsigned long flags;
> >>>>>>>>>>
> >>>>>>>>>> +	/*
> >>>>>>>>>> +	 * wire format of xen_netif_tx_request only supports skb->len
> >>>>>>>>>> +	 * < 64K, because size field in xen_netif_tx_request is
> >>>>>>>>>> +	 * uint16_t.
> >>>>>>>>>
> >>>>>>>>> Is there some field we can set e.g. in struct ethernet_device which
> >>>>>>>>> would stop this from happening?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> struct ethernet_device? I could not find it.
> >>>>>>>>
> >>>>>>>> And for struct net_device,
> >>>>>>>
> >>>>>>> I meant struct net_device.
> >>>>>>>
> >>>>>>>>   there is no field for this AFAICT.
> >>>>>>>
> >>>>>>> Interesting. Are hardware devices expected to cope with arbitrary sized
> >>>>>>> GSO skbs then I wonder.
> >>>>>>>
> >>>>>>
> >>>>>> No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> >>>>>> net_device. :-)
> >>>>>
> >>>>> But aren't we seeing skb's bigger than that?
> >>>>>
> >>>>> Maybe this is just a historical bug in some older guests?
> >>>>
> >>>> GSO_MAX_SIZE is the maximum payload length, not the maximum total length
> >>>> of an skb.
> >>>
> >>> ...and it's actually just the default value assigned to
> >>> dev->gso_max_size.  You'll want to change it to your actual maximum
> >>> (65535 - maximum length of headers) before registering your net devices.
> >>
> >> Thanks.
> >>
> >> "maximum length of headers" might be a bit tricky to determine
> >> generically :-(.
> >
> > Well you don't need to be generic, you need to know the maximum length
> > of headers that might appear in a TSO skb.
> >
> > Ethernet + VLAN tag + IPv6 + TCP + timestamp option = 90 bytes, but I'm
> > not sure whether there can be other IP or TCP options in a TSO skb.  I'd
> > really like to get the TSO requirements clearly documented somewhere.
> 
> What about encapsulated IPSEC, IP-in-IP-tunnels, etc. ?

xen-netfront doesn't offload GSO for those, unless I'm much mistaken.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] tcp: assign the sock correctly to an outgoing SYNACK packet
From: Paul Moore @ 2013-04-09 15:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Casey Schaufler, David Miller, netdev, mvadkert, selinux,
	linux-security-module
In-Reply-To: <7718638.lBZi8geXkP@sifl>

On Tuesday, April 09, 2013 10:52:17 AM Paul Moore wrote:
> On Tuesday, April 09, 2013 07:31:04 AM Eric Dumazet wrote:
> > On Tue, 2013-04-09 at 10:19 -0400, Paul Moore wrote:
> > > On Tuesday, April 09, 2013 07:00:22 AM Eric Dumazet wrote:
> > > > On Tue, 2013-04-09 at 09:19 -0400, Paul Moore wrote:
> > > > > As Casey already mentioned, if this isn't acceptable please help me
> > > > > understand why.
> > > > 
> > > > You see something which is not the reality. If you do such analysis,
> > > > better do it properly, because any change you are going to submit will
> > > > be doubly checked by people who really care.
> > > 
> > > I am attempting to do it properly, I simply made a mistake.  Ben also
> > > pointed it out.  As you wrote yesterday, "Lets go forward".
> > > 
> > > After fixing the BITS_PER_LONG problem I looked at it again and it
> > > appears
> > > that by simply replacing the "secmark" field with a blob we retain the
> > > size of the sk_buff as well as the cacheline positions of all the
> > > fields,
> > > e.g. dma_cookie no longer moves cachelines.  Thoughts?
> > 
> > If you take a look at recent history of changes on sk_buff, you can see
> > we added very recently fields for encapsulation support. These were
> > absolutely wanted for modern operations at datacenter level.
> > 
> > This effort might still need new room, so I prefer not filling sk_buff
> > right now.
> 
> Has anyone proposed any additional encapsulation patches which need
> additional fields in the sk_buff?  Are you aware of any additional
> encapsulation patches which are in progress?  When would you consider it
> "safe"?

Another thought.

If we move the "protocol" field after the flags2 bitfield, remove "secmark", 
and add the blob before/after "destructor" then we preserve the existing four 
byte hole in cacheline #3 for potential use by the encapsulation folks and 
move the dma_cookie into cacheline #3 as well which may actually be a good 
thing (corrections welcome on this comment).  The overall size remains the 
same at 256 bytes.

struct sk_buff_test {
        struct sk_buff *           next;                 /*     0     8 */
        struct sk_buff *           prev;                 /*     8     8 */
        ktime_t                    tstamp;               /*    16     8 */
        struct sock *              sk;                   /*    24     8 */
        struct net_device *        dev;                  /*    32     8 */
        char                       cb[48];               /*    40    48 */
        /* --- cacheline 1 boundary (64 bytes) was 24 bytes ago --- */
        long unsigned int          _skb_refdst;          /*    88     8 */
        struct sec_path *          sp;                   /*    96     8 */
        unsigned int               len;                  /*   104     4 */
        unsigned int               data_len;             /*   108     4 */
        __u16                      mac_len;              /*   112     2 */
        __u16                      hdr_len;              /*   114     2 */
        union {
                __wsum             csum;                 /*           4 */
                struct {
                        __u16      csum_start;           /*   116     2 */
                        __u16      csum_offset;          /*   118     2 */
                };                                       /*           4 */
        };                                               /*   116     4 */
        __u32                      priority;             /*   120     4 */
        int                        flags1_begin[0];      /*   124     0 */
        __u8                       local_df:1;           /*   124: 7  1 */
        __u8                       cloned:1;             /*   124: 6  1 */
        __u8                       ip_summed:2;          /*   124: 4  1 */
        __u8                       nohdr:1;              /*   124: 3  1 */
        __u8                       nfctinfo:3;           /*   124: 0  1 */
        __u8                       pkt_type:3;           /*   125: 5  1 */
        __u8                       fclone:2;             /*   125: 3  1 */
        __u8                       ipvs_property:1;      /*   125: 2  1 */
        __u8                       peeked:1;             /*   125: 1  1 */
        __u8                       nf_trace:1;           /*   125: 0  1 */

        /* XXX 2 bytes hole, try to pack */

        /* --- cacheline 2 boundary (128 bytes) --- */
        int                        flags1_end[0];        /*   128     0 */
        void *                     security;             /*   128     8 */
        void                       (*destructor)(struct sk_buff *); /*   136     
8 */
        struct nf_conntrack *      nfct;                 /*   144     8 */
        struct sk_buff *           nfct_reasm;           /*   152     8 */
        struct nf_bridge_info *    nf_bridge;            /*   160     8 */
        int                        skb_iif;              /*   168     4 */
        __u32                      rxhash;               /*   172     4 */
        __u16                      vlan_tci;             /*   176     2 */
        __u16                      tc_index;             /*   178     2 */
        __u16                      tc_verd;              /*   180     2 */
        __u16                      queue_mapping;        /*   182     2 */
        int                        flags2_begin[0];      /*   184     0 */
        __u8                       ndisc_nodetype:2;     /*   184: 6  1 */
        __u8                       pfmemalloc:1;         /*   184: 5  1 */
        __u8                       ooo_okay:1;           /*   184: 4  1 */
        __u8                       l4_rxhash:1;          /*   184: 3  1 */
        __u8                       wifi_acked_valid:1;   /*   184: 2  1 */
        __u8                       wifi_acked:1;         /*   184: 1  1 */
        __u8                       no_fcs:1;             /*   184: 0  1 */
        __u8                       head_frag:1;          /*   185: 7  1 */
        __u8                       encapsulation:1;      /*   185: 6  1 */

        /* XXX 6 bits hole, try to pack */
        /* XXX 2 bytes hole, try to pack */

        int                        flags2_end[0];        /*   188     0 */
        __be16                     protocol;             /*   188     2 */

        /* XXX 2 bytes hole, try to pack */

        /* --- cacheline 3 boundary (192 bytes) --- */
        dma_cookie_t               dma_cookie;           /*   192     4 */
        union {
                __u32              mark;                 /*           4 */
                __u32              dropcount;            /*           4 */
                __u32              reserved_tailroom;    /*           4 */
        };                                               /*   196     4 */
        sk_buff_data_t             inner_transport_header; /*   200     4 */
        sk_buff_data_t             inner_network_header; /*   204     4 */
        sk_buff_data_t             transport_header;     /*   208     4 */
        sk_buff_data_t             network_header;       /*   212     4 */
        sk_buff_data_t             mac_header;           /*   216     4 */
        sk_buff_data_t             tail;                 /*   220     4 */
        sk_buff_data_t             end;                  /*   224     4 */

        /* XXX 4 bytes hole, try to pack */

        unsigned char *            head;                 /*   232     8 */
        unsigned char *            data;                 /*   240     8 */
        unsigned int               truesize;             /*   248     4 */
        atomic_t                   users;                /*   252     4 */
        /* --- cacheline 4 boundary (256 bytes) --- */

        /* size: 256, cachelines: 4, members: 62 */
        /* sum members: 246, holes: 4, sum holes: 10 */
        /* bit holes: 1, sum bit holes: 6 bits */
};


-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply

* Re: [PATCH] tcp: assign the sock correctly to an outgoing SYNACK packet
From: Eric Dumazet @ 2013-04-09 15:07 UTC (permalink / raw)
  To: Paul Moore
  Cc: Casey Schaufler, David Miller, netdev, mvadkert, selinux,
	linux-security-module
In-Reply-To: <7718638.lBZi8geXkP@sifl>

On Tue, 2013-04-09 at 10:52 -0400, Paul Moore wrote:

> 
> Perhaps I'm misunderstanding, but these comments above only apply if we were 
> to increase the size of the sk_buff struct, yes?  What I proposed, replacing 
> "secmark" with a blob, does not currently change the size of the sk_buff 
> struct so the performance and memory usage should remain unchanged as well.
> 

If blob size is 4 bytes, thats fine.

If not, read again my mail.




^ permalink raw reply

* Re: [Xen-devel] [PATCH 5/6] xen-netback: coalesce slots before copying
From: Ian Campbell @ 2013-04-09 15:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Wei Liu, xen-devel@lists.xen.org, netdev@vger.kernel.org,
	annie.li@oracle.com, David Vrabel
In-Reply-To: <20130325165746.GB25740@phenom.dumpdata.com>

On Mon, 2013-03-25 at 16:57 +0000, Konrad Rzeszutek Wilk wrote:
> On Mon, Mar 25, 2013 at 11:08:21AM +0000, Wei Liu wrote:
> > This patch tries to coalesce tx requests when constructing grant copy
> > structures. It enables netback to deal with situation when frontend's
> > MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.
> >
> > It defines max_skb_slots, which is a estimation of the maximum number of slots
> > a guest can send, anything bigger than that is considered malicious. Now it is
> > set to 20, which should be enough to accommodate Linux (16 to 19).
> >
> > Also change variable name from "frags" to "slots" in netbk_count_requests.
> >
> 
> This should probably also CC stable@vger.kernel.org

DaveM prefers net patches to not do so and he takes care of forwarding
patches once he is happy (i.e. after they've been in his/Linus' tree for
a bit).

Ian

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox