Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] sysfs: add entry to indicate network interfaces with random MAC address
From: Stefan Assmann @ 2010-07-23  8:08 UTC (permalink / raw)
  To: Casey Leedom
  Cc: Rose, Gregory V, David Miller, shemminger, andy, harald,
	bhutchings, netdev, linux-kernel, gospo, Duyck, Alexander H
In-Reply-To: <2408F875-69BE-45D3-8605-4AE5CDA4601C@chelsio.com>

On 23.07.2010 02:26, Casey Leedom wrote:
> 
>  Or you simply don't have the VF Driver loaded in the "Domain 0" Control OS.  When we install the cxgb4 PF Driver with "num_vf=..." this enables the PCI-E SR-IOV Capabilities within the various PFs and the corresponding VF PCI Devices are instantiated and discovered by the Domain 0 Linux OS.  But without a cxgb4vf VF Driver loaded, those devices just sit there \x13 available for "Device Assignment" to VMs.
> 

Just out of curiosity, how do you prevent the VF driver from getting
loaded in the host? Except from blacklisting it.

  Stefan
--
Stefan Assmann         | Red Hat GmbH
Software Engineer      | Otto-Hahn-Strasse 20, 85609 Dornach
                       | HR: Amtsgericht Muenchen HRB 153243
                       | GF: Brendan Lane, Charlie Peters,
sassmann at redhat.com |     Michael Cunningham, Charles Cachera

^ permalink raw reply

* Re: macvtap: Limit packet queue length
From: Herbert Xu @ 2010-07-23  7:58 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: David Miller, mashirle, netdev, mwagner
In-Reply-To: <201007230928.09064.arnd@arndb.de>

On Fri, Jul 23, 2010 at 09:28:08AM +0200, Arnd Bergmann wrote:
>
> But this is the receive path, not transmit, so a packet coming from an
> external NIC never goes through dev_hard_start_xmit.

In that case skb->sk is probably coming from tuntap, which means
that orphaning the skb is not necessarily a good thing.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] macvlan: Fix rx counters update in macvlan_handle_frame()
From: David Miller @ 2010-07-23  7:31 UTC (permalink / raw)
  To: herbert; +Cc: sri, kaber, netdev
In-Reply-To: <20100723072338.GA4267@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 23 Jul 2010 15:23:39 +0800

> I don't think we should count packet drops when the interface is
> down.  This would be inconsistent with other devices such as
> IPIP.

Agreed.

^ permalink raw reply

* Re: [PATCH] LSM: Add post recvmsg() hook.
From: David Miller @ 2010-07-23  7:29 UTC (permalink / raw)
  To: penguin-kernel
  Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, paul.moore, netdev,
	linux-security-module
In-Reply-To: <201007230721.o6N7Lc2L048186@www262.sakura.ne.jp>

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Fri, 23 Jul 2010 16:21:38 +0900

> Users calling recvmsg() after select() said "read operation will return with
> available data without blocking as long as you pass correct parameters" will
> get error code even if they passed correct parameters.

This is nonsense, because not all "parameters" are not visible to poll().

These "parameters" I speak of are virtual, they are my way of saying
that the errors returned by the existing LSM hooks are more like -EFAULT
or -EACCESS than -EINTR/-EAGAIN or blocking, the latter of which are
what are illegal for blocking socket invoking recvmsg() after poll
indicates there is data to read.

^ permalink raw reply

* Re: macvtap: Limit packet queue length
From: Arnd Bergmann @ 2010-07-23  7:28 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, mashirle, netdev, mwagner
In-Reply-To: <20100722.125808.232541418.davem@davemloft.net>

On Thursday 22 July 2010, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Fri, 23 Jul 2010 00:07:31 +0800
> 
> > On Thu, Jul 22, 2010 at 08:59:58AM -0700, Shirley Ma wrote:
> >> On Thu, 2010-07-22 at 14:41 +0800, Herbert Xu wrote:
> >> >  {
> >> >         struct macvtap_queue *q = macvtap_get_queue(dev, skb);
> >> >         if (!q)
> >> > -               return -ENOLINK;
> >> > +               goto drop;
> >> > +
> >> > +       if (skb_queue_len(&q->sk.sk_receive_queue) >=
> >> > dev->tx_queue_len)
> >> > +               goto drop;
> >> > 
> >> 
> >> Do we need to orphan skb here, just like tun?
> > 
> > We could, but that is orthogonal to the problem at hand so feel
> > free to do that in another patch.
> 
> These days, the stack pre-orphans all packets sent to ->ndo_start_xmit()
> in dev_hard_start_xmit() as long as socket based TX timestamping is not
> active for the packet.

But this is the receive path, not transmit, so a packet coming from an
external NIC never goes through dev_hard_start_xmit.

	Arnd

^ permalink raw reply

* Re: [PATCH] macvlan: Fix rx counters update in macvlan_handle_frame()
From: Herbert Xu @ 2010-07-23  7:23 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: David Miller, kaber, netdev
In-Reply-To: <1279839867.3274.25.camel@w-sridhar.beaverton.ibm.com>

On Thu, Jul 22, 2010 at 04:04:27PM -0700, Sridhar Samudrala wrote:
> Fix macvlan_handle_frame() to update the rx counters based
> on the return value of the vlan->receive call.
> 
> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
> 
> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
> index 87e8d4c..2732df1 100644
> --- a/drivers/net/macvlan.c
> +++ b/drivers/net/macvlan.c
> @@ -152,7 +152,8 @@ static struct sk_buff *macvlan_handle_frame(struct macvlan_port *port,
>  	const struct macvlan_dev *vlan;
>  	const struct macvlan_dev *src;
>  	struct net_device *dev;
> -	unsigned int len;
> +	unsigned int len = 0;
> +	int ret = NET_RX_DROP;
>  
>  	if (is_multicast_ether_addr(eth->h_dest)) {
>  		src = macvlan_hash_lookup(port, eth->h_source);
> @@ -184,18 +185,20 @@ static struct sk_buff *macvlan_handle_frame(struct macvlan_port *port,
>  	dev = vlan->dev;
>  	if (unlikely(!(dev->flags & IFF_UP))) {
>  		kfree_skb(skb);
> -		return NULL;
> +		goto out;

I don't think we should count packet drops when the interface is
down.  This would be inconsistent with other devices such as
IPIP.

Otherthis your patch looks good to me.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] LSM: Add post recvmsg() hook.
From: Tetsuo Handa @ 2010-07-23  7:21 UTC (permalink / raw)
  To: davem
  Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, paul.moore, netdev,
	linux-security-module
In-Reply-To: <20100722.224450.04662931.davem@davemloft.net>

David Miller wrote:
> From: Tetsuo Handa
> Date: Fri, 23 Jul 2010 09:22:20 +0900
> 
> > David Miller wrote:
> >> The fact is going to remain that you will be unable to return data
> >> from recvmsg() to a blocking socket when ->poll() returns true even
> >> though data is in fact there in the socket receive queue.
> >> 
> >> This is something that the existing LSM hooks do not do.
> > 
> > This is something that the existing security_socket_recvmsg() hook does do.
> > SELinux is unable to return data from recvmsg() to a blocking socket when
> > ->poll() returns true even though data is in fact there in the socket receive
> > queue.
> > We agreed below situation, didn't we?
> 
> Existing LSM hook returns an error early, as if the user passed in
> incorrect parameters or similar.
> 
> It's completely stateless and dependent upon purely the labels
> associated with state visible on recvmsg() entry, and independent
> of other things such as attributes in the packets contained in
> the socket's receive queue.

Users calling recvmsg() after select() said "read operation will return with
available data without blocking as long as you pass correct parameters" will
get error code even if they passed correct parameters.

If they passed incorrect parameters, they must accept the error code.
But they passed correct parameters, and they expected that recvmsg() returns
success with available data. From the point of view of select()->recvmsg()
users, they don't care whether internal implementation is stateful and/or
dependent of other things or not, they care what the specification says.

My understanding was that the specification requires

  If select() said "read operation will return with available data without
  blocking", recvmsg() must return available data. No access control mechanism
  in recvmsg() path is allowed to reject the request.

and security_socket_recvmsg() conflicts with this understanding.
I'd like to see the specification which states disclaimers like below.

  (i) Even if select() said "read operation will return with available data
      without blocking", recvmsg() is allowed to return error code rather than
      available data when an access control mechanism in recvmsg() path
      rejected the request.

  (ii) The access control mechanism in recvmsg() path may use attribute of
       the socket.

  (iii) The access control mechanism in recvmsg() path must not use the
        packets contained in the socket's receive queue.

We agreed on (i) and I don't mind (ii).
Would you please show me URLs to the specification which states (iii)?

^ permalink raw reply

* Re: linux-next: manual merge of the staging-next tree with the net tree
From: Sven Eckelmann @ 2010-07-23  7:09 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Greg KH, linux-next, linux-kernel, Eric Dumazet, David Miller,
	netdev
In-Reply-To: <20100723150323.daa04d5f.sfr@canb.auug.org.au>

[-- Attachment #1: Type: Text/Plain, Size: 1264 bytes --]

Stephen Rothwell wrote:
> Hi Greg,
> 
> Today's linux-next merge of the staging-next tree got a conflict in
> drivers/staging/batman-adv/hard-interface.c between commit
> 28172739f0a276eb8d6ca917b3974c2edb036da3 ("net: fix 64 bit counters on 32
> bit arches") from the net tree and commit
> a1a38cad4c71bc08403b204fbe0ba98b4447f8bf ("Staging: batman-adv: Don't
> increment stats of foreign device") from the staging-next tree.
> 
> The latter just removes the code fixed by the former, so I did that.

That is absolutely correct. The netdev guys noticed that the commits around 
"net: fix 64 bit counters on 32 bit arches" made those lines a real bug 
(casting stuff around, writing on memory it should only read, probably writing 
on unrelated memory). We reviewed those lines again and noticed that they make 
no sense at all and removed them to fix that bug and another minor bug.

I am sorry that this created a merge conflict for you, but only those commits 
by the netdev guys made us aware that there is something going wrong in the 
batman-adv receive code.

Thanks a lot for your linux-next repository and your work on it. It is a big 
help to see what is going on elsewhere and how it may affect our code.

Best regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH] iptables: use skb->len for accounting
From: Eric Dumazet @ 2010-07-23  6:58 UTC (permalink / raw)
  To: Changli Gao; +Cc: Patrick McHardy, David S. Miller, netfilter-devel, netdev
In-Reply-To: <AANLkTikFp9tjtYz5c0GTOfcS1DD8v6fLvBUVVeBVbsns@mail.gmail.com>

Le vendredi 23 juillet 2010 à 14:47 +0800, Changli Gao a écrit :
> On Fri, Jul 23, 2010 at 2:29 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le vendredi 23 juillet 2010 à 11:34 +0800, Changli Gao a écrit :
> >> iptables: use skb->len for accounting
> >>
> >> use skb->len for accounting as xt_quota does.
> >>
> >
> > Why ?
> >
> > This is a gratuitous change, unless you have very strong arguments.
> >
> > xt_quota is an exception, dont change all others because of it !
> 
> exception ? Why ?

Because it handles all protocols...

So skb->len is a shortcut, an approximation if you want.

At IPV4 level, ip->tot_len is an exact value.




^ permalink raw reply

* Re: [PATCH] iptables: use skb->len for accounting
From: Changli Gao @ 2010-07-23  6:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, David S. Miller, netfilter-devel, netdev
In-Reply-To: <1279866541.2482.79.camel@edumazet-laptop>

On Fri, Jul 23, 2010 at 2:29 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 23 juillet 2010 à 11:34 +0800, Changli Gao a écrit :
>> iptables: use skb->len for accounting
>>
>> use skb->len for accounting as xt_quota does.
>>
>
> Why ?
>
> This is a gratuitous change, unless you have very strong arguments.
>
> xt_quota is an exception, dont change all others because of it !

exception ? Why ?

>
> It is about actual data on wire, including overhead (excess bytes after
> IP frame if any).
>
> But IP tables accounting is about IP only.
>

Is the assumption that: the packets in netfilter have the padding
bytes removed, and skb->data points to the network header? Thanks.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] ip6tables: use skb->len for accounting
From: Changli Gao @ 2010-07-23  6:38 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov,
	Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
	netfilter-devel, netdev
In-Reply-To: <alpine.LSU.2.01.1007230813300.815@obet.zrqbmnf.qr>

On Fri, Jul 23, 2010 at 2:16 PM, Jan Engelhardt <jengelh@medozas.de> wrote:
>
>
> I wonder how this fares with trailing padding or data, like, when
> you have a standard v4/v6 packet created in a raw socket, and append
> a bunch of \0s to it.
>
>

For the packets received, ip_rcv, ipv6_rcv and bridge all call
pskb_trim_rcsum before feeding them to netfilter. The raw packets are
sent via dev_queue_xmit(), and they don't pass through the output path
of netfilter. One case, maybe the queued packets mangled "wrongly" in
userspace are reinjected, however, we can't prevent a user from
changing the payload_len wrongly.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] iptables: use skb->len for accounting
From: Eric Dumazet @ 2010-07-23  6:29 UTC (permalink / raw)
  To: Changli Gao; +Cc: Patrick McHardy, David S. Miller, netfilter-devel, netdev
In-Reply-To: <1279856088-9004-1-git-send-email-xiaosuo@gmail.com>

Le vendredi 23 juillet 2010 à 11:34 +0800, Changli Gao a écrit :
> iptables: use skb->len for accounting
> 
> use skb->len for accounting as xt_quota does.
> 

Why ?

This is a gratuitous change, unless you have very strong arguments.

xt_quota is an exception, dont change all others because of it !

It is about actual data on wire, including overhead (excess bytes after
IP frame if any).

But IP tables accounting is about IP only.

> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----
>  net/ipv4/netfilter/ip_tables.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
> index b38c118..3c584a6 100644
> --- a/net/ipv4/netfilter/ip_tables.c
> +++ b/net/ipv4/netfilter/ip_tables.c
> @@ -364,7 +364,7 @@ ipt_do_table(struct sk_buff *skb,
>  				goto no_match;
>  		}
>  
> -		ADD_COUNTER(e->counters, ntohs(ip->tot_len), 1);
> +		ADD_COUNTER(e->counters, skb->len, 1);
>  
>  		t = ipt_get_target(e);
>  		IP_NF_ASSERT(t->u.kernel.target);
> --



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] xt_quota: don't copy quota back to userspace
From: Eric Dumazet @ 2010-07-23  6:28 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Changli Gao, Patrick McHardy, David S. Miller, netfilter-devel,
	netdev
In-Reply-To: <alpine.LSU.2.01.1007230818260.815@obet.zrqbmnf.qr>

Le vendredi 23 juillet 2010 à 08:20 +0200, Jan Engelhardt a écrit :
> On Friday 2010-07-23 06:54, Changli Gao wrote:
> 
> >This patch should be applied after my another patch:
> >http://patchwork.ozlabs.org/patch/59729/
> >
> >xt_quota: don't copy quota back to userspace
> >
> >In nowadays, table entries are per-cpu variables, so it don't make any 
> >sense to copy quota back to one of the variable instances. To keep 
> >things simple, this patch undo the copy.
> 
> I object. This line is on purpose, to give at least a chance of 
> reporting back a more-or-less believable value. Without copying
> the value back, users have moaned about the counter not decreasing
> _at all_.

Maybe, but current situation is buggy.



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] xt_quota: don't copy quota back to userspace
From: Jan Engelhardt @ 2010-07-23  6:20 UTC (permalink / raw)
  To: Changli Gao; +Cc: Patrick McHardy, David S. Miller, netfilter-devel, netdev
In-Reply-To: <1279860845-7177-1-git-send-email-xiaosuo@gmail.com>

On Friday 2010-07-23 06:54, Changli Gao wrote:

>This patch should be applied after my another patch:
>http://patchwork.ozlabs.org/patch/59729/
>
>xt_quota: don't copy quota back to userspace
>
>In nowadays, table entries are per-cpu variables, so it don't make any 
>sense to copy quota back to one of the variable instances. To keep 
>things simple, this patch undo the copy.

I object. This line is on purpose, to give at least a chance of 
reporting back a more-or-less believable value. Without copying
the value back, users have moaned about the counter not decreasing
_at all_.

^ permalink raw reply

* Re: [PATCH] ip6tables: use skb->len for accounting
From: Jan Engelhardt @ 2010-07-23  6:16 UTC (permalink / raw)
  To: Changli Gao
  Cc: Patrick McHardy, David S. Miller, Alexey Kuznetsov,
	Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
	netfilter-devel, netdev
In-Reply-To: <1279855877-8945-1-git-send-email-xiaosuo@gmail.com>


On Friday 2010-07-23 05:31, Changli Gao wrote:

>ip6tables: use skb->len for accounting
>
>ipv6_hdr(skb)->payload_len is ZERO and can't be used for accounting, if the
>payload is a Jumbo Payload specified in RFC2675.
>
>Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>----
> net/ipv6/netfilter/ip6_tables.c |    4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
>index dc41d6d..33113c1 100644
>--- a/net/ipv6/netfilter/ip6_tables.c
>+++ b/net/ipv6/netfilter/ip6_tables.c
>@@ -387,9 +387,7 @@ ip6t_do_table(struct sk_buff *skb,
> 				goto no_match;
> 		}
> 
>-		ADD_COUNTER(e->counters,
>-			    ntohs(ipv6_hdr(skb)->payload_len) +
>-			    sizeof(struct ipv6hdr), 1);
>+		ADD_COUNTER(e->counters, skb->len, 1);
> 
> 		t = ip6t_get_target_c(e);
> 		IP_NF_ASSERT(t->u.kernel.target);
>--

I wonder how this fares with trailing padding or data, like, when
you have a standard v4/v6 packet created in a raw socket, and append
a bunch of \0s to it.


^ permalink raw reply

* Re: [PATCH] xt_quota: don't copy quota back to userspace
From: Changli Gao @ 2010-07-23  5:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, David S. Miller, netfilter-devel, netdev
In-Reply-To: <1279863633.2482.22.camel@edumazet-laptop>

On Fri, Jul 23, 2010 at 1:40 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Reading again your patch, I understand only the Changelog is wrong.
>
> We want to copy quota back to userspace, as specified when rule was
> setup (so that iptables-save works)
>
> The real thing you are doing is that we dont change the initial quota
> during packet processing, only the private quota, shared by all cpus.
>
> Before the patch , iptables -nvL could report an old and not accurate
> quota value.
>
> After the patch, iptables -nvL reports the initial quota value, not the
> actual value.
>

Yes. This module was expected to report the current quota value, when
iptables -nvL or iptables-save  was executed. It seems my patch
changes its ABI. However, report the initial value is also meaningful,
and it keeps the same behavior as the other modules do.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: IPv6 Anycast?
From: Mikael Abrahamsson @ 2010-07-23  5:51 UTC (permalink / raw)
  To: Stuart Sheldon; +Cc: netdev
In-Reply-To: <4C491AD4.9070103@actusa.net>

On Thu, 22 Jul 2010, Stuart Sheldon wrote:

> Yea, I'm sure...
>
> We use Linux for routers as well as servers and workstations. Since I
> sent this I've discovered that by default, when a Linux system has IPv6
> forwarding turned on, it adds the <network>::/64 anycast router
> addresses on all the interfaces (as per rfc 2526).
>
> What I'm actually looking to do is (change / add) other addresses the to
> anycast6 list to work in an existing configuration that does not use the
> rfc 2526 anycast router address.
>
> Is there a command line method of setting up these anycast addresses?

In routers, this is done by adding the IP address to a loopback interface 
and announcing the address using a routing protocol, I don't see why this 
can't be done on a linux box?

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply

* Re: [PATCH] LSM: Add post recvmsg() hook.
From: David Miller @ 2010-07-23  5:44 UTC (permalink / raw)
  To: penguin-kernel
  Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, paul.moore, netdev,
	linux-security-module
In-Reply-To: <201007230022.o6N0MKLt053955@www262.sakura.ne.jp>

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Fri, 23 Jul 2010 09:22:20 +0900

> David Miller wrote:
>> The fact is going to remain that you will be unable to return data
>> from recvmsg() to a blocking socket when ->poll() returns true even
>> though data is in fact there in the socket receive queue.
>> 
>> This is something that the existing LSM hooks do not do.
> 
> This is something that the existing security_socket_recvmsg() hook does do.
> SELinux is unable to return data from recvmsg() to a blocking socket when
> ->poll() returns true even though data is in fact there in the socket receive
> queue.
> We agreed below situation, didn't we?

Existing LSM hook returns an error early, as if the user passed in
incorrect parameters or similar.

It's completely stateless and dependent upon purely the labels
associated with state visible on recvmsg() entry, and independent
of other things such as attributes in the packets contained in
the socket's receive queue.

^ permalink raw reply

* Re: [PATCH] xt_quota: don't copy quota back to userspace
From: Changli Gao @ 2010-07-23  5:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, David S. Miller, netfilter-devel, netdev
In-Reply-To: <1279862843.2482.16.camel@edumazet-laptop>

On Fri, Jul 23, 2010 at 1:27 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 23 juillet 2010 à 12:54 +0800, Changli Gao a écrit :
>> This patch should be applied after my another patch:
>> http://patchwork.ozlabs.org/patch/59729/
>>
>> xt_quota: don't copy quota back to userspace
>>
>> In nowadays, table entries are per-cpu variables, so it don't make any sense to
>> copy quota back to one of the variable instances. To keep things simple, this
>> patch undo the copy.
>>
>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>
> This looks the wrong way to fix this problem.
>
> Also Changli, could you please _not_ include the title of your patches
> inside the Changelog ? This is useless.
>

Thanks. I remembered  some document mentioned a summary line was
needed. I'll follow your advice.


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] xt_quota: don't copy quota back to userspace
From: Eric Dumazet @ 2010-07-23  5:40 UTC (permalink / raw)
  To: Changli Gao; +Cc: Patrick McHardy, David S. Miller, netfilter-devel, netdev
In-Reply-To: <1279862843.2482.16.camel@edumazet-laptop>

Le vendredi 23 juillet 2010 à 07:27 +0200, Eric Dumazet a écrit :
> Le vendredi 23 juillet 2010 à 12:54 +0800, Changli Gao a écrit :
> > This patch should be applied after my another patch:
> > http://patchwork.ozlabs.org/patch/59729/
> > 
> > xt_quota: don't copy quota back to userspace
> > 
> > In nowadays, table entries are per-cpu variables, so it don't make any sense to
> > copy quota back to one of the variable instances. To keep things simple, this
> > patch undo the copy.
> > 
> > Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> 
> This looks the wrong way to fix this problem.
> 
> Also Changli, could you please _not_ include the title of your patches
> inside the Changelog ? This is useless.
> 

Reading again your patch, I understand only the Changelog is wrong.

We want to copy quota back to userspace, as specified when rule was
setup (so that iptables-save works)

The real thing you are doing is that we dont change the initial quota
during packet processing, only the private quota, shared by all cpus.

Before the patch , iptables -nvL could report an old and not accurate
quota value.

After the patch, iptables -nvL reports the initial quota value, not the
actual value.



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH v3 1/5] irq: add tracepoint to softirq_raise
From: KOSAKI Motohiro @ 2010-07-23  5:34 UTC (permalink / raw)
  To: Neil Horman
  Cc: kosaki.motohiro, Koki Sanagi, netdev, linux-kernel, davem,
	kaneshige.kenji, izumi.taku, laijs, scott.a.mcmillan, rostedt,
	eric.dumazet, fweisbec, mathieu.desnoyers
In-Reply-To: <20100721135654.GE21259@hmsreliant.think-freely.org>

> On Wed, Jul 21, 2010 at 10:01:34PM +0900, KOSAKI Motohiro wrote:
> > > > >>  #endif /*  _TRACE_IRQ_H */
> > > > >> diff --git a/kernel/softirq.c b/kernel/softirq.c
> > > > >> index 825e112..6790599 100644
> > > > >> --- a/kernel/softirq.c
> > > > >> +++ b/kernel/softirq.c
> > > > >> @@ -215,9 +215,9 @@ restart:
> > > > >>  			int prev_count = preempt_count();
> > > > >>  			kstat_incr_softirqs_this_cpu(h - softirq_vec);
> > > > >>  
> > > > >> -			trace_softirq_entry(h, softirq_vec);
> > > > >> +			trace_softirq_entry(h - softirq_vec);
> > > > >>  			h->action(h);
> > > > >> -			trace_softirq_exit(h, softirq_vec);
> > > > >> +			trace_softirq_exit(h - softirq_vec);
> > > > > 
> > > > > You're loosing information here by reducing the numbers of parameters in this
> > > > > tracepoint.  How many other tracepoint scripts rely on having both pointers
> > > > > handy?  Why not just do the pointer math inside your tracehook instead?
> > > > 
> > > > In __raise_softirq_irqoff macro there is no method to refer softirq_vec, so it
> > > > can't use softirq DECLARE_EVENT_CLASS as is.
> > > > Currently,  there is no script using softirq_entry or softirq_exit.
> > > > 
> > > That shouldn't matter, just pass in NULL for softirq_vec in
> > > __raise_softirq_irqoff as the second argument to the trace function.  You may
> > > need to fix up the class definition so that the assignment or printk doesn't try
> > > to dereference that pointer when its NULL, but thats easy enough, and it avoids
> > > breaking any other perf scripts floating out there.
> > 
> > please see 5 lines above. we already have 'h - softirq_vec' calculation in
> > this function. so, Sanagi-san's change don't makes any overhead.
> > 
> > So, if the overhead is zero, I'd prefer simplest tracepoint definition :)
> > 
> I never complained about performance here, I complained about information loss.
> You have a tracepoint that provides two arguments here, and you're eliminating
> one of them.  That will potentially break other users of this tracepoint.  I
> understand we don't normally care about that with tracepoints as much, but if we
> can avoid it, why don't we?

I see. I have no objection.

Thanks.

^ permalink raw reply

* Re: [PATCH] xt_quota: don't copy quota back to userspace
From: Eric Dumazet @ 2010-07-23  5:27 UTC (permalink / raw)
  To: Changli Gao; +Cc: Patrick McHardy, David S. Miller, netfilter-devel, netdev
In-Reply-To: <1279860845-7177-1-git-send-email-xiaosuo@gmail.com>

Le vendredi 23 juillet 2010 à 12:54 +0800, Changli Gao a écrit :
> This patch should be applied after my another patch:
> http://patchwork.ozlabs.org/patch/59729/
> 
> xt_quota: don't copy quota back to userspace
> 
> In nowadays, table entries are per-cpu variables, so it don't make any sense to
> copy quota back to one of the variable instances. To keep things simple, this
> patch undo the copy.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

This looks the wrong way to fix this problem.

Also Changli, could you please _not_ include the title of your patches
inside the Changelog ? This is useless.


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next-2.6] net: pskb_expand_head() optimization
From: Eric Dumazet @ 2010-07-23  5:09 UTC (permalink / raw)
  To: Andrea Shepard, David Miller; +Cc: netdev
In-Reply-To: <20100722191234.GA832@cronus.persephoneslair.org>

Le jeudi 22 juillet 2010 à 12:12 -0700, Andrea Shepard a écrit :
> Make pskb_expand_head() check ip_summed to make sure csum_start is really
> csum_start and not csum before adjusting it.
> 
> This fixes a bug I encountered using a Sun Quad-Fast Ethernet card and VLANs.
> On my configuration, the sunhme driver produces skbs with differing amounts
> of headroom on receive depending on the packet size.  See line 2030 of
> drivers/net/sunhme.c; packets smaller than RX_COPY_THRESHOLD have 52 bytes
> of headroom but packets larger than that cutoff have only 20 bytes.
> 
> When these packets reach the VLAN driver, vlan_check_reorder_header()
> calls skb_cow(), which, if the packet has less than NET_SKB_PAD (== 32) bytes
> of headroom, uses pskb_expand_head() to make more.
> 
> Then, pskb_expand_head() needs to adjust a lot of offsets into the skb,
> including csum_start.  Since csum_start is a union with csum, if the packet
> has a valid csum value this will corrupt it, which was the effect I observed.
> The sunhme hardware computes receive checksums, so the skbs would be created
> by the driver with ip_summed == CHECKSUM_COMPLETE and a valid csum field, and
> then pskb_expand_head() would corrupt the csum field, leading to an "hw csum
> error" message later on, for example in icmp_rcv() for pings larger than the
> sunhme RX_COPY_THRESHOLD.
> 
> On the basis of the comment at the beginning of include/linux/skbuff.h,
> I believe that the csum_start skb field is only meaningful if ip_csummed is
> CSUM_PARTIAL, so this patch makes pskb_expand_head() adjust it only in that
> case to avoid corrupting a valid csum value.
> 
> Please see my more in-depth disucssion of tracking down this bug for
> more details if you like:
> 
> http://puellavulnerata.livejournal.com/112186.html
> http://puellavulnerata.livejournal.com/112567.html
> http://puellavulnerata.livejournal.com/112891.html
> http://puellavulnerata.livejournal.com/113096.html
> http://puellavulnerata.livejournal.com/113591.html
> 
> I am not subscribed to this list, so please CC me on replies.
> 

Excellent Changelog Andrea, thanks !

Reviewing pskb_expand_head(), I see it copy the whole struct
skb_shared_info, even if source contains garbage (uninitialized data).

I wonder why it is not triggering kmemcheck alarms

[PATCH net-next-2.6] net: pskb_expand_head() optimization

Move frags[] at the end of struct skb_shared_info, and make
pskb_expand_head() copy only the used part of it instead of whole array.

This should avoid kmemcheck warnings and speedup pskb_expand_head() as
well, avoiding a lot of cache misses.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/skbuff.h |    3 ++-
 net/core/skbuff.c      |    2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f5aa87e..d89876b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -202,10 +202,11 @@ struct skb_shared_info {
 	 */
 	atomic_t	dataref;
 
-	skb_frag_t	frags[MAX_SKB_FRAGS];
 	/* Intermediate layers must ensure that destructor_arg
 	 * remains valid until skb destructor */
 	void *		destructor_arg;
+	/* must be last field, see pskb_expand_head() */
+	skb_frag_t	frags[MAX_SKB_FRAGS];
 };
 
 /* We divide dataref into two halves.  The higher 16 bits hold references
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 76d33ca..7da58a2 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -817,7 +817,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	memcpy(data + nhead, skb->head, skb->tail - skb->head);
 #endif
 	memcpy(data + size, skb_end_pointer(skb),
-	       sizeof(struct skb_shared_info));
+	       offsetof(struct skb_shared_info, frags[skb_shinfo(skb)->nr_frags]));
 
 	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
 		get_page(skb_shinfo(skb)->frags[i].page);



^ permalink raw reply related

* linux-next: manual merge of the staging-next tree with the net tree
From: Stephen Rothwell @ 2010-07-23  5:03 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-next, linux-kernel, Eric Dumazet, David Miller, netdev,
	Sven Eckelmann

[-- Attachment #1: Type: text/plain, Size: 574 bytes --]

Hi Greg,

Today's linux-next merge of the staging-next tree got a conflict in
drivers/staging/batman-adv/hard-interface.c between commit
28172739f0a276eb8d6ca917b3974c2edb036da3 ("net: fix 64 bit counters on 32
bit arches") from the net tree and commit
a1a38cad4c71bc08403b204fbe0ba98b4447f8bf ("Staging: batman-adv: Don't
increment stats of foreign device") from the staging-next tree.

The latter just removes the code fixed by the former, so I did that.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* [PATCH] xt_quota: don't copy quota back to userspace
From: Changli Gao @ 2010-07-23  4:54 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, netfilter-devel, netdev, Changli Gao

This patch should be applied after my another patch:
http://patchwork.ozlabs.org/patch/59729/

xt_quota: don't copy quota back to userspace

In nowadays, table entries are per-cpu variables, so it don't make any sense to
copy quota back to one of the variable instances. To keep things simple, this
patch undo the copy.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/linux/netfilter/xt_quota.h |    2 +-
 net/netfilter/xt_quota.c           |    2 --
 2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/include/linux/netfilter/xt_quota.h b/include/linux/netfilter/xt_quota.h
index 8dc89df..b0d28c6 100644
--- a/include/linux/netfilter/xt_quota.h
+++ b/include/linux/netfilter/xt_quota.h
@@ -11,9 +11,9 @@ struct xt_quota_priv;
 struct xt_quota_info {
 	u_int32_t		flags;
 	u_int32_t		pad;
+	aligned_u64		quota;
 
 	/* Used internally by the kernel */
-	aligned_u64		quota;
 	struct xt_quota_priv	*master;
 };
 
diff --git a/net/netfilter/xt_quota.c b/net/netfilter/xt_quota.c
index 304b1fd..70eb2b4 100644
--- a/net/netfilter/xt_quota.c
+++ b/net/netfilter/xt_quota.c
@@ -36,8 +36,6 @@ quota_mt(const struct sk_buff *skb, struct xt_action_param *par)
 		/* we do not allow even small packets from now on */
 		priv->quota = 0;
 	}
-	/* Copy quota back to matchinfo so that iptables can display it */
-	q->quota = priv->quota;
 	spin_unlock_bh(&priv->lock);
 
 	return ret;

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox