Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] r8169: use unlimited DMA burst for TX
From: Francois Romieu @ 2012-09-14  5:19 UTC (permalink / raw)
  To: Michal Schmidt; +Cc: netdev, Hayes Wang, Ivan Vecera
In-Reply-To: <1347234926-5263-1-git-send-email-mschmidt@redhat.com>

Michal Schmidt <mschmidt@redhat.com> :
[...]
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

Acked-by: Francois Romieu <romieu@fr.zoreil.com>

-- 
Ueimor

^ permalink raw reply

* linux-next: manual merge of the workqueues tree with the net tree
From: Stephen Rothwell @ 2012-09-14  5:34 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-next, linux-kernel, David Miller, netdev, Karsten Keil

[-- Attachment #1: Type: text/plain, Size: 523 bytes --]

Hi Tejun,

Today's linux-next merge of the workqueues tree got a conflict in
drivers/isdn/mISDN/hwchannel.c between commit 4b921eda5336 ("mISDN: Fix
wrong usage of flush_work_sync while holding locks") from the  tree and
commit 43829731dd37 ("workqueue: deprecate flush[_delayed]_work_sync()")
from the workqueues tree.

The former supercedes the latter (I think) so I used that and can carry
the fix as necessary (no action is required).

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH] bnx2x: fix rx checksum validation for IPv6
From: Eilon Greenstein @ 2012-09-14  5:20 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: Eric Dumazet, netdev, Eric Dumazet, Yaniv Rosner, Yuval Mintz,
	Merav Sicron, Robert Evans, Tom Herbert, Willem de Bruijn,
	David Miller, Havard Skinnemoen
In-Reply-To: <1347578079.8555.141.camel@edumazet-glaptop>

On Fri, 2012-09-14 at 01:14 +0200, Eric Dumazet wrote:
> On Fri, 2012-09-14 at 00:59 +0200, Michal Schmidt wrote:
> > Commit d6cb3e41 "bnx2x: fix checksum validation" caused a performance
> > regression for IPv6. Rx checksum offload does not work. IPv6 packets
> > are passed to the stack with CHECKSUM_NONE.
> > 
> > The hardware obviously cannot perform IP checksum validation for IPv6,
> > because there is no checksum in the IPv6 header. This should not prevent
> > us from setting CHECKSUM_UNNECESSARY.
> > 
> > Tested on BCM57711.
> > 
> > Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
> > ---
> >  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 12 +++++++-----
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> > index af20c6e..e8e97a7 100644
> > --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> > +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> > @@ -662,14 +662,16 @@ void bnx2x_csum_validate(struct sk_buff *skb, union eth_rx_cqe *cqe,
> >  				 struct bnx2x_fastpath *fp,
> >  				 struct bnx2x_eth_q_stats *qstats)
> >  {
> > -	/* Do nothing if no IP/L4 csum validation was done */
> > -
> > +	/* Do nothing if no L4 csum validation was done.
> > +	 * We do not check whether IP csum was validated. For IPv4 we assume
> > +	 * that if the card got as far as validating the L4 csum, it also
> > +	 * validated the IP csum. IPv6 has no IP csum.
> > +	 */
> >  	if (cqe->fast_path_cqe.status_flags &
> > -	    (ETH_FAST_PATH_RX_CQE_IP_XSUM_NO_VALIDATION_FLG |
> > -	     ETH_FAST_PATH_RX_CQE_L4_XSUM_NO_VALIDATION_FLG))
> > +	    ETH_FAST_PATH_RX_CQE_L4_XSUM_NO_VALIDATION_FLG)
> >  		return;
> >  
> > -	/* If both IP/L4 validation were done, check if an error was found. */
> > +	/* If L4 validation was done, check if an error was found. */
> >  
> >  	if (cqe->fast_path_cqe.type_error_flags &
> >  	    (ETH_FAST_PATH_RX_CQE_IP_BAD_XSUM_FLG |
> 
> Thanks for fixing this bug !
> 
> Acked-by: Eric Dumazet <edumazet@google.com>

Indeed - thanks Michal!

Acked-by: Eilon Greenstein <eilong@broadcom.com>

^ permalink raw reply

* Re: [PATCH] netfilter: Allow xt_nat.c and x_tables.c to compiled in
From: Cong Wang @ 2012-09-14  3:33 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev
In-Reply-To: <87627hfi69.fsf@xmission.com>

On Fri, 14 Sep 2012 at 02:32 GMT, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> xt_init in x_tables.c must be called before xt_nat_init in xt_nat.c
> Reorder the makefile so that x_tables.o comes before xt_nat.o in
> netfilter.o.
>
> This allows me to built a kernel with both of these modules compiled in.
>

There is a patch to fix the same issue:
http://1984.lsi.us.es/git/nf-next/commit/?id=00545bec9412d130c77f72a08d6c8b6ad21d4a1


^ permalink raw reply

* [PATCH] netfilter: Allow xt_nat.c and x_tables.c to compiled in
From: Eric W. Biederman @ 2012-09-14  2:32 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, netfilter-devel, Patrick McHardy


xt_init in x_tables.c must be called before xt_nat_init in xt_nat.c
Reorder the makefile so that x_tables.o comes before xt_nat.o in
netfilter.o.

This allows me to built a kernel with both of these modules compiled in.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 net/netfilter/Makefile |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 98244d4..1f652b6 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -43,6 +43,9 @@ obj-$(CONFIG_NF_CONNTRACK_SANE) += nf_conntrack_sane.o
 obj-$(CONFIG_NF_CONNTRACK_SIP) += nf_conntrack_sip.o
 obj-$(CONFIG_NF_CONNTRACK_TFTP) += nf_conntrack_tftp.o
 
+# generic X tables 
+obj-$(CONFIG_NETFILTER_XTABLES) += x_tables.o xt_tcpudp.o
+
 nf_nat-y	:= nf_nat_core.o nf_nat_proto_unknown.o nf_nat_proto_common.o \
 		   nf_nat_proto_udp.o nf_nat_proto_tcp.o nf_nat_helper.o
 
@@ -64,9 +67,6 @@ obj-$(CONFIG_NF_NAT_TFTP) += nf_nat_tftp.o
 # transparent proxy support
 obj-$(CONFIG_NETFILTER_TPROXY) += nf_tproxy_core.o
 
-# generic X tables 
-obj-$(CONFIG_NETFILTER_XTABLES) += x_tables.o xt_tcpudp.o
-
 # combos
 obj-$(CONFIG_NETFILTER_XT_MARK) += xt_mark.o
 obj-$(CONFIG_NETFILTER_XT_CONNMARK) += xt_connmark.o
-- 
1.7.5.4


^ permalink raw reply related

* linux-next: manual merge of the net-next tree with the net tree
From: Stephen Rothwell @ 2012-09-14  1:18 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: linux-next, linux-kernel, Eric Dumazet, Eric W. Biederman

[-- Attachment #1: Type: text/plain, Size: 1713 bytes --]

Hi all,

Today's linux-next merge of the net-next tree got a conflict in
net/netfilter/nfnetlink_log.c between commit 0626af313957 ("netfilter:
take care of timewait sockets") from the  tree and commit 9eea9515cb5f
("userns: nfnetlink_log: Report socket uids in the log sockets user
namespace") from the net-next tree.

Just context changes. I fixed it up (see below) and can carry the fix as
necessary (no action is required).

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc net/netfilter/nfnetlink_log.c
index 5cfb5be,8cb67c4..0000000
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@@ -500,14 -501,16 +502,17 @@@ __build_packet_message(struct nfulnl_in
  	}
  
  	/* UID */
 -	if (skb->sk) {
 -		read_lock_bh(&skb->sk->sk_callback_lock);
 -		if (skb->sk->sk_socket && skb->sk->sk_socket->file) {
 -			struct file *file = skb->sk->sk_socket->file;
 +	sk = skb->sk;
 +	if (sk && sk->sk_state != TCP_TIME_WAIT) {
 +		read_lock_bh(&sk->sk_callback_lock);
 +		if (sk->sk_socket && sk->sk_socket->file) {
 +			struct file *file = sk->sk_socket->file;
- 			__be32 uid = htonl(file->f_cred->fsuid);
- 			__be32 gid = htonl(file->f_cred->fsgid);
+ 			__be32 uid = htonl(from_kuid_munged(inst->peer_user_ns,
+ 							    file->f_cred->fsuid));
+ 			__be32 gid = htonl(from_kgid_munged(inst->peer_user_ns,
+ 							    file->f_cred->fsgid));
+ 			/* need to unlock here since NLA_PUT may goto */
 -			read_unlock_bh(&skb->sk->sk_callback_lock);
 +			read_unlock_bh(&sk->sk_callback_lock);
  			if (nla_put_be32(inst->skb, NFULA_UID, uid) ||
  			    nla_put_be32(inst->skb, NFULA_GID, gid))
  				goto nla_put_failure;

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* linux-next: manual merge of the net-next tree with the net tree
From: Stephen Rothwell @ 2012-09-14  1:17 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: linux-next, linux-kernel, Eric W. Biederman, Eric Dumazet

[-- Attachment #1: Type: text/plain, Size: 1409 bytes --]

Hi all,

Today's linux-next merge of the net-next tree got a conflict in
net/netfilter/xt_LOG.c between commit 0626af313957 ("netfilter: take care
of timewait sockets") from the net tree and commit 8c6e2a941ae7 ("userns:
Convert xt_LOG to print socket kuids and kgids as uids and gids") from
the net-next tree.

I fixed it up (I think - see below) and can carry the fix as necessary
(no action is required).

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc net/netfilter/xt_LOG.c
index 91e9af4,02a2bf4..0000000
--- a/net/netfilter/xt_LOG.c
+++ b/net/netfilter/xt_LOG.c
@@@ -145,19 -145,6 +145,21 @@@ static int dump_tcp_header(struct sbuf
  	return 0;
  }
  
 +static void dump_sk_uid_gid(struct sbuff *m, struct sock *sk)
 +{
 +	if (!sk || sk->sk_state == TCP_TIME_WAIT)
 +		return;
 +
 +	read_lock_bh(&sk->sk_callback_lock);
 +	if (sk->sk_socket && sk->sk_socket->file) {
++		const struct cred *cred = sk->sk_socket->file->f_cred;
 +		sb_add(m, "UID=%u GID=%u ",
- 			sk->sk_socket->file->f_cred->fsuid,
- 			sk->sk_socket->file->f_cred->fsgid);
++			from_kuid_munged(&init_user_ns, cred->fsuid),
++			from_kgid_munged(&init_user_ns, cred->fsgid));
++	}
 +	read_unlock_bh(&sk->sk_callback_lock);
 +}
 +
  /* One level of recursion won't kill us */
  static void dump_ipv4_packet(struct sbuff *m,
  			const struct nf_loginfo *info,

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] r8169: use unlimited DMA burst for TX
From: Michal Schmidt @ 2012-09-13 23:27 UTC (permalink / raw)
  To: 'Francois Romieu'
  Cc: hayeswang, 'David Miller', netdev, ivecera
In-Reply-To: <CAF7E57619E34A17A56F6D13097342EA@realtek.com.tw>

On 09/11/2012 10:09 AM, hayeswang wrote:
> [Francois Romieu wrote:]
>> Hayes, should we:
>> - mimic Realtek's 8168, 8169 and 810x drivers ?
>> - always set TX_DMA_BURST at the max value ?
>> - do something different (per chipset) ?
>
> Our hw engineer suggets to set unlimited for both TX_DMA_BURST and RX_DMA_BURST
> for all chipsets.

Francois,
as this is exactly what the patch does, would you give an ACK?

Michal

^ permalink raw reply

* Re: [PATCH] bnx2x: fix rx checksum validation for IPv6
From: Eric Dumazet @ 2012-09-13 23:14 UTC (permalink / raw)
  To: Michal Schmidt
  Cc: netdev, Eilon Greenstein, Eric Dumazet, Yaniv Rosner, Yuval Mintz,
	Merav Sicron, Robert Evans, Tom Herbert, Willem de Bruijn,
	David Miller, Havard Skinnemoen
In-Reply-To: <1347577184-8417-1-git-send-email-mschmidt@redhat.com>

On Fri, 2012-09-14 at 00:59 +0200, Michal Schmidt wrote:
> Commit d6cb3e41 "bnx2x: fix checksum validation" caused a performance
> regression for IPv6. Rx checksum offload does not work. IPv6 packets
> are passed to the stack with CHECKSUM_NONE.
> 
> The hardware obviously cannot perform IP checksum validation for IPv6,
> because there is no checksum in the IPv6 header. This should not prevent
> us from setting CHECKSUM_UNNECESSARY.
> 
> Tested on BCM57711.
> 
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> index af20c6e..e8e97a7 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
> @@ -662,14 +662,16 @@ void bnx2x_csum_validate(struct sk_buff *skb, union eth_rx_cqe *cqe,
>  				 struct bnx2x_fastpath *fp,
>  				 struct bnx2x_eth_q_stats *qstats)
>  {
> -	/* Do nothing if no IP/L4 csum validation was done */
> -
> +	/* Do nothing if no L4 csum validation was done.
> +	 * We do not check whether IP csum was validated. For IPv4 we assume
> +	 * that if the card got as far as validating the L4 csum, it also
> +	 * validated the IP csum. IPv6 has no IP csum.
> +	 */
>  	if (cqe->fast_path_cqe.status_flags &
> -	    (ETH_FAST_PATH_RX_CQE_IP_XSUM_NO_VALIDATION_FLG |
> -	     ETH_FAST_PATH_RX_CQE_L4_XSUM_NO_VALIDATION_FLG))
> +	    ETH_FAST_PATH_RX_CQE_L4_XSUM_NO_VALIDATION_FLG)
>  		return;
>  
> -	/* If both IP/L4 validation were done, check if an error was found. */
> +	/* If L4 validation was done, check if an error was found. */
>  
>  	if (cqe->fast_path_cqe.type_error_flags &
>  	    (ETH_FAST_PATH_RX_CQE_IP_BAD_XSUM_FLG |

Thanks for fixing this bug !

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [net-next.git 3/8 (V2)] stmmac: add the initial tx coalesce schema
From: Ben Hutchings @ 2012-09-13 23:11 UTC (permalink / raw)
  To: David Miller; +Cc: peppe.cavallaro, netdev
In-Reply-To: <20120913.173727.314155374058895289.davem@davemloft.net>

On Thu, 2012-09-13 at 17:37 -0400, David Miller wrote:
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Thu, 13 Sep 2012 22:10:50 +0100
> 
> > On Thu, 2012-09-13 at 16:46 -0400, David Miller wrote:
> >> From: Ben Hutchings <bhutchings@solarflare.com>
> >> Date: Thu, 13 Sep 2012 21:42:51 +0100
> >> 
> >> Well written NAPI drivers never need to disable hardware interrupts
> >> in their ->poll() method and it's callers, neither should you.
> > 
> > Perhaps you should get round to reviewing netpoll, because it does
> > exactly this.
> 
> Then I don't understand the point you're trying to make.
> 
> Hardware interrupt disabling has absolutely no place in the
> NAPI polling fast paths.
> 
> If NAPI drivers can't be implemented without hardware interrupt
> toggling in ->poll(), we've failed.

Right.

The problem being that NAPI poll functions *are* sometimes called in
hardware interrupt context.  Thus, any spinlock that may be taken by a
NAPI handler, may well need to be taken with spinlock_irq or
spinlock_irqsave elsewhere.  (This is horrible and I think it's well
past time that we ripped the NAPI polling out of netpoll.)

I think you're right that stmmac_tx() (completion handler?) doesn't need
to disable hardware interrupts, but sadly stmmac_xmit() does right now
unless Giuseppe can work out how to make their interaction lockless.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [PATCH] bnx2x: fix rx checksum validation for IPv6
From: Michal Schmidt @ 2012-09-13 22:59 UTC (permalink / raw)
  To: netdev
  Cc: Eilon Greenstein, Eric Dumazet, Yaniv Rosner, Yuval Mintz,
	Merav Sicron, Robert Evans, Tom Herbert, Willem de Bruijn,
	David Miller

Commit d6cb3e41 "bnx2x: fix checksum validation" caused a performance
regression for IPv6. Rx checksum offload does not work. IPv6 packets
are passed to the stack with CHECKSUM_NONE.

The hardware obviously cannot perform IP checksum validation for IPv6,
because there is no checksum in the IPv6 header. This should not prevent
us from setting CHECKSUM_UNNECESSARY.

Tested on BCM57711.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index af20c6e..e8e97a7 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -662,14 +662,16 @@ void bnx2x_csum_validate(struct sk_buff *skb, union eth_rx_cqe *cqe,
 				 struct bnx2x_fastpath *fp,
 				 struct bnx2x_eth_q_stats *qstats)
 {
-	/* Do nothing if no IP/L4 csum validation was done */
-
+	/* Do nothing if no L4 csum validation was done.
+	 * We do not check whether IP csum was validated. For IPv4 we assume
+	 * that if the card got as far as validating the L4 csum, it also
+	 * validated the IP csum. IPv6 has no IP csum.
+	 */
 	if (cqe->fast_path_cqe.status_flags &
-	    (ETH_FAST_PATH_RX_CQE_IP_XSUM_NO_VALIDATION_FLG |
-	     ETH_FAST_PATH_RX_CQE_L4_XSUM_NO_VALIDATION_FLG))
+	    ETH_FAST_PATH_RX_CQE_L4_XSUM_NO_VALIDATION_FLG)
 		return;
 
-	/* If both IP/L4 validation were done, check if an error was found. */
+	/* If L4 validation was done, check if an error was found. */
 
 	if (cqe->fast_path_cqe.type_error_flags &
 	    (ETH_FAST_PATH_RX_CQE_IP_BAD_XSUM_FLG |
-- 
1.7.11.4

^ permalink raw reply related

* [PATCH] ipconfig: Inform user if carrier is not ready
From: Erwan Velu @ 2012-09-13 21:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120913.164525.1171098883605242394.davem@davemloft.net>

From: Erwan Velu <erwanaliasr1@gmail.com>

While using the ip= option at the cmdline, the kernel can hold the boot
process for 2 minutes (CONF_CARRIER_TIMEOUT) if the carrier is not
present.

While waiting the carrier, user is not informed about this situation and
so could think the kernel is frozen.

This patch is just adding a simple message every second telling we are
waiting the carrier to come up.
---
  net/ipv4/ipconfig.c |    8 ++++++++
  1 file changed, 8 insertions(+)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 67e8a6b..d9f34b7 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -205,6 +205,7 @@ static int __init ic_open_devs(void)
      struct net_device *dev;
      unsigned short oflags;
      unsigned long start;
+    unsigned int loops=0;

      last = &ic_first_dev;
      rtnl_lock();
@@ -266,6 +267,13 @@ static int __init ic_open_devs(void)
              if (ic_is_init_dev(dev) && netif_carrier_ok(dev))
                  goto have_carrier;

+        loops++;
+        /* This loop is blocking the boot process until we get the 
carrier or reach the timeout.
+         * We have to inform the user about the situation as it could 
look like a kernel freeze.
+         * Every second, we display a short message indicating we wait 
the carrier */
+        if ((loops % 1000) == 0) {
+            pr_info("IP-Config: Waiting Carrier (%d/%d):\n",loops / 
1000, CONF_CARRIER_TIMEOUT / 1000);
+        }
          msleep(1);
      }
  have_carrier:
-- 
1.7.10

^ permalink raw reply related

* Re: [PATCH v2 net-next] ipv6: prevent useless neigh alloc on PTP or lo routes
From: Eric Dumazet @ 2012-09-13 21:51 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, lorenzo, maze, therbert, willemb
In-Reply-To: <20120913.171305.713716058425991240.davem@davemloft.net>

On Thu, 2012-09-13 at 17:13 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 13 Sep 2012 05:15:58 +0200
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > We have special handling of SIT devices in addrconf_prefix_route()
> > to avoid allocating a neighbour for each destination.
> > 
> > If routing entry is :
> > 
> > ip -6 route add 2001:db8::/64 dev sit1
> > 
> > Then the kernel will create a new route and neighbour for every new
> > address under 2001:db8::/64 that we send a packet to 
> > (potentially, 2^64 routes and neighbours).
> > 
> > Under load, we immediately get the infamous "Neighbour table overflow"
> > message and machine eventually crash.
> > 
> > This does not happen if we specify a next-hop explicitly, like so:
> > 
> > ip -6 route add 2001:db8::/64 via fe80:: dev sit1
> > 
> > Same problem happens if we use routes to loopback.
> > 
> > Idea of this patch is to move existing SIT related code from
> > addrconf_prefix_route() to a more generic one in ip6_route_add(). 
> > 
> > This permits ip6_pol_route() to clone route instead of calling
> > rt6_alloc_cow() and allocate a neighbour.
> > 
> > Many thanks to Lorenzo for his help and suggestions.
> > 
> > Reported-by: Lorenzo Colitti <lorenzo@google.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> This patch lacks the desired effect without your clone-caching-removal
> patch, which I will not apply.
> 
> Therefore it doesn't make any sense to apply this either, as it won't
> fix the stated problem.
> 
> Doing a proper conversion of ipv6 to ref-count-less neigh's will solve
> this problem and allow all of the clone/cow caching code to be elided
> for the majority of cases and is the correct approach to these problems.


This seems quite different patches to me. Addressing two problems.

This patch is about not allocating a neighbour for a given route, but
reusing one existing neighbour. What could be possibly wrong with this,
since all neighbours are exactly the sames ?

I understand we can make it better with big surgery later, but right now
it seems quite reasonable.

This is already done (in part) for SIT devices, which are a subclass of
PtP device.

For the other patch, it seems problem was introduced in 2.6.38 when
CLONE_OFFLINK_ROUTE was removed in commit d80bc0fd26.

^ permalink raw reply

* [PATCH] xfrm_user: return error pointer instead of NULL
From: Mathias Krause @ 2012-09-13 21:41 UTC (permalink / raw)
  To: David S. Miller
  Cc: Steffen Klassert, netdev, linux-kernel, Mathias Krause, stable

When dump_one_state() returns an error, e.g. because of a too small
buffer to dump the whole xfrm state, xfrm_state_netlink() returns NULL
instead of an error pointer. But its callers expect an error pointer
and therefore continue to operate on a NULL skbuff.

This could lead to a privilege escalation (execution of user code in
kernel context) if the attacker has CAP_NET_ADMIN and is able to map
address 0.

Cc: stable@vger.kernel.org
Signed-off-by: Mathias Krause <minipli@googlemail.com>
---

A test case can be provided on request.

 net/xfrm/xfrm_user.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index e75d8e4..dac08e2 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -878,6 +878,7 @@ static struct sk_buff *xfrm_state_netlink(struct sk_buff *in_skb,
 {
 	struct xfrm_dump_info info;
 	struct sk_buff *skb;
+	int err;
 
 	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
 	if (!skb)
@@ -888,9 +889,10 @@ static struct sk_buff *xfrm_state_netlink(struct sk_buff *in_skb,
 	info.nlmsg_seq = seq;
 	info.nlmsg_flags = 0;
 
-	if (dump_one_state(x, 0, &info)) {
+	err = dump_one_state(x, 0, &info);
+	if (err) {
 		kfree_skb(skb);
-		return NULL;
+		return ERR_PTR(err);
 	}
 
 	return skb;
-- 
1.7.10.4

^ permalink raw reply related

* Re: [net-next.git 3/8 (V2)] stmmac: add the initial tx coalesce schema
From: David Miller @ 2012-09-13 21:37 UTC (permalink / raw)
  To: bhutchings; +Cc: peppe.cavallaro, netdev
In-Reply-To: <1347570650.13258.40.camel@deadeye.wl.decadent.org.uk>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Thu, 13 Sep 2012 22:10:50 +0100

> On Thu, 2012-09-13 at 16:46 -0400, David Miller wrote:
>> From: Ben Hutchings <bhutchings@solarflare.com>
>> Date: Thu, 13 Sep 2012 21:42:51 +0100
>> 
>> Well written NAPI drivers never need to disable hardware interrupts
>> in their ->poll() method and it's callers, neither should you.
> 
> Perhaps you should get round to reviewing netpoll, because it does
> exactly this.

Then I don't understand the point you're trying to make.

Hardware interrupt disabling has absolutely no place in the
NAPI polling fast paths.

If NAPI drivers can't be implemented without hardware interrupt
toggling in ->poll(), we've failed.

^ permalink raw reply

* [PATCH] tcp: restore rcv_wscale in a repair mode
From: Andrew Vagin @ 2012-09-13 21:28 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Pavel Emelyanov, Andrew Vagin, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy

rcv_wscale is a symetric parameter with snd_wscale.

Both this parameters are set on a connection handshake.

Without this value a remote window size can not be interpreted correctly,
because a value from a packet should be shifted on rcv_wscale.

And one more thing is that wscale_ok should be set too.

This patch doesn't break a backward compatibility.
If someone uses it in a old scheme, a rcv window
will be restored with the same bug (rcv_wscale = 0).

Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
---
 net/ipv4/tcp.c |   20 ++++++++++++++++----
 1 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index df83d74..ed22cd7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2348,10 +2348,22 @@ static int tcp_repair_options_est(struct tcp_sock *tp,
 			tp->rx_opt.mss_clamp = opt.opt_val;
 			break;
 		case TCPOPT_WINDOW:
-			if (opt.opt_val > 14)
-				return -EFBIG;
-
-			tp->rx_opt.snd_wscale = opt.opt_val;
+			{
+				union {
+					struct {
+						u16 snd_wscale;
+						u16 rcv_wscale;
+					};
+					u32 raw;
+				} val = { .raw = opt.opt_val };
+
+				if (val.snd_wscale > 14 || val.rcv_wscale > 14)
+					return -EFBIG;
+
+				tp->rx_opt.snd_wscale = val.snd_wscale;
+				tp->rx_opt.rcv_wscale = val.rcv_wscale;
+				tp->rx_opt.wscale_ok = 1;
+			}
 			break;
 		case TCPOPT_SACK_PERM:
 			if (opt.opt_val != 0)
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH v2 net-next] ipv6: prevent useless neigh alloc on PTP or lo routes
From: David Miller @ 2012-09-13 21:13 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, lorenzo, maze, therbert, willemb
In-Reply-To: <1347506158.13103.1365.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 13 Sep 2012 05:15:58 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> We have special handling of SIT devices in addrconf_prefix_route()
> to avoid allocating a neighbour for each destination.
> 
> If routing entry is :
> 
> ip -6 route add 2001:db8::/64 dev sit1
> 
> Then the kernel will create a new route and neighbour for every new
> address under 2001:db8::/64 that we send a packet to 
> (potentially, 2^64 routes and neighbours).
> 
> Under load, we immediately get the infamous "Neighbour table overflow"
> message and machine eventually crash.
> 
> This does not happen if we specify a next-hop explicitly, like so:
> 
> ip -6 route add 2001:db8::/64 via fe80:: dev sit1
> 
> Same problem happens if we use routes to loopback.
> 
> Idea of this patch is to move existing SIT related code from
> addrconf_prefix_route() to a more generic one in ip6_route_add(). 
> 
> This permits ip6_pol_route() to clone route instead of calling
> rt6_alloc_cow() and allocate a neighbour.
> 
> Many thanks to Lorenzo for his help and suggestions.
> 
> Reported-by: Lorenzo Colitti <lorenzo@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

This patch lacks the desired effect without your clone-caching-removal
patch, which I will not apply.

Therefore it doesn't make any sense to apply this either, as it won't
fix the stated problem.

Doing a proper conversion of ipv6 to ref-count-less neigh's will solve
this problem and allow all of the clone/cow caching code to be elided
for the majority of cases and is the correct approach to these problems.

^ permalink raw reply

* Re: [net-next.git 3/8 (V2)] stmmac: add the initial tx coalesce schema
From: Ben Hutchings @ 2012-09-13 21:10 UTC (permalink / raw)
  To: David Miller; +Cc: peppe.cavallaro, netdev
In-Reply-To: <20120913.164645.1268091771093604148.davem@davemloft.net>

On Thu, 2012-09-13 at 16:46 -0400, David Miller wrote:
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Thu, 13 Sep 2012 21:42:51 +0100
> 
> > On Thu, 2012-09-13 at 16:23 -0400, David Miller wrote:
> >> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
> >> Date: Tue, 11 Sep 2012 08:55:09 +0200
> >> 
> >> > +	unsigned long flags;
> >> > +
> >> > +	spin_lock_irqsave(&priv->tx_lock, flags);
> >> >  
> >> > -	spin_lock(&priv->tx_lock);
> >> > +	priv->xstats.tx_clean++;
> >> 
> >> You are changing the locking here for the sake of the new timer.
> >> 
> >> But timers run in software interrupt context, so this change is
> >> completely unnecessary since NAPI runs in software interrupt context
> >> as well, and neither timers nor NAPI run in hardware interrupts
> >> context.
> > 
> > NAPI pollers can be called by netpoll in hardware interrupt context, and
> > this driver supports netpoll.
> 
> And the netpoll handler need to make amends to make sure that
> hardware the environment expected by the software interrupt
> code is preserved.
>
> Well written NAPI drivers never need to disable hardware interrupts
> in their ->poll() method and it's callers, neither should you.

Perhaps you should get round to reviewing netpoll, because it does
exactly this.

> I'm not considering your patches until you implement this properly.

These aren't my patches.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] pktgen: fix crash with vlan and packet size less than 46
From: David Miller @ 2012-09-13 21:10 UTC (permalink / raw)
  To: nistrive; +Cc: netdev, benve
In-Reply-To: <1347492769-32409-1-git-send-email-nistrive@cisco.com>

From: Nishank Trivedi <nistrive@cisco.com>
Date: Wed, 12 Sep 2012 16:32:49 -0700

> If vlan option is being specified in the pktgen and packet size
> being requested is less than 46 bytes, despite being illogical
> request, pktgen should not crash the kernel.
> 
> BUG: unable to handle kernel paging request at ffff88021fb82000
> Process kpktgend_0 (pid: 1184, threadinfo ffff880215f1a000, task ffff880218544530)
> Call Trace:
> [<ffffffffa0637cd2>] ? pktgen_finalize_skb+0x222/0x300 [pktgen]
> [<ffffffff814f0084>] ? build_skb+0x34/0x1c0
> [<ffffffffa0639b11>] pktgen_thread_worker+0x5d1/0x1790 [pktgen]
> [<ffffffffa03ffb10>] ? igb_xmit_frame_ring+0xa30/0xa30 [igb]
> [<ffffffff8107ba20>] ? wake_up_bit+0x40/0x40
> [<ffffffff8107ba20>] ? wake_up_bit+0x40/0x40
> [<ffffffffa0639540>] ? spin+0x240/0x240 [pktgen]
> [<ffffffff8107b4e3>] kthread+0x93/0xa0
> [<ffffffff81615de4>] kernel_thread_helper+0x4/0x10
> [<ffffffff8107b450>] ? flush_kthread_worker+0x80/0x80
> [<ffffffff81615de0>] ? gs_change+0x13/0x13
> 
> The root cause of why pktgen is not able to handle this case is due
> to comparison of signed (datalen) and unsigned data (sizeof), which
> eventually passes a huge number to skb_put().
> 
> Signed-off-by: Nishank Trivedi <nistrive@cisco.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] drivers/net: Enable IOMMU pass through for be2net
From: David Miller @ 2012-09-13 21:09 UTC (permalink / raw)
  To: craig.hada
  Cc: netdev, sathya.perla, subbu.seetharaman, ajit.khaparde,
	linux-kernel
In-Reply-To: <505212A3.1030307@hp.com>

From: Craig Hada <craig.hada@hp.com>
Date: Thu, 13 Sep 2012 10:06:43 -0700

> On 9/13/2012 9:27 AM, Hada, Craig M wrote:
>> This patch sets the coherent DMA mask to 64-bit after the be2net
>> driver has been acknowledged that the system is 64-bit DMA
>> capable. The coherent DMA mask is examined by the Intel IOMMU driver
>> to determine whether to allow pass through context mapping for all
>> devices. With this patch, the be2net driver combined with be2net
>> compatible hardware provides comparable performance to the case where
>> vt-d is disabled. The main use case for this change is to decrease the
>> time necessary to copy virtual machine memory during KVM live
>> migration instantiations.
>>
>> This patch was tested on a system that enables the IOMMU in
>> non-coherent mode. Two DMA remapper issues were encountered and both
>> are in the Intel IOMMU driver with the following patches submitted
>> upstream but not yet commited.
>>
>> Patch 1 - DMAR:[fault reason 02] Present bit in context entry is clear
>> https://lkml.org/lkml/2012/6/15/20
> 
> My apologies for posting a truncated link for the above. The correct
> link is https://lkml.org/lkml/2012/6/15/204

First of all you've made this email reply in such a way it didn't
get logged in the patch in patchwork, perhaps because either the
Message-Id got changed or flat-out removed, I can't say for sure.

Secondly, it is not appropriate to install a change that will
knowingly break usage of the device until the IOMMU reaper fixes
actually exist upstream.  This patch is absolutely dependent upon
those fixes, and therefore must only get applied to trees that
have the IOMMU fixes installed.

You must therefore wait for upstream to acquire the fixes, upstream to
get sync'd into the networking trees, and then you apply a patch with
these requirements.

^ permalink raw reply

* Re: [net-next 0/6][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2012-09-13 21:05 UTC (permalink / raw)
  To: davem@davemloft.net
  Cc: netdev@vger.kernel.org, gospo@redhat.com, sassmann@redhat.com,
	Vick, Matthew, Ben Hutchings, Keller, Jacob E, Richard Cochran
In-Reply-To: <1346888106-25638-1-git-send-email-jeffrey.t.kirsher@intel.com>

[-- Attachment #1: Type: text/plain, Size: 1636 bytes --]

On Wed, 2012-09-05 at 16:35 -0700, Kirsher, Jeffrey T wrote:
> This series contains updates to igb (specifically PTP code).
> 
> The following are changes since commit f6fe569fe056388166575af1cfaed0bcbc688305:
>   Revert "usbnet: drop unneeded check for NULL"
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
> 
> Matthew Vick (6):
>   igb: Tidy up wrapping for CONFIG_IGB_PTP.
>   igb: Update PTP function names/variables and locations.
>   igb: Correct PTP support query from ethtool.
>   igb: Store the MAC address in the name in the PTP struct.
>   igb: Prevent dropped Tx timestamps via work items and interrupts.
>   igb: Add 1588 support to I210/I211.
> 
>  drivers/net/ethernet/intel/igb/e1000_defines.h |   8 +
>  drivers/net/ethernet/intel/igb/e1000_regs.h    |   2 +
>  drivers/net/ethernet/intel/igb/igb.h           |  29 +-
>  drivers/net/ethernet/intel/igb/igb_ethtool.c   |  84 +--
>  drivers/net/ethernet/intel/igb/igb_main.c      | 329 +++---------
>  drivers/net/ethernet/intel/igb/igb_ptp.c       | 676 ++++++++++++++++++++-----
>  6 files changed, 708 insertions(+), 420 deletions(-)
> 
> --

Dave-
I see you have set this series to "Changes Requested" in patchworks, and
I am assuming that is from the discussion that occurred on patch 04 of
the series.  That discussion came to the conclusion that changes should
happen in the PTP core, and that the patch is fine as is currently.

If there was something else you want changed, let me/Matthew know so
that we can make the necessary changes.

Cheers,
Jeff

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH 9/9] drivers/isdn/gigaset/common.c: Remove useless kfree
From: David Miller @ 2012-09-13 21:05 UTC (permalink / raw)
  To: tilman
  Cc: peter.senna, hjlipp, kernel-janitors, isdn, gigaset307x-common,
	netdev, linux-kernel
In-Reply-To: <5051ACDB.5030407@imap.cc>

From: Tilman Schmidt <tilman@imap.cc>
Date: Thu, 13 Sep 2012 11:52:27 +0200

> Am 12.09.2012 17:06, schrieb Peter Senna Tschudin:
>> From: Peter Senna Tschudin <peter.senna@gmail.com>
>> 
>> Remove useless kfree() and clean up code related to the removal.
>> 
>> The semantic patch that finds this problem is as follows:
> [...]
>> 
>> Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
> 
> Acked-by: Tilman Schmidt <tilman@imap.cc>

Applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH net-next 2/2] ipv6: dont cache cloned routes
From: David Miller @ 2012-09-13 21:02 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, lorenzo, maze, therbert
In-Reply-To: <1347451307.13103.885.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 12 Sep 2012 14:01:47 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> We can now destroy cloned routes immediately from dst_release() instead
> of depending on garbage collection.
> 
> Set DST_NOCACHE in rt6_alloc_clone() so that :
> 
> 1) we avoid calling ip6_ins_rt() on such routes
> 
> 2) dst_release() can call destroy when refcount becomes 0
> 
> This allows machines to resist to DDOS.
> 
> Reported-by: Lorenzo Colitti <lorenzo@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Maciej Żenczykowski <maze@google.com>
> Cc: Tom Herbert <therbert@google.com>

This current behavior is very much intentional and cannot be removed
so trivially.  The scope of this change is much wider than some DDOS
test.

This change is the moral equivalent of the ipv4 routing cache removal,
but we have not done anything to compensate for the resulting ipv6
performance loss as the routing cache removal changes did.

The insertion of ipv6 route clones into the tree is how the ipv6 code
caches routes.

The only legitimate way to make this change is to revamp ipv6 route
handling properly like we did for ipv4.

This means making it such that, when legitimate, prefixed routes found
directly into the route tree are used directly.

To achieve this you need to:

1) Convert ipv6 to do ref-count-less neighbour handling and not cache
   neighbours in the ipv6 routes, instead doing the lookup on demand
   in ip6_output as we do on the ipv4 side.

2) Stop caching inetpeers in the ipv6 routes.

3) Make ipv6 in-route metrics read-only, again as we already do in
   ipv4.

And so on and so forth, until direct use of prefixed ipv6 routes is
possible.

I really can't even remotely entertain applying this patch, sorry.

^ permalink raw reply

* Re: [PATCH v2] ipv6: replace write lock with read lock when get route info
From: David Miller @ 2012-09-13 20:54 UTC (permalink / raw)
  To: roy.qing.li; +Cc: netdev
In-Reply-To: <1347436741-344-1-git-send-email-roy.qing.li@gmail.com>

From: roy.qing.li@gmail.com
Date: Wed, 12 Sep 2012 15:59:01 +0800

> From: Li RongQing <roy.qing.li@gmail.com>
> 
> geting route info does not write rt->rt6i_table, so replace
> write lock with read lock
> 
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>

I'm surprised it's been like this for so long, but I can't find
any problems with your patch :-)

Applied to net-next, thanks.

^ permalink raw reply

* Re: [PATCH net-next] ipv6: route templates can be const
From: David Miller @ 2012-09-13 20:53 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1347436071.13103.695.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 12 Sep 2012 09:47:51 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> We kmemdup() templates, so they can be const.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox