Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 0/6] sctp: Auto-ASCONF patch series
From: Michio Honda @ 2011-04-27  8:27 UTC (permalink / raw)
  To: netdev; +Cc: Honda Michio, YOSHIFUJI Hideaki, Wei Yongjun

>From 9ede9db0ec4b03d3061a5bfed78328cb5528b908 Mon Sep 17 00:00:00 2001
From: Michio Honda <micchie@sfc.wide.ad.jp>
Date: Wed, 27 Apr 2011 17:16:21 +0900
Subject: [PATCH 0/6] sctp: Auto-ASCONF patch series

Series of 6 patches to support auto_asconf and the other related functionalities that auto_asconf relies on. 

Michio Honda (5):
  sctp: Add ADD/DEL ASCONF handling at the receiver.
  sctp: Add Auto-ASCONF support (core).
  sctp: Add sysctl support for Auto-ASCONF.
  sctp: Add socket option operation for Auto-ASCONF.
  sctp: sctp: Add ASCONF operation on the single-homed host

YOSHIFUJI Hideaki (1):
  sctp: Allow regular C expression in 4th argument for
    SCTP_DEBUG_PRINTK_IPADDR macro.

 include/net/sctp/sctp.h    |   11 ++-
 include/net/sctp/structs.h |   17 +++++
 include/net/sctp/user.h    |    1 +
 net/sctp/associola.c       |    6 ++
 net/sctp/bind_addr.c       |   15 ++++
 net/sctp/ipv6.c            |    9 +++
 net/sctp/outqueue.c        |   13 ++++
 net/sctp/protocol.c        |  151 ++++++++++++++++++++++++++++++++++++++++++-
 net/sctp/sm_make_chunk.c   |   40 +++++++++++-
 net/sctp/socket.c          |  156 +++++++++++++++++++++++++++++++++++++++++---
 net/sctp/sysctl.c          |    7 ++
 11 files changed, 411 insertions(+), 15 deletions(-)

-- 
1.7.3.2



^ permalink raw reply

* Re: [PATCH] Applying inappropriate ioctl operation on socket should return ENOTTY
From: Lifeng Sun @ 2011-04-27  8:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev
In-Reply-To: <1303887457.2699.60.camel@edumazet-laptop>

On 08:57 Wed 04/27/11 Apr, Eric Dumazet wrote:
> You quote manpage for a library call, not a system call.

okay, let me quote the one for ioctl system call:

man 2 ioctl

int ioctl(int d, int request, ...);

ERRORS
   EBADF  d is not a valid descriptor.
   EFAULT argp references an inaccessible memory area.
   EINVAL Request or argp is not valid.
   ENOTTY d is not associated with a character special device.
   ENOTTY The specified request does not apply to the kind of object
     that the descriptor d references.

we see ENOTTY and EFAULT refine EVINAL and it should return ENOTTY or
EFAULT whenever possible rather than EINVAL, otherwise we could always
return EBADF or EINVAL.

Regarding to isatty, well, it's only a library call, isn't it? :-) If
you insist on the significance of the manpage of isatty, there are
also a lot of ioctl operations return ENOTTY, if not less than those
return EINVAL, for inappropriated command and eventually violate the
ERRORS section of the manpage. Certainly we could complain to c
library maintainers.

> If you feel your glibc doesnt implement well this, please complain to
> glibc maintainer.

-- 

^ permalink raw reply

* [PATCH net-next-2.6 5/5 v2] sctp: clean up route lookup calls
From: Wei Yongjun @ 2011-04-27  7:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, lksctp
In-Reply-To: <4DB7C73D.3000406@cn.fujitsu.com>

From: Vlad Yasevich <vladislav.yasevich@hp.com>

Change the call to take the transport parameter and set the
cached 'dst' appropriately inside the get_dst() function calls.

This will allow us in the future  to clean up source address
storage as well.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
---
 include/net/sctp/structs.h |    3 +--
 net/sctp/ipv6.c            |   17 ++++++++---------
 net/sctp/protocol.c        |   12 +++++-------
 net/sctp/transport.c       |   23 ++++++++++-------------
 4 files changed, 24 insertions(+), 31 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index bb2f43b..ff3e8cc 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -564,8 +564,7 @@ struct sctp_af {
 					 int optname,
 					 char __user *optval,
 					 int __user *optlen);
-	struct dst_entry *(*get_dst)	(struct sctp_association *asoc,
-					 union sctp_addr *daddr,
+	void		(*get_dst)	(struct sctp_transport *t,
 					 union sctp_addr *saddr,
 					 struct flowi *fl,
 					 struct sock *sk);
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index a1913a4..500875f 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -247,17 +247,16 @@ static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *transport)
 /* Returns the dst cache entry for the given source and destination ip
  * addresses.
  */
-static struct dst_entry *sctp_v6_get_dst(struct sctp_association *asoc,
-					 union sctp_addr *daddr,
-					 union sctp_addr *saddr,
-					 struct flowi *fl,
-					 struct sock *sk)
+static void sctp_v6_get_dst(struct sctp_transport *t, union sctp_addr *saddr,
+			    struct flowi *fl, struct sock *sk)
 {
+	struct sctp_association *asoc = t->asoc;
 	struct dst_entry *dst = NULL;
 	struct flowi6 *fl6 = &fl->u.ip6;
 	struct sctp_bind_addr *bp;
 	struct sctp_sockaddr_entry *laddr;
 	union sctp_addr *baddr = NULL;
+	union sctp_addr *daddr = &t->ipaddr;
 	union sctp_addr dst_saddr;
 	__u8 matchlen = 0;
 	__u8 bmatchlen;
@@ -270,7 +269,6 @@ static struct dst_entry *sctp_v6_get_dst(struct sctp_association *asoc,
 	if (ipv6_addr_type(&daddr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL)
 		fl6->flowi6_oif = daddr->v6.sin6_scope_id;
 
-
 	SCTP_DEBUG_PRINTK("%s: DST=%pI6 ", __func__, &fl6->daddr);
 
 	if (asoc)
@@ -343,12 +341,13 @@ out:
 	if (!IS_ERR(dst)) {
 		struct rt6_info *rt;
 		rt = (struct rt6_info *)dst;
+		t->dst = dst;
 		SCTP_DEBUG_PRINTK("rt6_dst:%pI6 rt6_src:%pI6\n",
 			&rt->rt6i_dst.addr, &fl6->saddr);
-		return dst;
+	} else {
+		t->dst = NULL;
+		SCTP_DEBUG_PRINTK("NO ROUTE\n");
 	}
-	SCTP_DEBUG_PRINTK("NO ROUTE\n");
-	return NULL;
 }
 
 /* Returns the number of consecutive initial bits that match in the 2 ipv6
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 68b4c43..9d3f159 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -463,17 +463,16 @@ static sctp_scope_t sctp_v4_scope(union sctp_addr *addr)
  * addresses. If an association is passed, trys to get a dst entry with a
  * source address that matches an address in the bind address list.
  */
-static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
-					 union sctp_addr *daddr,
-					 union sctp_addr *saddr,
-					 struct flowi *fl,
-					 struct sock *sk)
+static void sctp_v4_get_dst(struct sctp_transport *t, union sctp_addr *saddr,
+				struct flowi *fl, struct sock *sk)
 {
+	struct sctp_association *asoc = t->asoc;
 	struct rtable *rt;
 	struct flowi4 *fl4 = &fl->u.ip4;
 	struct sctp_bind_addr *bp;
 	struct sctp_sockaddr_entry *laddr;
 	struct dst_entry *dst = NULL;
+	union sctp_addr *daddr = &t->ipaddr;
 	union sctp_addr dst_saddr;
 
 	memset(fl4, 0x0, sizeof(struct flowi4));
@@ -548,13 +547,12 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
 out_unlock:
 	rcu_read_unlock();
 out:
+	t->dst = dst;
 	if (dst)
 		SCTP_DEBUG_PRINTK("rt_dst:%pI4, rt_src:%pI4\n",
 				  &rt->rt_dst, &rt->rt_src);
 	else
 		SCTP_DEBUG_PRINTK("NO ROUTE\n");
-
-	return dst;
 }
 
 /* For v4, the source address is cached in the route entry(dst). So no need
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 1fbb920..d8595dd 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -213,17 +213,17 @@ void sctp_transport_set_owner(struct sctp_transport *transport,
 /* Initialize the pmtu of a transport. */
 void sctp_transport_pmtu(struct sctp_transport *transport, struct sock *sk)
 {
-	struct dst_entry *dst;
 	struct flowi fl;
 
-	dst = transport->af_specific->get_dst(transport->asoc,
-					      &transport->ipaddr,
-					      &transport->saddr,
+	/* If we don't have a fresh route, look one up */
+	if (!transport->dst || transport->dst->obsolete > 1) {
+		dst_release(transport->dst);
+		transport->af_specific->get_dst(transport, &transport->saddr,
 					      &fl, sk);
+	}
 
-	if (dst) {
-		transport->pathmtu = dst_mtu(dst);
-		dst_release(dst);
+	if (transport->dst) {
+		transport->pathmtu = dst_mtu(transport->dst);
 	} else
 		transport->pathmtu = SCTP_DEFAULT_MAXSEGMENT;
 }
@@ -274,12 +274,9 @@ void sctp_transport_route(struct sctp_transport *transport,
 {
 	struct sctp_association *asoc = transport->asoc;
 	struct sctp_af *af = transport->af_specific;
-	union sctp_addr *daddr = &transport->ipaddr;
-	struct dst_entry *dst;
 	struct flowi fl;
 
-	dst = af->get_dst(asoc, daddr, saddr, &fl, sctp_opt2sk(opt));
-	transport->dst = dst;
+	af->get_dst(transport, saddr, &fl, sctp_opt2sk(opt));
 
 	if (saddr)
 		memcpy(&transport->saddr, saddr, sizeof(union sctp_addr));
@@ -289,8 +286,8 @@ void sctp_transport_route(struct sctp_transport *transport,
 	if ((transport->param_flags & SPP_PMTUD_DISABLE) && transport->pathmtu) {
 		return;
 	}
-	if (dst) {
-		transport->pathmtu = dst_mtu(dst);
+	if (transport->dst) {
+		transport->pathmtu = dst_mtu(transport->dst);
 
 		/* Initialize sk->sk_rcv_saddr, if the transport is the
 		 * association's active path for getsockname().
-- 
1.6.5.2



^ permalink raw reply related

* [PATCH net-next-2.6 4/5 v2] sctp: remove useless arguments from get_saddr() call
From: Wei Yongjun @ 2011-04-27  7:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, lksctp
In-Reply-To: <4DB7C73D.3000406@cn.fujitsu.com>

From: Vlad Yasevich <vladislav.yasevich@hp.com>

There is no point in passing a destination address to
a get_saddr() call.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
---
 include/net/sctp/structs.h |    1 -
 net/sctp/ipv6.c            |    5 +----
 net/sctp/protocol.c        |    1 -
 net/sctp/transport.c       |    2 +-
 4 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 1d465d6..bb2f43b 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -571,7 +571,6 @@ struct sctp_af {
 					 struct sock *sk);
 	void		(*get_saddr)	(struct sctp_sock *sk,
 					 struct sctp_transport *t,
-					 union sctp_addr *daddr,
 					 struct flowi *fl);
 	void		(*copy_addrlist) (struct list_head *,
 					  struct net_device *);
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 593c801..a1913a4 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -365,15 +365,12 @@ static inline int sctp_v6_addr_match_len(union sctp_addr *s1,
  */
 static void sctp_v6_get_saddr(struct sctp_sock *sk,
 			      struct sctp_transport *t,
-			      union sctp_addr *daddr,
 			      struct flowi *fl)
 {
 	struct flowi6 *fl6 = &fl->u.ip6;
 	union sctp_addr *saddr = &t->saddr;
 
-	SCTP_DEBUG_PRINTK("%s: asoc:%p dst:%p daddr:%pI6 ",
-			  __func__, t->asoc, t->dst, &daddr->v6.sin6_addr);
-
+	SCTP_DEBUG_PRINTK("%s: asoc:%p dst:%p\n", __func__, t->asoc, t->dst);
 
 	if (t->dst) {
 		saddr->v6.sin6_family = AF_INET6;
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 3421645..68b4c43 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -562,7 +562,6 @@ out:
  */
 static void sctp_v4_get_saddr(struct sctp_sock *sk,
 			      struct sctp_transport *t,
-			      union sctp_addr *daddr,
 			      struct flowi *fl)
 {
 	union sctp_addr *saddr = &t->saddr;
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 2544b9b..1fbb920 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -284,7 +284,7 @@ void sctp_transport_route(struct sctp_transport *transport,
 	if (saddr)
 		memcpy(&transport->saddr, saddr, sizeof(union sctp_addr));
 	else
-		af->get_saddr(opt, transport, daddr, &fl);
+		af->get_saddr(opt, transport, &fl);
 
 	if ((transport->param_flags & SPP_PMTUD_DISABLE) && transport->pathmtu) {
 		return;
-- 
1.6.5.2



^ permalink raw reply related

* Re: [Bugme-new] [Bug 33842] New: NULL pointer dereference in ip_fragment
From: Tomas Carnecky @ 2011-04-27  7:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Bandan Das, David Miller, netdev, akpm
In-Reply-To: <1303853047.2699.15.camel@edumazet-laptop>

On 4/26/11 11:24 PM, Eric Dumazet wrote:
> Le mardi 26 avril 2011 à 17:19 -0400, Bandan Das a écrit :
>> Yeah, I just rechecked and this is already in Linus' tree. So, Tomas you can
>> either try pulling in those changes or you can apply this patch and see
>> if it makes any difference. Thanks!
> Better pull Linus tree because there is another patch involved.
>
> (commits c65353daf137dd41f3ede3baf62d561fca076228
> ip: ip_options_compile() resilient to NULL skb route

Still getting that error (on rc4-00245-g4175242, which includes that 
commit).


^ permalink raw reply

* [PATCH net-next-2.6 3/5] sctp: make sctp over IPv6 work with IPsec
From: Wei Yongjun @ 2011-04-27  7:52 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, lksctp
In-Reply-To: <4DB7C73D.3000406@cn.fujitsu.com>

From: Vlad Yasevich <vladislav.yasevich@hp.com>

SCTP never called xfrm_output after it's v6 route lookups so
that never really worked with ipsec.  Additioanlly, we never
passed port nubmers and protocol in the flowi, so any port
based policies were never applied as well.  Now that we can
fixed ipv6 routing lookup code, using ip6_dst_lookup_flow()
and pass port numbers.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
---
 net/sctp/ipv6.c |   16 +++++++++++-----
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 51c048d..593c801 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -262,22 +262,27 @@ static struct dst_entry *sctp_v6_get_dst(struct sctp_association *asoc,
 	__u8 matchlen = 0;
 	__u8 bmatchlen;
 	sctp_scope_t scope;
-	int err = 0;
 
 	memset(fl6, 0, sizeof(struct flowi6));
 	ipv6_addr_copy(&fl6->daddr, &daddr->v6.sin6_addr);
+	fl6->fl6_dport = daddr->v6.sin6_port;
+	fl6->flowi6_proto = IPPROTO_SCTP;
 	if (ipv6_addr_type(&daddr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL)
 		fl6->flowi6_oif = daddr->v6.sin6_scope_id;
 
 
 	SCTP_DEBUG_PRINTK("%s: DST=%pI6 ", __func__, &fl6->daddr);
 
+	if (asoc)
+		fl6->fl6_sport = htons(asoc->base.bind_addr.port);
+
 	if (saddr) {
 		ipv6_addr_copy(&fl6->saddr, &saddr->v6.sin6_addr);
+		fl6->fl6_sport = saddr->v6.sin6_port;
 		SCTP_DEBUG_PRINTK("SRC=%pI6 - ", &fl6->saddr);
 	}
 
-	err = ip6_dst_lookup(sk, &dst, fl6);
+	dst = ip6_dst_lookup_flow(sk, fl6, NULL, false);
 	if (!asoc || saddr)
 		goto out;
 
@@ -286,7 +291,7 @@ static struct dst_entry *sctp_v6_get_dst(struct sctp_association *asoc,
 	/* ip6_dst_lookup has filled in the fl6->saddr for us.  Check
 	 * to see if we can use it.
 	 */
-	if (!err) {
+	if (!IS_ERR(dst)) {
 		/* Walk through the bind address list and look for a bind
 		 * address that matches the source address of the returned dst.
 		 */
@@ -330,11 +335,12 @@ static struct dst_entry *sctp_v6_get_dst(struct sctp_association *asoc,
 	rcu_read_unlock();
 	if (baddr) {
 		ipv6_addr_copy(&fl6->saddr, &baddr->v6.sin6_addr);
-		err = ip6_dst_lookup(sk, &dst, fl6);
+		fl6->fl6_sport = baddr->v6.sin6_port;
+		dst = ip6_dst_lookup_flow(sk, fl6, NULL, false);
 	}
 
 out:
-	if (!err) {
+	if (!IS_ERR(dst)) {
 		struct rt6_info *rt;
 		rt = (struct rt6_info *)dst;
 		SCTP_DEBUG_PRINTK("rt6_dst:%pI6 rt6_src:%pI6\n",
-- 
1.6.5.2



^ permalink raw reply related

* [PATCH net-next-2.6 2/5 v2] sctp: cache the ipv6 source after route lookup
From: Wei Yongjun @ 2011-04-27  7:51 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, lksctp
In-Reply-To: <4DB7C73D.3000406@cn.fujitsu.com>

From: Vlad Yasevich <vladislav.yasevich@hp.com>

The ipv6 routing lookup does give us a source address,
but instead of filling it into the dst, it's stored in
the flowi.  We can use that instead of going through the
entire source address selection again.
Also the useless ->dst_saddr member of sctp_pf is removed.
And sctp_v6_dst_saddr() is removed, instead by introduce
sctp_v6_to_addr(), which can be reused to cleanup some dup
code.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
---
 include/net/sctp/structs.h |   14 ++--
 net/sctp/ipv6.c            |  161 ++++++++++++++++++++------------------------
 net/sctp/protocol.c        |   47 ++++++-------
 net/sctp/socket.c          |    2 +-
 net/sctp/transport.c       |   15 +++--
 5 files changed, 112 insertions(+), 127 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 5c9bada..1d465d6 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -566,17 +566,15 @@ struct sctp_af {
 					 int __user *optlen);
 	struct dst_entry *(*get_dst)	(struct sctp_association *asoc,
 					 union sctp_addr *daddr,
-					 union sctp_addr *saddr);
+					 union sctp_addr *saddr,
+					 struct flowi *fl,
+					 struct sock *sk);
 	void		(*get_saddr)	(struct sctp_sock *sk,
-					 struct sctp_association *asoc,
-					 struct dst_entry *dst,
+					 struct sctp_transport *t,
 					 union sctp_addr *daddr,
-					 union sctp_addr *saddr);
+					 struct flowi *fl);
 	void		(*copy_addrlist) (struct list_head *,
 					  struct net_device *);
-	void		(*dst_saddr)	(union sctp_addr *saddr,
-					 struct dst_entry *dst,
-					 __be16 port);
 	int		(*cmp_addr)	(const union sctp_addr *addr1,
 					 const union sctp_addr *addr2);
 	void		(*addr_copy)	(union sctp_addr *dst,
@@ -1061,7 +1059,7 @@ void sctp_transport_set_owner(struct sctp_transport *,
 			      struct sctp_association *);
 void sctp_transport_route(struct sctp_transport *, union sctp_addr *,
 			  struct sctp_sock *);
-void sctp_transport_pmtu(struct sctp_transport *);
+void sctp_transport_pmtu(struct sctp_transport *, struct sock *sk);
 void sctp_transport_free(struct sctp_transport *);
 void sctp_transport_reset_timers(struct sctp_transport *);
 void sctp_transport_hold(struct sctp_transport *);
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 3a571d6..51c048d 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -82,6 +82,10 @@
 
 static inline int sctp_v6_addr_match_len(union sctp_addr *s1,
 					 union sctp_addr *s2);
+static void sctp_v6_to_addr(union sctp_addr *addr, struct in6_addr *saddr,
+			      __be16 port);
+static int sctp_v6_cmp_addr(const union sctp_addr *addr1,
+			    const union sctp_addr *addr2);
 
 /* Event handler for inet6 address addition/deletion events.
  * The sctp_local_addr_list needs to be protocted by a spin lock since
@@ -245,73 +249,99 @@ static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *transport)
  */
 static struct dst_entry *sctp_v6_get_dst(struct sctp_association *asoc,
 					 union sctp_addr *daddr,
-					 union sctp_addr *saddr)
+					 union sctp_addr *saddr,
+					 struct flowi *fl,
+					 struct sock *sk)
 {
 	struct dst_entry *dst = NULL;
-	struct flowi6 fl6;
+	struct flowi6 *fl6 = &fl->u.ip6;
 	struct sctp_bind_addr *bp;
 	struct sctp_sockaddr_entry *laddr;
 	union sctp_addr *baddr = NULL;
+	union sctp_addr dst_saddr;
 	__u8 matchlen = 0;
 	__u8 bmatchlen;
 	sctp_scope_t scope;
+	int err = 0;
 
-	memset(&fl6, 0, sizeof(fl6));
-	ipv6_addr_copy(&fl6.daddr, &daddr->v6.sin6_addr);
+	memset(fl6, 0, sizeof(struct flowi6));
+	ipv6_addr_copy(&fl6->daddr, &daddr->v6.sin6_addr);
 	if (ipv6_addr_type(&daddr->v6.sin6_addr) & IPV6_ADDR_LINKLOCAL)
-		fl6.flowi6_oif = daddr->v6.sin6_scope_id;
+		fl6->flowi6_oif = daddr->v6.sin6_scope_id;
 
 
-	SCTP_DEBUG_PRINTK("%s: DST=%pI6 ", __func__, &fl6.daddr);
+	SCTP_DEBUG_PRINTK("%s: DST=%pI6 ", __func__, &fl6->daddr);
 
 	if (saddr) {
-		ipv6_addr_copy(&fl6.saddr, &saddr->v6.sin6_addr);
-		SCTP_DEBUG_PRINTK("SRC=%pI6 - ", &fl6.saddr);
+		ipv6_addr_copy(&fl6->saddr, &saddr->v6.sin6_addr);
+		SCTP_DEBUG_PRINTK("SRC=%pI6 - ", &fl6->saddr);
 	}
 
-	dst = ip6_route_output(&init_net, NULL, &fl6);
+	err = ip6_dst_lookup(sk, &dst, fl6);
 	if (!asoc || saddr)
 		goto out;
 
-	if (dst->error) {
-		dst_release(dst);
-		dst = NULL;
-		bp = &asoc->base.bind_addr;
-		scope = sctp_scope(daddr);
-		/* Walk through the bind address list and try to get a dst that
-		 * matches a bind address as the source address.
+	bp = &asoc->base.bind_addr;
+	scope = sctp_scope(daddr);
+	/* ip6_dst_lookup has filled in the fl6->saddr for us.  Check
+	 * to see if we can use it.
+	 */
+	if (!err) {
+		/* Walk through the bind address list and look for a bind
+		 * address that matches the source address of the returned dst.
 		 */
+		sctp_v6_to_addr(&dst_saddr, &fl6->saddr, htons(bp->port));
 		rcu_read_lock();
 		list_for_each_entry_rcu(laddr, &bp->address_list, list) {
-			if (!laddr->valid)
+			if (!laddr->valid || (laddr->state != SCTP_ADDR_SRC))
 				continue;
-			if ((laddr->state == SCTP_ADDR_SRC) &&
-			    (laddr->a.sa.sa_family == AF_INET6) &&
-			    (scope <= sctp_scope(&laddr->a))) {
-				bmatchlen = sctp_v6_addr_match_len(daddr,
-								   &laddr->a);
-				if (!baddr || (matchlen < bmatchlen)) {
-					baddr = &laddr->a;
-					matchlen = bmatchlen;
-				}
+
+			/* Do not compare against v4 addrs */
+			if ((laddr->a.sa.sa_family == AF_INET6) &&
+			    (sctp_v6_cmp_addr(&dst_saddr, &laddr->a))) {
+				rcu_read_unlock();
+				goto out;
 			}
 		}
 		rcu_read_unlock();
-		if (baddr) {
-			ipv6_addr_copy(&fl6.saddr, &baddr->v6.sin6_addr);
-			dst = ip6_route_output(&init_net, NULL, &fl6);
+		/* None of the bound addresses match the source address of the
+		 * dst. So release it.
+		 */
+		dst_release(dst);
+		dst = NULL;
+	}
+
+	/* Walk through the bind address list and try to get the
+	 * best source address for a given destination.
+	 */
+	rcu_read_lock();
+	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+		if (!laddr->valid && laddr->state != SCTP_ADDR_SRC)
+			continue;
+		if ((laddr->a.sa.sa_family == AF_INET6) &&
+		    (scope <= sctp_scope(&laddr->a))) {
+			bmatchlen = sctp_v6_addr_match_len(daddr, &laddr->a);
+			if (!baddr || (matchlen < bmatchlen)) {
+				baddr = &laddr->a;
+				matchlen = bmatchlen;
+			}
 		}
 	}
+	rcu_read_unlock();
+	if (baddr) {
+		ipv6_addr_copy(&fl6->saddr, &baddr->v6.sin6_addr);
+		err = ip6_dst_lookup(sk, &dst, fl6);
+	}
+
 out:
-	if (!dst->error) {
+	if (!err) {
 		struct rt6_info *rt;
 		rt = (struct rt6_info *)dst;
 		SCTP_DEBUG_PRINTK("rt6_dst:%pI6 rt6_src:%pI6\n",
-			&rt->rt6i_dst.addr, &rt->rt6i_src.addr);
+			&rt->rt6i_dst.addr, &fl6->saddr);
 		return dst;
 	}
 	SCTP_DEBUG_PRINTK("NO ROUTE\n");
-	dst_release(dst);
 	return NULL;
 }
 
@@ -328,64 +358,21 @@ static inline int sctp_v6_addr_match_len(union sctp_addr *s1,
  * and asoc's bind address list.
  */
 static void sctp_v6_get_saddr(struct sctp_sock *sk,
-			      struct sctp_association *asoc,
-			      struct dst_entry *dst,
+			      struct sctp_transport *t,
 			      union sctp_addr *daddr,
-			      union sctp_addr *saddr)
+			      struct flowi *fl)
 {
-	struct sctp_bind_addr *bp;
-	struct sctp_sockaddr_entry *laddr;
-	sctp_scope_t scope;
-	union sctp_addr *baddr = NULL;
-	__u8 matchlen = 0;
-	__u8 bmatchlen;
+	struct flowi6 *fl6 = &fl->u.ip6;
+	union sctp_addr *saddr = &t->saddr;
 
 	SCTP_DEBUG_PRINTK("%s: asoc:%p dst:%p daddr:%pI6 ",
-			  __func__, asoc, dst, &daddr->v6.sin6_addr);
-
-	if (!asoc) {
-		ipv6_dev_get_saddr(sock_net(sctp_opt2sk(sk)),
-				   dst ? ip6_dst_idev(dst)->dev : NULL,
-				   &daddr->v6.sin6_addr,
-				   inet6_sk(&sk->inet.sk)->srcprefs,
-				   &saddr->v6.sin6_addr);
-		SCTP_DEBUG_PRINTK("saddr from ipv6_get_saddr: %pI6\n",
-				  &saddr->v6.sin6_addr);
-		return;
-	}
-
-	scope = sctp_scope(daddr);
+			  __func__, t->asoc, t->dst, &daddr->v6.sin6_addr);
 
-	bp = &asoc->base.bind_addr;
-
-	/* Go through the bind address list and find the best source address
-	 * that matches the scope of the destination address.
-	 */
-	rcu_read_lock();
-	list_for_each_entry_rcu(laddr, &bp->address_list, list) {
-		if (!laddr->valid)
-			continue;
-		if ((laddr->state == SCTP_ADDR_SRC) &&
-		    (laddr->a.sa.sa_family == AF_INET6) &&
-		    (scope <= sctp_scope(&laddr->a))) {
-			bmatchlen = sctp_v6_addr_match_len(daddr, &laddr->a);
-			if (!baddr || (matchlen < bmatchlen)) {
-				baddr = &laddr->a;
-				matchlen = bmatchlen;
-			}
-		}
-	}
 
-	if (baddr) {
-		memcpy(saddr, baddr, sizeof(union sctp_addr));
-		SCTP_DEBUG_PRINTK("saddr: %pI6\n", &saddr->v6.sin6_addr);
-	} else {
-		pr_err("%s: asoc:%p Could not find a valid source "
-		       "address for the dest:%pI6\n",
-		       __func__, asoc, &daddr->v6.sin6_addr);
+	if (t->dst) {
+		saddr->v6.sin6_family = AF_INET6;
+		ipv6_addr_copy(&saddr->v6.sin6_addr, &fl6->saddr);
 	}
-
-	rcu_read_unlock();
 }
 
 /* Make a copy of all potential local addresses. */
@@ -507,14 +494,13 @@ static int sctp_v6_to_addr_param(const union sctp_addr *addr,
 	return length;
 }
 
-/* Initialize a sctp_addr from a dst_entry. */
-static void sctp_v6_dst_saddr(union sctp_addr *addr, struct dst_entry *dst,
+/* Initialize a sctp_addr from struct in6_addr. */
+static void sctp_v6_to_addr(union sctp_addr *addr, struct in6_addr *saddr,
 			      __be16 port)
 {
-	struct rt6_info *rt = (struct rt6_info *)dst;
 	addr->sa.sa_family = AF_INET6;
 	addr->v6.sin6_port = port;
-	ipv6_addr_copy(&addr->v6.sin6_addr, &rt->rt6i_src.addr);
+	ipv6_addr_copy(&addr->v6.sin6_addr, saddr);
 }
 
 /* Compare addresses exactly.
@@ -1001,7 +987,6 @@ static struct sctp_af sctp_af_inet6 = {
 	.to_sk_daddr	   = sctp_v6_to_sk_daddr,
 	.from_addr_param   = sctp_v6_from_addr_param,
 	.to_addr_param	   = sctp_v6_to_addr_param,
-	.dst_saddr	   = sctp_v6_dst_saddr,
 	.cmp_addr	   = sctp_v6_cmp_addr,
 	.scope		   = sctp_v6_scope,
 	.addr_valid	   = sctp_v6_addr_valid,
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index d5bf91d..3421645 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -465,33 +465,35 @@ static sctp_scope_t sctp_v4_scope(union sctp_addr *addr)
  */
 static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
 					 union sctp_addr *daddr,
-					 union sctp_addr *saddr)
+					 union sctp_addr *saddr,
+					 struct flowi *fl,
+					 struct sock *sk)
 {
 	struct rtable *rt;
-	struct flowi4 fl4;
+	struct flowi4 *fl4 = &fl->u.ip4;
 	struct sctp_bind_addr *bp;
 	struct sctp_sockaddr_entry *laddr;
 	struct dst_entry *dst = NULL;
 	union sctp_addr dst_saddr;
 
-	memset(&fl4, 0x0, sizeof(struct flowi4));
-	fl4.daddr  = daddr->v4.sin_addr.s_addr;
-	fl4.fl4_dport = daddr->v4.sin_port;
-	fl4.flowi4_proto = IPPROTO_SCTP;
+	memset(fl4, 0x0, sizeof(struct flowi4));
+	fl4->daddr  = daddr->v4.sin_addr.s_addr;
+	fl4->fl4_dport = daddr->v4.sin_port;
+	fl4->flowi4_proto = IPPROTO_SCTP;
 	if (asoc) {
-		fl4.flowi4_tos = RT_CONN_FLAGS(asoc->base.sk);
-		fl4.flowi4_oif = asoc->base.sk->sk_bound_dev_if;
-		fl4.fl4_sport = htons(asoc->base.bind_addr.port);
+		fl4->flowi4_tos = RT_CONN_FLAGS(asoc->base.sk);
+		fl4->flowi4_oif = asoc->base.sk->sk_bound_dev_if;
+		fl4->fl4_sport = htons(asoc->base.bind_addr.port);
 	}
 	if (saddr) {
-		fl4.saddr = saddr->v4.sin_addr.s_addr;
-		fl4.fl4_sport = saddr->v4.sin_port;
+		fl4->saddr = saddr->v4.sin_addr.s_addr;
+		fl4->fl4_sport = saddr->v4.sin_port;
 	}
 
 	SCTP_DEBUG_PRINTK("%s: DST:%pI4, SRC:%pI4 - ",
-			  __func__, &fl4.daddr, &fl4.saddr);
+			  __func__, &fl4->daddr, &fl4->saddr);
 
-	rt = ip_route_output_key(&init_net, &fl4);
+	rt = ip_route_output_key(&init_net, fl4);
 	if (!IS_ERR(rt))
 		dst = &rt->dst;
 
@@ -533,9 +535,9 @@ static struct dst_entry *sctp_v4_get_dst(struct sctp_association *asoc,
 			continue;
 		if ((laddr->state == SCTP_ADDR_SRC) &&
 		    (AF_INET == laddr->a.sa.sa_family)) {
-			fl4.saddr = laddr->a.v4.sin_addr.s_addr;
-			fl4.fl4_sport = laddr->a.v4.sin_port;
-			rt = ip_route_output_key(&init_net, &fl4);
+			fl4->saddr = laddr->a.v4.sin_addr.s_addr;
+			fl4->fl4_sport = laddr->a.v4.sin_port;
+			rt = ip_route_output_key(&init_net, fl4);
 			if (!IS_ERR(rt)) {
 				dst = &rt->dst;
 				goto out_unlock;
@@ -559,19 +561,15 @@ out:
  * to cache it separately and hence this is an empty routine.
  */
 static void sctp_v4_get_saddr(struct sctp_sock *sk,
-			      struct sctp_association *asoc,
-			      struct dst_entry *dst,
+			      struct sctp_transport *t,
 			      union sctp_addr *daddr,
-			      union sctp_addr *saddr)
+			      struct flowi *fl)
 {
-	struct rtable *rt = (struct rtable *)dst;
-
-	if (!asoc)
-		return;
+	union sctp_addr *saddr = &t->saddr;
+	struct rtable *rt = (struct rtable *)t->dst;
 
 	if (rt) {
 		saddr->v4.sin_family = AF_INET;
-		saddr->v4.sin_port = htons(asoc->base.bind_addr.port);
 		saddr->v4.sin_addr.s_addr = rt->rt_src;
 	}
 }
@@ -950,7 +948,6 @@ static struct sctp_af sctp_af_inet = {
 	.to_sk_daddr	   = sctp_v4_to_sk_daddr,
 	.from_addr_param   = sctp_v4_from_addr_param,
 	.to_addr_param	   = sctp_v4_to_addr_param,
-	.dst_saddr	   = sctp_v4_dst_saddr,
 	.cmp_addr	   = sctp_v4_cmp_addr,
 	.addr_valid	   = sctp_v4_addr_valid,
 	.inaddr_any	   = sctp_v4_inaddr_any,
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index f694ee1..33d9ee6 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2287,7 +2287,7 @@ static int sctp_apply_peer_addr_params(struct sctp_paddrparams *params,
 			trans->param_flags =
 				(trans->param_flags & ~SPP_PMTUD) | pmtud_change;
 			if (update) {
-				sctp_transport_pmtu(trans);
+				sctp_transport_pmtu(trans, sctp_opt2sk(sp));
 				sctp_assoc_sync_pmtu(asoc);
 			}
 		} else if (asoc) {
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index d3ae493..2544b9b 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -211,11 +211,15 @@ void sctp_transport_set_owner(struct sctp_transport *transport,
 }
 
 /* Initialize the pmtu of a transport. */
-void sctp_transport_pmtu(struct sctp_transport *transport)
+void sctp_transport_pmtu(struct sctp_transport *transport, struct sock *sk)
 {
 	struct dst_entry *dst;
+	struct flowi fl;
 
-	dst = transport->af_specific->get_dst(NULL, &transport->ipaddr, NULL);
+	dst = transport->af_specific->get_dst(transport->asoc,
+					      &transport->ipaddr,
+					      &transport->saddr,
+					      &fl, sk);
 
 	if (dst) {
 		transport->pathmtu = dst_mtu(dst);
@@ -272,15 +276,16 @@ void sctp_transport_route(struct sctp_transport *transport,
 	struct sctp_af *af = transport->af_specific;
 	union sctp_addr *daddr = &transport->ipaddr;
 	struct dst_entry *dst;
+	struct flowi fl;
 
-	dst = af->get_dst(asoc, daddr, saddr);
+	dst = af->get_dst(asoc, daddr, saddr, &fl, sctp_opt2sk(opt));
+	transport->dst = dst;
 
 	if (saddr)
 		memcpy(&transport->saddr, saddr, sizeof(union sctp_addr));
 	else
-		af->get_saddr(opt, asoc, dst, daddr, &transport->saddr);
+		af->get_saddr(opt, transport, daddr, &fl);
 
-	transport->dst = dst;
 	if ((transport->param_flags & SPP_PMTUD_DISABLE) && transport->pathmtu) {
 		return;
 	}
-- 
1.6.5.2



^ permalink raw reply related

* Re: [Bugme-new] [Bug 33842] New: NULL pointer dereference in ip_fragment
From: Eric Dumazet @ 2011-04-27  7:41 UTC (permalink / raw)
  To: Tomas Carnecky; +Cc: Bandan Das, David Miller, netdev, akpm
In-Reply-To: <4DB7C44D.60306@dbservice.com>

Le mercredi 27 avril 2011 à 09:22 +0200, Tomas Carnecky a écrit :
> On 4/26/11 11:24 PM, Eric Dumazet wrote:
> > Le mardi 26 avril 2011 à 17:19 -0400, Bandan Das a écrit :
> >> Yeah, I just rechecked and this is already in Linus' tree. So, Tomas you can
> >> either try pulling in those changes or you can apply this patch and see
> >> if it makes any difference. Thanks!
> > Better pull Linus tree because there is another patch involved.
> >
> > (commits c65353daf137dd41f3ede3baf62d561fca076228
> > ip: ip_options_compile() resilient to NULL skb route
> 
> Still getting that error (on rc4-00245-g4175242, which includes that 
> commit).
> 

Could you send us a complete trace ?

One way to get one is to use netconsole (provided you have another
machine )

grep NETCONSOLE .config
CONFIG_NETCONSOLE=y

Add on your boot command

netconsole=4444@192.168.20.108/eth0,4444@192.168.20.112/00:1e:0b:ec:c3:e4

messages sent to host 192.168.20.112 udp port 4444, mac addr 00:1e:0b:ec:c3:e4, on eth0

then on 192.168.20.112 start a netcat listening on udp port 4444 to get
a copy of messages.

netcat -l -u -p 4444

Complete doc on Documentation/networking/netconsole.txt

Thanks !



^ permalink raw reply

* [PATCH net-next-2.6 1/5 v2] sctp: fix sctp to work with ipv6 source address routing
From: Wei Yongjun @ 2011-04-27  7:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, lksctp
In-Reply-To: <4DB7C73D.3000406@cn.fujitsu.com>

From: Weixing Shi <Weixing.Shi@windriver.com>

In the below test case, using the source address routing,
sctp can not work.
Node-A
1)ifconfig eth0 inet6 add 2001:1::1/64
2)ip -6 rule add from 2001:1::1 table 100 pref 100
3)ip -6 route add 2001:2::1 dev eth0 table 100
4)sctp_darn -H 2001:1::1 -P 250 -l &
Node-B
1)ifconfig eth0 inet6 add 2001:2::1/64
2)ip -6 rule add from 2001:2::1 table 100 pref 100
3)ip -6 route add 2001:1::1 dev eth0 table 100
4)sctp_darn -H 2001:2::1 -P 250 -h 2001:1::1 -p 250 -s

root cause:
Node-A and Node-B use the source address routing, and
at begining, source address will be NULL,sctp will
search the  routing table by the destination address,
because using the source address routing table, and
the result dst_entry will be NULL.

solution:
walk through the bind address list to get the source
address and then lookup the routing table again to get
the correct dst_entry.

Signed-off-by: Weixing Shi <Weixing.Shi@windriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
---
 net/sctp/ipv6.c |   44 +++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 43 insertions(+), 1 deletions(-)

diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 321f175..3a571d6 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -80,6 +80,9 @@
 
 #include <asm/uaccess.h>
 
+static inline int sctp_v6_addr_match_len(union sctp_addr *s1,
+					 union sctp_addr *s2);
+
 /* Event handler for inet6 address addition/deletion events.
  * The sctp_local_addr_list needs to be protocted by a spin lock since
  * multiple notifiers (say IPv4 and IPv6) may be running at the same
@@ -244,8 +247,14 @@ static struct dst_entry *sctp_v6_get_dst(struct sctp_association *asoc,
 					 union sctp_addr *daddr,
 					 union sctp_addr *saddr)
 {
-	struct dst_entry *dst;
+	struct dst_entry *dst = NULL;
 	struct flowi6 fl6;
+	struct sctp_bind_addr *bp;
+	struct sctp_sockaddr_entry *laddr;
+	union sctp_addr *baddr = NULL;
+	__u8 matchlen = 0;
+	__u8 bmatchlen;
+	sctp_scope_t scope;
 
 	memset(&fl6, 0, sizeof(fl6));
 	ipv6_addr_copy(&fl6.daddr, &daddr->v6.sin6_addr);
@@ -261,6 +270,39 @@ static struct dst_entry *sctp_v6_get_dst(struct sctp_association *asoc,
 	}
 
 	dst = ip6_route_output(&init_net, NULL, &fl6);
+	if (!asoc || saddr)
+		goto out;
+
+	if (dst->error) {
+		dst_release(dst);
+		dst = NULL;
+		bp = &asoc->base.bind_addr;
+		scope = sctp_scope(daddr);
+		/* Walk through the bind address list and try to get a dst that
+		 * matches a bind address as the source address.
+		 */
+		rcu_read_lock();
+		list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+			if (!laddr->valid)
+				continue;
+			if ((laddr->state == SCTP_ADDR_SRC) &&
+			    (laddr->a.sa.sa_family == AF_INET6) &&
+			    (scope <= sctp_scope(&laddr->a))) {
+				bmatchlen = sctp_v6_addr_match_len(daddr,
+								   &laddr->a);
+				if (!baddr || (matchlen < bmatchlen)) {
+					baddr = &laddr->a;
+					matchlen = bmatchlen;
+				}
+			}
+		}
+		rcu_read_unlock();
+		if (baddr) {
+			ipv6_addr_copy(&fl6.saddr, &baddr->v6.sin6_addr);
+			dst = ip6_route_output(&init_net, NULL, &fl6);
+		}
+	}
+out:
 	if (!dst->error) {
 		struct rt6_info *rt;
 		rt = (struct rt6_info *)dst;
-- 
1.6.5.2



^ permalink raw reply related

* [PATCH net-next-2.6 0/5 v2] SCTP updates for net-next-2.6
From: Wei Yongjun @ 2011-04-27  7:35 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, lksctp

Hi David

Here is a set of SCTP patches for net-next-2.6, the last part
from vlad's lksctp-dev tree, update SCTP IPv6 routing and IPSec
issues. Please apply.

Changelog:
  - redo the intermediate builds test and function test.
  - remove useless ->dst_saddr member of sctp_pf
  - merge some fix for original patch

Vlad Yasevich (4):
      sctp: cache the ipv6 source after route lookup
      sctp: make sctp over IPv6 work with IPsec
      sctp: remove useless arguments from get_saddr() call
      sctp: clean up route lookup calls

Weixing Shi (1):
      sctp: fix sctp to work with ipv6 source address routing

 include/net/sctp/structs.h |   18 ++---
 net/sctp/ipv6.c            |  183 +++++++++++++++++++++++++------------------
 net/sctp/protocol.c        |   54 ++++++-------
 net/sctp/socket.c          |    2 +-
 net/sctp/transport.c       |   28 ++++---
 5 files changed, 153 insertions(+), 132 deletions(-)



^ permalink raw reply

* Re: [PATCH] Applying inappropriate ioctl operation on socket should return ENOTTY
From: Eric Dumazet @ 2011-04-27  6:57 UTC (permalink / raw)
  To: Lifeng Sun; +Cc: linux-kernel, netdev
In-Reply-To: <20110427063730.GA20313@md5.ntu.edu.sg>

Le mercredi 27 avril 2011 à 14:37 +0800, Lifeng Sun a écrit :
> On 07:58 Wed 04/27/11 Apr, Eric Dumazet wrote:
> > Really ?
> > 
> > EINVAL is ok too : Request or argp is not valid.
> 
> I'm afraid not. SUSv4 specifies, say,
> 
>   int tcsetattr(int fildes, int optional_actions,
>          const struct termios *termios_p);
> 
>  ERROR:
>   [EINVAL]
>       The optional_actions argument is not a supported value, or an
>       attempt was made to change an attribute represented in the
>       termios structure to an unsupported value.
> 
>   [ENOTTY]
>       The file associated with fildes is not a terminal.
> 
> which means when we apply tcsetattr (implemented by ioctl) to _any_
> non-terminal file descriptor, it should set errno to ENOTTY rather
> than EINVAL.
> 

You quote manpage for a library call, not a system call.

If you feel your glibc doesnt implement well this, please complain to
glibc maintainer.

^ permalink raw reply

* Re: [PATCH] Applying inappropriate ioctl operation on socket should return ENOTTY
From: Eric Dumazet @ 2011-04-27  6:55 UTC (permalink / raw)
  To: Lifeng Sun; +Cc: linux-kernel, netdev
In-Reply-To: <20110427063730.GA20313@md5.ntu.edu.sg>

Le mercredi 27 avril 2011 à 14:37 +0800, Lifeng Sun a écrit :
> On 07:58 Wed 04/27/11 Apr, Eric Dumazet wrote:
> > Really ?
> > 
> > EINVAL is ok too : Request or argp is not valid.
> 
> I'm afraid not. SUSv4 specifies, say,
> 
>   int tcsetattr(int fildes, int optional_actions,
>          const struct termios *termios_p);
> 
>  ERROR:
>   [EINVAL]
>       The optional_actions argument is not a supported value, or an
>       attempt was made to change an attribute represented in the
>       termios structure to an unsupported value.
> 
>   [ENOTTY]
>       The file associated with fildes is not a terminal.
> 
> which means when we apply tcsetattr (implemented by ioctl) to _any_
> non-terminal file descriptor, it should set errno to ENOTTY rather
> than EINVAL.

Thats not so simple. This is a known and documented artifact.

In old days, ioctl() had a meaning for TTYS (mostly).



man isatty

ERRORS
       EBADF  fd is not a valid file descriptor.

       EINVAL fd refers to a file other than a terminal.  POSIX.1-2001 specifies the error ENOTTY for this case.


This is not because POSIX changes rules that we must change kernel and break applications.

Conformant applications use isatty(fd) and test result code being 1 or not 1

This way, they work with linux 1.0, 2.0, 2.2, 2.4, .... and other OSes as well.

^ permalink raw reply

* Re: [PATCH] Applying inappropriate ioctl operation on socket should return ENOTTY
From: Lifeng Sun @ 2011-04-27  6:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, netdev
In-Reply-To: <1303883910.2699.53.camel@edumazet-laptop>

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]

On 07:58 Wed 04/27/11 Apr, Eric Dumazet wrote:
> Really ?
> 
> EINVAL is ok too : Request or argp is not valid.

I'm afraid not. SUSv4 specifies, say,

  int tcsetattr(int fildes, int optional_actions,
         const struct termios *termios_p);

 ERROR:
  [EINVAL]
      The optional_actions argument is not a supported value, or an
      attempt was made to change an attribute represented in the
      termios structure to an unsupported value.

  [ENOTTY]
      The file associated with fildes is not a terminal.

which means when we apply tcsetattr (implemented by ioctl) to _any_
non-terminal file descriptor, it should set errno to ENOTTY rather
than EINVAL.

> I would say, its not a bug as you claim. 
> 
> Its really too late to make such change and risk regressions.
> 
> isatty(fd) performs well. Please use it instead.
> 
> Also, networking patches should be sent to netdev@vger.kernel.org and
> David Miller, as mentioned in MAINTAINERS file.

Thank you.

-- 

[-- Attachment #2: GnuPG digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* Re: [PATCH] net: tun: convert to hw_features
From: Rusty Russell @ 2011-04-27  4:59 UTC (permalink / raw)
  To: David Miller, mirq-linux; +Cc: netdev
In-Reply-To: <20110420.013216.28818575.davem@davemloft.net>

On Wed, 20 Apr 2011 01:32:16 -0700 (PDT), David Miller <davem@davemloft.net> wrote:
> From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> Date: Tue, 19 Apr 2011 18:13:10 +0200 (CEST)
> 
> > This changes offload setting behaviour to what I think is correct:
> >  - offloads set via ethtool mean what admin wants to use (by default
> >    he wants 'em all)
> >  - offloads set via ioctl() mean what userspace is expecting to get
> >    (this limits which admin wishes are granted)
> >  - TUN_NOCHECKSUM is ignored, as it might cause broken packets when
> >    forwarded (ip_summed == CHECKSUM_UNNECESSARY means that checksum
> >    was verified, not that it can be ignored)
> > 
> > If TUN_NOCHECKSUM is implemented, it should set skb->csum_* and
> > skb->ip_summed (= CHECKSUM_PARTIAL) for known protocols and let others
> > be verified by kernel when necessary.
> > 
> > TUN_NOCHECKSUM handling was introduced by commit
> > f43798c27684ab925adde7d8acc34c78c6e50df8:
> > 
> >     tun: Allow GSO using virtio_net_hdr
> >     
> > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> 
> Applied.

Dave, you just removed a feature that has been in Linux since before
git.  It *probably* just means we go slower in cases we don't really
care about.  But does removing it break qemu?  Has anyone tested?

Thanks,
Rusty.


^ permalink raw reply

* Re: [PATCH] Applying inappropriate ioctl operation on socket should return ENOTTY
From: Eric Dumazet @ 2011-04-27  5:58 UTC (permalink / raw)
  To: Lifeng Sun; +Cc: linux-kernel, netdev
In-Reply-To: <1303882625-28115-1-git-send-email-lifongsun@gmail.com>

Le mercredi 27 avril 2011 à 13:37 +0800, Lifeng Sun a écrit :
> ioctl() calls against a socket with an inappropriate ioctl operation
> are incorrectly returning EINVAL rather than ENOTTY:
> 
>   [ENOTTY]
>       Inappropriate I/O control operation.
> 
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=33992
> 
> This bug is not limited to socket, it also occurs in a lot of, maybe
> some hundred, other ioctl operations, while in the patch I only fixed
> about a dozen of additional ones in pipe, fifo and character device
> drivers.

Really ?

EINVAL is ok too : Request or argp is not valid.

I would say, its not a bug as you claim. 

Its really too late to make such change and risk regressions.

isatty(fd) performs well. Please use it instead.

Also, networking patches should be sent to netdev@vger.kernel.org and
David Miller, as mentioned in MAINTAINERS file.




^ permalink raw reply

* Re: [PATCH 3/4] ipv4: Remove erroneous check in igmpv3_newpack() and igmp_send_report().
From: Eric Dumazet @ 2011-04-27  4:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110426.151204.70183859.davem@davemloft.net>

Le mardi 26 avril 2011 à 15:12 -0700, David Miller a écrit :
> Output route resolution never returns a route with rt_src set to zero
> (which is INADDR_ANY).
> 
> Even if the flow key for the output route lookup specifies INADDR_ANY
> for the source address, the output route resolution chooses a real
> source address to use in the final route.
> 
> This test has existed forever in igmp_send_report() and David Stevens
> simply copied over the erroneous test when implementing support for
> IGMPv3.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>

Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>



^ permalink raw reply

* linux-next: ibmveth runtime errors
From: Stephen Rothwell @ 2011-04-27  4:49 UTC (permalink / raw)
  To: ppc-dev
  Cc: "Michał Mirosław", David Miller, netdev,
	Santiago Leon, linux-next, LKML

[-- Attachment #1: Type: text/plain, Size: 579 bytes --]

Hi all,

For the last couple of days, linux-next booting on a few of our Power
partitions (but not all) have produced this error (over and over):

ibmveth 3000000b: eth0: tx: h_send_logical_lan failed with rc=-4

Linus' tree seems to boot fine on these partitions.  The only commit
directly affecting ibmveth in linux-next is b9367bf3ee6d ("net: ibmveth:
convert to hw_features") which first appeared in next-20110421 which is
also the first one that failed.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCH 2/4] ipv4: Sanitize and simplify ip_route_{connect,newports}()
From: Eric Dumazet @ 2011-04-27  4:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110426.151202.245387083.davem@davemloft.net>

Le mardi 26 avril 2011 à 15:12 -0700, David Miller a écrit :
> These functions are used together as a unit for route resolution
> during connect().  They address the chicken-and-egg problem that
> exists when ports need to be allocated during connect() processing,
> yet such port allocations require addressing information from the
> routing code.
> 
> It's currently more heavy handed than it needs to be, and in
> particular we allocate and initialize a flow object twice.
> 
> Let the callers provide the on-stack flow object.  That way we only
> need to initialize it once in the ip_route_connect() call.
> 
> Later, if ip_route_newports() needs to do anything, it re-uses that
> flow object as-is except for the ports which it updates before the
> route re-lookup.
> 
> Also, describe why this set of facilities are needed and how it works
> in a big comment.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>

Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>



^ permalink raw reply

* Re: Kernel crash after using new Intel NIC (igb)
From: Eric Dumazet @ 2011-04-27  4:32 UTC (permalink / raw)
  To: Maximilian Engelhardt; +Cc: linux-kernel, netdev, StuStaNet Vorstand
In-Reply-To: <1303878240.2699.41.camel@edumazet-laptop>

Le mercredi 27 avril 2011 à 06:24 +0200, Eric Dumazet a écrit :

> We had similar reports in the past that disappeared when adding
> "slab_nomerge" to boot parameters. We suspect a memory corruption from
> another part of kernel on 64bytes kmemcache objects.
> 
> In 2.6.37, inetpeer code uses 64bytes objects. Using slab_nomerge and
> SLUB allocator (as you already do), makes sure inetpeer kmemcache wont
> be shared by other 64bytes objects in kernel.
> 

Of course, the right option name is slub_nomerge

vi +2293 Documentation/kernel-parameters.txt

        slub_nomerge    [MM, SLUB]
                        Disable merging of slabs with similar size. May be
                        necessary if there is some reason to distinguish
                        allocs to different slabs. Debug options disable
                        merging on their own.
                        For more information see Documentation/vm/slub.txt.

^ permalink raw reply

* Re: Kernel crash after using new Intel NIC (igb)
From: Eric Dumazet @ 2011-04-27  4:24 UTC (permalink / raw)
  To: Maximilian Engelhardt; +Cc: linux-kernel, netdev, StuStaNet Vorstand
In-Reply-To: <201104250033.03401.maxi@daemonizer.de>

Le lundi 25 avril 2011 à 00:32 +0200, Maximilian Engelhardt a écrit :
> Hello,
> 
> some time ago we switched some of our servers to a new networking card that 
> uses the Intel igb driver. Since that time we see regular kernel crashes.
> The crashes happen at very irregular intervals, sometimes after a week uptime, 
> sometimes after a month or even more. They seem to be independent of the 
> server load as they also happen in the night when there is low traffic.
> 
> The affected server is used as a NAT device with some iptables rules and serves 
> about 2000 people.
> 
> Attached are two logs of the crashes as well as the output of dmesg, lspci, 
> and /proc/interrupts as well as the used kernel config.
> 
> I have no idea what might be wrong but I think it is a kernel bug. Perhaps 
> someone with more knowledge has a clue.
> 
> If needed I can provide additional information or build different kernels.
> 
> Greetings,
> Maxi

Hello Maximilian

We had similar reports in the past that disappeared when adding
"slab_nomerge" to boot parameters. We suspect a memory corruption from
another part of kernel on 64bytes kmemcache objects.

In 2.6.37, inetpeer code uses 64bytes objects. Using slab_nomerge and
SLUB allocator (as you already do), makes sure inetpeer kmemcache wont
be shared by other 64bytes objects in kernel.

In 2.6.38 and up, inetpeer objects are now larger, so you also could try
latest linux-2.6 tree, just to make sure inetpeer code is not faulty.

Thanks

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffff8145ea9f>] cleanup_once+0x3f/0xa0
PGD 12d82a067 PUD 12ea49067 PMD 0
Oops: 0002 [#1] PREEMPT SMP
last sysfs file: /sys/devices/virtual/vc/vcsa5/uevent
CPU 0
Pid: 0, comm: swapper Not tainted 2.6.37.1 #1 Supermicro X7SB4/E/X7SB4/E
RIP: 0010:[<ffffffff8145ea9f>]  [<ffffffff8145ea9f>] cleanup_once+0x3f/0xa0
RSP: 0018:ffff8800cfc03e40  EFLAGS: 00010202
RAX: ffff880128167798 RBX: ffff880128167780 RCX: 0000000000000000
RDX: c398112e00026cf7 RSI: 00000000000001a2 RDI: ffffffff8166ce10
RBP: 0000000000024702 R08: 00000000003d0900 R09: 00040ea8ea5b7700
R10: ffffffff814f312d R11: 0000000000000010 R12: ffffffff8161ffd8
R13: 0000000000000102 R14: ffffffff8174b4e0 R15: ffffffff8161ffd8
FS:  0000000000000000(0000) GS:ffff8800cfc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 000000012fe67000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff8161e000, task ffffffff81638020)
Stack:
 ffff8800cfc11f00 0000000111034f87 0000000000024702 ffffffff8145ed68
 ffffffff8174a4c0 ffffffff8174a4c0 ffff8800cfc03eb0 ffffffff81044cb8
 ffffffff81034079 ffffffff8145ed30 0000000000000000 ffffffff8174b8e0
Call Trace:
 <IRQ>
 [<ffffffff8145ed68>] ? peer_check_expire+0x38/0x110
 [<ffffffff81044cb8>] ? run_timer_softirq+0x138/0x250
 [<ffffffff81034079>] ? scheduler_tick+0xd9/0x2e0
 [<ffffffff8145ed30>] ? peer_check_expire+0x0/0x110
 [<ffffffff8103eb0d>] ? __do_softirq+0x9d/0x130
 [<ffffffff8100320c>] ? call_softirq+0x1c/0x30
 [<ffffffff8100531d>] ? do_softirq+0x4d/0x80
 [<ffffffff8103e9cd>] ? irq_exit+0x8d/0x90
 [<ffffffff8101d5ea>] ? smp_apic_timer_interrupt+0x6a/0xa0
 [<ffffffff81002cd3>] ? apic_timer_interrupt+0x13/0x20
 <EOI>
 [<ffffffff8100a93a>] ? mwait_idle+0x6a/0x80
 [<ffffffff81001528>] ? cpu_idle+0x58/0xb0
 [<ffffffff81698dd3>] ? start_kernel+0x334/0x33f
 [<ffffffff8169840d>] ? x86_64_start_kernel+0xf3/0xf7
Code: 00 48 8b 05 84 e3 20 00 48 3d 00 ce 66 81 74 5c 48 8d 58 e8 48 8b 15 31 5e 22 00 2b 53 28 48 39 ea 72 49 48 8b 4b 18 48 8b 53 20 <48> 89 51 08 48 89 0a 48 89 43 18 48 89 43 20 f0 ff 40 14 48 c7
RIP  [<ffffffff8145ea9f>] cleanup_once+0x3f/0xa0
 RSP <ffff8800cfc03e40>
CR2: 0000000000000008
---[ end trace 904f16191de0663c ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 0, comm: swapper Tainted: G      D     2.6.37.1 #1
Call Trace:
 <IRQ>  [<ffffffff814e4152>] ? panic+0xa1/0x19e
 [<ffffffff810068eb>] ? oops_end+0x9b/0xa0
 [<ffffffff81024523>] ? no_context+0x103/0x270
 [<ffffffff81024d10>] ? do_page_fault+0x290/0x430
 [<ffffffff813eabd2>] ? __alloc_skb+0x72/0x160
 [<ffffffff81262f40>] ? swiotlb_dma_mapping_error+0x10/0x20
 [<ffffffff8133e168>] ? igb_alloc_rx_buffers_adv+0x208/0x3a0
 [<ffffffff814e780f>] ? page_fault+0x1f/0x30
 [<ffffffff8145ea9f>] ? cleanup_once+0x3f/0xa0
 [<ffffffff8145ed68>] ? peer_check_expire+0x38/0x110
 [<ffffffff81044cb8>] ? run_timer_softirq+0x138/0x250
 [<ffffffff81034079>] ? scheduler_tick+0xd9/0x2e0
 [<ffffffff8145ed30>] ? peer_check_expire+0x0/0x110
 [<ffffffff8103eb0d>] ? __do_softirq+0x9d/0x130
 [<ffffffff8100320c>] ? call_softirq+0x1c/0x30
 [<ffffffff8100531d>] ? do_softirq+0x4d/0x80
 [<ffffffff8103e9cd>] ? irq_exit+0x8d/0x90
 [<ffffffff8101d5ea>] ? smp_apic_timer_interrupt+0x6a/0xa0
 [<ffffffff81002cd3>] ? apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff8100a93a>] ? mwait_idle+0x6a/0x80
 [<ffffffff81001528>] ? cpu_idle+0x58/0xb0
 [<ffffffff81698dd3>] ? start_kernel+0x334/0x33f
 [<ffffffff8169840d>] ? x86_64_start_kernel+0xf3/0xf7

^ permalink raw reply

* Re: [PATCH] netfilter/IPv6: initialize TOS field in REJECT target module
From: Fernando Luis Vazquez Cao @ 2011-04-27  4:21 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: David Miller, eric.dumazet, netfilter-devel, netdev, yoshfuji,
	jengelh, Patrick McHardy
In-Reply-To: <4DB6E647.5060608@netfilter.org>

On Tue, 2011-04-26 at 17:35 +0200, Pablo Neira Ayuso wrote:
> On 26/04/11 17:34, Pablo Neira Ayuso wrote:
> > On 26/04/11 07:25, Fernando Luis Vazquez Cao wrote:
> >> Pablo, could you pull in the two patches below? They have already been
> >> acked by Eric. It would be great it we could get them merged for the
> >> next -rc and stable releases.
> >>
> >> [PATCH] netfilter/IPv6: fix DSCP mangle code
> >> [PATCH] netfilter/IPv6: initialize TOS field in REJECT target module
> > 
> > Patrick is the primary link to take patches, I'm including him in this
> > CC. If he experiences any problem, I'll make sure that these hit -rc, so
> > never mind.
>   ^^^^^^^^^^
> 
> Sorry, I meant to say, "don't worry" :-)

Thank you, Pablo. I really appreciate it.

- Fernando


^ permalink raw reply

* Re: Strange igb bug, out-of-tree driver seems to work fine.
From: Ben Greear @ 2011-04-27  4:18 UTC (permalink / raw)
  To: Wyborny, Carolyn; +Cc: netdev
In-Reply-To: <EDC0E76513226749BFBC9C3FB031318F016B41256F@orsmsx508.amr.corp.intel.com>

On 04/26/2011 04:23 PM, Wyborny, Carolyn wrote:

> Hello,
>
> I'm sorry for the delay in responding.  I'm really scratching my head on this one as we don't do much in the driver that affects what we get on receive.  I've seen situations where some switches end up transmitting more of these and then we record more of them, but I'm guessing you're testing with the same equipment, just a different driver version.  Let me know if I'm mistaken there.
>
> So, to answer your question, I believe my patches are there, but I did review them again and I'm not sure they will make any difference.  My latest batch of patches was to add features to the i350 device specifically.
>
> Give it try though and let me know if you see any difference with 2.6.39-rc4+.

We reproduced this with stock 2.6.38.4 today, but I didn't get a chance to really
dig into it.

We only seem to have problems when the nics are associated with a kernel bridge
(some ports are connected to a pair of veth devices through a user-space bridge
that uses packet sockets to bridge the packets, and one of the veth interfaces
is in the kernel bridge).

We did run the same igb system to itself sending layer-3 traffic and it ran
fine, so it appears to be a fairly tricky bug.  It *almost* looks like issues
with the bridge or how we set things up, but we can reliably reproduce it
on in-kernel igb driver systems, and e1000e systems never see the problem.

I'll try to get some better debug info tomorrow, and if time allows,
we'll try on the stock linus top-of-tree kernel as well.  If top-of-tree
does work, I should be able to bisect the problem since we have a reliable
test case..would be interesting to see where the issues lies.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: r8169 :  always copying the rx buffer to new skb
From: Eric Dumazet @ 2011-04-27  3:57 UTC (permalink / raw)
  To: John Lumby; +Cc: Francois Romieu, netdev, Ben Hutchings, nic_swsd
In-Reply-To: <4DB77D03.9070507@hotmail.com>

Le mardi 26 avril 2011 à 22:18 -0400, John Lumby a écrit :
> Anyone have any further thoughts on the proposal to avoid memcpy'ing?  
> (see earlier post)
> 
> I also have a question concerning NAPI.     I've found that much of the 
> CPU saved from not memcpy'ing is burned in extra rx_interrupt'ing,  and 
> much of that seems to be wasted (no new packets).    So the actual 
> benefit is rather less than I think should be possible.
> 
> I've tried some tinkering with the napi weight but can't find any 
> setting which really improves the ratio of rx packets to hard interrupts 
> significantly.    The problem seems to be that each successive 
> rtl8169_poll() is driven too soon after the last one   (in this 
> particular workload).     The napi weight doesn't directly influence that.
> 
> So  -  question :
> is there any way,   when returning from rtl8169_poll,  to tell napi 
> something like :
>     "   finish this interrupt context and let something else run on this 
> CPU  (always CPU0 on my machine) BUT reschedule another napi poll on 
> this same device at some time after that "
> the point being that rtl8169_poll will,  for this case,  NOT re-enable 
> the NIC's napi interrupts,  in the hope that maybe some user work can be 
> dispatched,    so something else will have to schedule the next napi 
> poll for it.    Conceptually,    if rtl8169_poll finds no rx work done 
> on this call,   it wants to cause a yield() and then try again.     
> Except it can't from within the interrupt.
> 
> I appreciate this could lead to delays in handling new work so might be 
> dangerous,    but it seems to me to be in line with NAPI objectives so I 
> wanted to try it .   But don't know how.     Any hints or thoughts 
> appreciated.

Answer is no. There is no such facility in NAPI infrastructure.

You want to introduce a timer based polling. Some old pre-NAPI drivers
were doing that. Its OK when you have one device to handle, it can be a
nightmare when you mix several devices.




^ permalink raw reply

* Re: r8169 :  always copying the rx buffer to new skb
From: John Lumby @ 2011-04-27  2:18 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, Ben Hutchings, nic_swsd
In-Reply-To: <4DAFA9F9.5080909@hotmail.com>

Anyone have any further thoughts on the proposal to avoid memcpy'ing?  
(see earlier post)

I also have a question concerning NAPI.     I've found that much of the 
CPU saved from not memcpy'ing is burned in extra rx_interrupt'ing,  and 
much of that seems to be wasted (no new packets).    So the actual 
benefit is rather less than I think should be possible.

I've tried some tinkering with the napi weight but can't find any 
setting which really improves the ratio of rx packets to hard interrupts 
significantly.    The problem seems to be that each successive 
rtl8169_poll() is driven too soon after the last one   (in this 
particular workload).     The napi weight doesn't directly influence that.

So  -  question :
is there any way,   when returning from rtl8169_poll,  to tell napi 
something like :
    "   finish this interrupt context and let something else run on this 
CPU  (always CPU0 on my machine) BUT reschedule another napi poll on 
this same device at some time after that "
the point being that rtl8169_poll will,  for this case,  NOT re-enable 
the NIC's napi interrupts,  in the hope that maybe some user work can be 
dispatched,    so something else will have to schedule the next napi 
poll for it.    Conceptually,    if rtl8169_poll finds no rx work done 
on this call,   it wants to cause a yield() and then try again.     
Except it can't from within the interrupt.

I appreciate this could lead to delays in handling new work so might be 
dangerous,    but it seems to me to be in line with NAPI objectives so I 
wanted to try it .   But don't know how.     Any hints or thoughts 
appreciated.

John

^ permalink raw reply

* Re: [RFC PATCH 1/1] bna: Generic Netlink Interface to collect FW trace
From: David Miller @ 2011-04-27  2:17 UTC (permalink / raw)
  To: debdut; +Cc: shemminger, rmody, netdev, huangj, amathur, ddutt
In-Reply-To: <BANLkTinUsJ2foWKBVb_9nVFA2V_vKb11rA@mail.gmail.com>

From: Debashis Dutt <debdut@gmail.com>
Date: Tue, 26 Apr 2011 19:16:29 -0700

> However, since the generic netlink is a more generic interface, we could use
> this infrastructure in the driver for commands which are not part of
> other standard tools.

You aren't the only device in the world that might want to provide
a facility to fetch firmware traces.

This isn't really a niche thing specific to your device at all.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox