Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v3 04/12] l2tp: Add ppp device name to L2TP ppp session data
From: David Miller @ 2010-03-31  8:46 UTC (permalink / raw)
  To: jchapman; +Cc: shemminger, netdev
In-Reply-To: <4BB2FD0D.1080105@katalix.com>

From: James Chapman <jchapman@katalix.com>
Date: Wed, 31 Mar 2010 08:43:09 +0100

> Stephen Hemminger wrote:
>> On Tue, 30 Mar 2010 17:17:46 +0100
>> James Chapman <jchapman@katalix.com> wrote:
>> 
>>> When dumping L2TP PPP sessions using /proc/net/l2tp, get
>>> the assigned PPP device name from PPP using ppp_dev_name().
>>>
>>> Signed-off-by: James Chapman <jchapman@katalix.com>
>>> Reviewed-by: Randy Dunlap <randy.dunlap@oracle.com>
>>>
>> 
>> Why is this a necessary API?
>> Why not put it in debugfs if just a debugging tool?
> 
> With the original driver (merged in 2.6.23), some people use horrible
> hacks in scripts to derive info about their L2TP connections from /proc.
> So I was reluctant to move it to debugfs in the new driver. If it is ok
> to move an existing /proc file to debugfs, I'm happy to do so. People
> should obtain such info from their L2TP userspace daemon, or through
> netlink anyway.

Existing stuff you shouldn't move around, people do depend on it
and thus it has to be retained.

But for new stuff, we can try to think about better ways to export the
information if possible.

^ permalink raw reply

* [PATCH][V2] MACB: Set PHY address in kernel parameters
From: Anders Darander @ 2010-03-31  8:46 UTC (permalink / raw)
  To: Haavard Skinnemoen
  Cc: David S. Miller, Jiri Pirko, Erik Waling, Patrick McHardy,
	Anders Darander, Grant Likely, netdev, linux-kernel

From: Anders Darander <ad@datarespons.se>

Add the possibility to set the phy address. This is needed if an integrated
switch is connected to the MAC, as it is often the case that the highest port
is the one connected to the MAC of the MCU.

E.g. in the case of the Micrel KSZ8873, port 3 is the one to connect to the
MCU, thus, the MAC needs to connect to phy address 0x03, instead of the first
phy found.

Signed-off-by: Anders Darander <ad@datarespons.se>
---

Changes from V1:
	* Made the variable type consistent with the module parameter
	  declaration. (I.e. made it unsigned short).

 drivers/net/macb.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/drivers/net/macb.c b/drivers/net/macb.c
index c8a18a6..af7b61b 100644
--- a/drivers/net/macb.c
+++ b/drivers/net/macb.c
@@ -53,6 +53,14 @@
 #define MACB_RX_INT_FLAGS	(MACB_BIT(RCOMP) | MACB_BIT(RXUBR)	\
 				 | MACB_BIT(ISR_ROVR))
 
+/*
+ * Setup PHY probeing
+ */
+
+static unsigned short phy_addr = PHY_MAX_ADDR;
+module_param(phy_addr, ushort, 0);
+MODULE_PARAM_DESC(phy_addr, "PHY address connected to the MACB");
+
 static void __macb_set_hwaddr(struct macb *bp)
 {
 	u32 bottom;
@@ -193,7 +201,11 @@ static int macb_mii_probe(struct net_device *dev)
 	struct eth_platform_data *pdata;
 	int ret;
 
-	phydev = phy_find_first(bp->mii_bus);
+	if (phy_addr >= PHY_MAX_ADDRESS)
+		phydev = phy_find_first(bp->mii_bus);
+	else
+		phydev = bp->mii_bus->phy_map[phy_addr];
+
 	if (!phydev) {
 		printk (KERN_ERR "%s: no PHY found\n", dev->name);
 		return -1;
-- 
1.7.0.3

^ permalink raw reply related

* Re: [PATCH] MACB: Set PHY address in kernel parameters
From: Anders Darander @ 2010-03-31  8:50 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, linux-kernel
In-Reply-To: <4BB307CA.9080305@trash.net>

* Patrick McHardy <kaber@trash.net> [100331 10:29]:
> Anders Darander wrote:
> > +	if (phy_addr >= PHY_MAX_ADDRESS)
> > +		phydev = phy_find_first(bp->mii_bus);
> > +	else
> > +		phydev = bp->mii_bus->phy_map[phy_addr];

> This looks like you need to use an unsigned to avoid negative
> indices.

Thanks for spotting this. That made me find that I'd also been inconsistent with the variable and module parameter declarations.

A new version, V2, of the patch has been sent.

Best regards,
Anders Darander

^ permalink raw reply

* Re: [PATCH v3 09/12] l2tp: Add netlink control API for L2TP
From: Patrick McHardy @ 2010-03-31  8:59 UTC (permalink / raw)
  To: James Chapman; +Cc: netdev
In-Reply-To: <20100330161814.9628.43239.stgit@bert.katalix.com>

James Chapman wrote:
> +static struct nla_policy l2tp_nl_policy[L2TP_ATTR_MAX + 1] = {
> +	[L2TP_ATTR_NONE]		= { .type = NLA_UNSPEC, },
> +	[L2TP_ATTR_PW_TYPE]		= { .type = NLA_U16, },
> +	[L2TP_ATTR_ENCAP_TYPE]		= { .type = NLA_U16, },
> +	[L2TP_ATTR_OFFSET]		= { .type = NLA_U16, },
> +	[L2TP_ATTR_DATA_SEQ]		= { .type = NLA_U8, },
> +	[L2TP_ATTR_L2SPEC_TYPE]		= { .type = NLA_U8, },
> +	[L2TP_ATTR_L2SPEC_LEN]		= { .type = NLA_U8, },
> +	[L2TP_ATTR_PROTO_VERSION]	= { .type = NLA_U8, },
> +	[L2TP_ATTR_CONN_ID]		= { .type = NLA_U32, },
> +	[L2TP_ATTR_PEER_CONN_ID]	= { .type = NLA_U32, },
> +	[L2TP_ATTR_SESSION_ID]		= { .type = NLA_U32, },
> +	[L2TP_ATTR_PEER_SESSION_ID]	= { .type = NLA_U32, },
> +	[L2TP_ATTR_UDP_CSUM]		= { .type = NLA_FLAG, },
> +	[L2TP_ATTR_VLAN_ID]		= { .type = NLA_U16, },
> +	[L2TP_ATTR_DEBUG]		= { .type = NLA_U32, },
> +	[L2TP_ATTR_RECV_SEQ]		= { .type = NLA_FLAG, },
> +	[L2TP_ATTR_SEND_SEQ]		= { .type = NLA_FLAG, },
> +	[L2TP_ATTR_LNS_MODE]		= { .type = NLA_FLAG, },
> +	[L2TP_ATTR_USING_IPSEC]		= { .type = NLA_FLAG, },

Please don't use NLA_FLAG, it diverges from the usual netlink
attribute semantics since you either can't unset the flag in
case you check for the presence of the attribute (the attribute
can only encode "flag set") or you'll have to send the current
value in every message even if you don't intend to change it.
In this case it can't be unset.

A better way is to use something like this:

struct l2tp_flags {
	__u32	value;
	__u32	mask;
};

and set mask to the bits you intend to change.

^ permalink raw reply

* Re: [Patch] fix packet loss and massive ping spikes with PPP multi-link
From: Richard Hartmann @ 2010-03-31  9:01 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-ppp
In-Reply-To: <2d460de71003260850x7f90d04cy79ac853464108182@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6366 bytes --]

Hi all,

this is our attempt at a cleaner version. It is still far from being
perfect and packets seem to gain one to two bytes in size sometimes
which means that you will run into "normal" fragmentation if you set
your MTU to the possible maximum (as you are then over said max) but it
does what it should, can be changed at run-time and is not panicking the
kernel.

Feedback appreciated, code even more so.


Thanks,
Richard

PS: The main patch is inline, both the main and the debug patch are
attached.


--- drivers/net/ppp_generic.c.orig      2010-03-25 16:56:05.000000000 +0100
+++ drivers/net/ppp_generic.c   2010-03-30 19:56:15.000000000 +0200
@@ -129,6 +129,7 @@
        u32             nextseq;        /* MP: seq no of next packet */
        u32             minseq;         /* MP: min of most recent seqnos */
        struct sk_buff_head mrq;        /* MP: receive reconstruction queue */
+       int     rrsched;    /* round robin scheduler for packet distribution */
 #endif /* CONFIG_PPP_MULTILINK */
 #ifdef CONFIG_PPP_FILTER
        struct sock_filter *pass_filter;        /* filter for packets to pass */
@@ -227,6 +228,17 @@
 #define B      0x80            /* this fragment begins a packet */
 #define E      0x40            /* this fragment ends a packet */

+#ifdef CONFIG_PPP_MULTILINK
+/* alternate fragmentation algorithm and multilink behaviour options
added by uli.staerk@globalways.net */
+static int ppp_ml_noexplode = 0;
+module_param(ppp_ml_noexplode, int, 0600);
+MODULE_PARM_DESC(ppp_ml_noexplode, "Set this to any other values than
zero to avoid fragmentation over connected channels");
+
+static int ppp_ml_noheader = 0;
+module_param(ppp_ml_noheader, int, 0600);
+MODULE_PARM_DESC(ppp_ml_noheader, "Set this to any other value than
zero to remove the ppp-multilink protocol header (pppoes.protocol not
0x003d) which enables fragmentation and reordering");
+#endif /* CONFIG_PPP_MULTILINK */
+
 /* Compare multilink sequence numbers (assumed to be 32 bits wide) */
 #define seq_before(a, b)	((s32)((a) - (b)) < 0)
 #define seq_after(a, b)		((s32)((a) - (b)) > 0)
@@ -250,6 +262,7 @@
 static void ppp_mp_insert(struct ppp *ppp, struct sk_buff *skb);
 static struct sk_buff *ppp_mp_reconstruct(struct ppp *ppp);
 static int ppp_mp_explode(struct ppp *ppp, struct sk_buff *skb);
+static void ppp_mp_roundrobin(struct ppp *ppp, struct sk_buff *skb);
 #endif /* CONFIG_PPP_MULTILINK */
 static int ppp_set_compress(struct ppp *ppp, unsigned long arg);
 static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound);
@@ -1292,10 +1305,18 @@
 	}

 #ifdef CONFIG_PPP_MULTILINK
-	/* Multilink: fragment the packet over as many links
-	   as can take the packet at the moment. */
-	if (!ppp_mp_explode(ppp, skb))
+	/* send packet without multilink header */
+	if(ppp_ml_noheader) {
+		ppp_mp_roundrobin(ppp, skb);
 		return;
+	}
+	else {
+		/* Multilink: fragment the packet over as many links
+		   as can take the packet at the moment. */
+		if (!ppp_mp_explode(ppp, skb)) {
+			return;
+		}
+	}
 #endif /* CONFIG_PPP_MULTILINK */

 	ppp->xmit_pending = NULL;
@@ -1304,6 +1325,38 @@

 #ifdef CONFIG_PPP_MULTILINK
 /*
+ * Send packet through the next channel (round robin)
+ */
+static void ppp_mp_roundrobin(struct ppp *ppp, struct sk_buff *skb)
+{
+	int i;
+	struct channel *pch;
+
+	ppp->rrsched++;
+	i = 0;
+	list_for_each_entry(pch, &ppp->channels, clist)      {
+		if(pch->chan == NULL) continue;
+
+		if (ppp->rrsched % ppp->n_channels == i) {
+			spin_lock_bh(&pch->downl);
+			if (pch->chan) {
+				if (pch->chan->ops->start_xmit(pch->chan, skb)) {
+					ppp->xmit_pending = NULL;
+				}
+			} else {
+				/* channel got unregistered */
+				kfree_skb(skb);
+				ppp->xmit_pending = NULL;
+			}
+			spin_unlock_bh(&pch->downl);
+			return;
+		}
+		i++;
+	}
+	return;
+}
+
+/*
  * Divide a packet to be transmitted into fragments and
  * send them out the individual links.
  */
@@ -1352,13 +1405,21 @@
 		}
 		++i;
 	}
-	/*
-	 * Don't start sending this	packet unless at least half	of
-	 * the channels	are	free.  This	gives much better TCP
-	 * performance if we have a	lot	of channels.
-	 */
-	if (nfree == 0 || nfree	< navail / 2)
-		return 0; /* can't take now, leave it in xmit_pending	*/
+
+
+	if(ppp_ml_noexplode) {
+	}
+	else {
+		/*
+		 * Don't start sending this	packet unless at least half	of
+		 * the channels	are	free.  This	gives much better TCP
+		 * performance if we have a	lot	of channels.
+		 */
+		if (nfree == 0 || nfree	< navail / 2) {
+			return 0; /* can't take now, leave it in xmit_pending	*/
+
+		}
+	}

 	/* Do protocol field compression (XXX this should be optional) */
 	p =	skb->data;
@@ -1371,6 +1432,7 @@
 	totlen = len;
 	nbigger	= len %	nfree;

+
 	/* skip	to the channel after the one we	last used
 	   and start at	that one */
 	list = &ppp->channels;
@@ -1432,33 +1494,40 @@
 		*of the channel we are going to transmit on
 		*/
 		flen = len;
-		if (nfree > 0) {
-			if (pch->speed == 0) {
-				flen = totlen/nfree	;
-				if (nbigger > 0) {
-					flen++;
-					nbigger--;
-				}
-			} else {
-				flen = (((totfree - nzero)*(totlen + hdrlen*totfree)) /
-					((totspeed*totfree)/pch->speed)) - hdrlen;
-				if (nbigger > 0) {
-					flen += ((totfree - nzero)*pch->speed)/totspeed;
-					nbigger -= ((totfree - nzero)*pch->speed)/
-							totspeed;
+
+		if(ppp_ml_noexplode) {
+			nfree--;
+		}
+		else {
+			if (nfree > 0) {
+				if (pch->speed == 0) {
+					flen = totlen/nfree	;
+					if (nbigger > 0) {
+						flen++;
+						nbigger--;
+					}
+				} else {
+					flen = (((totfree - nzero)*(totlen + hdrlen*totfree)) /
+						((totspeed*totfree)/pch->speed)) - hdrlen;
+					if (nbigger > 0) {
+						flen += ((totfree - nzero)*pch->speed)/totspeed;
+						nbigger -= ((totfree - nzero)*pch->speed)/
+								totspeed;
+					}
 				}
+				nfree--;
 			}
-			nfree--;
+			/*
+			 *check	if we are on the last channel or
+			 *we exceded the lenght	of the data	to
+			 *fragment
+			 */
+			if ((nfree <= 0) || (flen > len))
+				flen = len;
+
 		}

 		/*
-		 *check	if we are on the last channel or
-		 *we exceded the lenght	of the data	to
-		 *fragment
-		 */
-		if ((nfree <= 0) || (flen > len))
-			flen = len;
-		/*
 		 *it is not worth to tx on slow channels:
 		 *in that case from the resulting flen according to the
 		 *above formula will be equal or less than zero.

[-- Attachment #2: ppp_ml_noexplode.patch --]
[-- Type: text/x-diff, Size: 5747 bytes --]

--- drivers/net/ppp_generic.c.orig	2010-03-25 16:56:05.000000000 +0100
+++ drivers/net/ppp_generic.c	2010-03-30 19:56:15.000000000 +0200
@@ -129,6 +129,7 @@
 	u32		nextseq;	/* MP: seq no of next packet */
 	u32		minseq;		/* MP: min of most recent seqnos */
 	struct sk_buff_head mrq;	/* MP: receive reconstruction queue */
+	int     rrsched;    /* round robin scheduler for packet distribution */
 #endif /* CONFIG_PPP_MULTILINK */
 #ifdef CONFIG_PPP_FILTER
 	struct sock_filter *pass_filter;	/* filter for packets to pass */
@@ -227,6 +228,17 @@
 #define B	0x80		/* this fragment begins a packet */
 #define E	0x40		/* this fragment ends a packet */
 
+#ifdef CONFIG_PPP_MULTILINK
+/* alternate fragmentation algorithm and multilink behaviour options added by uli.staerk@globalways.net */
+static int ppp_ml_noexplode = 0;
+module_param(ppp_ml_noexplode, int, 0600);
+MODULE_PARM_DESC(ppp_ml_noexplode, "Set this to any other values than zero to avoid fragmentation over connected channels");
+
+static int ppp_ml_noheader = 0;
+module_param(ppp_ml_noheader, int, 0600);
+MODULE_PARM_DESC(ppp_ml_noheader, "Set this to any other value than zero to remove the ppp-multilink protocol header (pppoes.protocol not 0x003d) which enables fragmentation and reordering");
+#endif /* CONFIG_PPP_MULTILINK */
+
 /* Compare multilink sequence numbers (assumed to be 32 bits wide) */
 #define seq_before(a, b)	((s32)((a) - (b)) < 0)
 #define seq_after(a, b)		((s32)((a) - (b)) > 0)
@@ -250,6 +262,7 @@
 static void ppp_mp_insert(struct ppp *ppp, struct sk_buff *skb);
 static struct sk_buff *ppp_mp_reconstruct(struct ppp *ppp);
 static int ppp_mp_explode(struct ppp *ppp, struct sk_buff *skb);
+static void ppp_mp_roundrobin(struct ppp *ppp, struct sk_buff *skb);
 #endif /* CONFIG_PPP_MULTILINK */
 static int ppp_set_compress(struct ppp *ppp, unsigned long arg);
 static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound);
@@ -1292,10 +1305,18 @@
 	}
 
 #ifdef CONFIG_PPP_MULTILINK
-	/* Multilink: fragment the packet over as many links
-	   as can take the packet at the moment. */
-	if (!ppp_mp_explode(ppp, skb))
+	/* send packet without multilink header */
+	if(ppp_ml_noheader) {
+		ppp_mp_roundrobin(ppp, skb);
 		return;
+	}
+	else {
+		/* Multilink: fragment the packet over as many links
+		   as can take the packet at the moment. */
+		if (!ppp_mp_explode(ppp, skb)) {
+			return;
+		}
+	}
 #endif /* CONFIG_PPP_MULTILINK */
 
 	ppp->xmit_pending = NULL;
@@ -1304,6 +1325,38 @@
 
 #ifdef CONFIG_PPP_MULTILINK
 /*
+ * Send packet through the next channel (round robin)
+ */
+static void ppp_mp_roundrobin(struct ppp *ppp, struct sk_buff *skb)
+{
+	int i;
+	struct channel *pch;
+
+	ppp->rrsched++;
+	i = 0;
+	list_for_each_entry(pch, &ppp->channels, clist)      {
+		if(pch->chan == NULL) continue;
+
+		if (ppp->rrsched % ppp->n_channels == i) {
+			spin_lock_bh(&pch->downl);
+			if (pch->chan) {
+				if (pch->chan->ops->start_xmit(pch->chan, skb)) {
+					ppp->xmit_pending = NULL;
+				}
+			} else {
+				/* channel got unregistered */
+				kfree_skb(skb);
+				ppp->xmit_pending = NULL;
+			}
+			spin_unlock_bh(&pch->downl);
+			return;
+		}
+		i++;
+	}
+	return;
+}
+
+/*
  * Divide a packet to be transmitted into fragments and
  * send them out the individual links.
  */
@@ -1352,13 +1405,21 @@
 		}
 		++i;
 	}
-	/*
-	 * Don't start sending this	packet unless at least half	of
-	 * the channels	are	free.  This	gives much better TCP
-	 * performance if we have a	lot	of channels.
-	 */
-	if (nfree == 0 || nfree	< navail / 2)
-		return 0; /* can't take now, leave it in xmit_pending	*/
+
+
+	if(ppp_ml_noexplode) {
+	}
+	else {
+		/*
+		 * Don't start sending this	packet unless at least half	of
+		 * the channels	are	free.  This	gives much better TCP
+		 * performance if we have a	lot	of channels.
+		 */
+		if (nfree == 0 || nfree	< navail / 2) {
+			return 0; /* can't take now, leave it in xmit_pending	*/
+
+		}
+	}
 
 	/* Do protocol field compression (XXX this should be optional) */
 	p =	skb->data;
@@ -1371,6 +1432,7 @@
 	totlen = len;
 	nbigger	= len %	nfree;
 
+
 	/* skip	to the channel after the one we	last used
 	   and start at	that one */
 	list = &ppp->channels;
@@ -1432,33 +1494,40 @@
 		*of the channel we are going to transmit on
 		*/
 		flen = len;
-		if (nfree > 0) {
-			if (pch->speed == 0) {
-				flen = totlen/nfree	;
-				if (nbigger > 0) {
-					flen++;
-					nbigger--;
-				}
-			} else {
-				flen = (((totfree - nzero)*(totlen + hdrlen*totfree)) /
-					((totspeed*totfree)/pch->speed)) - hdrlen;
-				if (nbigger > 0) {
-					flen += ((totfree - nzero)*pch->speed)/totspeed;
-					nbigger -= ((totfree - nzero)*pch->speed)/
-							totspeed;
+
+		if(ppp_ml_noexplode) {
+			nfree--;
+		}
+		else {
+			if (nfree > 0) {
+				if (pch->speed == 0) {
+					flen = totlen/nfree	;
+					if (nbigger > 0) {
+						flen++;
+						nbigger--;
+					}
+				} else {
+					flen = (((totfree - nzero)*(totlen + hdrlen*totfree)) /
+						((totspeed*totfree)/pch->speed)) - hdrlen;
+					if (nbigger > 0) {
+						flen += ((totfree - nzero)*pch->speed)/totspeed;
+						nbigger -= ((totfree - nzero)*pch->speed)/
+								totspeed;
+					}
 				}
+				nfree--;
 			}
-			nfree--;
+			/*
+			 *check	if we are on the last channel or
+			 *we exceded the lenght	of the data	to
+			 *fragment
+			 */
+			if ((nfree <= 0) || (flen > len))
+				flen = len;
+
 		}
 
 		/*
-		 *check	if we are on the last channel or
-		 *we exceded the lenght	of the data	to
-		 *fragment
-		 */
-		if ((nfree <= 0) || (flen > len))
-			flen = len;
-		/*
 		 *it is not worth to tx on slow channels:
 		 *in that case from the resulting flen according to the
 		 *above formula will be equal or less than zero.

[-- Attachment #3: ppp_ml_noexplode-with_debug.patch --]
[-- Type: text/x-diff, Size: 7913 bytes --]

--- drivers/net/ppp_generic.c.orig	2010-03-25 16:56:05.000000000 +0100
+++ drivers/net/ppp_generic.c.patched_with_debug	2010-03-30 20:03:31.000000000 +0200
@@ -129,6 +129,7 @@
 	u32		nextseq;	/* MP: seq no of next packet */
 	u32		minseq;		/* MP: min of most recent seqnos */
 	struct sk_buff_head mrq;	/* MP: receive reconstruction queue */
+	int     rrsched;    /* round robin scheduler for packet distribution */
 #endif /* CONFIG_PPP_MULTILINK */
 #ifdef CONFIG_PPP_FILTER
 	struct sock_filter *pass_filter;	/* filter for packets to pass */
@@ -227,6 +228,17 @@
 #define B	0x80		/* this fragment begins a packet */
 #define E	0x40		/* this fragment ends a packet */
 
+#ifdef CONFIG_PPP_MULTILINK
+/* alternate fragmentation algorithm and multilink behaviour options added by uli.staerk@globalways.net */
+static int ppp_ml_noexplode = 0;
+module_param(ppp_ml_noexplode, int, 0600);
+MODULE_PARM_DESC(ppp_ml_noexplode, "Set this to any other values than zero to avoid fragmentation over connected channels");
+
+static int ppp_ml_noheader = 0;
+module_param(ppp_ml_noheader, int, 0600);
+MODULE_PARM_DESC(ppp_ml_noheader, "Set this to any other value than zero to remove the ppp-multilink protocol header (pppoes.protocol not 0x003d) which enables fragmentation and reordering");
+#endif /* CONFIG_PPP_MULTILINK */
+
 /* Compare multilink sequence numbers (assumed to be 32 bits wide) */
 #define seq_before(a, b)	((s32)((a) - (b)) < 0)
 #define seq_after(a, b)		((s32)((a) - (b)) > 0)
@@ -250,6 +262,7 @@
 static void ppp_mp_insert(struct ppp *ppp, struct sk_buff *skb);
 static struct sk_buff *ppp_mp_reconstruct(struct ppp *ppp);
 static int ppp_mp_explode(struct ppp *ppp, struct sk_buff *skb);
+static void ppp_mp_roundrobin(struct ppp *ppp, struct sk_buff *skb);
 #endif /* CONFIG_PPP_MULTILINK */
 static int ppp_set_compress(struct ppp *ppp, unsigned long arg);
 static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound);
@@ -1273,6 +1286,7 @@
 		return;
 	}
 
+printk(KERN_ERR "send packet\n");
 	if ((ppp->flags & SC_MULTILINK) == 0) {
 		/* not doing multilink: send it down the first channel */
 		list = list->next;
@@ -1292,11 +1306,24 @@
 	}
 
 #ifdef CONFIG_PPP_MULTILINK
-	/* Multilink: fragment the packet over as many links
-	   as can take the packet at the moment. */
-	if (!ppp_mp_explode(ppp, skb))
+	/* send packet without multilink header */
+	if(ppp_ml_noheader) {
+printk(KERN_ERR "BEGIN SEND RR\n");
+		ppp_mp_roundrobin(ppp, skb);
 		return;
+printk(KERN_ERR "END SEND RR\n");
+	}
+	else {
+printk(KERN_ERR "BEGIN SEND MULTILINK\n");
+		/* Multilink: fragment the packet over as many links
+		   as can take the packet at the moment. */
+		if (!ppp_mp_explode(ppp, skb)) {
+printk(KERN_ERR "END SEND MULTILINK\n");
+			return;
+		}
+	}
 #endif /* CONFIG_PPP_MULTILINK */
+printk(KERN_ERR "END SEND DROP PACKET\n");
 
 	ppp->xmit_pending = NULL;
 	kfree_skb(skb);
@@ -1304,6 +1331,41 @@
 
 #ifdef CONFIG_PPP_MULTILINK
 /*
+ * Send packet through the next channel (round robin)
+ */
+static void ppp_mp_roundrobin(struct ppp *ppp, struct sk_buff *skb)
+{
+	int i;
+	struct channel *pch;
+
+	ppp->rrsched++;
+printk(KERN_ERR "  RR counter=%d, len=%d, devmtu=%d\n", ppp->rrsched, skb->len, ppp->dev->mtu);
+	i = 0;
+	list_for_each_entry(pch, &ppp->channels, clist)      {
+		if(pch->chan == NULL) continue;
+
+		if (ppp->rrsched % ppp->n_channels == i) {
+printk(KERN_ERR "  RR send via %d, chmtu=%d\n", i, pch->chan->mtu);
+			spin_lock_bh(&pch->downl);
+			if (pch->chan) {
+				if (pch->chan->ops->start_xmit(pch->chan, skb)) {
+					ppp->xmit_pending = NULL;
+				}
+			} else {
+printk(KERN_ERR "  RR dropped at %d\n", i);
+				/* channel got unregistered */
+				kfree_skb(skb);
+				ppp->xmit_pending = NULL;
+			}
+			spin_unlock_bh(&pch->downl);
+			return;
+		}
+		i++;
+	}
+	return;
+}
+
+/*
  * Divide a packet to be transmitted into fragments and
  * send them out the individual links.
  */
@@ -1352,13 +1414,25 @@
 		}
 		++i;
 	}
-	/*
-	 * Don't start sending this	packet unless at least half	of
-	 * the channels	are	free.  This	gives much better TCP
-	 * performance if we have a	lot	of channels.
-	 */
-	if (nfree == 0 || nfree	< navail / 2)
-		return 0; /* can't take now, leave it in xmit_pending	*/
+
+printk(KERN_ERR "  ML nfree=%d, navail=%d, nzero=%d, totfree=%d, totspeed=%d\n", nfree,navail,nzero,totfree,totspeed);
+
+	if(ppp_ml_noexplode) {
+printk(KERN_ERR "  ML no explode A\n");
+	}
+	else {
+printk(KERN_ERR "  ML explode A\n");
+		/*
+		 * Don't start sending this	packet unless at least half	of
+		 * the channels	are	free.  This	gives much better TCP
+		 * performance if we have a	lot	of channels.
+		 */
+		if (nfree == 0 || nfree	< navail / 2) {
+printk(KERN_ERR "  ML wait A\n");
+			return 0; /* can't take now, leave it in xmit_pending	*/
+
+		}
+	}
 
 	/* Do protocol field compression (XXX this should be optional) */
 	p =	skb->data;
@@ -1371,6 +1445,8 @@
 	totlen = len;
 	nbigger	= len %	nfree;
 
+printk(KERN_ERR "  ML len=%d, totlen=%d, nbigger=%d\n", len, totlen, nbigger);
+
 	/* skip	to the channel after the one we	last used
 	   and start at	that one */
 	list = &ppp->channels;
@@ -1381,10 +1457,12 @@
 			break;
 		}
 	}
+printk(KERN_ERR "  ML skip to channel=%d\n", i);
 
 	/* create a	fragment for each channel */
 	bits = B;
 	while (len	> 0) {
+printk(KERN_ERR "  ML while len=%d, i=%d\n", len, i);
 		list = list->next;
 		if (list ==	&ppp->channels)	{
 			i =	0;
@@ -1432,45 +1510,58 @@
 		*of the channel we are going to transmit on
 		*/
 		flen = len;
-		if (nfree > 0) {
-			if (pch->speed == 0) {
-				flen = totlen/nfree	;
-				if (nbigger > 0) {
-					flen++;
-					nbigger--;
-				}
-			} else {
-				flen = (((totfree - nzero)*(totlen + hdrlen*totfree)) /
-					((totspeed*totfree)/pch->speed)) - hdrlen;
-				if (nbigger > 0) {
-					flen += ((totfree - nzero)*pch->speed)/totspeed;
-					nbigger -= ((totfree - nzero)*pch->speed)/
-							totspeed;
+printk(KERN_ERR "  ML fragmentlen=%d\n", flen);
+
+		if(ppp_ml_noexplode) {
+printk(KERN_ERR "  ML no explode B \n");
+			nfree--;
+		}
+		else {
+printk(KERN_ERR "  ML explode B \n");
+			if (nfree > 0) {
+				if (pch->speed == 0) {
+					flen = totlen/nfree	;
+					if (nbigger > 0) {
+						flen++;
+						nbigger--;
+					}
+				} else {
+					flen = (((totfree - nzero)*(totlen + hdrlen*totfree)) /
+						((totspeed*totfree)/pch->speed)) - hdrlen;
+					if (nbigger > 0) {
+						flen += ((totfree - nzero)*pch->speed)/totspeed;
+						nbigger -= ((totfree - nzero)*pch->speed)/
+								totspeed;
+					}
 				}
+				nfree--;
 			}
-			nfree--;
+			/*
+			 *check	if we are on the last channel or
+			 *we exceded the lenght	of the data	to
+			 *fragment
+			 */
+			if ((nfree <= 0) || (flen > len))
+				flen = len;
+
+printk(KERN_ERR "  ML new fragmentlen=%d\n", flen);
 		}
 
 		/*
-		 *check	if we are on the last channel or
-		 *we exceded the lenght	of the data	to
-		 *fragment
-		 */
-		if ((nfree <= 0) || (flen > len))
-			flen = len;
-		/*
 		 *it is not worth to tx on slow channels:
 		 *in that case from the resulting flen according to the
 		 *above formula will be equal or less than zero.
 		 *Skip the channel in this case
 		 */
 		if (flen <=	0) {
+printk(KERN_ERR "  ML fragmentlen is zero\n");
 			pch->avail = 2;
 			spin_unlock_bh(&pch->downl);
 			continue;
 		}
 
 		mtu	= pch->chan->mtu - hdrlen;
+printk(KERN_ERR "  ML mtu=%d (chan-mtu=%d)\n", mtu, pch->chan->mtu);
 		if (mtu	< 4)
 			mtu	= 4;
 		if (flen > mtu)
@@ -1502,6 +1593,7 @@
 		if (!skb_queue_empty(&pch->file.xq)	||
 			!chan->ops->start_xmit(chan, frag))
 			skb_queue_tail(&pch->file.xq, frag);
+printk(KERN_ERR "  ML sent packet with seq: %d\n", ppp->nxseq);
 		pch->had_frag =	1;
 		p += flen;
 		len	-= flen;
@@ -1510,6 +1602,7 @@
 		spin_unlock_bh(&pch->downl);
 	}
 	ppp->nxchan	= i;
+printk(KERN_ERR "  ML END SEND\n");
 
 	return 1;
 

^ permalink raw reply

* Re: [net-2.6 PATCH] bonding: fix broken multicast with round-robin mode
From: Eric Dumazet @ 2010-03-31  9:08 UTC (permalink / raw)
  To: Andy Gospodarek, David Miller; +Cc: netdev, lhh, fubar, bonding-devel
In-Reply-To: <20100325214033.GA28741@gospo.rdu.redhat.com>

Le jeudi 25 mars 2010 à 17:40 -0400, Andy Gospodarek a écrit :
> Round-robin (mode 0) does nothing to ensure that any multicast traffic
> originally destined for the host will continue to arrive at the host when
> the link that sent the IGMP join or membership report goes down.  One of
> the benefits of absolute round-robin transmit.
> 
> Keeping track of subscribed multicast groups for each slave did not seem
> like a good use of resources, so I decided to simply send on the
> curr_active slave of the bond (typically the first enslaved device that
> is up).  This makes failover management simple as IGMP membership
> reports only need to be sent when the curr_active_slave changes.  I
> tested this patch and it appears to work as expected.
> 
> Originally reported by Lon Hohberger <lhh@redhat.com>.
> 
> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> CC: Lon Hohberger <lhh@redhat.com>
> CC: Jay Vosburgh <fubar@us.ibm.com>
> 
> ---
>  drivers/net/bonding/bond_main.c |   34 ++++++++++++++++++++++++++--------
>  1 files changed, 26 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 430c022..0b38455 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1235,6 +1235,11 @@ void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
>  			write_lock_bh(&bond->curr_slave_lock);
>  		}
>  	}
> +
> +	/* resend IGMP joins since all were sent on curr_active_slave */
> +	if (bond->params.mode == BOND_MODE_ROUNDROBIN) {
> +		bond_resend_igmp_join_requests(bond);
> +	}
>  }
>  
>  /**
> @@ -4138,22 +4143,35 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev
>  	struct bonding *bond = netdev_priv(bond_dev);
>  	struct slave *slave, *start_at;
>  	int i, slave_no, res = 1;
> +	struct iphdr *iph = ip_hdr(skb);
>  
>  	read_lock(&bond->lock);
>  
>  	if (!BOND_IS_OK(bond))
>  		goto out;
> -
>  	/*
> -	 * Concurrent TX may collide on rr_tx_counter; we accept that
> -	 * as being rare enough not to justify using an atomic op here
> +	 * Start with the curr_active_slave that joined the bond as the
> +	 * default for sending IGMP traffic.  For failover purposes one
> +	 * needs to maintain some consistency for the interface that will
> +	 * send the join/membership reports.  The curr_active_slave found
> +	 * will send all of this type of traffic.
>  	 */
> -	slave_no = bond->rr_tx_counter++ % bond->slave_cnt;
> +	if ((skb->protocol == htons(ETH_P_IP)) &&
> +	    (iph->protocol == htons(IPPROTO_IGMP))) {

Hmm...

iph->protocol is a u8, how can htons(IPPROTO_IGMP) be equal to
iph->protocol ?

[PATCH] bonding: bond_xmit_roundrobin() fix

Commit a2fd940f (bonding: fix broken multicast with round-robin mode)
added a problem on litle endian machines.

drivers/net/bonding/bond_main.c:4159: warning: comparison is always
false due to limited range of data type

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 5b92fbf..5972a52 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4156,7 +4156,7 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev
 	 * send the join/membership reports.  The curr_active_slave found
 	 * will send all of this type of traffic.
 	 */
-	if ((iph->protocol == htons(IPPROTO_IGMP)) &&
+	if ((iph->protocol == IPPROTO_IGMP) &&
 	    (skb->protocol == htons(ETH_P_IP))) {
 
 		read_lock(&bond->curr_slave_lock);




^ permalink raw reply related

* iproute u32 filter - server hang
From: Paweł Staszewski @ 2010-03-31  9:10 UTC (permalink / raw)
  To: Linux Network Development list

I find some problem with iproute2 and u32 filters

To reproduce the problem (need to make one mistake in filter parent 
declaration 1:101):

tc qdisc add dev eth0 root handle 1: hfsc default 63
tc class add dev eth0 parent 1: classid 1:1 hfsc sc rate 100mbit ul rate 
100mbit
tc class add dev eth0 parent 1:1 classid 1:2 hfsc sc rate 1mbit ul rate 
1mbit
tc class add dev eth0 parent 1:1 classid 1:63 hfsc sc rate 99mbit ul 
rate 99mbit
tc class add dev eth0 parent 1:1 classid 1:101 hfsc sc rate 8kbit ul 
rate 1mbit
tc class add dev eth0 parent 1:101 classid 1:102 hfsc sc rate 8kbit ul 
rate 1mbit
tc filter add dev eth0 protocol ip parent 1: u32 match ip dst 
212.77.100.101 flowid 1:101
tc filter add dev eth0 protocol ip parent 1:101 u32 match ip protocol 1 
0xff flowid 1:101

ping 212.77.100.101
And after this server will stop responding to anything - without any 
error (hang).



With a little different rules:
tc qdisc add dev eth0 root handle 1: hfsc default 63
tc class add dev eth0 parent 1: classid 1:1 hfsc sc rate 100mbit ul rate 
100mbit
tc class add dev eth0 parent 1:1 classid 1:2 hfsc sc rate 1mbit ul rate 
1mbit
tc class add dev eth0 parent 1:1 classid 1:63 hfsc sc rate 99mbit ul 
rate 99mbit
tc class add dev eth0 parent 1:1 classid 1:101 hfsc sc rate 8kbit ul 
rate 1mbit
tc class add dev eth0 parent 1:101 classid 1:102 hfsc sc rate 8kbit ul 
rate 1mbit
tc filter add dev eth0 protocol ip parent 1: u32 match ip dst 
212.77.100.101 flowid 1:101
tc filter add dev eth0 protocol ip parent 1: u32 match ip protocol 1 
0xff flowid 1:101

ping 212.77.100.101
All is ok.

I check this with kernels 2.6.30.1 / 2.6.33 / 2.6.33.1
iproute tc utility version: iproute2-ss090324



Best Regards
Paweł Staszewski

^ permalink raw reply

* Re: [PATCH][V2] MACB: Set PHY address in kernel parameters
From: Marc Kleine-Budde @ 2010-03-31  9:17 UTC (permalink / raw)
  To: Anders Darander
  Cc: Haavard Skinnemoen, David S. Miller, Jiri Pirko, Erik Waling,
	Patrick McHardy, Anders Darander, Grant Likely, netdev,
	linux-kernel
In-Reply-To: <1270025218-7245-1-git-send-email-anders.darander@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]

Anders Darander wrote:
> From: Anders Darander <ad@datarespons.se>
> 
> Add the possibility to set the phy address. This is needed if an integrated
> switch is connected to the MAC, as it is often the case that the highest port
> is the one connected to the MAC of the MCU.
> 
> E.g. in the case of the Micrel KSZ8873, port 3 is the one to connect to the
> MCU, thus, the MAC needs to connect to phy address 0x03, instead of the first
> phy found.

We're using phy_mask in one of our projects, it's still using 2.6.29,
though. I think it's worth testing if it's still working.

static struct at91_eth_data __initdata p298_macb_data = {
       .is_rmii        = 0,
       .phy_mask       = ~(1 << 8),
};

at91_add_device_eth(&p298_macb_data);

cheers, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply

* Re: [PATCH next-next-2.6 v2] virtio_net: missing sg_init_table
From: Michael S. Tsirkin @ 2010-03-31  9:20 UTC (permalink / raw)
  To: Shirley Ma; +Cc: David Miller, Thomas Müller, netdev, linux-kernel
In-Reply-To: <1269911955.14013.4.camel@localhost.localdomain>

On Mon, Mar 29, 2010 at 06:19:15PM -0700, Shirley Ma wrote:
> Add missing sg_init_table for sg_set_buf in virtio_net which
> induced in defer skb patch.
> 
> Reported-by: Thomas Müller <thomas@mathtm.de>
> Tested-by: Thomas Müller <thomas@mathtm.de>
> Signed-off-by: Shirley Ma <xma@us.ibm.com>

I'm concerned that the 'big' path might cause a performance regression.
Let's move sg into virtnet_info so that this needs to be only called
once?

> ---
>  drivers/net/virtio_net.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 25dc77c..3f5be35 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -326,6 +326,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, gfp_t gfp)
>  	struct scatterlist sg[2];
>  	int err;
>  
> +	sg_init_table(sg, 2);
>  	skb = netdev_alloc_skb_ip_align(vi->dev, MAX_PACKET_LEN);
>  	if (unlikely(!skb))
>  		return -ENOMEM;
> @@ -351,6 +352,7 @@ static int add_recvbuf_big(struct virtnet_info *vi, gfp_t gfp)
>  	char *p;
>  	int i, err, offset;
>  
> +	sg_init_table(sg, MAX_SKB_FRAGS + 2);
>  	/* page in sg[MAX_SKB_FRAGS + 1] is list tail */
>  	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>  		first = get_a_page(vi, gfp);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] virtio_net: avoid BUG_ON() with large packets when CONFIG_DEBUG_SG=y
From: Michael S. Tsirkin @ 2010-03-31  9:17 UTC (permalink / raw)
  To: Rusty Russell; +Cc: netdev
In-Reply-To: <201003301526.35720.rusty@rustcorp.com.au>

On Tue, Mar 30, 2010 at 03:26:35PM +1030, Rusty Russell wrote:
> AFAICT only weird kvm setups and lguest traverse this code path now.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

vhost still lacks support mergeable buffers so it uses this path.

> ---
>  drivers/net/virtio_net.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -351,6 +351,9 @@ static int add_recvbuf_big(struct virtne
>  	char *p;
>  	int i, err, offset;
>  
> +	/* This is a waste of cycles, but satisfies CONFIG_DEBUG_SG. */
> +	sg_init_table(sg, ARRAY_SIZE(sg));
> +

How about moving sg into virtnet_info? This way we could call
this only once, right?

>  	/* page in sg[MAX_SKB_FRAGS + 1] is list tail */
>  	for (i = MAX_SKB_FRAGS + 1; i > 1; --i) {
>  		first = get_a_page(vi, gfp);

^ permalink raw reply

* Re: iproute u32 filter - server hang
From: Patrick McHardy @ 2010-03-31  9:34 UTC (permalink / raw)
  To: Paweł Staszewski; +Cc: Linux Network Development list
In-Reply-To: <4BB3116F.803@itcare.pl>

[-- Attachment #1: Type: text/plain, Size: 889 bytes --]

Paweł Staszewski wrote:
> I find some problem with iproute2 and u32 filters
> 
> To reproduce the problem (need to make one mistake in filter parent
> declaration 1:101):
> 
> ...
> tc filter add dev eth0 protocol ip parent 1:101 u32 match ip protocol 1
> 0xff flowid 1:101
> 
> ping 212.77.100.101
> And after this server will stop responding to anything - without any
> error (hang).

This is caused by hfsc_classify() looping endlessly since the filter
points to the originating class. hfsc_bind_tcf() is actually supposed
to prevent this, but it only prevents resolving the filter immediately
and we still run into the loop at runtime.

This patch (based on how CBQ handles this) should abort classification
and fall back to the default class. It would be better to simply catch
this at configuration time, but that looks a bit more involved. I'll try
to look into it this weekend.



[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1295 bytes --]

diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index b38b39c..a435cf1 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1155,7 +1155,7 @@ static struct hfsc_class *
 hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 {
 	struct hfsc_sched *q = qdisc_priv(sch);
-	struct hfsc_class *cl;
+	struct hfsc_class *head, *cl;
 	struct tcf_result res;
 	struct tcf_proto *tcf;
 	int result;
@@ -1166,6 +1166,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 			return cl;
 
 	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+	head = &q->root;
 	tcf = q->root.filter_list;
 	while (tcf && (result = tc_classify(skb, tcf, &res)) >= 0) {
 #ifdef CONFIG_NET_CLS_ACT
@@ -1180,6 +1181,8 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 		if ((cl = (struct hfsc_class *)res.class) == NULL) {
 			if ((cl = hfsc_find_class(res.classid, sch)) == NULL)
 				break; /* filter selected invalid classid */
+			if (cl->level >= head->level)
+				break; /* filter may only point downwards */
 		}
 
 		if (cl->level == 0)
@@ -1187,6 +1190,7 @@ hfsc_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr)
 
 		/* apply inner filter chain */
 		tcf = cl->filter_list;
+		head = cl;
 	}
 
 	/* classification failed, try default class */

^ permalink raw reply related

* Re: [PATCH v3 10/12] l2tp: Add L2TP ethernet pseudowire support
From: James Chapman @ 2010-03-31  9:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20100330093252.60d9cbee@nehalam>

Stephen Hemminger wrote:
> On Tue, 30 Mar 2010 17:18:19 +0100
> James Chapman <jchapman@katalix.com> wrote:
> 
>> +struct l2tp_eth_net {
>> +	struct list_head l2tp_eth_dev_list;
>> +	rwlock_t l2tp_eth_lock;
>> +};
> 
> Reader/write locks are discouraged because they are slower than
> spin locks.  If you have lots of readers use RCU, if reading
> is infrequent just use a spin lock.

Ok. In doing the conversion of the rwlocks in l2tp_core.c, I'm finding
that some list access primitives don't have rcu equivalents, namely
list_is_last(), list_for_each_entry_safe(). Is this intentional? Should
I add the missing ones in a separate patch?


-- 
James Chapman
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development


^ permalink raw reply

* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
From: Eric W. Biederman @ 2010-03-31  9:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Benjamin Thery
In-Reply-To: <4BB30644.9090809@gmail.com>

Tejun Heo <htejun@gmail.com> writes:

> Just wanna add a bit more.
>
> On 03/31/2010 05:17 PM, Tejun Heo wrote:
>> If you think all those callbacks are absolute necessities, can you
>> please at least add boatload of comments around them explaning what
>> they're meant to do and how they're gonna be used?  It's probably
>> because I don't have any experience with namespaces but I really can't
>> wrap my head around it as it currently stands.
>
> The reason why I talked about proper layering is the same reason.
> It's very difficult to review your code because I have no idea how
> those callbacks are meant to be used and gonna behave and that lowers
> maintainability significantly in the long run.  If at all possible,
> please make it implement a discrete function which is used to
> implement something higher up.  If it's already done like that and I'm
> just being stupid, please feel free to enlighten me.

Apologies.   There is a fine line between sending enough patches
to give context and completely overwhelming people with patches,
and of course by this time I am so accustomed to this code I am
practically blind to it.

Let me try a happy median between overwhelming and too little
information by giving you some experts, and a bit of overview.

(Ugh after have writing this I certainly will agree that we
 have some many layers in the device model that they become
 obfuscating abstractions).

Looking through my code there are 3 types of callbacks.
- Callbacks to the namespace type of a children.
  .child_ns_type
- Callbacks to find the namespace of a kobject.
  .namespace
- Callbacks on the a namespace type to find the namespace
  of a particular context.
  .current_ns
  .initial_ns  (not used in my patchset)
  .netlink_ns  (not used in my patchset)


In a world of weird explicitness I expect .child_ns_type and
.namespace could be made to go away by pushing through explicit
ns_type, and namespace parameters everywhere. But that seems
like an awful lot of unnecessary code churn and bloat with
the only real advantage being that we have an abstraction
stored explicit at each layer.

I use child_ns_type to see if a directory should be tagged
and to figure out the type of the tags on a sysfs directory.

I use current_ns to capture the namespace (of ns_type) of the
current process when sysfs is mounted so I know what to show
userspace.

I use ktype->namespace to figure out which namespace a given
kobject's name is in.

There are intermediate steps on those methods but that is
just what appears to be the necessary boilerplate to get
from a class down to a kobject.

The nstype callbacks initial_ns and netlink_ns are not used in this
patchset.  Instead they play a role in the filtering of events sent to
userspace.

netlink_ns is used to find the namespace of a netlink socket
to see if it is ok to send an event over a netlink socket.

static int kobj_bcast_filter(struct sock *dest_sk, struct sk_buff *skb, void *data)
{
	struct kobject *kobj = data;
	const struct kobj_ns_type_operations *ops;

	ops = kobj_ns_ops(kobj);
	if (ops) {
		const void *sock_ns, *ns;
		ns = kobj->ktype->namespace(kobj);
		sock_ns = ops->netlink_ns(dsk);
		return sock_ns != ns;
	}

	return 0;
}

initial_ns is used to figure out what the initial/default
namespace is for a class of namespaces.  We only report
with /sbin/hotplug events in the initial network namespace.
At least for now.

static int kobj_usermode_filter(struct kobject *kobj)
{
	const struct kobj_ns_type_operations *ops;

	ops = kobj_ns_ops(kobj);
	if (ops) {
		const void *init_ns, *ns;
		ns = kobj->ktype->namespace(kobj);
		init_ns = ops->initial_ns();
		return ns != init_ns;
	}

	return 0;
}

This is my change that adds support for the network namespace.
The only namespace I expect to add support for in the short term.

I hope this helps,

Eric


commit fdc0adeaa8bfab9a179e1eb349cab400ddb70403
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Thu Jul 3 16:13:11 2008 -0600

    netns: Teach network device kobjects which namespace they are in.

    The problem.  Network devices show up in sysfs and with the network
    namespace active multiple devices with the same name can show up in
    the same directory, ouch!

    To avoid that problem and allow existing applications in network namespaces
    to see the same interface that is currently presented in sysfs, this
    patch enables the tagging directory support in sysfs.

    By using the network namespace pointers as tags to separate out the
    the sysfs directory entries we ensure that we don't have conflicts
    in the directories and applications only see a limited set of
    the network devices.

    Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index d9456f6..9452e39 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -138,6 +138,7 @@ extern const struct sysfs_ops kobj_sysfs_ops;

 enum kobj_ns_type {
 	KOBJ_NS_TYPE_NONE = 0,
+	KOBJ_NS_TYPE_NET,
 	KOBJ_NS_TYPES
 };

diff --git a/net/Kconfig b/net/Kconfig
index 041c35e..265e33b 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -45,6 +45,14 @@ config COMPAT_NETLINK_MESSAGES

 menu "Networking options"

+config NET_NS
+	bool "Network namespace support"
+	default n
+	depends on EXPERIMENTAL && NAMESPACES
+	help
+	  Allow user space to create what appear to be multiple instances
+	  of the network stack.
+
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/xfrm/Kconfig"
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 099c753..1b98e36 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -13,7 +13,9 @@
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
 #include <linux/if_arp.h>
+#include <linux/nsproxy.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 #include <linux/rtnetlink.h>
 #include <linux/wireless.h>
 #include <net/wext.h>
@@ -466,6 +468,37 @@ static struct attribute_group wireless_group = {
 };
 #endif

+static const void *net_current_ns(void)
+{
+	return current->nsproxy->net_ns;
+}
+
+static const void *net_initial_ns(void)
+{
+	return &init_net;
+}
+
+static const void *net_netlink_ns(struct sock *sk)
+{
+	return sock_net(sk);
+}
+
+static struct kobj_ns_type_operations net_ns_type_operations = {
+	.type = KOBJ_NS_TYPE_NET,
+	.current_ns = net_current_ns,
+	.netlink_ns = net_netlink_ns,
+	.initial_ns = net_initial_ns,
+};
+
+static void net_kobj_ns_exit(struct net *net)
+{
+	kobj_ns_exit(KOBJ_NS_TYPE_NET, net);
+}
+
+static struct pernet_operations sysfs_net_ops = {
+	.exit = net_kobj_ns_exit,
+};
+
 #endif /* CONFIG_SYSFS */

 #ifdef CONFIG_HOTPLUG
@@ -506,6 +539,13 @@ static void netdev_release(struct device *d)
 	kfree((char *)dev - dev->padded);
 }

+static const void *net_namespace(struct device *d)
+{
+	struct net_device *dev;
+	dev = container_of(d, struct net_device, dev);
+	return dev_net(dev);
+}
+
 static struct class net_class = {
 	.name = "net",
 	.dev_release = netdev_release,
@@ -515,6 +555,8 @@ static struct class net_class = {
 #ifdef CONFIG_HOTPLUG
 	.dev_uevent = netdev_uevent,
 #endif
+	.ns_type = &net_ns_type_operations,
+	.namespace = net_namespace,
 };

 /* Delete sysfs entries but hold kobject reference until after all
@@ -587,5 +629,9 @@ void netdev_initialize_kobject(struct net_device *net)

 int netdev_kobject_init(void)
 {
+	kobj_ns_type_register(&net_ns_type_operations);
+#ifdef CONFIG_SYSFS
+	register_pernet_subsys(&sysfs_net_ops);
+#endif
 	return class_register(&net_class);
 }

^ permalink raw reply related

* Re: [PATCH][V2] MACB: Set PHY address in kernel parameters
From: Anders Darander @ 2010-03-31  9:39 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Haavard Skinnemoen, David S. Miller, Jiri Pirko, Erik Waling,
	Patrick McHardy, Grant Likely, netdev, linux-kernel
In-Reply-To: <4BB31345.5050101@pengutronix.de>

* Marc Kleine-Budde <mkl@pengutronix.de> [100331 11:18]:
> We're using phy_mask in one of our projects, it's still using 2.6.29,
> though. I think it's worth testing if it's still working.

Well, that was obviously something I overlooked / misinterpreted when I
tried to solve our problem!

As I understand it, phy_mask is a bitfield for setting which PHYs should
be enabled... Then this should have worked OK for us!

> static struct at91_eth_data __initdata p298_macb_data = {
>        .is_rmii        = 0,
>        .phy_mask       = ~(1 << 8),
> };

> at91_add_device_eth(&p298_macb_data);

I've not tested it, but it should probably have worked.

Regards,
Anders

^ permalink raw reply

* Re: iproute u32 filter - server hang
From: Paweł Staszewski @ 2010-03-31  9:42 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Linux Network Development list
In-Reply-To: <4BB31731.9060104@trash.net>

W dniu 2010-03-31 11:34, Patrick McHardy pisze:
> Paweł Staszewski wrote:
>    
>> I find some problem with iproute2 and u32 filters
>>
>> To reproduce the problem (need to make one mistake in filter parent
>> declaration 1:101):
>>
>> ...
>> tc filter add dev eth0 protocol ip parent 1:101 u32 match ip protocol 1
>> 0xff flowid 1:101
>>
>> ping 212.77.100.101
>> And after this server will stop responding to anything - without any
>> error (hang).
>>      
> This is caused by hfsc_classify() looping endlessly since the filter
> points to the originating class. hfsc_bind_tcf() is actually supposed
> to prevent this, but it only prevents resolving the filter immediately
> and we still run into the loop at runtime.
>
> This patch (based on how CBQ handles this) should abort classification
> and fall back to the default class. It would be better to simply catch
> this at configuration time, but that looks a bit more involved. I'll try
> to look into it this weekend.
>
>
>    
I check this also with htb and the same problem like with hfsc.
This rules also hang my server.

tc qdisc del dev eth4 root
tc qdisc add dev eth4 root handle 1: htb default 63
tc class add dev eth4 parent 1: classid 1:1 htb rate 100mbit ceil 100mbit
tc class add dev eth4 parent 1:1 classid 1:2 htb rate 1mbit ceil 1mbit
tc class add dev eth4 parent 1:1 classid 1:63 htb rate 99mbit ceil 99mbit
tc class add dev eth4 parent 1:1 classid 1:101 htb rate 8kbit ceil 1mbit
tc class add dev eth4 parent 1:101 classid 1:102 htb rate 8kbit ceil 1mbit
tc filter add dev eth4 protocol ip parent 1: u32 match ip dst 
212.77.100.101 flowid 1:101
tc filter add dev eth4 protocol ip parent 1:101 u32 match ip protocol 1 
0xff flowid 1:101





^ permalink raw reply

* Re: iproute u32 filter - server hang
From: Patrick McHardy @ 2010-03-31  9:46 UTC (permalink / raw)
  To: Paweł Staszewski; +Cc: Linux Network Development list
In-Reply-To: <4BB31908.5090601@itcare.pl>

Paweł Staszewski wrote:
> W dniu 2010-03-31 11:34, Patrick McHardy pisze:
>> Paweł Staszewski wrote:
>>   
>>> I find some problem with iproute2 and u32 filters
>>>
>>> To reproduce the problem (need to make one mistake in filter parent
>>> declaration 1:101):
>>>
>>> ...
>>> tc filter add dev eth0 protocol ip parent 1:101 u32 match ip protocol 1
>>> 0xff flowid 1:101
>>>
>>> ping 212.77.100.101
>>> And after this server will stop responding to anything - without any
>>> error (hang).
>>>      
>> This is caused by hfsc_classify() looping endlessly since the filter
>> points to the originating class. hfsc_bind_tcf() is actually supposed
>> to prevent this, but it only prevents resolving the filter immediately
>> and we still run into the loop at runtime.
>>
>> This patch (based on how CBQ handles this) should abort classification
>> and fall back to the default class. It would be better to simply catch
>> this at configuration time, but that looks a bit more involved. I'll try
>> to look into it this weekend.
>>
>>
>>    
> I check this also with htb and the same problem like with hfsc.
> This rules also hang my server.

Yes, HTB doesn't even catch loops when binding filters. As I said,
its a larger piece of work, for now please just try the patch I
sent.

^ permalink raw reply

* Re: [PATCH][V2] MACB: Set PHY address in kernel parameters
From: Marc Kleine-Budde @ 2010-03-31  9:48 UTC (permalink / raw)
  To: Anders Darander
  Cc: Haavard Skinnemoen, David S. Miller, Jiri Pirko, Erik Waling,
	Patrick McHardy, Grant Likely, netdev, linux-kernel
In-Reply-To: <20100331093956.GI4837@datarespons.se>

[-- Attachment #1: Type: text/plain, Size: 1066 bytes --]

Anders Darander wrote:
> * Marc Kleine-Budde <mkl@pengutronix.de> [100331 11:18]:
>> We're using phy_mask in one of our projects, it's still using 2.6.29,
>> though. I think it's worth testing if it's still working.
> 
> Well, that was obviously something I overlooked / misinterpreted when I
> tried to solve our problem!
> 
> As I understand it, phy_mask is a bitfield for setting which PHYs should
> be enabled... Then this should have worked OK for us!

Yes, it's a bitmask of PHYs that should be probed.

>> static struct at91_eth_data __initdata p298_macb_data = {
>>        .is_rmii        = 0,
>>        .phy_mask       = ~(1 << 8),
>> };
> 
>> at91_add_device_eth(&p298_macb_data);
> 
> I've not tested it, but it should probably have worked.

Cheers, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply

* Re: iproute u32 filter - server hang
From: Paweł Staszewski @ 2010-03-31  9:59 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Linux Network Development list
In-Reply-To: <4BB31A08.90809@trash.net>

W dniu 2010-03-31 11:46, Patrick McHardy pisze:
> Paweł Staszewski wrote:
>    
>> W dniu 2010-03-31 11:34, Patrick McHardy pisze:
>>      
>>> Paweł Staszewski wrote:
>>>
>>>        
>>>> I find some problem with iproute2 and u32 filters
>>>>
>>>> To reproduce the problem (need to make one mistake in filter parent
>>>> declaration 1:101):
>>>>
>>>> ...
>>>> tc filter add dev eth0 protocol ip parent 1:101 u32 match ip protocol 1
>>>> 0xff flowid 1:101
>>>>
>>>> ping 212.77.100.101
>>>> And after this server will stop responding to anything - without any
>>>> error (hang).
>>>>
>>>>          
>>> This is caused by hfsc_classify() looping endlessly since the filter
>>> points to the originating class. hfsc_bind_tcf() is actually supposed
>>> to prevent this, but it only prevents resolving the filter immediately
>>> and we still run into the loop at runtime.
>>>
>>> This patch (based on how CBQ handles this) should abort classification
>>> and fall back to the default class. It would be better to simply catch
>>> this at configuration time, but that looks a bit more involved. I'll try
>>> to look into it this weekend.
>>>
>>>
>>>
>>>        
>> I check this also with htb and the same problem like with hfsc.
>> This rules also hang my server.
>>      
> Yes, HTB doesn't even catch loops when binding filters. As I said,
> its a larger piece of work, for now please just try the patch I
> sent.
>    

Yes.
Your patch fix this problem.


Thanks
Paweł


> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>    

^ permalink raw reply

* Re: iproute u32 filter - server hang
From: Patrick McHardy @ 2010-03-31 10:01 UTC (permalink / raw)
  To: Paweł Staszewski; +Cc: Linux Network Development list
In-Reply-To: <4BB31D15.3000500@itcare.pl>

Paweł Staszewski wrote:
> W dniu 2010-03-31 11:46, Patrick McHardy pisze:
>>>>> tc filter add dev eth0 protocol ip parent 1:101 u32 match ip
>>>>> protocol 1
>>>>> 0xff flowid 1:101
>>>>>
>>>>> ping 212.77.100.101
>>>>> And after this server will stop responding to anything - without any
>>>>> error (hang).
>>>>>
>>>>>          
>>>> This is caused by hfsc_classify() looping endlessly since the filter
>>>> points to the originating class. hfsc_bind_tcf() is actually supposed
>>>> to prevent this, but it only prevents resolving the filter immediately
>>>> and we still run into the loop at runtime.
>>>>
>>>> This patch (based on how CBQ handles this) should abort classification
>>>> and fall back to the default class. It would be better to simply catch
>>>> this at configuration time, but that looks a bit more involved. I'll
>>>> try
>>>> to look into it this weekend.
>>>>
>>>>
>>>>
>>>>        
>>> I check this also with htb and the same problem like with hfsc.
>>> This rules also hang my server.
>>>      
>> Yes, HTB doesn't even catch loops when binding filters. As I said,
>> its a larger piece of work, for now please just try the patch I
>> sent.
>>    
> 
> Yes.
> Your patch fix this problem.

Thanks for testing, I'll let you know once I have a complete
patch for this problem.

^ permalink raw reply

* Re: [PATCH] MACB: Set PHY address in kernel parameters
From: Jiri Pirko @ 2010-03-31 10:03 UTC (permalink / raw)
  To: Anders Darander
  Cc: Haavard Skinnemoen, David S. Miller, Erik Waling, Patrick McHardy,
	Anders Darander, Grant Likely, netdev, linux-kernel
In-Reply-To: <1270021902-6556-1-git-send-email-anders.darander@gmail.com>

Wed, Mar 31, 2010 at 09:51:42AM CEST, anders.darander@gmail.com wrote:
>From: Anders Darander <ad@datarespons.se>
>
>Add the possibility to set the phy address. This is needed if an integrated
>switch is connected to the MAC, as it is often the case that the highest port
>is the one connected to the MAC of the MCU.
>
>E.g. in the case of the Micrel KSZ8873, port 3 is the one to connect to the
>MCU, thus, the MAC needs to connect to phy address 0x03, instead of the first
>phy found.
>
>Signed-off-by: Anders Darander <ad@datarespons.se>
>---
> drivers/net/macb.c |   14 +++++++++++++-
> 1 files changed, 13 insertions(+), 1 deletions(-)
>
>diff --git a/drivers/net/macb.c b/drivers/net/macb.c
>index c8a18a6..9b4e301 100644
>--- a/drivers/net/macb.c
>+++ b/drivers/net/macb.c
>@@ -53,6 +53,14 @@
> #define MACB_RX_INT_FLAGS	(MACB_BIT(RCOMP) | MACB_BIT(RXUBR)	\
> 				 | MACB_BIT(ISR_ROVR))
> 
>+/*
>+ * Setup PHY probeing
>+ */
>+
>+static int phy_addr = PHY_MAX_ADDR;
>+module_param(phy_addr, ushort, 0);
>+MODULE_PARAM_DESC(phy_addr, "PHY address connected to the MACB");
>+
> static void __macb_set_hwaddr(struct macb *bp)
> {
> 	u32 bottom;
>@@ -193,7 +201,11 @@ static int macb_mii_probe(struct net_device *dev)
> 	struct eth_platform_data *pdata;
> 	int ret;
> 
>-	phydev = phy_find_first(bp->mii_bus);
>+	if (phy_addr >= PHY_MAX_ADDRESS)
>+		phydev = phy_find_first(bp->mii_bus);
>+	else
>+		phydev = bp->mii_bus->phy_map[phy_addr];
>+
> 	if (!phydev) {
> 		printk (KERN_ERR "%s: no PHY found\n", dev->name);
> 		return -1;
>-- 
>1.7.0.3
>

This is really ugly :( Should be done differently, more general. I've been
thinking about this for a while. Maybe the solution is to integrate the switch
into DSA subsystem. See net/dsa. Not sure though...

Jirka

^ permalink raw reply

* Re: [Patch] fix packet loss and massive ping spikes with PPP multi-link
From: Ben McKeegan @ 2010-03-31 10:03 UTC (permalink / raw)
  To: netdev, linux-ppp
  Cc: Alan Cox, Alexander E. Patrakov, Richard Hartmann, linux-kernel
In-Reply-To: <20100326170428.6c1ad66c@lxorguk.ukuu.org.uk>

[-- Attachment #1: Type: text/plain, Size: 2775 bytes --]

>>> Making it runtime per link selectable would be nicer but thats a bit more
>>> work.
>> Doesn't it work already via echoing values to 
>> /sys/module/ppp/generic/parameters/ml_explode in the above code?
> 
> Thats runtime (and why I set 0600 in the permissions for the example) but
> not per link.
> 

I needed to do something similar a while back and I took a very 
different approach, which I think is more flexible.   Rather than 
implement a new round-robin scheduler I simply introduced a target 
minimum fragment size into the fragment size calculation, as a per 
bundle parameter that can be configured via a new ioctl.  This modifies 
the algorithm so that it tries to limit the number of fragments such 
that each fragment is at least the minimum size.  If the minimum size is 
greater than the packet size it will not be fragmented all but will 
instead just get sent down the next available channel.

A pppd plugin generates the ioctl call allowing this to be tweaked per 
connection.  It is more flexible in that you can still have the larger 
packets fragmented if you wish.

We've used a variant of this patch on our ADSL LNS pool for a few years 
now with varying results.  We originally did it to save bandwidth as we 
have a per packet overhead and fragmenting tiny packets such as VoIP 
across a bundle of 4 lines made no sense at all.  We've experimented 
with higher minimum settings up to and above the link MTU, thus 
achieving the equivalent of Richard's patch.

In some cases this has improved performance, others it makes it worse. 
It depends a lot on the lines and traffic patterns, and it is certainly 
not a change we would wish to have on by default.  Any solution going 
into mainline kernel would need to be tunable per connection.  One of 
the issues seems to be with poor recovery from packet loss on low 
volume, highly delay sensitive traffic on large bundles of lines.  With 
Linux at both ends you are relying on received sequence numbers to 
detect loss.  When packets are being fragmented across all channels and 
a fragment is lost, the receiving system is able to spot the lost 
fragment fairly quickly.  Once you start sending some multilink frames 
down individual channels, it takes a lot longer for the receiver to 
notice the packet loss on an individual channel.  Until another fragment 
is successfully received on the lossy channel, the fragments of the 
incomplete frame sit in the queue clogging up the other channels (the 
receiver is attempting to preserve the original packet order and is 
still waiting for the lost fragment).

Original patch attached.   This almost certainly needs updating to take 
account of other more recent changes in multi link algorithm but it may 
provide some inspiration.

Regards,
Ben.


[-- Attachment #2: mppp-min-frag-size.patch --]
[-- Type: text/x-diff, Size: 4255 bytes --]

diff -ubdr linux-2.6.16.16-l2tp/drivers/net/ppp_generic.c linux-2.6.16.16-l2tp-mppp/drivers/net/ppp_generic.c
--- linux-2.6.16.16-l2tp/drivers/net/ppp_generic.c	2006-05-11 02:56:24.000000000 +0100
+++ linux-2.6.16.16-l2tp-mppp/drivers/net/ppp_generic.c	2007-07-03 18:23:35.000000000 +0100
@@ -64,7 +64,7 @@
 
 #define MPHDRLEN	6	/* multilink protocol header length */
 #define MPHDRLEN_SSN	4	/* ditto with short sequence numbers */
-#define MIN_FRAG_SIZE	64
+#define MIN_FRAG_SIZE	256
 
 /*
  * An instance of /dev/ppp can be associated with either a ppp
@@ -120,6 +120,7 @@
 	unsigned long	last_recv;	/* jiffies when last pkt rcvd a0 */
 	struct net_device *dev;		/* network interface device a4 */
 #ifdef CONFIG_PPP_MULTILINK
+        int             minfragsize;    /* minimum size for a fragment */
 	int		nxchan;		/* next channel to send something on */
 	u32		nxseq;		/* next sequence number to send */
 	int		mrru;		/* MP: max reconst. receive unit */
@@ -767,6 +768,15 @@
 		ppp_recv_unlock(ppp);
 		err = 0;
 		break;
+
+	case PPPIOCSMINFRAG:
+	        if (get_user(val, p))
+	                break;
+	        ppp_recv_lock(ppp);
+	        ppp->minfragsize = val < 0 ? 0 : val;
+	        ppp_recv_unlock(ppp);
+	        err = 0;
+	        break;
 #endif /* CONFIG_PPP_MULTILINK */
 
 	default:
@@ -1254,7 +1264,7 @@
 	int len, fragsize;
 	int i, bits, hdrlen, mtu;
 	int flen;
-	int navail, nfree;
+	int navail, nfree, nfrag;
 	int nbigger;
 	unsigned char *p, *q;
 	struct list_head *list;
@@ -1285,7 +1295,7 @@
 	 * the channels are free.  This gives much better TCP
 	 * performance if we have a lot of channels.
 	 */
-	if (nfree == 0 || nfree < navail / 2)
+	if (nfree == 0 || (nfree < navail / 2 && ppp->minfragsize == 0))
 		return 0;	/* can't take now, leave it in xmit_pending */
 
 	/* Do protocol field compression (XXX this should be optional) */
@@ -1302,13 +1312,24 @@
 	 * how small they are (i.e. even 0 length) in order to minimize
 	 * the time that it will take to detect when a channel drops
 	 * a fragment.
+         * However, if ppp->minfragsize > 0 we try to avoid creating
+         * fragments smaller than ppp->minfragsize and thus do not
+         * always use all free channels
 	 */
+	if (ppp->minfragsize > 0) {
+	  nfrag= len / ppp->minfragsize;
+	  if (nfrag < 1)
+	        nfrag = 1;
+	  else if (nfrag > nfree)
+	        nfrag = nfree;
+	} else
+	        nfrag = nfree;
 	fragsize = len;
-	if (nfree > 1)
-		fragsize = DIV_ROUND_UP(fragsize, nfree);
+	if (nfrag > 1)
+	        fragsize = DIV_ROUND_UP(fragsize, nfrag);
 	/* nbigger channels get fragsize bytes, the rest get fragsize-1,
 	   except if nbigger==0, then they all get fragsize. */
-	nbigger = len % nfree;
+	nbigger = len % nfrag;
 
 	/* skip to the channel after the one we last used
 	   and start at that one */
@@ -1323,7 +1344,7 @@
 
 	/* create a fragment for each channel */
 	bits = B;
-	while (nfree > 0 || len > 0) {
+	while (len > 0 || (nfree > 0 && ppp->minfragsize == 0)) {
 		list = list->next;
 		if (list == &ppp->channels) {
 			i = 0;
@@ -1371,7 +1392,7 @@
 			mtu = 4;
 		if (flen > mtu)
 			flen = mtu;
-		if (flen == len && nfree == 0)
+		if (flen == len && (nfree == 0 || ppp->minfragsize != 0))
 			bits |= E;
 		frag = alloc_skb(flen + hdrlen + (flen == 0), GFP_ATOMIC);
 		if (frag == 0)
@@ -2435,6 +2456,7 @@
 	spin_lock_init(&ppp->rlock);
 	spin_lock_init(&ppp->wlock);
 #ifdef CONFIG_PPP_MULTILINK
+	ppp->minfragsize = MIN_FRAG_SIZE;
 	ppp->minseq = -1;
 	skb_queue_head_init(&ppp->mrq);
 #endif /* CONFIG_PPP_MULTILINK */
diff -ubdr linux-2.6.16.16-l2tp/include/linux/if_ppp.h linux-2.6.16.16-l2tp-mppp/include/linux/if_ppp.h
--- linux-2.6.16.16-l2tp/include/linux/if_ppp.h	2006-05-12 13:45:00.000000000 +0100
+++ linux-2.6.16.16-l2tp-mppp/include/linux/if_ppp.h	2007-07-03 18:15:27.000000000 +0100
@@ -162,6 +162,7 @@
 #define PPPIOCATTCHAN	_IOW('t', 56, int)	/* attach to ppp channel */
 #define PPPIOCGCHAN	_IOR('t', 55, int)	/* get ppp channel number */
 #define PPPIOCGL2TPSTATS _IOR('t', 54, struct pppol2tp_ioc_stats)
+#define PPPIOCSMINFRAG  _IOW('t', 53, int)  /* minimum fragment size */
 
 #define SIOCGPPPSTATS   (SIOCDEVPRIVATE + 0)
 #define SIOCGPPPVER     (SIOCDEVPRIVATE + 1)	/* NEVER change this!! */

^ permalink raw reply

* Re: [PATCH] MACB: Set PHY address in kernel parameters
From: Anders Darander @ 2010-03-31 10:11 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Haavard Skinnemoen, David S. Miller, Erik Waling, Patrick McHardy,
	Grant Likely, netdev, linux-kernel
In-Reply-To: <20100331100313.GA4053@psychotron.lab.eng.brq.redhat.com>

* Jiri Pirko <jpirko@redhat.com> [100331 12:03]:
> Wed, Mar 31, 2010 at 09:51:42AM CEST, anders.darander@gmail.com wrote:
> > 
> >-	phydev = phy_find_first(bp->mii_bus);
> >+	if (phy_addr >= PHY_MAX_ADDRESS)
> >+		phydev = phy_find_first(bp->mii_bus);
> >+	else
> >+		phydev = bp->mii_bus->phy_map[phy_addr];
> >+
> > 	if (!phydev) {
> > 		printk (KERN_ERR "%s: no PHY found\n", dev->name);
> > 		return -1;
> 
> This is really ugly :( Should be done differently, more general. I've been
> thinking about this for a while. Maybe the solution is to integrate the switch
> into DSA subsystem. See net/dsa. Not sure though...

A more general solution is always welcome...

However, although I've not tested it, I think that Marc Kleine-Budde
came up with the correct solution. That was using phy_mask, and set it
up in the *eth_data structure in the board setup code.

Thus, it seems that the needed functionality is readily available.
(Working in the embedded field, I've no problem with defining HW-related
things in the board setup code, as opposite of having it dynamically
defined).

Regards,
Anders


^ permalink raw reply

* Re: [PATCH next-next-2.6 v2] virtio_net: missing sg_init_table
From: David Miller @ 2010-03-31 10:16 UTC (permalink / raw)
  To: mst; +Cc: mashirle, thomas, netdev, linux-kernel
In-Reply-To: <20100331092022.GA31911@redhat.com>

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Wed, 31 Mar 2010 12:20:22 +0300

> On Mon, Mar 29, 2010 at 06:19:15PM -0700, Shirley Ma wrote:
>> Add missing sg_init_table for sg_set_buf in virtio_net which
>> induced in defer skb patch.
>> 
>> Reported-by: Thomas Müller <thomas@mathtm.de>
>> Tested-by: Thomas Müller <thomas@mathtm.de>
>> Signed-off-by: Shirley Ma <xma@us.ibm.com>
> 
> I'm concerned that the 'big' path might cause a performance regression.
> Let's move sg into virtnet_info so that this needs to be only called
> once?

Yeah that might improve things.

Shirley's change is already in net-next-2.6 so anything implementing
this would need to be submitted relative to that.

^ permalink raw reply

* [PATCH 0/4] xfrm fixes and flow structurization
From: Timo Teras @ 2010-03-31 10:17 UTC (permalink / raw)
  To: netdev; +Cc: Herbert Xu, Timo Teras

These are fixes and cleanups which should be good for merging in.
Patches 1 and 2 are new. Patches 3 and 4 were previously discussed
with Herbert.

Please review and consider committing these.

Thanks.

Timo Teras (4):
  xfrm: increment genid before bumping state genids
  xfrm_user: verify policy direction at XFRM_MSG_POLEXPIRE handler
  xfrm: remove policy lock when accessing policy->walk.dead
  flow: structurize flow cache

 net/core/flow.c        |  223 +++++++++++++++++++++++++----------------------
 net/xfrm/xfrm_policy.c |   31 ++-----
 net/xfrm/xfrm_state.c  |    3 +-
 net/xfrm/xfrm_user.c   |   10 +-
 4 files changed, 135 insertions(+), 132 deletions(-)


^ permalink raw reply

* [PATCH 0/4] xfrm fixes and flow structurization
From: Timo Teras @ 2010-03-31 10:17 UTC (permalink / raw)
  To: netdev; +Cc: Herbert Xu, Timo Teras
In-Reply-To: <1270030626-16687-1-git-send-email-timo.teras@iki.fi>

These are fixes and cleanups which should be good for merging in.
Patches 1 and 2 are new. Patches 3 and 4 were previously discussed
with Herbert.

Please review and consider committing.

Thanks.

Timo Teras (4):
  xfrm: increment genid before bumping state genids
  xfrm_user: verify policy direction at XFRM_MSG_POLEXPIRE handler
  xfrm: remove policy lock when accessing policy->walk.dead
  flow: structurize flow cache

 net/core/flow.c        |  223 +++++++++++++++++++++++++----------------------
 net/xfrm/xfrm_policy.c |   31 ++-----
 net/xfrm/xfrm_state.c  |    3 +-
 net/xfrm/xfrm_user.c   |   10 +-
 4 files changed, 135 insertions(+), 132 deletions(-)


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox