Netdev List

Netdev List
 help / color / mirror / Atom feed

* CONTACT WESTERN UNION FOR YOUR PAYMENT MTCN
From: Dr. James William @ 2012-05-11 14:50 UTC (permalink / raw)




-- 
Attention.

 I’ve sent you the first payment of $5.000.00 today from your winning
funds total amount of $4.7MillionUSD,Therefore you need to contact western
union agent for him to give you the transfer payment information and MTCN,

Agent: Mr.Adam Smith
Phone: +229-9813-6203
Email: wu_34live@superposta.com

Please try to contact him today for him to forward the payment information
to you, remember to indicate the registration code of EB-2520 to him when
e-mailing or calling he also he will be sending you $5,000.00 daily as per
my discussion with him.

Do get in touch with me once you have received the transfer.
Sincerely,
Mr. James Williams.

^ permalink raw reply

* Re: [PATCH v2 5/5] net: sh_eth: use NAPI
From: Ben Hutchings @ 2012-05-11 15:35 UTC (permalink / raw)
  To: Shimoda, Yoshihiro; +Cc: netdev, SH-Linux
In-Reply-To: <4FACD007.1070308@renesas.com>

On Fri, 2012-05-11 at 17:38 +0900, Shimoda, Yoshihiro wrote:
> This patch modifies the driver to use NAPI.
[...]
> +static int sh_eth_poll(struct napi_struct *napi, int budget)
> +{
> +	struct sh_eth_private *mdp = container_of(napi, struct sh_eth_private,
> +						  napi);
> +	struct net_device *ndev = mdp->ndev;
> +	struct sh_eth_cpu_data *cd = mdp->cd;
> +	int work_done = 0, txfree_num;
> +	u32 intr_status = sh_eth_read(ndev, EESR);
> +
> +	/* Clear interrupt flags */
> +	sh_eth_write(ndev, intr_status, EESR);
> +
> +	/* check txdesc */
> +	txfree_num = sh_eth_txfree(ndev);
> +	if (txfree_num)
>  		netif_wake_queue(ndev);
> -	}
> 
> +	/* check rxdesc */
> +	sh_eth_rx(ndev, &work_done, budget);
> +
> +	/* check error flags */
>  	if (intr_status & cd->eesr_err_check)
>  		sh_eth_error(ndev, intr_status);
> 
> -other_irq:
> -	spin_unlock(&mdp->lock);
> +	/* get current interrupt flags */
> +	intr_status = sh_eth_read(ndev, EESR);
> 
> -	return ret;
> +	/* check whether the controller doesn't have any events */
> +	if (!txfree_num && !(intr_status & cd->eesr_err_check) &&
> +	    work_done < budget) {
> +		napi_complete(napi);

If and only if you return a value less than the budget then you *must*
call napi_complete().  You can't add these extra conditions.

> +		/* Enable all interrupts */
> +		sh_eth_write(ndev, cd->eesipr_value, EESIPR);
> +	}
> +
> +	return work_done;
>  }
> 
>  /* PHY state control function */
> @@ -1565,6 +1590,8 @@ static int sh_eth_open(struct net_device *ndev)
> 
>  	pm_runtime_get_sync(&mdp->pdev->dev);
> 
> +	napi_enable(&mdp->napi);
> +
>  	ret = request_irq(ndev->irq, sh_eth_interrupt,
>  #if defined(CONFIG_CPU_SUBTYPE_SH7763) || \
>  	defined(CONFIG_CPU_SUBTYPE_SH7764) || \
> @@ -1705,6 +1732,8 @@ static int sh_eth_close(struct net_device *ndev)
>  		phy_disconnect(mdp->phydev);
>  	}
> 
> +	napi_disable(&mdp->napi);
> +
[...]

You will also need to call napi_disable() and napi_enable() in the
set_ringparam implementation.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH v2 4/5] net: sh_eth: add support for set_ringparam/get_ringparam
From: Ben Hutchings @ 2012-05-11 15:27 UTC (permalink / raw)
  To: Shimoda, Yoshihiro; +Cc: netdev, SH-Linux
In-Reply-To: <4FACD005.3010800@renesas.com>

On Fri, 2012-05-11 at 17:38 +0900, Shimoda, Yoshihiro wrote:
> This patch supports the ethtool's set_ringparam() and get_ringparam().
[...]
> +static int sh_eth_set_ringparam(struct net_device *ndev,
> +				struct ethtool_ringparam *ring)
> +{
> +	struct sh_eth_private *mdp = netdev_priv(ndev);
> +	int ringsize;
> +
> +	if (ring->tx_pending > TX_RING_MAX ||
> +	    ring->rx_pending > RX_RING_MAX ||
> +	    ring->tx_pending < TX_RING_MIN ||
> +	    ring->rx_pending < RX_RING_MIN)
> +		return -EINVAL;
> +	if (ring->rx_mini_pending || ring->rx_jumbo_pending)
> +		return -EINVAL;
> +
> +	if (netif_running(ndev)) {
> +		/* Disable interrupts by clearing the interrupt mask. */
> +		sh_eth_write(ndev, 0x0000, EESIPR);
> +		/* Stop the chip's Tx and Rx processes. */
> +		sh_eth_write(ndev, 0, EDTRR);
> +		sh_eth_write(ndev, 0, EDRRR);

You'll also need to call netif_tx_disable() and synchronize_irq().

> +	}
> +
> +	/* Free all the skbuffs in the Rx queue. */
> +	sh_eth_ring_free(ndev);
> +	/* Free DMA buffer */
> +	ringsize = sizeof(struct sh_eth_rxdesc) * mdp->num_rx_ring;
> +	dma_free_coherent(NULL, ringsize, mdp->rx_ring, mdp->rx_desc_dma);
> +	ringsize = sizeof(struct sh_eth_txdesc) * mdp->num_tx_ring;
> +	dma_free_coherent(NULL, ringsize, mdp->tx_ring, mdp->tx_desc_dma);
> +
> +	/* Set new parameters */
> +	mdp->num_rx_ring = ring->rx_pending;
> +	mdp->num_tx_ring = ring->tx_pending;
> +
> +	sh_eth_ring_init(ndev);
> +	sh_eth_dev_init(ndev);

And what if either of these fails?

> +	if (netif_running(ndev)) {
> +		sh_eth_write(ndev, mdp->cd->eesipr_value, EESIPR);
> +		/* Setting the Rx mode will start the Rx process. */
> +		sh_eth_write(ndev, EDRRR_R, EDRRR);

You'll need to call netif_wake_queue().

> +	}
> +
> +	return 0;
> +}
[...]

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next] fq_codel: Fair Queue Codel AQM
From: Eric Dumazet @ 2012-05-11 15:23 UTC (permalink / raw)
  To: Changli Gao
  Cc: David Miller, netdev, Dave Taht, Kathleen Nichols, Van Jacobson,
	Tom Herbert, Matt Mathis, Yuchung Cheng, Stephen Hemminger
In-Reply-To: <CABa6K_Fk07CUR1+hOx06cwJ5MuNYi7vpeLYFOc_1Qwmu5ubYhQ@mail.gmail.com>

On Fri, 2012-05-11 at 23:03 +0800, Changli Gao wrote:
> On Fri, May 11, 2012 at 9:59 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Fair Queue Codel implementation.
> >
> > Principles :
> >
> > - Packets are classified (internal classifier or external) on flows.
> > - This is a Stochastic model (as we use a hash, several flows might
> >                              be hashed on same slot)
> > - Each flow has a CoDel managed queue.
> > - Flows are linked onto two (Round Robin) lists,
> >  so that new flows have priority on old ones.
> 
> I don't think it is a good idea, as the old ones may be starved. It isn't
> fair. Why not use the conventional DRR?
> 

Hey, its DRR, but with 64 bytes per flow instead of more than 256.
One cache line per flow, that was my goal, sharing the codel_params and
stats for all flows.

A 'struct fq_codel_flow' can be in three states :

- Detached state
- In new flow list
- In old flow list

And its the dequeue() that can put a flow in detached state, only if
coming from old flow list.

Its possible I missed something, because in my first coding I had 3
lists.

Anyway I'll send a V2 because I left .change method to NULL, while the
intent was to permit a change on fq_codel.

> > +
> > +       /* Queue is full! Find the fat flow and drop packet from it.
> > +        * This might sound expensive, but with 1024 flows, we scan
> > +        * 4KB of memory, and we dont need to handle a complex tree
> > +        * in fast path (packet queue/enqueue) with many cache misses.
> > +        */
> 
> How about the tricks used by SFQ?

They are too expensive in term of cache misses and limits.
Code is complex and difficult to maintain.
That was a nice compromise 20 years ago when memory was expensive.
Now, memory is cheap but still slow.

Also adding the 'priority to new flows' is too difficult with SFQ.

^ permalink raw reply

* Re: [PATCH 10/17] netvm: Allow skb allocation to use PFMEMALLOC reserves
From: Mel Gorman @ 2012-05-11 15:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Miller, akpm, linux-mm, netdev, linux-kernel, neilb,
	michaelc, emunson
In-Reply-To: <1336747350.1017.22.camel@twins>

On Fri, May 11, 2012 at 04:42:30PM +0200, Peter Zijlstra wrote:
> On Fri, 2012-05-11 at 15:32 +0100, Mel Gorman wrote:
> > > > +extern atomic_t memalloc_socks;
> > > > +static inline int sk_memalloc_socks(void)
> > > > +{
> > > > +   return atomic_read(&memalloc_socks);
> > > > +}
> > > 
> > > Please change this to be a static branch.
> > > 
> > 
> > Will do. I renamed memalloc_socks to sk_memalloc_socks, made it a int as
> > atomics are unnecessary and I check it directly in a branch instead of a
> > static inline. It should be relatively easy for the branch predictor. 
> 
> David means you to use include/linux/jump_label.h.
> 

Ah, that makes a whole lot more sense. Thanks for the clarification.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net-next] fq_codel: Fair Queue Codel AQM
From: Changli Gao @ 2012-05-11 15:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Dave Taht, Kathleen Nichols, Van Jacobson,
	Tom Herbert, Matt Mathis, Yuchung Cheng, Stephen Hemminger
In-Reply-To: <1336744796.31653.164.camel@edumazet-glaptop>

On Fri, May 11, 2012 at 9:59 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Fair Queue Codel implementation.
>
> Principles :
>
> - Packets are classified (internal classifier or external) on flows.
> - This is a Stochastic model (as we use a hash, several flows might
>                              be hashed on same slot)
> - Each flow has a CoDel managed queue.
> - Flows are linked onto two (Round Robin) lists,
>  so that new flows have priority on old ones.

I don't think it is a good idea, as the old ones may be starved. It isn't
fair. Why not use the conventional DRR?

> +
> +       /* Queue is full! Find the fat flow and drop packet from it.
> +        * This might sound expensive, but with 1024 flows, we scan
> +        * 4KB of memory, and we dont need to handle a complex tree
> +        * in fast path (packet queue/enqueue) with many cache misses.
> +        */

How about the tricks used by SFQ?

Thanks.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [PATCH 1/2 net v2] 6lowpan: rework data fetching from skb
From: Alexander Smirnov @ 2012-05-11 14:58 UTC (permalink / raw)
  To: davem; +Cc: netdev, eric.dumazet, Alexander Smirnov
In-Reply-To: <4fabc166.9208cc0a.4de8.ffff9211@mx.google.com>

This patch reworks functions responsible for fetching data from skb. Now they
work more accurately and can notify if something went wrong.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
---
 net/ieee802154/6lowpan.c |   75 ++++++++++++++++++++++++++-------------------
 1 files changed, 43 insertions(+), 32 deletions(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index 32eb417..c2bbf01 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -291,25 +291,31 @@ lowpan_compress_udp_header(u8 **hc06_ptr, struct sk_buff *skb)
 	*hc06_ptr += 2;
 }
 
-static u8 lowpan_fetch_skb_u8(struct sk_buff *skb)
+static inline int lowpan_fetch_skb_u8(struct sk_buff *skb, u8 *val)
 {
-	u8 ret;
+	if (WARN_ON_ONCE(!pskb_may_pull(skb, 1))) {
+		/*
+		 * Uhhh, something went terribly wrong.
+		 * Check the bottom layers code
+		 */
+		return -EINVAL;
+	}
 
-	ret = skb->data[0];
+	*val = skb->data[0];
 	skb_pull(skb, 1);
 
-	return ret;
+	return 0;
 }
 
-static u16 lowpan_fetch_skb_u16(struct sk_buff *skb)
+static inline int lowpan_fetch_skb_u16(struct sk_buff *skb, u16 *val)
 {
-	u16 ret;
-
-	BUG_ON(!pskb_may_pull(skb, 2));
+	if (WARN_ON_ONCE(!pskb_may_pull(skb, 2)))
+		return -EINVAL;
 
-	ret = skb->data[0] | (skb->data[1] << 8);
+	*val = skb->data[0] | (skb->data[1] << 8);
 	skb_pull(skb, 2);
-	return ret;
+
+	return 0;
 }
 
 static int
@@ -318,7 +324,8 @@ lowpan_uncompress_udp_header(struct sk_buff *skb)
 	struct udphdr *uh = udp_hdr(skb);
 	u8 tmp;
 
-	tmp = lowpan_fetch_skb_u8(skb);
+	if (lowpan_fetch_skb_u8(skb, &tmp))
+		goto err;
 
 	if ((tmp & LOWPAN_NHC_UDP_MASK) == LOWPAN_NHC_UDP_ID) {
 		pr_debug("(%s): UDP header uncompression\n", __func__);
@@ -710,7 +717,9 @@ lowpan_process_data(struct sk_buff *skb)
 	/* at least two bytes will be used for the encoding */
 	if (skb->len < 2)
 		goto drop;
-	iphc0 = lowpan_fetch_skb_u8(skb);
+
+	if (lowpan_fetch_skb_u8(skb, &iphc0))
+		goto drop;
 
 	/* fragments assembling */
 	switch (iphc0 & LOWPAN_DISPATCH_MASK) {
@@ -722,8 +731,9 @@ lowpan_process_data(struct sk_buff *skb)
 		u16 tag;
 		bool found = false;
 
-		len = lowpan_fetch_skb_u8(skb); /* frame length */
-		tag = lowpan_fetch_skb_u16(skb);
+		if (lowpan_fetch_skb_u8(skb, &len) || /* frame length */
+		    lowpan_fetch_skb_u16(skb, &tag))  /* fragment tag */
+			goto drop;
 
 		/*
 		 * check if frame assembling with the same tag is
@@ -747,7 +757,8 @@ lowpan_process_data(struct sk_buff *skb)
 		if ((iphc0 & LOWPAN_DISPATCH_MASK) == LOWPAN_DISPATCH_FRAG1)
 			goto unlock_and_drop;
 
-		offset = lowpan_fetch_skb_u8(skb); /* fetch offset */
+		if (lowpan_fetch_skb_u8(skb, &offset)) /* fetch offset */
+			goto unlock_and_drop;
 
 		/* if payload fits buffer, copy it */
 		if (likely((offset * 8 + skb->len) <= frame->length))
@@ -769,7 +780,10 @@ lowpan_process_data(struct sk_buff *skb)
 			dev_kfree_skb(skb);
 			skb = frame->skb;
 			kfree(frame);
-			iphc0 = lowpan_fetch_skb_u8(skb);
+
+			if (lowpan_fetch_skb_u8(skb, &iphc0))
+				goto unlock_and_drop;
+
 			break;
 		}
 		spin_unlock(&flist_lock);
@@ -780,7 +794,8 @@ lowpan_process_data(struct sk_buff *skb)
 		break;
 	}
 
-	iphc1 = lowpan_fetch_skb_u8(skb);
+	if (lowpan_fetch_skb_u8(skb, &iphc1))
+		goto drop;
 
 	_saddr = mac_cb(skb)->sa.hwaddr;
 	_daddr = mac_cb(skb)->da.hwaddr;
@@ -791,9 +806,8 @@ lowpan_process_data(struct sk_buff *skb)
 	if (iphc1 & LOWPAN_IPHC_CID) {
 		pr_debug("(%s): CID flag is set, increase header with one\n",
 								__func__);
-		if (!skb->len)
+		if (lowpan_fetch_skb_u8(skb, &num_context))
 			goto drop;
-		num_context = lowpan_fetch_skb_u8(skb);
 	}
 
 	hdr.version = 6;
@@ -805,9 +819,9 @@ lowpan_process_data(struct sk_buff *skb)
 	 * ECN + DSCP + 4-bit Pad + Flow Label (4 bytes)
 	 */
 	case 0: /* 00b */
-		if (!skb->len)
+		if (lowpan_fetch_skb_u8(skb, &tmp))
 			goto drop;
-		tmp = lowpan_fetch_skb_u8(skb);
+
 		memcpy(&hdr.flow_lbl, &skb->data[0], 3);
 		skb_pull(skb, 3);
 		hdr.priority = ((tmp >> 2) & 0x0f);
@@ -819,9 +833,9 @@ lowpan_process_data(struct sk_buff *skb)
 	 * ECN + DSCP (1 byte), Flow Label is elided
 	 */
 	case 1: /* 10b */
-		if (!skb->len)
+		if (lowpan_fetch_skb_u8(skb, &tmp))
 			goto drop;
-		tmp = lowpan_fetch_skb_u8(skb);
+
 		hdr.priority = ((tmp >> 2) & 0x0f);
 		hdr.flow_lbl[0] = ((tmp << 6) & 0xC0) | ((tmp >> 2) & 0x30);
 		hdr.flow_lbl[1] = 0;
@@ -832,9 +846,9 @@ lowpan_process_data(struct sk_buff *skb)
 	 * ECN + 2-bit Pad + Flow Label (3 bytes), DSCP is elided
 	 */
 	case 2: /* 01b */
-		if (!skb->len)
+		if (lowpan_fetch_skb_u8(skb, &tmp))
 			goto drop;
-		tmp = lowpan_fetch_skb_u8(skb);
+
 		hdr.flow_lbl[0] = (skb->data[0] & 0x0F) | ((tmp >> 2) & 0x30);
 		memcpy(&hdr.flow_lbl[1], &skb->data[0], 2);
 		skb_pull(skb, 2);
@@ -853,9 +867,9 @@ lowpan_process_data(struct sk_buff *skb)
 	/* Next Header */
 	if ((iphc0 & LOWPAN_IPHC_NH_C) == 0) {
 		/* Next header is carried inline */
-		if (!skb->len)
+		if (lowpan_fetch_skb_u8(skb, &(hdr.nexthdr)))
 			goto drop;
-		hdr.nexthdr = lowpan_fetch_skb_u8(skb);
+
 		pr_debug("(%s): NH flag is set, next header is carried "
 			 "inline: %02x\n", __func__, hdr.nexthdr);
 	}
@@ -864,9 +878,8 @@ lowpan_process_data(struct sk_buff *skb)
 	if ((iphc0 & 0x03) != LOWPAN_IPHC_TTL_I)
 		hdr.hop_limit = lowpan_ttl_values[iphc0 & 0x03];
 	else {
-		if (!skb->len)
+		if (lowpan_fetch_skb_u8(skb, &(hdr.hop_limit)))
 			goto drop;
-		hdr.hop_limit = lowpan_fetch_skb_u8(skb);
 	}
 
 	/* Extract SAM to the tmp variable */
@@ -894,10 +907,8 @@ lowpan_process_data(struct sk_buff *skb)
 			pr_debug("(%s): destination address non-context-based"
 				 " multicast compression\n", __func__);
 			if (0 < tmp && tmp < 3) {
-				if (!skb->len)
+				if (lowpan_fetch_skb_u8(skb, &prefix[1]))
 					goto drop;
-				else
-					prefix[1] = lowpan_fetch_skb_u8(skb);
 			}
 
 			err = lowpan_uncompress_addr(skb, &hdr.daddr, prefix,
-- 
1.7.2.3

^ permalink raw reply related

* Re: [PATCH 12/17] netvm: Propagate page->pfmemalloc from netdev_alloc_page to skb
From: Mel Gorman @ 2012-05-11 14:46 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, linux-mm, netdev, linux-kernel, neilb, a.p.zijlstra,
	michaelc, emunson
In-Reply-To: <20120511.010109.1698578316660207883.davem@davemloft.net>

On Fri, May 11, 2012 at 01:01:09AM -0400, David Miller wrote:
> From: Mel Gorman <mgorman@suse.de>
> Date: Thu, 10 May 2012 14:45:05 +0100
> 
> > +/**
> > + *	propagate_pfmemalloc_skb - Propagate pfmemalloc if skb is allocated after RX page
> > + *	@page: The page that was allocated from netdev_alloc_page
> > + *	@skb: The skb that may need pfmemalloc set
> > + */
> > +static inline void propagate_pfmemalloc_skb(struct page *page,
> > +						struct sk_buff *skb)
> 
> Please use consistent prefixes in the names for new interfaces.
> 

Understood.

> This one should probably be named "skb_propagate_pfmemalloc()" and
> go into skbuff.h since it needs no knowledge of netdevices.
> 

I used a netdev prefix and placed it in skbuff.h which was stupid. The
screw-up was because I was partially reverting a patch that deleted
netdev_alloc_page but I didn't need any device information so the naming
was poor. I renamed netdev_alloc_page to skb_alloc_page and will fix up
the documentation appropriately.

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 10/17] netvm: Allow skb allocation to use PFMEMALLOC reserves
From: Peter Zijlstra @ 2012-05-11 14:42 UTC (permalink / raw)
  To: Mel Gorman
  Cc: David Miller, akpm, linux-mm, netdev, linux-kernel, neilb,
	michaelc, emunson
In-Reply-To: <20120511143218.GS11435@suse.de>

On Fri, 2012-05-11 at 15:32 +0100, Mel Gorman wrote:
> > > +extern atomic_t memalloc_socks;
> > > +static inline int sk_memalloc_socks(void)
> > > +{
> > > +   return atomic_read(&memalloc_socks);
> > > +}
> > 
> > Please change this to be a static branch.
> > 
> 
> Will do. I renamed memalloc_socks to sk_memalloc_socks, made it a int as
> atomics are unnecessary and I check it directly in a branch instead of a
> static inline. It should be relatively easy for the branch predictor. 

David means you to use include/linux/jump_label.h.

static struct static_key sk_memalloc_socks = STATIC_KEY_INIT_FALSE;

and have your function read:

static inline bool sk_memalloc_socks(void)
{
	return static_key_false(&sk_memalloc_socks);
}

which can be modified using:

  static_key_slow_inc(&sk_memalloc_socks);

or

  static_key_slow_dec(&sk_memalloc_socks);

This magic goo turns the branch into self-modifying code such that the
branch is an unconditional jump at runtime.

  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 10/17] netvm: Allow skb allocation to use PFMEMALLOC reserves
From: Mel Gorman @ 2012-05-11 14:32 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, linux-mm, netdev, linux-kernel, neilb, a.p.zijlstra,
	michaelc, emunson
In-Reply-To: <20120511.005740.210437168371869566.davem@davemloft.net>

On Fri, May 11, 2012 at 12:57:40AM -0400, David Miller wrote:
> From: Mel Gorman <mgorman@suse.de>
> Date: Thu, 10 May 2012 14:45:03 +0100
> 
> > +/* Returns true if the gfp_mask allows use of ALLOC_NO_WATERMARK */
> > +bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
> 
> I know this gets added in an earlier patch, but it seems slightly
> overkill to have a function call just for a simply bit test.
> 

It's not that simple. gfp_pfmemalloc_allowed calls gfp_to_alloc_flags()
which is quite involved and probably should not be duplicated. In the slab
case, it's called from slow paths where we are already under memory pressure
and swapping to network so it's not a major problem. In the network case,
it is called when kmalloc() has already failed and also a slow path.

> > +extern atomic_t memalloc_socks;
> > +static inline int sk_memalloc_socks(void)
> > +{
> > +	return atomic_read(&memalloc_socks);
> > +}
> 
> Please change this to be a static branch.
> 

Will do. I renamed memalloc_socks to sk_memalloc_socks, made it a int as
atomics are unnecessary and I check it directly in a branch instead of a
static inline. It should be relatively easy for the branch predictor.

> > +	skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask,
> > +						SKB_ALLOC_RX, NUMA_NO_NODE);
> 
> Please fix the argument indentation.
> 

Done.

> > +	data = kmalloc_reserve(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
> > +		       gfp_mask, NUMA_NO_NODE, NULL);
> 
> Likewise.

Done

> 
> > -	struct sk_buff *n = alloc_skb(newheadroom + skb->len + newtailroom,
> > -				      gfp_mask);
> > +	struct sk_buff *n = __alloc_skb(newheadroom + skb->len + newtailroom,
> > +				      gfp_mask, skb_alloc_rx_flag(skb),
> > +				      NUMA_NO_NODE);
> 
> Likewise.
> 

Done.

> > -			nskb = alloc_skb(hsize + doffset + headroom,
> > -					 GFP_ATOMIC);
> > +			nskb = __alloc_skb(hsize + doffset + headroom,
> > +					 GFP_ATOMIC, skb_alloc_rx_flag(skb),
> > +					 NUMA_NO_NODE);
> 
> Likewise.

Done.

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH net-next v2] be2net: Fix to allow get/set of debug levels in the firmware.
From: Somnath Kotur @ 2012-05-11 14:20 UTC (permalink / raw)
  To: netdev, bhutchings; +Cc: Somnath Kotur, Suresh Reddy

Incorporated review comments by Ben Hutchings.

Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com>
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be.h         |    3 +
 drivers/net/ethernet/emulex/benet/be_cmds.c    |   56 +++++++++++++++++
 drivers/net/ethernet/emulex/benet/be_cmds.h    |   57 +++++++++++++++++
 drivers/net/ethernet/emulex/benet/be_ethtool.c |   77 ++++++++++++++++++++++++
 drivers/net/ethernet/emulex/benet/be_main.c    |   37 +++++++++++
 5 files changed, 230 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
index c3ee910..9817fed 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -415,6 +415,7 @@ struct be_adapter {
 	bool wol;
 	u32 max_pmac_cnt;	/* Max secondary UC MACs programmable */
 	u32 uc_macs;		/* Count of secondary UC MAC programmed */
+	u32 msg_enable;
 };
 
 #define be_physfn(adapter) (!adapter->is_virtfn)
@@ -603,4 +604,6 @@ extern void be_parse_stats(struct be_adapter *adapter);
 extern int be_load_fw(struct be_adapter *adapter, u8 *func);
 extern bool be_is_wol_supported(struct be_adapter *adapter);
 extern bool be_pause_supported(struct be_adapter *adapter);
+extern u32 be_get_fw_log_level(struct be_adapter *adapter);
+
 #endif				/* BE_H */
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c
index 43167e8..b24623c 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -2589,4 +2589,60 @@ err:
 	mutex_unlock(&adapter->mbox_lock);
 	pci_free_consistent(adapter->pdev, cmd.size, cmd.va, cmd.dma);
 	return status;
+
+}
+int be_cmd_get_ext_fat_capabilites(struct be_adapter *adapter,
+				   struct be_dma_mem *cmd)
+{
+	struct be_mcc_wrb *wrb;
+	struct be_cmd_req_get_ext_fat_caps *req;
+	int status;
+
+	if (mutex_lock_interruptible(&adapter->mbox_lock))
+		return -1;
+
+	wrb = wrb_from_mbox(adapter);
+	if (!wrb) {
+		status = -EBUSY;
+		goto err;
+	}
+
+	req = cmd->va;
+	be_wrb_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_COMMON,
+			       OPCODE_COMMON_GET_EXT_FAT_CAPABILITES,
+			       cmd->size, wrb, cmd);
+	req->parameter_type = cpu_to_le32(1);
+
+	status = be_mbox_notify_wait(adapter);
+err:
+	mutex_unlock(&adapter->mbox_lock);
+	return status;
+}
+
+int be_cmd_set_ext_fat_capabilites(struct be_adapter *adapter,
+				   struct be_dma_mem *cmd,
+				   struct be_fat_conf_params *configs)
+{
+	struct be_mcc_wrb *wrb;
+	struct be_cmd_req_set_ext_fat_caps *req;
+	int status;
+
+	spin_lock_bh(&adapter->mcc_lock);
+
+	wrb = wrb_from_mccq(adapter);
+	if (!wrb) {
+		status = -EBUSY;
+		goto err;
+	}
+
+	req = cmd->va;
+	memcpy(&req->set_params, configs, sizeof(struct be_fat_conf_params));
+	be_wrb_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_COMMON,
+			       OPCODE_COMMON_SET_EXT_FAT_CAPABILITES,
+			       cmd->size, wrb, cmd);
+
+	status = be_mcc_notify_wait(adapter);
+err:
+	spin_unlock_bh(&adapter->mcc_lock);
+	return status;
 }
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.h b/drivers/net/ethernet/emulex/benet/be_cmds.h
index 944f031..0b1029b 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.h
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.h
@@ -189,6 +189,8 @@ struct be_mcc_mailbox {
 #define OPCODE_COMMON_GET_PHY_DETAILS			102
 #define OPCODE_COMMON_SET_DRIVER_FUNCTION_CAP		103
 #define OPCODE_COMMON_GET_CNTL_ADDITIONAL_ATTRIBUTES	121
+#define OPCODE_COMMON_GET_EXT_FAT_CAPABILITES		125
+#define OPCODE_COMMON_SET_EXT_FAT_CAPABILITES		126
 #define OPCODE_COMMON_GET_MAC_LIST			147
 #define OPCODE_COMMON_SET_MAC_LIST			148
 #define OPCODE_COMMON_GET_HSW_CONFIG			152
@@ -1602,6 +1604,56 @@ static inline void *be_erx_stats_from_cmd(struct be_adapter *adapter)
 	}
 }
 
+
+/************** get fat capabilites *******************/
+#define MAX_MODULES 27
+#define MAX_MODES 4
+#define MODE_UART 0
+#define FW_LOG_LEVEL_DEFAULT 48
+#define FW_LOG_LEVEL_FATAL 64
+
+struct ext_fat_mode {
+	u8 mode;
+	u8 rsvd0;
+	u16 port_mask;
+	u32 dbg_lvl;
+	u64 fun_mask;
+} __packed;
+
+struct ext_fat_modules {
+	u8 modules_str[32];
+	u32 modules_id;
+	u32 num_modes;
+	struct ext_fat_mode trace_lvl[MAX_MODES];
+} __packed;
+
+struct be_fat_conf_params {
+	u32 max_log_entries;
+	u32 log_entry_size;
+	u8 log_type;
+	u8 max_log_funs;
+	u8 max_log_ports;
+	u8 rsvd0;
+	u32 supp_modes;
+	u32 num_modules;
+	struct ext_fat_modules module[MAX_MODULES];
+} __packed;
+
+struct be_cmd_req_get_ext_fat_caps {
+	struct be_cmd_req_hdr hdr;
+	u32 parameter_type;
+};
+
+struct be_cmd_resp_get_ext_fat_caps {
+	struct be_cmd_resp_hdr hdr;
+	struct be_fat_conf_params get_params;
+};
+
+struct be_cmd_req_set_ext_fat_caps {
+	struct be_cmd_req_hdr hdr;
+	struct be_fat_conf_params set_params;
+};
+
 extern int be_pci_fnum_get(struct be_adapter *adapter);
 extern int be_cmd_POST(struct be_adapter *adapter);
 extern int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 *mac_addr,
@@ -1707,4 +1759,9 @@ extern int be_cmd_set_hsw_config(struct be_adapter *adapter, u16 pvid,
 extern int be_cmd_get_hsw_config(struct be_adapter *adapter, u16 *pvid,
 			u32 domain, u16 intf_id);
 extern int be_cmd_get_acpi_wol_cap(struct be_adapter *adapter);
+extern int be_cmd_get_ext_fat_capabilites(struct be_adapter *adapter,
+					  struct be_dma_mem *cmd);
+extern int be_cmd_set_ext_fat_capabilites(struct be_adapter *adapter,
+					  struct be_dma_mem *cmd,
+					  struct be_fat_conf_params *cfgs);
 
diff --git a/drivers/net/ethernet/emulex/benet/be_ethtool.c b/drivers/net/ethernet/emulex/benet/be_ethtool.c
index 747f68f..c40147d 100644
--- a/drivers/net/ethernet/emulex/benet/be_ethtool.c
+++ b/drivers/net/ethernet/emulex/benet/be_ethtool.c
@@ -878,6 +878,81 @@ be_read_eeprom(struct net_device *netdev, struct ethtool_eeprom *eeprom,
 	return status;
 }
 
+static u32 be_get_msg_level(struct net_device *netdev)
+{
+	struct be_adapter *adapter = netdev_priv(netdev);
+
+	if (lancer_chip(adapter)) {
+		dev_err(&adapter->pdev->dev, "Operation not supported\n");
+		return -EOPNOTSUPP;
+	}
+
+	return adapter->msg_enable;
+}
+
+static void be_set_fw_log_level(struct be_adapter *adapter, u32 level)
+{
+	struct be_dma_mem extfat_cmd;
+	struct be_fat_conf_params *cfgs;
+	int status;
+	int i, j;
+
+	memset(&extfat_cmd, 0, sizeof(struct be_dma_mem));
+	extfat_cmd.size = sizeof(struct be_cmd_resp_get_ext_fat_caps);
+	extfat_cmd.va = pci_alloc_consistent(adapter->pdev, extfat_cmd.size,
+					     &extfat_cmd.dma);
+	if (!extfat_cmd.va) {
+		dev_err(&adapter->pdev->dev, "%s: Memory allocation failure\n",
+			__func__);
+		goto err;
+	}
+	status = be_cmd_get_ext_fat_capabilites(adapter, &extfat_cmd);
+	if (!status) {
+		cfgs = (struct be_fat_conf_params *)(extfat_cmd.va +
+					sizeof(struct be_cmd_resp_hdr));
+		for (i = 0; i < cfgs->num_modules; i++) {
+			for (j = 0; j < cfgs->module[i].num_modes; j++) {
+				if (cfgs->module[i].trace_lvl[j].mode ==
+								MODE_UART)
+					cfgs->module[i].trace_lvl[j].dbg_lvl =
+							cpu_to_le32(level);
+			}
+		}
+		status = be_cmd_set_ext_fat_capabilites(adapter, &extfat_cmd,
+							cfgs);
+		if (status)
+			dev_err(&adapter->pdev->dev,
+				"Message level set failed\n");
+	} else {
+		dev_err(&adapter->pdev->dev, "Message level get failed\n");
+	}
+
+	pci_free_consistent(adapter->pdev, extfat_cmd.size, extfat_cmd.va,
+			    extfat_cmd.dma);
+err:
+	return;
+}
+
+static void be_set_msg_level(struct net_device *netdev, u32 level)
+{
+	struct be_adapter *adapter = netdev_priv(netdev);
+
+	if (lancer_chip(adapter)) {
+		dev_err(&adapter->pdev->dev, "Operation not supported\n");
+		return;
+	}
+
+	if (adapter->msg_enable == level)
+		return;
+
+	if (level & NETIF_MSG_HW != adapter->msg_enable & NETIF_MSG_HW) {
+		be_set_fw_log_level(adapter, level & NETIF_MSG_HW ?
+			       	    FW_LOG_LEVEL_DEFAULT : FW_LOG_LEVEL_FATAL);
+	adapter->msg_enable = level;
+
+	return;
+}
+
 const struct ethtool_ops be_ethtool_ops = {
 	.get_settings = be_get_settings,
 	.get_drvinfo = be_get_drvinfo,
@@ -893,6 +968,8 @@ const struct ethtool_ops be_ethtool_ops = {
 	.set_pauseparam = be_set_pauseparam,
 	.get_strings = be_get_stat_strings,
 	.set_phys_id = be_set_phys_id,
+	.get_msglevel = be_get_msg_level,
+	.set_msglevel = be_set_msg_level,
 	.get_sset_count = be_get_sset_count,
 	.get_ethtool_stats = be_get_ethtool_stats,
 	.get_regs_len = be_get_reg_len,
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 6d5d30b..c2286a2 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -3367,9 +3367,43 @@ bool be_is_wol_supported(struct be_adapter *adapter)
 		!be_is_wol_excluded(adapter)) ? true : false;
 }
 
+u32 be_get_fw_log_level(struct be_adapter *adapter)
+{
+	struct be_dma_mem extfat_cmd;
+	struct be_fat_conf_params *cfgs;
+	int status;
+	u32 level = 0;
+	int j;
+
+	memset(&extfat_cmd, 0, sizeof(struct be_dma_mem));
+	extfat_cmd.size = sizeof(struct be_cmd_resp_get_ext_fat_caps);
+	extfat_cmd.va = pci_alloc_consistent(adapter->pdev, extfat_cmd.size,
+					     &extfat_cmd.dma);
+	if (!extfat_cmd.va) {
+		dev_err(&adapter->pdev->dev, "%s: Memory allocation failure\n",
+			__func__);
+		goto err;
+	}
+
+	status = be_cmd_get_ext_fat_capabilites(adapter, &extfat_cmd);
+	if (!status) {
+		cfgs = (struct be_fat_conf_params *)(extfat_cmd.va +
+						sizeof(struct be_cmd_resp_hdr));
+		for (j = 0; j < cfgs->module[0].num_modes; j++) {
+			if (cfgs->module[0].trace_lvl[j].mode == MODE_UART)
+				level = cfgs->module[0].trace_lvl[j].dbg_lvl;
+		}
+	}
+	pci_free_consistent(adapter->pdev, extfat_cmd.size, extfat_cmd.va,
+			    extfat_cmd.dma);
+err:
+	return level;
+}
+
 static int be_get_config(struct be_adapter *adapter)
 {
 	int status;
+	u32 level;
 
 	status = be_cmd_query_fw_cfg(adapter, &adapter->port_num,
 			&adapter->function_mode, &adapter->function_caps);
@@ -3407,6 +3441,9 @@ static int be_get_config(struct be_adapter *adapter)
 	if (be_is_wol_supported(adapter))
 		adapter->wol = true;
 
+	level = be_get_fw_log_level(adapter);
+	adapter->msg_enable = level <= FW_LOG_LEVEL_DEFAULT ? NETIF_MSG_HW : 0;
+
 	return 0;
 }
 
-- 
1.5.6.1

^ permalink raw reply related

* Re: [PATCH 08/17] net: Introduce sk_allocation() to allow addition of GFP flags depending on the individual socket
From: Mel Gorman @ 2012-05-11 14:12 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, linux-mm, netdev, linux-kernel, neilb, a.p.zijlstra,
	michaelc, emunson
In-Reply-To: <20120511.004949.655300373402132371.davem@davemloft.net>

On Fri, May 11, 2012 at 12:49:49AM -0400, David Miller wrote:
> From: Mel Gorman <mgorman@suse.de>
> Date: Thu, 10 May 2012 14:45:01 +0100
> 
> > Introduce sk_allocation(), this function allows to inject sock specific
> > flags to each sock related allocation. It is only used on allocation
> > paths that may be required for writing pages back to network storage.
> > 
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> 
> This is still a little bit more than it needs to be.
> 
> You are trying to propagate a single bit from sk->sk_allocation into
> all of the annotated socket memory allocation sites.
> 
> But many of them use sk->sk_allocation already.  In fact all of them
> that use a variable rather than a constant GFP_* satisfy this
> invariant.
> 
> All of those annotations are therefore spurious, and probably end up
> generating unnecessary |'s in of that special bit in at least some
> cases.
> 

Yes, you're completely correct here.

> What you really, therefore, care about are the GFP_FOO cases.  And in
> fact those are all GFP_ATOMIC.  So make something that says what it
> is that you want, a GFP_ATOMIC with some socket specified bits |'d
> in.
> 
> Something like this:
> 
> static inline gfp_t sk_gfp_atomic(struct sock *sk)
> {
> 	return GFP_ATOMIC | (sk->sk_allocation & __GFP_MEMALLOC);
> }
> 

I went with this.

> You'll also have to make your networking patches conform to the
> networking subsystem coding style.
> 
> For example:
> 
> > -	skb = sock_wmalloc(sk, MAX_TCP_HEADER + 15 + s_data_desired, 1, GFP_ATOMIC);
> > +	skb = sock_wmalloc(sk, MAX_TCP_HEADER + 15 + s_data_desired, 1,
> > +					sk_allocation(sk, GFP_ATOMIC));
> 
> The sk_allocation() argument has to line up with the first column
> after the openning parenthesis of the function call.  You can't just
> use all TAB characters.  And this all TABs thing looks extremely ugly
> to boot.
> 

I was not aware of the networking subsystem coding style. I'll fix it
up.

> > -		newnp->pktoptions = skb_clone(treq->pktopts, GFP_ATOMIC);
> > +		newnp->pktoptions = skb_clone(treq->pktopts,
> > +						sk_allocation(sk, GFP_ATOMIC));
> 
> Same here.
> 
> What's really funny to me is that in several cases elsewhere in this
> pach you get it right.

Whether I got it right or not would be effectively random. I tried
myself to see what pattern I was using thinking it would be "always"
tab but nope, no pattern :)

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH net-next] fq_codel: Fair Queue Codel AQM
From: Eric Dumazet @ 2012-05-11 13:59 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Dave Taht, Kathleen Nichols, Van Jacobson, Tom Herbert,
	Matt Mathis, Yuchung Cheng, Stephen Hemminger

From: Eric Dumazet <edumazet@google.com>

Fair Queue Codel implementation.

Principles :

- Packets are classified (internal classifier or external) on flows.
- This is a Stochastic model (as we use a hash, several flows might
                              be hashed on same slot)
- Each flow has a CoDel managed queue.
- Flows are linked onto two (Round Robin) lists,
  so that new flows have priority on old ones.

- For a given flow, packets are not reordered (CoDel uses a FIFO)
- head drops only.
- ECN capability is on by default.
- Very low memory footprint (64 bytes per flow)

tc qdisc ... fq_codel [ limit PACKETS ] [ flows number ]
                      [ target TIME ] [ interval TIME ] [ noecn ]

defaults : 1024 flows, 10240 packets limit

Impressive results on load :

# tc -s -d class show dev eth9

class htb 1:1 root leaf 10: prio 0 quantum 1514 rate 200000Kbit ceil 200000Kbit burst 1475b/8 mpu 0b overhead 0b cburst 1475b/8 mpu 0b overhead 0b level 0 
 Sent 1267974946 bytes 837585 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 202298Kbit 16702pps backlog 0b 103p requeues 0 
 lended: 837482 borrowed: 0 giants: 0
 tokens: -912 ctokens: -912

class fq_codel 10:a7 parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 18168b 12p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 7.0ms
class fq_codel 10:10b parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 16654b 11p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 6.4ms
class fq_codel 10:146 parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 13626b 9p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 5.2ms
class fq_codel 10:1c0 parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 12112b 8p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 2.8ms
class fq_codel 10:2ba parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 13626b 9p requeues 0 
  deficit 926 count 1 lastcount 1 ldelay 5.2ms
class fq_codel 10:31d parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 16654b 11p requeues 0 
  deficit 0 count 1 lastcount 1 ldelay 6.4ms
class fq_codel 10:32c parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 15140b 10p requeues 0 
  deficit 80 count 1 lastcount 1 ldelay 6.4ms
class fq_codel 10:342 parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 16654b 11p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 6.4ms
class fq_codel 10:3ab parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 18168b 12p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 7.0ms
class fq_codel 10:3c2 parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 15140b 10p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 6.4ms

# tc -s -d qdisc show dev eth9

qdisc htb 1: root refcnt 6 r2q 10 default 1 direct_packets_stat 0 ver 3.17
 Sent 1267878050 bytes 837521 pkt (dropped 0, overlimits 1666567 requeues 1) 
 rate 202305Kbit 16703pps backlog 0b 104p requeues 1 
qdisc fq_codel 10: parent 1:1 limit 10240p target 5.0ms interval 100.0ms ecn 
 Sent 1267878050 bytes 837521 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 202305Kbit 16703pps backlog 157456b 104p requeues 0 
  maxpacket 1514 drop_overlimit 0 new_flow_count 87 ecn_mark 4071
  new_flows_len 0 old_flows_len 10

# ping -c 10 172.30.42.18
PING 172.30.42.18 (172.30.42.18) 56(84) bytes of data.
64 bytes from 172.30.42.18: icmp_req=1 ttl=64 time=0.227 ms
64 bytes from 172.30.42.18: icmp_req=2 ttl=64 time=0.165 ms
64 bytes from 172.30.42.18: icmp_req=3 ttl=64 time=0.166 ms
64 bytes from 172.30.42.18: icmp_req=4 ttl=64 time=0.151 ms
64 bytes from 172.30.42.18: icmp_req=5 ttl=64 time=0.164 ms
64 bytes from 172.30.42.18: icmp_req=6 ttl=64 time=0.172 ms
64 bytes from 172.30.42.18: icmp_req=7 ttl=64 time=0.175 ms
64 bytes from 172.30.42.18: icmp_req=8 ttl=64 time=0.183 ms
64 bytes from 172.30.42.18: icmp_req=9 ttl=64 time=0.158 ms
64 bytes from 172.30.42.18: icmp_req=10 ttl=64 time=0.200 ms

--- 172.30.42.18 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 8999ms
rtt min/avg/max/mdev = 0.151/0.176/0.227/0.022 ms

Much better than SFQ because of priority given to new flows

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Van Jacobson <van@pollere.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
---
 include/linux/pkt_sched.h |   45 ++
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_fq_codel.c  |  595 ++++++++++++++++++++++++++++++++++++
 4 files changed, 652 insertions(+)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index cde56c2..3ffdaec 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -681,4 +681,49 @@ struct tc_codel_xstats {
 	__u32	dropping;  /* are we in dropping state ? */
 };
 
+/* FQ_CODEL */
+
+enum {
+	TCA_FQ_CODEL_UNSPEC,
+	TCA_FQ_CODEL_TARGET,
+	TCA_FQ_CODEL_LIMIT,
+	TCA_FQ_CODEL_INTERVAL,
+	TCA_FQ_CODEL_ECN,
+	TCA_FQ_CODEL_FLOWS,
+	__TCA_FQ_CODEL_MAX
+};
+
+#define TCA_FQ_CODEL_MAX	(__TCA_FQ_CODEL_MAX - 1)
+
+enum {
+	TCA_FQ_CODEL_XSTATS_QDISC,
+	TCA_FQ_CODEL_XSTATS_CLASS,
+};
+
+struct tc_fq_codel_xstats {
+	__u32	type;
+	union {
+		struct {
+			__u32	maxpacket; /* largest packet we've seen so far */
+			__u32	drop_overlimit; /* number of time max qdisc packet limit was hit */
+			__u32	ecn_mark;  /* number of packets we ECN marked
+					    * instead of dropped
+					    */
+			__u32	new_flow_count; /* number of time packets created a 'new flow' */
+			__u32	new_flows_len;	/* count of flows in new list */
+			__u32	old_flows_len;	/* count of flows in old list */ 
+		} qdisc_stats;
+		struct {
+			__s32	deficit;
+			__u32	ldelay; /* in-queue delay seen by most recently
+					 * dequeued packet
+					 */
+			__u32	count;
+			__u32	lastcount;
+			__u32	dropping;
+			__s32	drop_next;
+		} class_stats;
+	};
+};
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index fadd252..e7a8976 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -261,6 +261,17 @@ config NET_SCH_CODEL
 
 	  If unsure, say N.
 
+config NET_SCH_FQ_CODEL
+	tristate "Fair Queue Controlled Delay AQM (FQ_CODEL)"
+	help
+	  Say Y here if you want to use the FQ Controlled Delay (FQ_CODEL)
+	  packet scheduling algorithm.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called sch_fq_codel.
+
+	  If unsure, say N.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 30fab03..5940a19 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_NET_SCH_MQPRIO)	+= sch_mqprio.o
 obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
 obj-$(CONFIG_NET_SCH_QFQ)	+= sch_qfq.o
 obj-$(CONFIG_NET_SCH_CODEL)	+= sch_codel.o
+obj-$(CONFIG_NET_SCH_FQ_CODEL)	+= sch_fq_codel.o
 
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
new file mode 100644
index 0000000..8675ff8
--- /dev/null
+++ b/net/sched/sch_fq_codel.c
@@ -0,0 +1,595 @@
+/*
+ * Fair Queue CoDel discipline
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ *
+ *  Copyright (C) 2012 Eric Dumazet <edumazet@google.com>
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/jiffies.h>
+#include <linux/string.h>
+#include <linux/in.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/skbuff.h>
+#include <linux/jhash.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/flow_keys.h>
+#include <net/codel.h>
+
+/*	Fair Queue CoDel.
+ *
+ * Principles :
+ * Packets are classified (internal classifier or external) on flows.
+ * This is a Stochastic model (as we use a hash, several flows
+ *			       might be hashed on same slot)
+ * Each flow has a CoDel managed queue.
+ * Flows are linked onto two (Round Robin) lists,
+ * so that new flows have priority on old ones.
+ *
+ * For a given flow, packets are not reordered (CoDel uses a FIFO)
+ * head drops only.
+ * ECN capability is on by default.
+ * Low memory footprint (64 bytes per flow)
+ */
+
+struct fq_codel_flow {
+	struct sk_buff	  *head;
+	struct sk_buff	  *tail;
+	struct list_head  flowchain;
+	int		  deficit;
+	struct codel_vars cvars;
+};
+
+struct fq_codel_sched_data {
+	struct tcf_proto *filter_list;	/* external classifier */
+	struct fq_codel_flow *flows;	/* Flows table [flows_cnt] */
+	u32		*backlogs;	/* backlog table [flows_cnt] */
+	u32		flows_cnt;	/* number of flows */
+	u32		perturbation;	/* hash perturbation */
+	u32		quantum;	/* psched_mtu(qdisc_dev(sch)); */
+	struct codel_params cparams;
+	struct codel_stats cstats;
+	u32		drop_overlimit;
+	u32		new_flow_count;
+
+	struct list_head new_flows;	/* list of new flows */
+	struct list_head old_flows;	/* list of old flows */
+};
+
+static unsigned int fq_codel_hash(const struct fq_codel_sched_data *q,
+				  const struct sk_buff *skb)
+{
+	struct flow_keys keys;
+	unsigned int hash;
+
+	skb_flow_dissect(skb, &keys);
+	hash = jhash_3words((__force u32)keys.dst,
+			    (__force u32)keys.src ^ keys.ip_proto,
+			    (__force u32)keys.ports, q->perturbation);
+	return ((u64)hash * q->flows_cnt) >> 32;
+}
+
+static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
+				      int *qerr)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct tcf_result res;
+	int result;
+
+	if (TC_H_MAJ(skb->priority) == sch->handle &&
+	    TC_H_MIN(skb->priority) > 0 &&
+	    TC_H_MIN(skb->priority) <= q->flows_cnt)
+		return TC_H_MIN(skb->priority);
+
+	if (!q->filter_list)
+		return fq_codel_hash(q, skb) + 1;
+
+	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return 0;
+		}
+#endif
+		if (TC_H_MIN(res.classid) <= q->flows_cnt)
+			return TC_H_MIN(res.classid);
+	}
+	return 0;
+}
+
+/* helper functions : might be changed when/if skb use a standard list_head */
+
+/* remove one skb from head of slot queue */
+static inline struct sk_buff *dequeue_head(struct fq_codel_flow *flow)
+{
+	struct sk_buff *skb = flow->head;
+
+	flow->head = skb->next;
+	skb->next = NULL;
+	return skb;
+}
+
+/* add skb to flow queue (tail add) */
+static inline void flow_queue_add(struct fq_codel_flow *flow,
+				  struct sk_buff *skb)
+{
+	if (flow->head == NULL)
+		flow->head = skb;
+	else
+		flow->tail->next = skb;
+	flow->tail = skb;
+	skb->next = NULL;
+}
+
+static unsigned int fq_codel_drop(struct Qdisc *sch)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+	unsigned int maxbacklog = 0, idx = 0, i, len;
+	struct fq_codel_flow *flow;
+
+	/* Queue is full! Find the fat flow and drop packet from it.
+	 * This might sound expensive, but with 1024 flows, we scan
+	 * 4KB of memory, and we dont need to handle a complex tree
+	 * in fast path (packet queue/enqueue) with many cache misses.
+	 */
+	for (i = 0; i < q->flows_cnt; i++) {
+		if (q->backlogs[i] > maxbacklog) {
+			maxbacklog = q->backlogs[i];
+			idx = i;
+		}
+	}
+	flow = &q->flows[idx];
+	skb = dequeue_head(flow);
+	len = qdisc_pkt_len(skb);
+	q->backlogs[idx] -= len;
+	kfree_skb(skb);
+	sch->q.qlen--;
+	sch->qstats.drops++;
+	sch->qstats.backlog -= len;
+	return idx;
+}
+
+static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	unsigned int idx;
+	struct fq_codel_flow *flow;
+	int uninitialized_var(ret);
+
+	idx = fq_codel_classify(skb, sch, &ret);
+	if (idx == 0) {
+		if (ret & __NET_XMIT_BYPASS)
+			sch->qstats.drops++;
+		kfree_skb(skb);
+		return ret;
+	}
+	idx--;
+
+	codel_set_enqueue_time(skb);
+	flow = &q->flows[idx];
+	flow_queue_add(flow, skb);
+	q->backlogs[idx] += qdisc_pkt_len(skb);
+	sch->qstats.backlog += qdisc_pkt_len(skb);
+
+	if (list_empty(&flow->flowchain)) {
+		list_add_tail(&flow->flowchain, &q->new_flows);
+		codel_vars_init(&flow->cvars);
+		q->new_flow_count++;
+		flow->deficit = q->quantum;
+	}
+	if (++sch->q.qlen < sch->limit)
+		return NET_XMIT_SUCCESS;
+
+	q->drop_overlimit++;
+	/* Return Congestion Notification only if we dropped a packet
+	 * from this flow.
+	 */
+	if (fq_codel_drop(sch) == idx)
+		return NET_XMIT_CN;
+
+	/* As we dropped a packet, better let upper stack know this */
+	qdisc_tree_decrease_qlen(sch, 1);
+	return NET_XMIT_SUCCESS;
+}
+
+/* This is the specific function called from codel_dequeue()
+ * to dequeue a packet from queue. Note: backlog is handled in
+ * codel, we dont need to reduce it here.
+ */
+static struct sk_buff *dequeue(struct codel_vars *vars, struct Qdisc *sch)
+{
+	struct fq_codel_flow *flow;
+	struct sk_buff *skb = NULL;
+
+	flow = container_of(vars, struct fq_codel_flow, cvars);
+	if (flow->head) {
+		skb = dequeue_head(flow);
+		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->q.qlen--;
+	}
+	return skb;
+}
+
+static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+	struct fq_codel_flow *flow;
+
+begin:
+	if (!list_empty(&q->new_flows))
+		flow = list_first_entry(&q->new_flows,
+					struct fq_codel_flow,
+					flowchain);
+	else if (!list_empty(&q->old_flows))
+		flow = list_first_entry(&q->old_flows,
+					struct fq_codel_flow,
+					flowchain);
+	else
+		return NULL;
+
+	if (flow->deficit <= 0) {
+		flow->deficit += q->quantum;
+		list_move_tail(&flow->flowchain, &q->old_flows);
+		goto begin;
+	}
+	skb = codel_dequeue(sch, &q->cparams, &flow->cvars, &q->cstats,
+			    dequeue, &q->backlogs[flow - q->flows]);
+	if (!skb) {
+		list_del_init(&flow->flowchain);
+		goto begin;
+	}
+	qdisc_bstats_update(sch, skb);
+	flow->deficit -= qdisc_pkt_len(skb);
+	return skb;
+}
+
+static void fq_codel_reset(struct Qdisc *sch)
+{
+	struct sk_buff *skb;
+
+	while ((skb = fq_codel_dequeue(sch)) != NULL)
+		kfree_skb(skb);
+}
+
+static const struct nla_policy fq_codel_policy[TCA_FQ_CODEL_MAX + 1] = {
+	[TCA_FQ_CODEL_TARGET]	= { .type = NLA_U32 },
+	[TCA_FQ_CODEL_LIMIT]	= { .type = NLA_U32 },
+	[TCA_FQ_CODEL_INTERVAL]	= { .type = NLA_U32 },
+	[TCA_FQ_CODEL_ECN]	= { .type = NLA_U32 },
+};
+
+static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_FQ_CODEL_MAX + 1];
+	int err;
+
+	if (!opt)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_FQ_CODEL_MAX, opt, fq_codel_policy);
+	if (err < 0)
+		return err;
+	if (tb[TCA_FQ_CODEL_FLOWS]) {
+		if (q->flows)
+			return -EINVAL;
+		q->flows_cnt = nla_get_u32(tb[TCA_FQ_CODEL_FLOWS]);
+		if (!q->flows_cnt ||
+		    q->flows_cnt > 65536)
+			return -EINVAL;
+	}
+	sch_tree_lock(sch);
+
+	if (tb[TCA_FQ_CODEL_TARGET]) {
+		u64 target = nla_get_u32(tb[TCA_FQ_CODEL_TARGET]);
+
+		q->cparams.target = (target * NSEC_PER_USEC) >> CODEL_SHIFT;
+	}
+
+	if (tb[TCA_FQ_CODEL_INTERVAL]) {
+		u64 interval = nla_get_u32(tb[TCA_FQ_CODEL_INTERVAL]);
+
+		q->cparams.interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT;
+	}
+
+	if (tb[TCA_FQ_CODEL_LIMIT])
+		sch->limit = nla_get_u32(tb[TCA_FQ_CODEL_LIMIT]);
+
+	if (tb[TCA_FQ_CODEL_ECN])
+		q->cparams.ecn = !!nla_get_u32(tb[TCA_FQ_CODEL_ECN]);
+
+	while (sch->q.qlen > sch->limit) {
+		struct sk_buff *skb = fq_codel_dequeue(sch);
+
+		kfree_skb(skb);
+		q->cstats.drop_count++;
+	}
+	qdisc_tree_decrease_qlen(sch, q->cstats.drop_count);
+	q->cstats.drop_count = 0;
+
+	sch_tree_unlock(sch);
+	return 0;
+}
+
+static void *fq_codel_zalloc(size_t sz)
+{
+	void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN);
+
+	if (!ptr)
+		ptr = vzalloc(sz);
+	return ptr;
+}
+
+static void fq_codel_free(void *addr)
+{
+	if (addr) {
+		if (is_vmalloc_addr(addr))
+			vfree(addr);
+		else
+			kfree(addr);
+	}
+}
+
+static void fq_codel_destroy(struct Qdisc *sch)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	fq_codel_free(q->backlogs);
+	fq_codel_free(q->flows);
+}
+
+static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	int i;
+
+	sch->limit = 10*1024;
+	q->flows_cnt = 1024;
+	q->quantum = psched_mtu(qdisc_dev(sch));
+	q->perturbation = net_random();
+	INIT_LIST_HEAD(&q->new_flows);
+	INIT_LIST_HEAD(&q->old_flows);
+	codel_params_init(&q->cparams);
+	codel_stats_init(&q->cstats);
+	q->cparams.ecn = true;
+
+	if (opt) {
+		int err = fq_codel_change(sch, opt);
+		if (err)
+			return err;
+	}
+
+	if (!q->flows) {
+		q->flows = fq_codel_zalloc(q->flows_cnt *
+					   sizeof(struct fq_codel_flow));
+		if (!q->flows)
+			return -ENOMEM;
+		q->backlogs = fq_codel_zalloc(q->flows_cnt * sizeof(u32));
+		if (!q->backlogs) {
+			fq_codel_free(q->flows);
+			return -ENOMEM;
+		}
+		for (i = 0; i < q->flows_cnt; i++) {
+			struct fq_codel_flow *flow = q->flows + i;
+
+			INIT_LIST_HEAD(&flow->flowchain);
+		}
+	}
+	if (sch->limit >= 1)
+		sch->flags |= TCQ_F_CAN_BYPASS;
+	else
+		sch->flags &= ~TCQ_F_CAN_BYPASS;
+	return 0;
+}
+
+static int fq_codel_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts;
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+
+	if (nla_put_u32(skb, TCA_FQ_CODEL_TARGET,
+			codel_time_to_us(q->cparams.target)) ||
+	    nla_put_u32(skb, TCA_FQ_CODEL_LIMIT,
+			sch->limit) ||
+	    nla_put_u32(skb, TCA_FQ_CODEL_INTERVAL,
+			codel_time_to_us(q->cparams.interval)) ||
+	    nla_put_u32(skb, TCA_FQ_CODEL_ECN,
+			q->cparams.ecn) ||
+	    nla_put_u32(skb, TCA_FQ_CODEL_FLOWS,
+			q->flows_cnt))
+		goto nla_put_failure;
+
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -1;
+}
+
+static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct tc_fq_codel_xstats st = {
+		.type				= TCA_FQ_CODEL_XSTATS_QDISC,
+		.qdisc_stats.maxpacket		= q->cstats.maxpacket,
+		.qdisc_stats.drop_overlimit	= q->drop_overlimit,
+		.qdisc_stats.ecn_mark		= q->cstats.ecn_mark,
+		.qdisc_stats.new_flow_count	= q->new_flow_count,
+	};
+	struct list_head *pos;
+
+	list_for_each(pos, &q->new_flows)
+		st.qdisc_stats.new_flows_len++;
+
+	list_for_each(pos, &q->old_flows)
+		st.qdisc_stats.old_flows_len++;
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static struct Qdisc *fq_codel_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	return NULL;
+}
+
+static unsigned long fq_codel_get(struct Qdisc *sch, u32 classid)
+{
+	return 0;
+}
+
+static unsigned long fq_codel_bind(struct Qdisc *sch, unsigned long parent,
+			      u32 classid)
+{
+	/* we cannot bypass queue discipline anymore */
+	sch->flags &= ~TCQ_F_CAN_BYPASS;
+	return 0;
+}
+
+static void fq_codel_put(struct Qdisc *q, unsigned long cl)
+{
+}
+
+static struct tcf_proto **fq_codel_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static int fq_codel_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	return 0;
+}
+
+static int fq_codel_dump_class_stats(struct Qdisc *sch, unsigned long cl,
+				     struct gnet_dump *d)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	u32 idx = cl - 1;
+	struct gnet_stats_queue qs = { 0 };
+	struct tc_fq_codel_xstats xstats;
+
+	if (idx < q->flows_cnt) {
+		const struct fq_codel_flow *flow = &q->flows[idx];
+		const struct sk_buff *skb = flow->head;
+
+		memset(&xstats, 0, sizeof(xstats));
+		xstats.type = TCA_FQ_CODEL_XSTATS_CLASS;
+		xstats.class_stats.deficit = flow->deficit;
+		xstats.class_stats.ldelay =
+			codel_time_to_us(flow->cvars.ldelay);
+		xstats.class_stats.count = flow->cvars.count;
+		xstats.class_stats.lastcount = flow->cvars.lastcount;
+		xstats.class_stats.dropping = flow->cvars.dropping;
+		if (flow->cvars.dropping) {
+			codel_tdiff_t delta = flow->cvars.drop_next -
+					      codel_get_time();
+
+			xstats.class_stats.drop_next = (delta >= 0) ?
+				codel_time_to_us(delta) :
+				-codel_time_to_us(-delta);
+		}
+		while (skb) {
+			qs.qlen++;
+			skb = skb->next;
+		}
+		qs.backlog = q->backlogs[idx];
+	}
+	if (gnet_stats_copy_queue(d, &qs) < 0)
+		return -1;
+	if (idx < q->flows_cnt)
+		return gnet_stats_copy_app(d, &xstats, sizeof(xstats));
+	return 0;
+}
+
+static void fq_codel_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	unsigned int i;
+
+	if (arg->stop)
+		return;
+
+	for (i = 0; i < q->flows_cnt; i++) {
+		if (list_empty(&q->flows[i].flowchain) ||
+		    arg->count < arg->skip) {
+			arg->count++;
+			continue;
+		}
+		if (arg->fn(sch, i + 1, arg) < 0) {
+			arg->stop = 1;
+			break;
+		}
+		arg->count++;
+	}
+}
+
+static const struct Qdisc_class_ops fq_codel_class_ops = {
+	.leaf		=	fq_codel_leaf,
+	.get		=	fq_codel_get,
+	.put		=	fq_codel_put,
+	.tcf_chain	=	fq_codel_find_tcf,
+	.bind_tcf	=	fq_codel_bind,
+	.unbind_tcf	=	fq_codel_put,
+	.dump		=	fq_codel_dump_class,
+	.dump_stats	=	fq_codel_dump_class_stats,
+	.walk		=	fq_codel_walk,
+};
+
+static struct Qdisc_ops fq_codel_qdisc_ops __read_mostly = {
+	.cl_ops		=	&fq_codel_class_ops,
+	.id		=	"fq_codel",
+	.priv_size	=	sizeof(struct fq_codel_sched_data),
+	.enqueue	=	fq_codel_enqueue,
+	.dequeue	=	fq_codel_dequeue,
+	.peek		=	qdisc_peek_dequeued,
+	.drop		=	fq_codel_drop,
+	.init		=	fq_codel_init,
+	.reset		=	fq_codel_reset,
+	.destroy	=	fq_codel_destroy,
+	.change		=	NULL,
+	.dump		=	fq_codel_dump,
+	.dump_stats =	fq_codel_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init fq_codel_module_init(void)
+{
+	return register_qdisc(&fq_codel_qdisc_ops);
+}
+
+static void __exit fq_codel_module_exit(void)
+{
+	unregister_qdisc(&fq_codel_qdisc_ops);
+}
+
+module_init(fq_codel_module_init)
+module_exit(fq_codel_module_exit)
+MODULE_AUTHOR("Eric Dumazet");
+MODULE_LICENSE("GPL");

^ permalink raw reply related

* [PATCH v4] bonding: don't increase rx_dropped after processing LACPDUs
From: Jiri Bohac @ 2012-05-11 13:59 UTC (permalink / raw)
  To: David Miller; +Cc: andy, netdev, fubar

Since commit 3aba891d, bonding processes LACP frames (802.3ad
mode) with bond_handle_frame(). Currently a copy of the skb is
made and the original is left to be processed by other
rx_handlers and the rest of the network stack by returning
RX_HANDLER_ANOTHER.  As there is no protocol handler for
PKT_TYPE_LACPDU, the frame is dropped and dev->rx_dropped
increased.

Fix this by making bond_handle_frame() return RX_HANDLER_CONSUMED
if bonding has processed the LACP frame.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2173,9 +2173,10 @@ re_arm:
  * received frames (loopback). Since only the payload is given to this
  * function, it check for loopback.
  */
-static void bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u16 length)
+static int bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u16 length)
 {
 	struct port *port;
+	int ret = RX_HANDLER_ANOTHER;
 
 	if (length >= sizeof(struct lacpdu)) {
 
@@ -2184,11 +2185,12 @@ static void bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u
 		if (!port->slave) {
 			pr_warning("%s: Warning: port of slave %s is uninitialized\n",
 				   slave->dev->name, slave->dev->master->name);
-			return;
+			return ret;
 		}
 
 		switch (lacpdu->subtype) {
 		case AD_TYPE_LACPDU:
+			ret = RX_HANDLER_CONSUMED;
 			pr_debug("Received LACPDU on port %d\n",
 				 port->actor_port_number);
 			/* Protect against concurrent state machines */
@@ -2198,6 +2200,7 @@ static void bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u
 			break;
 
 		case AD_TYPE_MARKER:
+			ret = RX_HANDLER_CONSUMED;
 			// No need to convert fields to Little Endian since we don't use the marker's fields.
 
 			switch (((struct bond_marker *)lacpdu)->tlv_type) {
@@ -2219,6 +2222,7 @@ static void bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u
 			}
 		}
 	}
+	return ret;
 }
 
 /**
@@ -2456,18 +2460,20 @@ out:
 	return NETDEV_TX_OK;
 }
 
-void bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
+int bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
 			  struct slave *slave)
 {
+	int ret = RX_HANDLER_ANOTHER;
 	if (skb->protocol != PKT_TYPE_LACPDU)
-		return;
+		return ret;
 
 	if (!pskb_may_pull(skb, sizeof(struct lacpdu)))
-		return;
+		return ret;
 
 	read_lock(&bond->lock);
-	bond_3ad_rx_indication((struct lacpdu *) skb->data, slave, skb->len);
+	ret = bond_3ad_rx_indication((struct lacpdu *) skb->data, slave, skb->len);
 	read_unlock(&bond->lock);
+	return ret;
 }
 
 /*
diff --git a/drivers/net/bonding/bond_3ad.h b/drivers/net/bonding/bond_3ad.h
index 235b2cc..5ee7e3c 100644
--- a/drivers/net/bonding/bond_3ad.h
+++ b/drivers/net/bonding/bond_3ad.h
@@ -274,7 +274,7 @@ void bond_3ad_adapter_duplex_changed(struct slave *slave);
 void bond_3ad_handle_link_change(struct slave *slave, char link);
 int  bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info *ad_info);
 int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev);
-void bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
+int bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
 			  struct slave *slave);
 int bond_3ad_set_carrier(struct bonding *bond);
 void bond_3ad_update_lacp_rate(struct bonding *bond);
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 9abfde4..2e1f806 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -342,26 +342,26 @@ static void rlb_update_entry_from_arp(struct bonding *bond, struct arp_pkt *arp)
 	_unlock_rx_hashtbl_bh(bond);
 }
 
-static void rlb_arp_recv(struct sk_buff *skb, struct bonding *bond,
+static int rlb_arp_recv(struct sk_buff *skb, struct bonding *bond,
 			 struct slave *slave)
 {
 	struct arp_pkt *arp;
 
 	if (skb->protocol != cpu_to_be16(ETH_P_ARP))
-		return;
+		goto out;
 
 	arp = (struct arp_pkt *) skb->data;
 	if (!arp) {
 		pr_debug("Packet has no ARP data\n");
-		return;
+		goto out;
 	}
 
 	if (!pskb_may_pull(skb, arp_hdr_len(bond->dev)))
-		return;
+		goto out;
 
 	if (skb->len < sizeof(struct arp_pkt)) {
 		pr_debug("Packet is too small to be an ARP\n");
-		return;
+		goto out;
 	}
 
 	if (arp->op_code == htons(ARPOP_REPLY)) {
@@ -369,6 +369,8 @@ static void rlb_arp_recv(struct sk_buff *skb, struct bonding *bond,
 		rlb_update_entry_from_arp(bond, arp);
 		pr_debug("Server received an ARP Reply from client\n");
 	}
+out:
+	return RX_HANDLER_ANOTHER;
 }
 
 /* Caller must hold bond lock for read */
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 62d2409..bc13b3d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1444,8 +1444,9 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
 	struct sk_buff *skb = *pskb;
 	struct slave *slave;
 	struct bonding *bond;
-	void (*recv_probe)(struct sk_buff *, struct bonding *,
+	int (*recv_probe)(struct sk_buff *, struct bonding *,
 				struct slave *);
+	int ret = RX_HANDLER_ANOTHER;
 
 	skb = skb_share_check(skb, GFP_ATOMIC);
 	if (unlikely(!skb))
@@ -1464,8 +1465,12 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
 		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
 
 		if (likely(nskb)) {
-			recv_probe(nskb, bond, slave);
+			ret = recv_probe(nskb, bond, slave);
 			dev_kfree_skb(nskb);
+			if (ret == RX_HANDLER_CONSUMED) {
+				consume_skb(skb);
+				return ret;
+			}
 		}
 	}
 
@@ -1487,7 +1492,7 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
 		memcpy(eth_hdr(skb)->h_dest, bond->dev->dev_addr, ETH_ALEN);
 	}
 
-	return RX_HANDLER_ANOTHER;
+	return ret;
 }
 
 /* enslave device <slave> to bond device <master> */
@@ -2723,7 +2728,7 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
 	}
 }
 
-static void bond_arp_rcv(struct sk_buff *skb, struct bonding *bond,
+static int bond_arp_rcv(struct sk_buff *skb, struct bonding *bond,
 			 struct slave *slave)
 {
 	struct arphdr *arp;
@@ -2731,7 +2736,7 @@ static void bond_arp_rcv(struct sk_buff *skb, struct bonding *bond,
 	__be32 sip, tip;
 
 	if (skb->protocol != __cpu_to_be16(ETH_P_ARP))
-		return;
+		return RX_HANDLER_ANOTHER;
 
 	read_lock(&bond->lock);
 
@@ -2776,6 +2781,7 @@ static void bond_arp_rcv(struct sk_buff *skb, struct bonding *bond,
 
 out_unlock:
 	read_unlock(&bond->lock);
+	return RX_HANDLER_ANOTHER;
 }
 
 /*
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 9f2bae66..4581aa5 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -218,7 +218,7 @@ struct bonding {
 	struct   slave *primary_slave;
 	bool     force_primary;
 	s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
-	void     (*recv_probe)(struct sk_buff *, struct bonding *,
+	int     (*recv_probe)(struct sk_buff *, struct bonding *,
 			       struct slave *);
 	rwlock_t lock;
 	rwlock_t curr_slave_lock;


-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ

^ permalink raw reply related

* Re: [PATCH v3][resend] bonding: don't increase rx_dropped after processing LACPDUs
From: Jiri Bohac @ 2012-05-11 13:54 UTC (permalink / raw)
  To: David Miller; +Cc: jbohac, andy, netdev, fubar
In-Reply-To: <20120510.233134.1390996845692589691.davem@davemloft.net>

On Thu, May 10, 2012 at 11:31:34PM -0400, David Miller wrote:
> > Since commit 3aba891d, bonding processes LACP frames (802.3ad
> > mode) with bond_handle_frame(). Currently a copy of the skb is
> > made and the original is left to be processed by other
> > rx_handlers and the rest of the network stack by returning
> > RX_HANDLER_ANOTHER.  As there is no protocol handler for
> > PKT_TYPE_LACPDU, the frame is dropped and dev->rx_dropped
> > increased.
> > 
> > Fix this by making bond_handle_frame() return RX_HANDLER_CONSUMED
> > if bonding has processed the LACP frame.
> > 
> > Signed-off-by: Jiri Bohac <jbohac@suse.cz>
> > 
> 
> Don't send me garbage you didn't even check the compile of:
> 
> drivers/net/bonding/bond_main.c: In function ‘bond_handle_frame’:
> drivers/net/bonding/bond_main.c:1463:13: warning: assignment from incompatible pointer type [enabled by default]
> drivers/net/bonding/bond_main.c: In function ‘bond_open’:
> drivers/net/bonding/bond_main.c:3441:21: warning: assignment from incompatible pointer type [enabled by default]
> drivers/net/bonding/bond_main.c:3448:20: warning: assignment from incompatible pointer type [enabled by default]

sorry, I overlooked these warnings. The patch actually broke the
rlb mode which I did not test. 

I'll send a fixed patch (v4).

Thanks!

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ

^ permalink raw reply

* Re: [PATCH v5] tilegx network driver: initial support
From: Ben Hutchings @ 2012-05-11 13:54 UTC (permalink / raw)
  To: Chris Metcalf; +Cc: David Miller, arnd, linux-kernel, netdev
In-Reply-To: <201205091452.q49EqcO7005975@farm-0027.internal.tilera.com>

Here's another very incomplete review for you.

On Wed, 2012-05-09 at 06:42 -0400, Chris Metcalf wrote:
> This change adds support for the tilegx network driver based on the
> GXIO IORPC support in the tilegx software stack, using the on-chip
> mPIPE packet processing engine.
[...]
> --- /dev/null
> +++ b/drivers/net/ethernet/tile/tilegx.c
[...]
> +/* Define to support GSO. */
> +#undef TILE_NET_GSO

GSO is always enabled by the networking core.

> +/* Define to support TSO. */
> +#define TILE_NET_TSO

No, put NETIF_F_TSO in hw_features so it can be switched at run-time.

(Currently that won't work if you don't set dev->ethtool_ops, but that's
a bug that can be fixed.)

> +/* Use 3000 to enable the Linux Traffic Control (QoS) layer, else 0. */
> +#define TILE_NET_TX_QUEUE_LEN 0

This can be changed through sysfs, so there is no need for a compile-
time option.

> +/* Define to dump packets (prints out the whole packet on tx and rx). */
> +#undef TILE_NET_DUMP_PACKETS

Should really be controlled through a 'debug' module parameter (see
netif_msg_init(), netif_msg_pktdata(), etc.)

[...]
> +/* Total header bytes per equeue slot.  Must be big enough for 2 bytes
> + * of NET_IP_ALIGN alignment, plus 14 bytes (?) of L2 header, plus up to
> + * 60 bytes of actual TCP header.  We round up to align to cache lines.
> + */
> +#define HEADER_BYTES 128
> +
> +/* Maximum completions per cpu per device (must be a power of two).
> + * ISSUE: What is the right number here?
> + */
> +#define TILE_NET_MAX_COMPS 64
> +
> +#define MAX_FRAGS (65536 / PAGE_SIZE + 2 + 1)

Should be MAX_SKB_FRAGS + 1.

[...]
> +/* Help the kernel transmit a packet. */
> +static int tile_net_tx(struct sk_buff *skb, struct net_device *dev)
> +{
> +	struct tile_net_priv *priv = netdev_priv(dev);
> +
> +	struct tile_net_info *info = &__get_cpu_var(per_cpu_info);
> +
> +	struct tile_net_egress *egress = &egress_for_echannel[priv->echannel];
> +	gxio_mpipe_equeue_t *equeue = egress->equeue;
> +
> +	struct tile_net_comps *comps =
> +		info->comps_for_echannel[priv->echannel];
> +
> +	struct skb_shared_info *sh = skb_shinfo(skb);
> +
> +	unsigned int len = skb->len;
> +	unsigned char *data = skb->data;
> +
> +	unsigned int num_frags;
> +	struct frag frags[MAX_FRAGS];
> +	gxio_mpipe_edesc_t edescs[MAX_FRAGS];
> +
> +	unsigned int i;
> +
> +	int cid;
> +
> +	s64 slot;
> +
> +	unsigned long irqflags;

Please, no blank lines between your declarations.

[...]
> +	/* Reserve slots, or return NETDEV_TX_BUSY if "full". */
> +	slot = gxio_mpipe_equeue_try_reserve(equeue, num_frags);
> +	if (slot < 0) {
> +		local_irq_restore(irqflags);
> +		/* ISSUE: "Virtual device xxx asks to queue packet". */
> +		return NETDEV_TX_BUSY;
> +	}

You're supposed to stop queues when they're full.  And since that state
appears to be per-CPU, I think this device needs to be multiqueue with
one TX queue per CPU and ndo_select_queue defined accordingly.

> +	for (i = 0; i < num_frags; i++)
> +		gxio_mpipe_equeue_put_at(equeue, edescs[i], slot + i);
> +
> +	/* Wait for a free completion entry.
> +	 * ISSUE: Is this the best logic?
> +	 * ISSUE: Can this cause undesirable "blocking"?
> +	 */
> +	while (comps->comp_next - comps->comp_last >= TILE_NET_MAX_COMPS - 1)
> +		tile_net_free_comps(equeue, comps, 32, false);

I'm not convinced you should be processing completions here at all.  But
certainly you should have stopped the queue earlier rather than having
to wait here.

> +	/* Update the completions array. */
> +	cid = comps->comp_next % TILE_NET_MAX_COMPS;
> +	comps->comp_queue[cid].when = slot + num_frags;
> +	comps->comp_queue[cid].skb = skb;
> +	comps->comp_next++;
> +
> +	/* HACK: Track "expanded" size for short packets (e.g. 42 < 60). */
> +	atomic_add(1, (atomic_t *)&priv->stats.tx_packets);
> +	atomic_add((len >= ETH_ZLEN) ? len : ETH_ZLEN,
> +		   (atomic_t *)&priv->stats.tx_bytes);

You mustn't treat random fields to atomic_t.  For one thing, atomic_t
contains an int while stats are unsigned long...

Also, you're adding cache contention between all your CPUs here.  You
should maintain these stats per-CPU and then sum them in
tile_net_get_stats().  Then you can just use ordinary additions.

[...]
> +/* Ioctl commands. */
> +static int tile_net_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
> +{
> +	return -EOPNOTSUPP;
> +}

So why define it at all?

[...]
> +static void tile_net_dev_init(const char *name, const uint8_t *mac)
> +{
[...]
> +	/* Register the network device. */
> +	ret = register_netdev(dev);
> +	if (ret) {
> +		netdev_err(dev, "register_netdev failed %d\n", ret);
> +		free_netdev(dev);
> +		return;
> +	}
> +
> +	/* Get the MAC address and set it in the device struct; this must
> +	 * be done before the device is opened.
[...]

So you had better do this before calling register_netdev(), as the
device can be opened immediately after that...

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: Information leakage from RDS protocol
From: Venkat Venkatsubra @ 2012-05-11 12:57 UTC (permalink / raw)
  To: Jay Fenlason
  Cc: Linus Torvalds, security, eugene, pmatouse, Netdev, David Miller,
	rds-devel
In-Reply-To: <4FABE0F8.6070806@oracle.com>

On 5/10/2012 10:38 AM, Venkat Venkatsubra wrote:
> On 5/9/2012 10:57 AM, Jay Fenlason wrote:
>> On Wed, May 09, 2012 at 10:17:57AM -0500, Venkat Venkatsubra wrote:
>>> On 5/8/2012 1:22 PM, Jay Fenlason wrote:
>>>>> On Tue, May 8, 2012 at 9:10 AM, Jay 
>>>>> Fenlason<fenlason@redhat.com>   wrote:
>>>>>> recvfrom() on an RDS socket can return the contents of random(?)
>>>>>> kernel memory to userspace if it was called with a address
>>>>>> length larger than sizeof(struct sockaddr_in). ?rds_recvmsg() also
>>>>>> fails to set the addr_len paramater properly before returning, but
>>>>>> that's just a bug.
>>>>>>
>>>>>> There are also a number of cases wher recvfrom() can return an 
>>>>>> entirely
>>>>>> bogus address. ?Anything in rds_recvmsg() that returns a
>>>>>> non-negative value but does not go through the
>>>>>> ? "sin = (struct sockaddr_in *)msg->msg_name;"
>>>>>> code path at the end of the while(1) loop will return up to 128
>>>>>> bytes of kernel memory to userspace.
>>>>>>
>>>>>> Also, on a receive race, the message that was copied to userspace 
>>>>>> but
>>>>>> received by someone else is not zeroed, meaning that if the next
>>>>>> message it receives is smaller, the tail of the raced message is
>>>>>> leaked. ?I'm not sure how serious this is, but unexpectedly 
>>>>>> scribbling
>>>>>> on userspace memory (even if it is part of a buffer that userspace
>>>>>> asked us to write to) should be avoided.
>>>>>>
>>>> On Tue, May 08, 2012 at 11:04:01AM -0700, Linus Torvalds wrote:
>>>>> Please cc David Miller too on these things, and make sure he knows
>>>>> there's no embargo or anything (he won't touch it if there is). Maybe
>>>>> you don't want public mailing lists, but in general, the more open we
>>>>> can be, the better.
>>>> Added.  Nobody has said anything about any embargo to me, either
>>>> that they want one or that there shouldn't be one.  Personally, I
>>>> don't see any reason to embargo this, but I'm not on any
>>>> security-response teams.
>>>>
>>>>> This seems unfortunate, but at least the address thing is limited to
>>>>> sizeof(sockaddr_storage) and is kernel stack - which in turn means
>>>>> that while it potentially leaks kernel addresses (bad!), it almost
>>>>> certainly won't leak anything fundamentally interesting (ie you can't
>>>>> read arbitrary kernel memory and find plaintext passwords etc).
>>>>>
>>>>> I assume the fix is a trivial
>>>>>
>>>>>    msg->msg_namelen = sizeof(*sin);
>>>>>
>>>>> in rds_recvmsg() where it sets up the address?
>>>> That fixes the case where it actually sets up the address, but won't
>>>> fix the cases where it doesn't even do that.  I don't think anyone
>>>> ever thought about what the source address should be for a message
>>>> that was generated internally by the kernel.  I think the obvious
>>>> possibilities are msg_namelen = 0 (no address) and 127.0.0.1
>>>>
>>>>> I do wonder if maybe recvmsg() should initialize msg_namelen to 0
>>>>> instead of the size of the buffer before calling the low-level 
>>>>> recvmsg
>>>>> function - so that protocols would have to explicitly set the size to
>>>>> the right value. But that would need much more validation.
>>>> That would require checking/fixing all of the low-level functions,
>>>> which will then have to know that the buffer pointed to by msg is at
>>>> most sizeof(struct sockaddr_storage) bytes.  I think it's better to
>>>> keep the size of the address buffer there, so the low-level functions
>>>> can confirm that the address data they're about to stuff in there
>>>> won't overflow the buffer.  (That way if we ever change the size of
>>>> the buffer, only one place has to change.)
>>>>
>>>> And the whole rds recieve subsystem needs a bit of a rewrite to close
>>>> the information-leaking receive race.  Keeping the semantics correct
>>>> in regards to MSG_PEEK and multiple threads reading the socket at the
>>>> same time may be tricky.
>>>>
>>>>         -- JF
>>>>
>>> How about adding the suggested "msg->msg_namelen = sizeof(*sin);"
>>> line at the top of rds_recvmsg ?
>>> And "msg->msg_namelen = 0;" in the below "break;" cases ? I am
>>> assuming the apps wouldn't need to look at msg_name in these cases.
>>>                  if (!list_empty(&rs->rs_notify_queue)) {
>>>                          ret = rds_notify_queue_get(rs, msg);
>>>                          break;
>>>                  }
>>>
>>>                  if (rs->rs_cong_notify) {
>>>                          ret = rds_notify_cong(rs, msg);
>>>                          break;
>>>                  }
>> Wouldn't it be better to set msg->msg_namelen = 0 at the top of the
>> function, and only set it to sizeof(*sin) after msg->msg_name is
>> filled in?  That'll prevent accidental disclosure of kernel memory via
>> unanticipated code paths.
>>
>>> And, shouldn't an error be returned for the case below ? Currently
>>> zero is returned.
>>>
>>>          if (msg_flags&  MSG_OOB)
>>>                  goto out;
>>> An error such as EOPNOTSUPP ?
>> I don't know.  I'm not a networking expert.  From what I've found
>> googling, EINVAL would be more correct that ENOTSUPP.
>>
>> This only leaves the datagram contents leak to userspace when multiple
>> threads race on receiving a datagram and the subsequent datagram is
>> smaller.  That one will be hard to fix, most notably because the
>> obvious fixes I've looked at involve losing a datagram if either of
>>    inc->i_conn->c_trans->inc_copy_to_user()
>> or
>>    rds_cmsg_recv()
>> fail.  I don't know how likely either of those are, but losing
>> datagrams seems like an inappropriate behavior for a reliable datagram
>> subsystem.
>>
> Moving the discussion to netdev.
>
> Venkat
Forgot to include rds-devel.

Venkat

^ permalink raw reply

* [PATCH 10/15] batman-adv: rename sysfs macros to reflect the soft-interface dependency
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_sysfs.c |   57 ++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 28 deletions(-)

diff --git a/net/batman-adv/bat_sysfs.c b/net/batman-adv/bat_sysfs.c
index 2c81688..913299d 100644
--- a/net/batman-adv/bat_sysfs.c
+++ b/net/batman-adv/bat_sysfs.c
@@ -63,7 +63,7 @@ struct bat_attribute bat_attr_##_name = {	\
 	.store  = _store,			\
 };
 
-#define BAT_ATTR_STORE_BOOL(_name, _post_func)				\
+#define BAT_ATTR_SIF_STORE_BOOL(_name, _post_func)			\
 ssize_t store_##_name(struct kobject *kobj, struct attribute *attr,	\
 		      char *buff, size_t count)				\
 {									\
@@ -73,9 +73,9 @@ ssize_t store_##_name(struct kobject *kobj, struct attribute *attr,	\
 				 &bat_priv->_name, net_dev);		\
 }
 
-#define BAT_ATTR_SHOW_BOOL(_name)					\
-ssize_t show_##_name(struct kobject *kobj, struct attribute *attr,	\
-			    char *buff)					\
+#define BAT_ATTR_SIF_SHOW_BOOL(_name)					\
+ssize_t show_##_name(struct kobject *kobj,				\
+		     struct attribute *attr, char *buff)		\
 {									\
 	struct bat_priv *bat_priv = kobj_to_batpriv(kobj);		\
 	return sprintf(buff, "%s\n",					\
@@ -83,16 +83,17 @@ ssize_t show_##_name(struct kobject *kobj, struct attribute *attr,	\
 		       "disabled" : "enabled");				\
 }									\
 
-/* Use this, if you are going to turn a [name] in bat_priv on or off */
-#define BAT_ATTR_BOOL(_name, _mode, _post_func)				\
-	static BAT_ATTR_STORE_BOOL(_name, _post_func)			\
-	static BAT_ATTR_SHOW_BOOL(_name)				\
+/* Use this, if you are going to turn a [name] in the soft-interface
+ * (bat_priv) on or off */
+#define BAT_ATTR_SIF_BOOL(_name, _mode, _post_func)			\
+	static BAT_ATTR_SIF_STORE_BOOL(_name, _post_func)		\
+	static BAT_ATTR_SIF_SHOW_BOOL(_name)				\
 	static BAT_ATTR(_name, _mode, show_##_name, store_##_name)
 
 
-#define BAT_ATTR_STORE_UINT(_name, _min, _max, _post_func)		\
+#define BAT_ATTR_SIF_STORE_UINT(_name, _min, _max, _post_func)		\
 ssize_t store_##_name(struct kobject *kobj, struct attribute *attr,	\
-			     char *buff, size_t count)			\
+		      char *buff, size_t count)				\
 {									\
 	struct net_device *net_dev = kobj_to_netdev(kobj);		\
 	struct bat_priv *bat_priv = netdev_priv(net_dev);		\
@@ -100,19 +101,19 @@ ssize_t store_##_name(struct kobject *kobj, struct attribute *attr,	\
 				 attr, &bat_priv->_name, net_dev);	\
 }
 
-#define BAT_ATTR_SHOW_UINT(_name)					\
-ssize_t show_##_name(struct kobject *kobj, struct attribute *attr,	\
-			    char *buff)					\
+#define BAT_ATTR_SIF_SHOW_UINT(_name)					\
+ssize_t show_##_name(struct kobject *kobj,				\
+		     struct attribute *attr, char *buff)		\
 {									\
 	struct bat_priv *bat_priv = kobj_to_batpriv(kobj);		\
 	return sprintf(buff, "%i\n", atomic_read(&bat_priv->_name));	\
 }									\
 
-/* Use this, if you are going to set [name] in bat_priv to unsigned integer
- * values only */
-#define BAT_ATTR_UINT(_name, _mode, _min, _max, _post_func)		\
-	static BAT_ATTR_STORE_UINT(_name, _min, _max, _post_func)	\
-	static BAT_ATTR_SHOW_UINT(_name)				\
+/* Use this, if you are going to set [name] in the soft-interface
+ * (bat_priv) to an unsigned integer value */
+#define BAT_ATTR_SIF_UINT(_name, _mode, _min, _max, _post_func)		\
+	static BAT_ATTR_SIF_STORE_UINT(_name, _min, _max, _post_func)	\
+	static BAT_ATTR_SIF_SHOW_UINT(_name)				\
 	static BAT_ATTR(_name, _mode, show_##_name, store_##_name)
 
 
@@ -384,24 +385,24 @@ static ssize_t store_gw_bwidth(struct kobject *kobj, struct attribute *attr,
 	return gw_bandwidth_set(net_dev, buff, count);
 }
 
-BAT_ATTR_BOOL(aggregated_ogms, S_IRUGO | S_IWUSR, NULL);
-BAT_ATTR_BOOL(bonding, S_IRUGO | S_IWUSR, NULL);
+BAT_ATTR_SIF_BOOL(aggregated_ogms, S_IRUGO | S_IWUSR, NULL);
+BAT_ATTR_SIF_BOOL(bonding, S_IRUGO | S_IWUSR, NULL);
 #ifdef CONFIG_BATMAN_ADV_BLA
-BAT_ATTR_BOOL(bridge_loop_avoidance, S_IRUGO | S_IWUSR, NULL);
+BAT_ATTR_SIF_BOOL(bridge_loop_avoidance, S_IRUGO | S_IWUSR, NULL);
 #endif
-BAT_ATTR_BOOL(fragmentation, S_IRUGO | S_IWUSR, update_min_mtu);
-BAT_ATTR_BOOL(ap_isolation, S_IRUGO | S_IWUSR, NULL);
+BAT_ATTR_SIF_BOOL(fragmentation, S_IRUGO | S_IWUSR, update_min_mtu);
+BAT_ATTR_SIF_BOOL(ap_isolation, S_IRUGO | S_IWUSR, NULL);
 static BAT_ATTR(vis_mode, S_IRUGO | S_IWUSR, show_vis_mode, store_vis_mode);
 static BAT_ATTR(routing_algo, S_IRUGO, show_bat_algo, NULL);
 static BAT_ATTR(gw_mode, S_IRUGO | S_IWUSR, show_gw_mode, store_gw_mode);
-BAT_ATTR_UINT(orig_interval, S_IRUGO | S_IWUSR, 2 * JITTER, INT_MAX, NULL);
-BAT_ATTR_UINT(hop_penalty, S_IRUGO | S_IWUSR, 0, TQ_MAX_VALUE, NULL);
-BAT_ATTR_UINT(gw_sel_class, S_IRUGO | S_IWUSR, 1, TQ_MAX_VALUE,
-	      post_gw_deselect);
+BAT_ATTR_SIF_UINT(orig_interval, S_IRUGO | S_IWUSR, 2 * JITTER, INT_MAX, NULL);
+BAT_ATTR_SIF_UINT(hop_penalty, S_IRUGO | S_IWUSR, 0, TQ_MAX_VALUE, NULL);
+BAT_ATTR_SIF_UINT(gw_sel_class, S_IRUGO | S_IWUSR, 1, TQ_MAX_VALUE,
+		  post_gw_deselect);
 static BAT_ATTR(gw_bandwidth, S_IRUGO | S_IWUSR, show_gw_bwidth,
 		store_gw_bwidth);
 #ifdef CONFIG_BATMAN_ADV_DEBUG
-BAT_ATTR_UINT(log_level, S_IRUGO | S_IWUSR, 0, 15, NULL);
+BAT_ATTR_SIF_UINT(log_level, S_IRUGO | S_IWUSR, 0, 15, NULL);
 #endif
 
 static struct bat_attribute *mesh_attrs[] = {
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 11/15] batman-adv: Adding hard_iface specific sysfs wrapper macros for UINT
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Linus Luessing, Marek Lindner,
	Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Linus Luessing <linus.luessing@web.de>

This allows us to easily add a sysfs parameter for an unsigned int
later, which is not for a batman mesh interface (e.g. bat0), but for a
common interface instead. It allows reading and writing an atomic_t in
hard_iface (instead of bat_priv compared to the mesh variant).

Developed by Linus during a 6 months trainee study period in Ascom
(Switzerland) AG.

Signed-off-by: Linus Luessing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_sysfs.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/net/batman-adv/bat_sysfs.c b/net/batman-adv/bat_sysfs.c
index 913299d..5bc7b66 100644
--- a/net/batman-adv/bat_sysfs.c
+++ b/net/batman-adv/bat_sysfs.c
@@ -117,6 +117,49 @@ ssize_t show_##_name(struct kobject *kobj,				\
 	static BAT_ATTR(_name, _mode, show_##_name, store_##_name)
 
 
+#define BAT_ATTR_HIF_STORE_UINT(_name, _min, _max, _post_func)		\
+ssize_t store_##_name(struct kobject *kobj, struct attribute *attr,	\
+		      char *buff, size_t count)				\
+{									\
+	struct net_device *net_dev = kobj_to_netdev(kobj);		\
+	struct hard_iface *hard_iface = hardif_get_by_netdev(net_dev);	\
+	ssize_t length;							\
+									\
+	if (!hard_iface)						\
+		return 0;						\
+									\
+	length = __store_uint_attr(buff, count, _min, _max, _post_func,	\
+				   attr, &hard_iface->_name, net_dev);	\
+									\
+	hardif_free_ref(hard_iface);					\
+	return length;							\
+}
+
+#define BAT_ATTR_HIF_SHOW_UINT(_name)					\
+ssize_t show_##_name(struct kobject *kobj,				\
+		     struct attribute *attr, char *buff)		\
+{									\
+	struct net_device *net_dev = kobj_to_netdev(kobj);		\
+	struct hard_iface *hard_iface = hardif_get_by_netdev(net_dev);	\
+	ssize_t length;							\
+									\
+	if (!hard_iface)						\
+		return 0;						\
+									\
+	length = sprintf(buff, "%i\n", atomic_read(&hard_iface->_name));\
+									\
+	hardif_free_ref(hard_iface);					\
+	return length;							\
+}
+
+/* Use this, if you are going to set [name] in hard_iface to an
+ * unsigned integer value*/
+#define BAT_ATTR_HIF_UINT(_name, _mode, _min, _max, _post_func)		\
+	static BAT_ATTR_HIF_STORE_UINT(_name, _min, _max, _post_func)	\
+	static BAT_ATTR_HIF_SHOW_UINT(_name)				\
+	static BAT_ATTR(_name, _mode, show_##_name, store_##_name)
+
+
 static int store_bool_attr(char *buff, size_t count,
 			   struct net_device *net_dev,
 			   const char *attr_name, atomic_t *attr)
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 15/15] batman-adv: add contributor name
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

translation_table.{c,h} have been heavily modified by another contributor and
for legal purposes it is better to include his name into the contributor list

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/translation-table.c |    2 +-
 net/batman-adv/translation-table.h |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index a38d315..2cb46f0 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -1,7 +1,7 @@
 /*
  * Copyright (C) 2007-2012 B.A.T.M.A.N. contributors:
  *
- * Marek Lindner, Simon Wunderlich
+ * Marek Lindner, Simon Wunderlich, Antonio Quartulli
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
diff --git a/net/batman-adv/translation-table.h b/net/batman-adv/translation-table.h
index bfebe26..593d1b3 100644
--- a/net/batman-adv/translation-table.h
+++ b/net/batman-adv/translation-table.h
@@ -1,7 +1,7 @@
 /*
  * Copyright (C) 2007-2012 B.A.T.M.A.N. contributors:
  *
- * Marek Lindner, Simon Wunderlich
+ * Marek Lindner, Simon Wunderlich, Antonio Quartulli
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 13/15] batman-adv: fix checkpatch string complaint
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

Regression introduced by: f76d019194e0a88c57371df169ecc979690a04c2

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_iv_ogm.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index bafb473..abd10c4 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -1061,8 +1061,8 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 
 	if (batman_ogm_packet->flags & NOT_BEST_NEXT_HOP) {
 		bat_dbg(DBG_BATMAN, bat_priv,
-			"Drop packet: ignoring all packets not forwarded from "
-			"the best next hop (sender: %pM)\n", ethhdr->h_source);
+			"Drop packet: ignoring all packets not forwarded from the best next hop (sender: %pM)\n",
+			ethhdr->h_source);
 		return;
 	}
 
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 12/15] batman-adv: avoid temporary routing loops by being strict on forwarded OGMs
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

batman-adv would forward OGMs from non-besthops while replacing the the TQ
and TTL values with the values from the best hop. In certain corner cases
this leads to a temporary routing loop.
This patch changes this behavior: Only packets from best next hops are
forwarded - TQ and TTL values won't be replaced anymore. However, the protocol
needs to rebroadcast OGMs from single hop neighbors regardless of whether or
not they are the best hop. To handle this case a new flag is introduced to
alert neighboring nodes about the forwarded OGM that is not from my best
next hop. It is to be discarded by all nodes except for the one originating
the OGM.

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Daniele Furlan <daniele.furlan@gmail.com>
Tested-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_iv_ogm.c |   60 ++++++++++++++++++++++---------------------
 net/batman-adv/packet.h     |    1 +
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index 9074e74..bafb473 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -507,11 +507,10 @@ static void bat_iv_ogm_forward(struct orig_node *orig_node,
 			       const struct ethhdr *ethhdr,
 			       struct batman_ogm_packet *batman_ogm_packet,
 			       bool is_single_hop_neigh,
+			       bool is_from_best_next_hop,
 			       struct hard_iface *if_incoming)
 {
 	struct bat_priv *bat_priv = netdev_priv(if_incoming->soft_iface);
-	struct neigh_node *router;
-	uint8_t in_tq, in_ttl, tq_avg = 0;
 	uint8_t tt_num_changes;
 
 	if (batman_ogm_packet->header.ttl <= 1) {
@@ -519,41 +518,30 @@ static void bat_iv_ogm_forward(struct orig_node *orig_node,
 		return;
 	}
 
-	router = orig_node_get_router(orig_node);
+	if (!is_from_best_next_hop) {
+		/* Mark the forwarded packet when it is not coming from our
+		 * best next hop. We still need to forward the packet for our
+		 * neighbor link quality detection to work in case the packet
+		 * originated from a single hop neighbor. Otherwise we can
+		 * simply drop the ogm.
+		 */
+		if (is_single_hop_neigh)
+			batman_ogm_packet->flags |= NOT_BEST_NEXT_HOP;
+		else
+			return;
+	}
 
-	in_tq = batman_ogm_packet->tq;
-	in_ttl = batman_ogm_packet->header.ttl;
 	tt_num_changes = batman_ogm_packet->tt_num_changes;
 
 	batman_ogm_packet->header.ttl--;
 	memcpy(batman_ogm_packet->prev_sender, ethhdr->h_source, ETH_ALEN);
 
-	/* rebroadcast tq of our best ranking neighbor to ensure the rebroadcast
-	 * of our best tq value */
-	if (router && router->tq_avg != 0) {
-
-		/* rebroadcast ogm of best ranking neighbor as is */
-		if (!compare_eth(router->addr, ethhdr->h_source)) {
-			batman_ogm_packet->tq = router->tq_avg;
-
-			if (router->last_ttl)
-				batman_ogm_packet->header.ttl =
-					router->last_ttl - 1;
-		}
-
-		tq_avg = router->tq_avg;
-	}
-
-	if (router)
-		neigh_node_free_ref(router);
-
 	/* apply hop penalty */
 	batman_ogm_packet->tq = hop_penalty(batman_ogm_packet->tq, bat_priv);
 
 	bat_dbg(DBG_BATMAN, bat_priv,
-		"Forwarding packet: tq_orig: %i, tq_avg: %i, tq_forw: %i, ttl_orig: %i, ttl_forw: %i\n",
-		in_tq, tq_avg, batman_ogm_packet->tq, in_ttl - 1,
-		batman_ogm_packet->header.ttl);
+		"Forwarding packet: tq: %i, ttl: %i\n",
+		batman_ogm_packet->tq, batman_ogm_packet->header.ttl);
 
 	batman_ogm_packet->seqno = htonl(batman_ogm_packet->seqno);
 	batman_ogm_packet->tt_crc = htons(batman_ogm_packet->tt_crc);
@@ -949,6 +937,7 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 	int is_my_addr = 0, is_my_orig = 0, is_my_oldorig = 0;
 	int is_broadcast = 0, is_bidirectional;
 	bool is_single_hop_neigh = false;
+	bool is_from_best_next_hop = false;
 	int is_duplicate;
 	uint32_t if_incoming_seqno;
 
@@ -1070,6 +1059,13 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 		return;
 	}
 
+	if (batman_ogm_packet->flags & NOT_BEST_NEXT_HOP) {
+		bat_dbg(DBG_BATMAN, bat_priv,
+			"Drop packet: ignoring all packets not forwarded from "
+			"the best next hop (sender: %pM)\n", ethhdr->h_source);
+		return;
+	}
+
 	orig_node = get_orig_node(bat_priv, batman_ogm_packet->orig);
 	if (!orig_node)
 		return;
@@ -1094,6 +1090,10 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 	if (router)
 		router_router = orig_node_get_router(router->orig_node);
 
+	if ((router && router->tq_avg != 0) &&
+	    (compare_eth(router->addr, ethhdr->h_source)))
+		is_from_best_next_hop = true;
+
 	/* avoid temporary routing loops */
 	if (router && router_router &&
 	    (compare_eth(router->addr, batman_ogm_packet->prev_sender)) &&
@@ -1144,7 +1144,8 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 
 		/* mark direct link on incoming interface */
 		bat_iv_ogm_forward(orig_node, ethhdr, batman_ogm_packet,
-				   is_single_hop_neigh, if_incoming);
+				   is_single_hop_neigh, is_from_best_next_hop,
+				   if_incoming);
 
 		bat_dbg(DBG_BATMAN, bat_priv,
 			"Forwarding packet: rebroadcast neighbor packet with direct link flag\n");
@@ -1167,7 +1168,8 @@ static void bat_iv_ogm_process(const struct ethhdr *ethhdr,
 	bat_dbg(DBG_BATMAN, bat_priv,
 		"Forwarding packet: rebroadcast originator packet\n");
 	bat_iv_ogm_forward(orig_node, ethhdr, batman_ogm_packet,
-			   is_single_hop_neigh, if_incoming);
+			   is_single_hop_neigh, is_from_best_next_hop,
+			   if_incoming);
 
 out_neigh:
 	if ((orig_neigh_node) && (!is_single_hop_neigh))
diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h
index f54969c..0ee1af7 100644
--- a/net/batman-adv/packet.h
+++ b/net/batman-adv/packet.h
@@ -39,6 +39,7 @@ enum bat_packettype {
 #define COMPAT_VERSION 14
 
 enum batman_iv_flags {
+	NOT_BEST_NEXT_HOP   = 1 << 3,
 	PRIMARIES_FIRST_HOP = 1 << 4,
 	VIS_SERVER	    = 1 << 5,
 	DIRECTLINK	    = 1 << 6
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 14/15] batman-adv: update copyright years
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

update copyright years in order to include 2012

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bridge_loop_avoidance.c |    2 +-
 net/batman-adv/bridge_loop_avoidance.h |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/bridge_loop_avoidance.c b/net/batman-adv/bridge_loop_avoidance.c
index ad394c6..8bf9751 100644
--- a/net/batman-adv/bridge_loop_avoidance.c
+++ b/net/batman-adv/bridge_loop_avoidance.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2011 B.A.T.M.A.N. contributors:
+ * Copyright (C) 2011-2012 B.A.T.M.A.N. contributors:
  *
  * Simon Wunderlich
  *
diff --git a/net/batman-adv/bridge_loop_avoidance.h b/net/batman-adv/bridge_loop_avoidance.h
index 4a8e4fc..e39f93a 100644
--- a/net/batman-adv/bridge_loop_avoidance.h
+++ b/net/batman-adv/bridge_loop_avoidance.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2011 B.A.T.M.A.N. contributors:
+ * Copyright (C) 2011-2012 B.A.T.M.A.N. contributors:
  *
  * Simon Wunderlich
  *
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 09/15] batman-adv: refactoring API: find generalized name for bat_ogm_update_mac callback
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_iv_ogm.c     |   24 ++++++++++++------------
 net/batman-adv/hard-interface.c |    4 ++--
 net/batman-adv/main.c           |    2 +-
 net/batman-adv/types.h          |    6 ++++--
 4 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index 37e368d..9074e74 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -93,6 +93,17 @@ static void bat_iv_ogm_iface_disable(struct hard_iface *hard_iface)
 	hard_iface->packet_buff = NULL;
 }
 
+static void bat_iv_ogm_iface_update_mac(struct hard_iface *hard_iface)
+{
+	struct batman_ogm_packet *batman_ogm_packet;
+
+	batman_ogm_packet = (struct batman_ogm_packet *)hard_iface->packet_buff;
+	memcpy(batman_ogm_packet->orig,
+	       hard_iface->net_dev->dev_addr, ETH_ALEN);
+	memcpy(batman_ogm_packet->prev_sender,
+	       hard_iface->net_dev->dev_addr, ETH_ALEN);
+}
+
 static void bat_iv_ogm_primary_iface_set(struct hard_iface *hard_iface)
 {
 	struct batman_ogm_packet *batman_ogm_packet;
@@ -102,17 +113,6 @@ static void bat_iv_ogm_primary_iface_set(struct hard_iface *hard_iface)
 	batman_ogm_packet->header.ttl = TTL;
 }
 
-static void bat_iv_ogm_update_mac(struct hard_iface *hard_iface)
-{
-	struct batman_ogm_packet *batman_ogm_packet;
-
-	batman_ogm_packet = (struct batman_ogm_packet *)hard_iface->packet_buff;
-	memcpy(batman_ogm_packet->orig,
-	       hard_iface->net_dev->dev_addr, ETH_ALEN);
-	memcpy(batman_ogm_packet->prev_sender,
-	       hard_iface->net_dev->dev_addr, ETH_ALEN);
-}
-
 /* when do we schedule our own ogm to be sent */
 static unsigned long bat_iv_ogm_emit_send_time(const struct bat_priv *bat_priv)
 {
@@ -1236,8 +1236,8 @@ static struct bat_algo_ops batman_iv __read_mostly = {
 	.name = "BATMAN IV",
 	.bat_iface_enable = bat_iv_ogm_iface_enable,
 	.bat_iface_disable = bat_iv_ogm_iface_disable,
+	.bat_iface_update_mac = bat_iv_ogm_iface_update_mac,
 	.bat_primary_iface_set = bat_iv_ogm_primary_iface_set,
-	.bat_ogm_update_mac = bat_iv_ogm_update_mac,
 	.bat_ogm_schedule = bat_iv_ogm_schedule,
 	.bat_ogm_emit = bat_iv_ogm_emit,
 };
diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index 95f869c..0b84bb1 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -228,7 +228,7 @@ static void hardif_activate_interface(struct hard_iface *hard_iface)
 
 	bat_priv = netdev_priv(hard_iface->soft_iface);
 
-	bat_priv->bat_algo_ops->bat_ogm_update_mac(hard_iface);
+	bat_priv->bat_algo_ops->bat_iface_update_mac(hard_iface);
 	hard_iface->if_status = IF_TO_BE_ACTIVATED;
 
 	/**
@@ -524,7 +524,7 @@ static int hard_if_event(struct notifier_block *this,
 		check_known_mac_addr(hard_iface->net_dev);
 
 		bat_priv = netdev_priv(hard_iface->soft_iface);
-		bat_priv->bat_algo_ops->bat_ogm_update_mac(hard_iface);
+		bat_priv->bat_algo_ops->bat_iface_update_mac(hard_iface);
 
 		primary_if = primary_if_get_selected(bat_priv);
 		if (!primary_if)
diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index f80c447..083a299 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -329,8 +329,8 @@ int bat_algo_register(struct bat_algo_ops *bat_algo_ops)
 	/* all algorithms must implement all ops (for now) */
 	if (!bat_algo_ops->bat_iface_enable ||
 	    !bat_algo_ops->bat_iface_disable ||
+	    !bat_algo_ops->bat_iface_update_mac ||
 	    !bat_algo_ops->bat_primary_iface_set ||
-	    !bat_algo_ops->bat_ogm_update_mac ||
 	    !bat_algo_ops->bat_ogm_schedule ||
 	    !bat_algo_ops->bat_ogm_emit) {
 		pr_info("Routing algo '%s' does not implement required ops\n",
diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h
index 9fa8b73..66a3750 100644
--- a/net/batman-adv/types.h
+++ b/net/batman-adv/types.h
@@ -381,10 +381,12 @@ struct bat_algo_ops {
 	int (*bat_iface_enable)(struct hard_iface *hard_iface);
 	/* de-init routing info when hard-interface is disabled */
 	void (*bat_iface_disable)(struct hard_iface *hard_iface);
+	/* (re-)init mac addresses of the protocol information
+	 * belonging to this hard-interface
+	 */
+	void (*bat_iface_update_mac)(struct hard_iface *hard_iface);
 	/* called when primary interface is selected / changed */
 	void (*bat_primary_iface_set)(struct hard_iface *hard_iface);
-	/* init mac addresses of the OGM belonging to this hard-interface */
-	void (*bat_ogm_update_mac)(struct hard_iface *hard_iface);
 	/* prepare a new outgoing OGM for the send queue */
 	void (*bat_ogm_schedule)(struct hard_iface *hard_iface,
 				 int tt_num_changes);
-- 
1.7.9.4

^ permalink raw reply related

* [PATCH 08/15] batman-adv: ignore protocol packets if the interface did not enable this protocol
From: Antonio Quartulli @ 2012-05-11 12:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n, Marek Lindner, Antonio Quartulli
In-Reply-To: <1336738892-7401-1-git-send-email-ordex@autistici.org>

From: Marek Lindner <lindner_marek@yahoo.de>

Reported-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
---
 net/batman-adv/bat_iv_ogm.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c
index ae0a08c..37e368d 100644
--- a/net/batman-adv/bat_iv_ogm.c
+++ b/net/batman-adv/bat_iv_ogm.c
@@ -1186,6 +1186,7 @@ out:
 static int bat_iv_ogm_receive(struct sk_buff *skb,
 			      struct hard_iface *if_incoming)
 {
+	struct bat_priv *bat_priv = netdev_priv(if_incoming->soft_iface);
 	struct batman_ogm_packet *batman_ogm_packet;
 	struct ethhdr *ethhdr;
 	int buff_pos = 0, packet_len;
@@ -1196,6 +1197,12 @@ static int bat_iv_ogm_receive(struct sk_buff *skb,
 	if (!ret)
 		return NET_RX_DROP;
 
+	/* did we receive a B.A.T.M.A.N. IV OGM packet on an interface
+	 * that does not have B.A.T.M.A.N. IV enabled ?
+	 */
+	if (bat_priv->bat_algo_ops->bat_ogm_emit != bat_iv_ogm_emit)
+		return NET_RX_DROP;
+
 	packet_len = skb_headlen(skb);
 	ethhdr = (struct ethhdr *)skb_mac_header(skb);
 	packet_buff = skb->data;
-- 
1.7.9.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox