Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH] vxge: Remember to release firmware after upgrading firmware {nodisc}
From: Ramkrishna Vepa @ 2011-01-13 20:35 UTC (permalink / raw)
  To: Jesper Juhl, netdev@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, Jon Mason, Sivakumar Subramani,
	Sreenivasa Honnur
In-Reply-To: <alpine.LNX.2.00.1101132119080.11347@swampdragon.chaosbits.net>

> Regardless of whether the firmware update being performed by
> vxge_fw_upgrade() is a success or not we must still remember to always
> release_firmware() before returning.
> 
> Signed-off-by: Jesper Juhl <jj@chaosbits.net>
> ---
>  vxge-main.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/vxge/vxge-main.c b/drivers/net/vxge/vxge-main.c
> index 1ac9b56..c81a651 100644
> --- a/drivers/net/vxge/vxge-main.c
> +++ b/drivers/net/vxge/vxge-main.c
> @@ -4120,6 +4120,7 @@ int vxge_fw_upgrade(struct vxgedev *vdev, char
> *fw_name, int override)
>  	       "hotplug event.\n");
> 
>  out:
> +	release_firmware(fw);
>  	return ret;
>  }
Thanks!

Acked-by: Ram Vepa <ram.vepa@exar.com>

^ permalink raw reply

* Re: [PATCH V9 02/13] ntp: add ADJ_SETOFFSET mode bit
From: Kuwahara,T. @ 2011-01-13 20:39 UTC (permalink / raw)
  To: Richard Cochran
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
	John Stultz, Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti,
	Thomas Gleixner
In-Reply-To: <60566a54842bcf5974d55ed39f387c32ff9cf5cb.1294917348.git.richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>

On Thu, Jan 13, 2011 at 8:32 PM, Richard Cochran
<richardcochran-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> This patch adds a new mode bit into the timex structure. When set, the bit
> instructs the kernel to add the given time value to the current time.

The unix time is a nonlinear timescale and there's no way to predict
how many leap seconds will be inserted/deleted during the given time
interval.  So it is impossible to convert relative time to absolute
time and thus your patch is broken. (More specifically, the
timekeeping_inject_offset function is broken.)

My proposal: Limit the adjustable range of the offset so that leap
seconds will never occur more than once. (2147.5 seconds would be the
best choice. :-)

^ permalink raw reply

* Re: [PATCH] CHOKe flow scheduler (0.7)
From: Eric Dumazet @ 2011-01-13 20:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20110113092706.154748c2@s6510>

Le jeudi 13 janvier 2011 à 09:27 -0800, Stephen Hemminger a écrit :
> +	/* Admit new packet */
> +	if (likely(choke_len(q) < q->limit)) {
> +
> +		q->tab[q->tail] = skb;
> +		q->tail = (q->tail + 1) & q->tab_mask;
> +
> +		sch->qstats.backlog += qdisc_pkt_len(skb);
> +		qdisc_update_bstats(sch, skb);
> +		sch->q.qlen = choke_len(q) - q->holes;
> +		return NET_XMIT_SUCCESS;
> +	}
> +

You now must use qdisc_bstats_update() ;)




^ permalink raw reply

* Re: [PATCH 2.6.36] vlan: Avoid hwaccel vlan packets when vid not used
From: Matt Carlson @ 2011-01-13 20:50 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Matthew Carlson, Michael Leun, Michael Chan, Eric Dumazet,
	David Miller, Ben Greear, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <AANLkTimz3Mw5itChtTgHTeGT4seJauy3+FVrpEJn1iez@mail.gmail.com>

On Thu, Jan 13, 2011 at 07:06:22AM -0800, Jesse Gross wrote:
> On Wed, Jan 12, 2011 at 8:21 PM, Matt Carlson <mcarlson@broadcom.com> wrote:
> > On Thu, Jan 06, 2011 at 08:36:27PM -0800, Jesse Gross wrote:
> >> On Thu, Jan 6, 2011 at 10:24 PM, Matt Carlson <mcarlson@broadcom.com> wrote:
> >> > On Sat, Dec 18, 2010 at 07:38:00PM -0800, Jesse Gross wrote:
> >> >> On Tue, Dec 14, 2010 at 11:16 PM, Michael Leun
> >> >> <lkml20101129@newton.leun.net> wrote:
> >> >> > OK - all tests done on that DL320G5:
> >> >> >
> >> >> > For completeness, 2.6.37-rc5 unpatched:
> >> >> >
> >> >> > eth0, no vlan configured: totally broken - see double tagged vlans
> >> >> > without tag, single or untagged packets missing at all
> >> >>
> >> >> Random behavior? ?This one is somewhat hard to explain - maybe there
> >> >> are some other factors. ?eth0 has ASF on, so it always strips tags. ?I
> >> >> would expect it to behave like the vlan configured case.
> >> >>
> >> >> >
> >> >> > eth0, vlan configured: see packets without vlan tag (see double tagged
> >> >> > packets with one vlan tag)
> >> >>
> >> >> Both ASF and vlan group configured cause tag stripping to be enabled.
> >> >> Missing tag.
> >> >>
> >> >> >
> >> >> > eth1 same as originally reported:
> >> >> > without vlan configured see vlan tags (single and double tagged as
> >> >> > expected)
> >> >>
> >> >> No ASF and no vlan group means tag stripping is disabled. ?Have tag.
> >> >>
> >> >> > with vlan configured: see packets without vlan tag (see double tagged
> >> >> > packets with one vlan tag)
> >> >>
> >> >> Configuring vlan group causes stripping to be enabled. ?Missing tag.
> >> >>
> >> >> >
> >> >> >
> >> >> > 2.6.37-rc5, your tg3 use new vlan-code patch:
> >> >> >
> >> >> > eth0, no vlan configured: ?see packets without vlan tag (see double
> >> >> > tagged packets with one vlan tag)
> >> >>
> >> >> ASF enables tag stripping. ?Missing tag.
> >> >>
> >> >> > eth1, no vlan configured: see vlan tags (single and double tagged as
> >> >> > expected)
> >> >>
> >> >> No ASF, no vlan group means no stripping. ?Have tag.
> >> >>
> >> >> >
> >> >> >
> >> >> > eth0, vlan configured: as without vlan
> >> >>
> >> >> ASF enables stripping. ?Missing tag.
> >> >>
> >> >> > eth1, vlan configured: as without vlan
> >> >>
> >> >> With this patch vlan stripping is only enabled when ASF is on, so no
> >> >> stripping. ?Have tag.
> >> >>
> >> >> >
> >> >> > 2.6.37-rc5, your tg3 use new vlan-code patch with test patch ontop
> >> >> >
> >> >> > eth1 no vlan configured: see packets without vlan tag (see double tagged
> >> >> > packets with one vlan tag)
> >> >>
> >> >> With the second patch, vlan stripping is always enabled. ?Missing tag.
> >> >>
> >> >> > eth1 with vlan: the same
> >> >>
> >> >> Stripping still always enabled. ?Missing tag.
> >> >>
> >> >> The bottom line is whenever vlan stripping is enabled we're missing
> >> >> the outer tag. ?It might be worth adding some debugging in the area
> >> >> before napi_gro_receive/vlan_gro_receive (depending on version). ?My
> >> >> guess is that (desc->type_flags & RXD_FLAG_VLAN) is false even for
> >> >> vlan packets on this NIC.
> >> >>
> >> >> You said that everything works on the 5752? ?Matt, is it possible that
> >> >> the 5714 either has a problem with vlan stripping or a different way
> >> >> of reporting it?
> >> >
> >> > I don't think this is a 5714 specific issue. ?I think the problem is
> >> > rooted in the fact that the VLAN tag stripping is enabled.
> >>
> >> It's definitely related to vlan stripping being enabled. ?Other cards
> >> using tg3 seem to work fine with stripping though, which is why I
> >> thought it might be specific to the 5714.
> >
> > I just tested this on a 5714S, using a net-next-2.6 snapshot obtained
> > today. ?It does the right thing in both cases (2nd tg3 patch ommited /
> > applied). ?The tag is always visible in the packet stream as seen from
> > tcpdump.
> >
> >> > Your RXD_FLAG_VLAN idea sounds unlikely to me, but it's worth a check.
> >> >
> >> > The patch here is using __vlan_hwaccel_put_tag(), which informs the
> >> > stack a VLAN tag is present. ?If this is indeed a reporting problem, I'm
> >> > not sure what else the driver should be doing.
> >>
> >> The code to hand off the tag to the stack looks OK to me. ?Michael was
> >> seeing this on older versions of the kernel as well with this NIC,
> >> which predates both this patch and the larger vlan changes so it
> >> doesn't seem like a problem with passing the tag to the network stack.
> >> ?It's hard to know exactly what is going on though without seeing what
> >> the hardware is reporting.
> >
> > When RX_MODE_KEEP_VLAN_TAG is set, the RXD_FLAG_VLAN flag will not be set
> > when receiving a packet. ?The driver skips the __vlan_hwaccel_put_tag()
> > call.
> >
> > When RX_MODE_KEEP_VLAN_TAG is unset, the RXD_FLAG_VLAN flag is set, and
> > __vlan_hwaccel_put_tag() is called to reinject the packet.
> 
> OK, thanks for testing it out.  I'm not sure that there's anything
> more we can do without hearing from Michael.

In the meantime, I think what we have should go upstream.  Just to be
absolutely clear though, your position is that VLAN tags should always
be stripped?


^ permalink raw reply

* [PATCH] Even Batman should not dereference NULL pointers
From: Jesper Juhl @ 2011-01-13 20:53 UTC (permalink / raw)
  To: b.a.t.m.a.n
  Cc: netdev, linux-kernel, Marek Lindner, Simon Wunderlich,
	Sven Eckelmann, David S. Miller

There's a problem in net/batman-adv/unicast.c::frag_send_skb().
dev_alloc_skb() allocates memory and may fail, thus returning NULL. If 
this happens we'll pass a NULL pointer on to skb_split() which in turn 
hands it to skb_split_inside_header() from where it gets passed to 
skb_put() that lets skb_tail_pointer() play with it and that function 
dereferences it. And thus the bat dies.

While I was at it I also moved the call to dev_alloc_skb() above the 
assignment to 'unicast_packet' since there's no reason to do that 
assignment if the memory allocation fails.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
---
 unicast.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/unicast.c b/net/batman-adv/unicast.c
index dc2e28b..ee41fef 100644
--- a/net/batman-adv/unicast.c
+++ b/net/batman-adv/unicast.c
@@ -229,10 +229,12 @@ int frag_send_skb(struct sk_buff *skb, struct bat_priv *bat_priv,
 	if (!bat_priv->primary_if)
 		goto dropped;
 
-	unicast_packet = (struct unicast_packet *) skb->data;
+	frag_skb = dev_alloc_skb(data_len - (data_len / 2) + ucf_hdr_len);
+	if (!frag_skb)
+		goto dropped;
 
+	unicast_packet = (struct unicast_packet *) skb->data;
 	memcpy(&tmp_uc, unicast_packet, uc_hdr_len);
-	frag_skb = dev_alloc_skb(data_len - (data_len / 2) + ucf_hdr_len);
 	skb_split(skb, frag_skb, data_len / 2);
 
 	if (my_skb_head_push(skb, ucf_hdr_len - uc_hdr_len) < 0 ||


-- 
Jesper Juhl <jj@chaosbits.net>            http://www.chaosbits.net/
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please.


^ permalink raw reply related

* Re: [PATCH V9 02/13] ntp: add ADJ_SETOFFSET mode bit
From: john stultz @ 2011-01-13 20:57 UTC (permalink / raw)
  To: Kuwahara,T.
  Cc: Richard Cochran, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
	Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti,
	Thomas Gleixner
In-Reply-To: <AANLkTimHP1OsWauj6O566WgnxVTjMbNg2PK644QcT9Lq-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, 2011-01-14 at 05:39 +0900, Kuwahara,T. wrote:
> My proposal: Limit the adjustable range of the offset so that leap
> seconds will never occur more than once. (2147.5 seconds would be the
> best choice. :-)

2147.5? That's ~36 minutes. 

While I think a limit could be a sensible compromise here. Leap seconds
are limited to every six months. So surely a limit of 86400 (one day),
or 2592000 (30 days) would be more reasonable.

thanks
-john

^ permalink raw reply

* Re: [PATCH] CHOKe flow scheduler (iproute)
From: Eric Dumazet @ 2011-01-13 21:01 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20110113094847.565add51@s6510>

Le jeudi 13 janvier 2011 à 09:48 -0800, Stephen Hemminger a écrit :
> Preliminary interface for CHOKe scheduler in iproute
> 
> +		return -1;
> +	fprintf(f, "limit %s min %s max %s ",
> +		sprint_size(qopt->limit, b1),
> +		sprint_size(qopt->qth_min, b2),
> +		sprint_size(qopt->qth_max, b3));
> +

sprint_size() adds 'b', not 'p' (packets)

fprintf(f, "limit %up min %up max %up ", qopt->limit, ...





^ permalink raw reply

* Re: [PATCH v4 05/10] net/fec: add dual fec support for mx28
From: Uwe Kleine-König @ 2011-01-13 21:06 UTC (permalink / raw)
  To: Shawn Guo
  Cc: davem, gerg, baruch, eric, bryan.wu, r64343, B32542, lw, w.sang,
	s.hauer, jamie, jamie, netdev, linux-arm-kernel
In-Reply-To: <1294297998-26930-6-git-send-email-shawn.guo@freescale.com>

Hello again,

On Thu, Jan 06, 2011 at 03:13:13PM +0800, Shawn Guo wrote:
>  static netdev_tx_t
>  fec_enet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct fec_enet_private *fep = netdev_priv(dev);
> +	const struct platform_device_id *id_entry =
> +				platform_get_device_id(fep->pdev);
>  	struct bufdesc *bdp;
>  	void *bufaddr;
>  	unsigned short	status;
> @@ -256,6 +288,14 @@ fec_enet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  		bufaddr = fep->tx_bounce[index];
>  	}
>  
> +	/*
> +	 * Some design made an incorrect assumption on endian mode of
> +	 * the system that it's running on. As the result, driver has to
> +	 * swap every frame going to and coming from the controller.
> +	 */
> +	if (id_entry->driver_data & FEC_QUIRK_SWAP_FRAME)
> +		swap_buffer(bufaddr, skb->len);
> +
Is that save here?  bufaddr either points to a bounce buffer (which
should be OK definitely) or skb->data.  Or asked differently:  Is the
skb here owned by the driver such that it is allowed to write to it?
Does the driver eventually need to restore the original data?

Just before this if, there is some bounce buffer handling.  If it is not
OK to modify skb->data, the call to swap_buffer can easily be moved in
there.

>  	/* Save skb pointer */
>  	fep->tx_skbuff[fep->skb_cur] = skb;
>  
> @@ -424,6 +464,8 @@ static void
>  fec_enet_rx(struct net_device *dev)
>  {
>  	struct	fec_enet_private *fep = netdev_priv(dev);
> +	const struct platform_device_id *id_entry =
> +				platform_get_device_id(fep->pdev);
>  	struct bufdesc *bdp;
>  	unsigned short status;
>  	struct	sk_buff	*skb;
> @@ -487,6 +529,9 @@ fec_enet_rx(struct net_device *dev)
>  	        dma_unmap_single(NULL, bdp->cbd_bufaddr, bdp->cbd_datlen,
>          			DMA_FROM_DEVICE);
>  
> +		if (id_entry->driver_data & FEC_QUIRK_SWAP_FRAME)
> +			swap_buffer(data, pkt_len);
> +
Here I guess it's OK, the hardware just wrote to the buffer, so the skb
cannot be shared to anything else and the write is all right.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply

* Re: [PATCH] Even Batman should not dereference NULL pointers
From: Sven Eckelmann @ 2011-01-13 21:13 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: b.a.t.m.a.n, netdev, linux-kernel, Marek Lindner,
	Simon Wunderlich, David S. Miller
In-Reply-To: <alpine.LNX.2.00.1101132146400.11347@swampdragon.chaosbits.net>

[-- Attachment #1: Type: Text/Plain, Size: 678 bytes --]

On Thursday 13 January 2011 21:53:38 Jesper Juhl wrote:
> There's a problem in net/batman-adv/unicast.c::frag_send_skb().
> dev_alloc_skb() allocates memory and may fail, thus returning NULL. If
> this happens we'll pass a NULL pointer on to skb_split() which in turn
> hands it to skb_split_inside_header() from where it gets passed to
> skb_put() that lets skb_tail_pointer() play with it and that function
> dereferences it. And thus the bat dies.
> 
> While I was at it I also moved the call to dev_alloc_skb() above the
> assignment to 'unicast_packet' since there's no reason to do that
> assignment if the memory allocation fails.

Applied

Thanks,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [PATCH] USB CDC NCM: Don't deref NULL in cdc_ncm_rx_fixup() and don't use uninitialized variable.
From: Jesper Juhl @ 2011-01-13 21:40 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Oliver Neukum, Greg Kroah-Hartman,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alexey Orishko, Hans Petter Selasky

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1831 bytes --]

skb_clone() dynamically allocates memory and may fail. If it does it 
returns NULL. This means we'll dereference a NULL pointer in 
drivers/net/usb/cdc_ncm.c::cdc_ncm_rx_fixup().
As far as I can tell, the proper way to deal with this is simply to goto 
the error label.

Furthermore gcc complains that 'skb' may be used uninitialized:
  drivers/net/usb/cdc_ncm.c: In function ‘cdc_ncm_rx_fixup’:
  drivers/net/usb/cdc_ncm.c:922:18: warning: ‘skb’ may be used uninitialized in this function
and I believe it is right. On the line where we
  pr_debug("invalid frame detected (ignored)" ...
we are using the local variable 'skb' but nothing has ever been assigned 
to that variable yet. I believe the correct fix for that is to use 
'skb_in' instead.

Signed-off-by: Jesper Juhl <jj-IYz4IdjRLj0sV2N9l4h3zg@public.gmane.org>
---
 cdc_ncm.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

  compile tested only.

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 593c104..d776c4a 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -1021,13 +1021,15 @@ static int cdc_ncm_rx_fixup(struct usbnet *dev, struct sk_buff *skb_in)
 		    (temp > CDC_NCM_MAX_DATAGRAM_SIZE) || (temp < ETH_HLEN)) {
 			pr_debug("invalid frame detected (ignored)"
 				"offset[%u]=%u, length=%u, skb=%p\n",
-							x, offset, temp, skb);
+							x, offset, temp, skb_in);
 			if (!x)
 				goto error;
 			break;
 
 		} else {
 			skb = skb_clone(skb_in, GFP_ATOMIC);
+			if (!skb)
+				goto error;
 			skb->len = temp;
 			skb->data = ((u8 *)skb_in->data) + offset;
 			skb_set_tail_pointer(skb, temp);



-- 
Jesper Juhl <jj-IYz4IdjRLj0sV2N9l4h3zg@public.gmane.org>            http://www.chaosbits.net/
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please.

^ permalink raw reply related

* [PATCH] ipsec: update MAX_AH_AUTH_LEN to support sha512
From: Nicolas Dichtel @ 2011-01-13 21:51 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 198 bytes --]

Hi,

here is a second patch after "ah: update maximum truncated ICV length". Sorry 
for not being more watchful the first time.
512 is the maximum I found in net/xfrm/xfrm_algo.c


Regards,
Nicolas

[-- Attachment #2: 0001-ipsec-update-MAX_AH_AUTH_LEN-to-support-sha512.patch --]
[-- Type: text/x-patch, Size: 772 bytes --]

>From e330817aa2b33e9d1f44071072fdc4778acf8d76 Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 13 Jan 2011 14:20:19 -0500
Subject: [PATCH] ipsec: update MAX_AH_AUTH_LEN to support sha512

icv_truncbits is set to 256 for sha512, so update
MAX_AH_AUTH_LEN to 64.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/ah.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/net/ah.h b/include/net/ah.h
index be7798d..ca95b98 100644
--- a/include/net/ah.h
+++ b/include/net/ah.h
@@ -4,7 +4,7 @@
 #include <linux/skbuff.h>
 
 /* This is the maximum truncated ICV length that we know of. */
-#define MAX_AH_AUTH_LEN	16
+#define MAX_AH_AUTH_LEN	64
 
 struct crypto_ahash;
 
-- 
1.5.6.5


^ permalink raw reply related

* Re: [PATCH V9 02/13] ntp: add ADJ_SETOFFSET mode bit
From: john stultz @ 2011-01-13 21:52 UTC (permalink / raw)
  To: Kuwahara,T.
  Cc: Richard Cochran, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
	Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti,
	Thomas Gleixner
In-Reply-To: <1294952228.5617.20.camel@work-vm>

On Thu, 2011-01-13 at 12:57 -0800, john stultz wrote:
> On Fri, 2011-01-14 at 05:39 +0900, Kuwahara,T. wrote:
> > My proposal: Limit the adjustable range of the offset so that leap
> > seconds will never occur more than once. (2147.5 seconds would be the
> > best choice. :-)
> 
> 2147.5? That's ~36 minutes. 
> 
> While I think a limit could be a sensible compromise here. Leap seconds
> are limited to every six months. So surely a limit of 86400 (one day),
> or 2592000 (30 days) would be more reasonable.

Actually. Thinking about this some more (and having to try to
rationalize it for Thomas), this doesn't really seem reasonable. If an
application wants to adjust the time, we can leave the responsibility to
userland to handle compensation for expected or historical leapseconds
if they care.

Since if you're adjusting the time, especially by a large amount, you're
not starting from the correct time value. So expecting the kernel to
incorperate historical or planned future leapseconds (outside of the
TIME_INS notification from NTP) is silly. 

Really, in Linux leapseconds do not really exist except for the moment
they occur. If a leapsecond occurs, then you set your time back to
before the leapsecond, it won't happen a second time. 

So I don't think we need to enforce a limit. The interface suggested
seems totally reasonable to me. I'm happy to hear arguments to the
contrary, but they really must be more convincing.

thanks
-john

^ permalink raw reply

* Re: [PATCH 2.6.36] vlan: Avoid hwaccel vlan packets when vid not used
From: Jesse Gross @ 2011-01-13 21:58 UTC (permalink / raw)
  To: Matt Carlson
  Cc: Michael Leun, Michael Chan, Eric Dumazet, David Miller,
	Ben Greear, linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20110113205056.GA29254@mcarlson.broadcom.com>

On Thu, Jan 13, 2011 at 3:50 PM, Matt Carlson <mcarlson@broadcom.com> wrote:
> On Thu, Jan 13, 2011 at 07:06:22AM -0800, Jesse Gross wrote:
>> On Wed, Jan 12, 2011 at 8:21 PM, Matt Carlson <mcarlson@broadcom.com> wrote:
>> > On Thu, Jan 06, 2011 at 08:36:27PM -0800, Jesse Gross wrote:
>> >> On Thu, Jan 6, 2011 at 10:24 PM, Matt Carlson <mcarlson@broadcom.com> wrote:
>> >> > On Sat, Dec 18, 2010 at 07:38:00PM -0800, Jesse Gross wrote:
>> >> >> On Tue, Dec 14, 2010 at 11:16 PM, Michael Leun
>> >> >> <lkml20101129@newton.leun.net> wrote:
>> >> >> > OK - all tests done on that DL320G5:
>> >> >> >
>> >> >> > For completeness, 2.6.37-rc5 unpatched:
>> >> >> >
>> >> >> > eth0, no vlan configured: totally broken - see double tagged vlans
>> >> >> > without tag, single or untagged packets missing at all
>> >> >>
>> >> >> Random behavior? ?This one is somewhat hard to explain - maybe there
>> >> >> are some other factors. ?eth0 has ASF on, so it always strips tags. ?I
>> >> >> would expect it to behave like the vlan configured case.
>> >> >>
>> >> >> >
>> >> >> > eth0, vlan configured: see packets without vlan tag (see double tagged
>> >> >> > packets with one vlan tag)
>> >> >>
>> >> >> Both ASF and vlan group configured cause tag stripping to be enabled.
>> >> >> Missing tag.
>> >> >>
>> >> >> >
>> >> >> > eth1 same as originally reported:
>> >> >> > without vlan configured see vlan tags (single and double tagged as
>> >> >> > expected)
>> >> >>
>> >> >> No ASF and no vlan group means tag stripping is disabled. ?Have tag.
>> >> >>
>> >> >> > with vlan configured: see packets without vlan tag (see double tagged
>> >> >> > packets with one vlan tag)
>> >> >>
>> >> >> Configuring vlan group causes stripping to be enabled. ?Missing tag.
>> >> >>
>> >> >> >
>> >> >> >
>> >> >> > 2.6.37-rc5, your tg3 use new vlan-code patch:
>> >> >> >
>> >> >> > eth0, no vlan configured: ?see packets without vlan tag (see double
>> >> >> > tagged packets with one vlan tag)
>> >> >>
>> >> >> ASF enables tag stripping. ?Missing tag.
>> >> >>
>> >> >> > eth1, no vlan configured: see vlan tags (single and double tagged as
>> >> >> > expected)
>> >> >>
>> >> >> No ASF, no vlan group means no stripping. ?Have tag.
>> >> >>
>> >> >> >
>> >> >> >
>> >> >> > eth0, vlan configured: as without vlan
>> >> >>
>> >> >> ASF enables stripping. ?Missing tag.
>> >> >>
>> >> >> > eth1, vlan configured: as without vlan
>> >> >>
>> >> >> With this patch vlan stripping is only enabled when ASF is on, so no
>> >> >> stripping. ?Have tag.
>> >> >>
>> >> >> >
>> >> >> > 2.6.37-rc5, your tg3 use new vlan-code patch with test patch ontop
>> >> >> >
>> >> >> > eth1 no vlan configured: see packets without vlan tag (see double tagged
>> >> >> > packets with one vlan tag)
>> >> >>
>> >> >> With the second patch, vlan stripping is always enabled. ?Missing tag.
>> >> >>
>> >> >> > eth1 with vlan: the same
>> >> >>
>> >> >> Stripping still always enabled. ?Missing tag.
>> >> >>
>> >> >> The bottom line is whenever vlan stripping is enabled we're missing
>> >> >> the outer tag. ?It might be worth adding some debugging in the area
>> >> >> before napi_gro_receive/vlan_gro_receive (depending on version). ?My
>> >> >> guess is that (desc->type_flags & RXD_FLAG_VLAN) is false even for
>> >> >> vlan packets on this NIC.
>> >> >>
>> >> >> You said that everything works on the 5752? ?Matt, is it possible that
>> >> >> the 5714 either has a problem with vlan stripping or a different way
>> >> >> of reporting it?
>> >> >
>> >> > I don't think this is a 5714 specific issue. ?I think the problem is
>> >> > rooted in the fact that the VLAN tag stripping is enabled.
>> >>
>> >> It's definitely related to vlan stripping being enabled. ?Other cards
>> >> using tg3 seem to work fine with stripping though, which is why I
>> >> thought it might be specific to the 5714.
>> >
>> > I just tested this on a 5714S, using a net-next-2.6 snapshot obtained
>> > today. ?It does the right thing in both cases (2nd tg3 patch ommited /
>> > applied). ?The tag is always visible in the packet stream as seen from
>> > tcpdump.
>> >
>> >> > Your RXD_FLAG_VLAN idea sounds unlikely to me, but it's worth a check.
>> >> >
>> >> > The patch here is using __vlan_hwaccel_put_tag(), which informs the
>> >> > stack a VLAN tag is present. ?If this is indeed a reporting problem, I'm
>> >> > not sure what else the driver should be doing.
>> >>
>> >> The code to hand off the tag to the stack looks OK to me. ?Michael was
>> >> seeing this on older versions of the kernel as well with this NIC,
>> >> which predates both this patch and the larger vlan changes so it
>> >> doesn't seem like a problem with passing the tag to the network stack.
>> >> ?It's hard to know exactly what is going on though without seeing what
>> >> the hardware is reporting.
>> >
>> > When RX_MODE_KEEP_VLAN_TAG is set, the RXD_FLAG_VLAN flag will not be set
>> > when receiving a packet. ?The driver skips the __vlan_hwaccel_put_tag()
>> > call.
>> >
>> > When RX_MODE_KEEP_VLAN_TAG is unset, the RXD_FLAG_VLAN flag is set, and
>> > __vlan_hwaccel_put_tag() is called to reinject the packet.
>>
>> OK, thanks for testing it out.  I'm not sure that there's anything
>> more we can do without hearing from Michael.
>
> In the meantime, I think what we have should go upstream.  Just to be
> absolutely clear though, your position is that VLAN tags should always
> be stripped?

That's what the other converted drivers do by default (though most of
them also provide an ethtool set_flags() method to change this).  It's
generally the most efficient and is now safe to do in all cases.  It's
also the consistent with what was happening before, since stripping
was enabled when a vlan device was configured.  So, yes, normally I
think stripping should be enabled.

I assumed that disabling stripping in most situations was just an
oversight.  Was there a reason why you feel it is better not to use
it?

^ permalink raw reply

* Tagged/untagged and gretap bridging question.
From: Jonathan Thibault @ 2011-01-13 22:07 UTC (permalink / raw)
  To: netdev

Greetings list,

Assuming the following network setup of three locations linked by two
ethernet over gre (gretap) tunnels.

I am assuming that a broadcast on the local network, if it comes
untagged to eth0 will reach both network1 and network2 untagged.

My main question is about a broadcast happening in the tagged portion of
(local network).  Is there a chance for an ethernet broadcast in vlan 1
on the local network to reach remote network 2?  I'm thinking not, but
if I tcpdump an interface that has vlans enabled, I will see the tagged
packets on eth0.  As such I wonder if they will travel through br0 to
the remote locations as well, something I would rather avoid.

(local network)
|                                              (remote network 1)
| eth0.1 <--br1--> gre1.1                                       |
+-eth0   <--br0--> gre1-- (l3_to_host1) -- gre0 <--br0--> eth0-+
            |
            +----> gre2 -- (l3-to_host2) -- gre0 <--br0--> eth0-+
  eth0.2 <--br2--> gre2.2                                       |
                                               (remote network 2)

Of interest too is knowing if the tags will survive all the way to
remote networks or if I need to enable vlans on the remote gretap and
ethernet interfaces as well for them to work.

Alternatively, the setup would look like this:

(local network)
|                                              (remote network 1)
| eth0.1 <--br1--> gre1.1                                       |
| eth0.3 <--br0--> gre1-- (l3_to_host1) -- gre0 <--br0--> eth0-+
+-eth0
  eth0.4 <--br3-->gre2 -- (l3-to_host2) -- gre0 <--br0--> eth0-+
  eth0.2 <--br2--> gre2.2                                       |
                                               (remote network 2)

The goal being not to see any untagged frames coming out on the local
network from remote locations and instead having them appear in specific
local vlans.

So at the core of my questions really is this:  Will bridging the
untagged portion of an interface that has vlans enabled (eth0 when
eth0.x exists) let tagged frames go through to other members of the bridge?

Thanks for your collective wisdom,

Jonathan

P.S.:  Please include me in the CC, I am not currently a member of the list.

^ permalink raw reply

* Re: [net-next-2.6 PATCH v7 1/2] net: implement mechanism for HW based QOS
From: Ben Hutchings @ 2011-01-13 22:30 UTC (permalink / raw)
  To: John Fastabend
  Cc: davem, jarkao2, hadi, eric.dumazet, shemminger, tgraf, nhorman,
	netdev
In-Reply-To: <20110107224543.19830.74009.stgit@jf-dev1-dcblab>

I'm actually having a go at implementing this, as a result of which I
have some more questions:

On Fri, 2011-01-07 at 14:45 -0800, John Fastabend wrote:
[...]
> With the mechanism in this patch users can set skb priority using
> expected methods ie setsockopt() or the stack can set the priority
> directly. Then the skb will be steered to the correct tx queues
> aligned with hardware QOS traffic classes. In the normal case with
> a single traffic class and all queues in this class everything
> works as is until the LLD enables multiple tcs.
> 
> To steer the skb we mask out the lower 4 bits of the priority
> and allow the hardware to configure upto 15 distinct classes
> of traffic. This is expected to be sufficient for most applications
> at any rate it is more then the 8021Q spec designates and is
> equal to the number of prio bands currently implemented in
> the default qdisc.

What is the meaning of a class number?  Is it simply higher number =>
higher priority?  If there are exactly 8 classes, can they be assumed to
match the 802.1q priority classes?

> This in conjunction with a userspace application such as
> lldpad can be used to implement 8021Q transmission selection
> algorithms one of these algorithms being the extended transmission
> selection algorithm currently being used for DCB.

Should an Ethernet driver/hardware insert a 802.1q priority tag if it
implements this, or should that be left to higher levels?

> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
> 
>  include/linux/netdevice.h |   65 +++++++++++++++++++++++++++++++++++++++++++++
>  net/core/dev.c            |   61 ++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 125 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 0f6b1c9..b1dbbed 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
[...]
> @@ -756,6 +764,7 @@ struct xps_dev_maps {
>   * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
>   *			  struct nlattr *port[]);
>   * int (*ndo_get_vf_port)(struct net_device *dev, int vf, struct sk_buff *skb);
> + * void (*ndo_setup_tc)(struct net_device *dev, u8 tc, unsigned int txq)
[...]

This is not documentation, it is just repetition!

Please specify what the parameters mean.  In particular, what is the
purpose of the txq parameter; can it be different from
dev->real_num_tx_queues?

Is this operation allowed to change the number of TX queues?  I was
looking at scaling the number of queues according to the number of
classes.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [patch] bridge-utils: show selected bridge
From: Stephen Hemminger @ 2011-01-13 22:48 UTC (permalink / raw)
  To: Anton Danilov; +Cc: netdev
In-Reply-To: <AANLkTi=2TGTSnFLAV0ucST7oWeaxHfchhxPd3y=F-dai@mail.gmail.com>

Applied, but please add commit message next time.

^ permalink raw reply

* [PATCH] r8169: keep firmware in memory.
From: Francois Romieu @ 2011-01-13 23:07 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Jarek Kamiński, Hayes, Ben Hutchings, Linus Torvalds

The firmware agent is not available during resume. Loading the firmware
during open() (see eee3a96c6368f47df8df5bd4ed1843600652b337) is not
enough.

close() is run during resume through rtl8169_reset_task(), whence the
mildly natural release of firmware in the driver removal method instead.

It will help with http://bugs.debian.org/609538. It will not avoid
the 60 seconds delay when:
- there is no firmware
- the driver is loaded and the device is not up before a suspend/resume

Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Tested-by: Jarek Kamiński <jarek@vilo.eu.org>
Cc: Hayes <hayeswang@realtek.com>
Cc: Ben Hutchings <benh@debian.org>
---
 drivers/net/r8169.c |   43 +++++++++++++++++++++++++++++++------------
 1 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index bb8645a..bde7d61 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -554,6 +554,8 @@ struct rtl8169_private {
 	struct mii_if_info mii;
 	struct rtl8169_counters counters;
 	u32 saved_wolopts;
+
+	const struct firmware *fw;
 };
 
 MODULE_AUTHOR("Realtek and the Linux r8169 crew <netdev@vger.kernel.org>");
@@ -1766,6 +1768,29 @@ rtl_phy_write_fw(struct rtl8169_private *tp, const struct firmware *fw)
 	}
 }
 
+static void rtl_release_firmware(struct rtl8169_private *tp)
+{
+	release_firmware(tp->fw);
+	tp->fw = NULL;
+}
+
+static int rtl_apply_firmware(struct rtl8169_private *tp, const char *fw_name)
+{
+	const struct firmware **fw = &tp->fw;
+	int rc = !*fw;
+
+	if (rc) {
+		rc = request_firmware(fw, fw_name, &tp->pci_dev->dev);
+		if (rc < 0)
+			goto out;
+	}
+
+	/* TODO: release firmware once rtl_phy_write_fw signals failures. */
+	rtl_phy_write_fw(tp, *fw);
+out:
+	return rc;
+}
+
 static void rtl8169s_hw_phy_config(struct rtl8169_private *tp)
 {
 	static const struct phy_reg phy_reg_init[] = {
@@ -2139,7 +2164,6 @@ static void rtl8168d_1_hw_phy_config(struct rtl8169_private *tp)
 		{ 0x0d, 0xf880 }
 	};
 	void __iomem *ioaddr = tp->mmio_addr;
-	const struct firmware *fw;
 
 	rtl_writephy_batch(tp, phy_reg_init_0, ARRAY_SIZE(phy_reg_init_0));
 
@@ -2203,11 +2227,8 @@ static void rtl8168d_1_hw_phy_config(struct rtl8169_private *tp)
 
 	rtl_writephy(tp, 0x1f, 0x0005);
 	rtl_writephy(tp, 0x05, 0x001b);
-	if (rtl_readphy(tp, 0x06) == 0xbf00 &&
-	    request_firmware(&fw, FIRMWARE_8168D_1, &tp->pci_dev->dev) == 0) {
-		rtl_phy_write_fw(tp, fw);
-		release_firmware(fw);
-	} else {
+	if ((rtl_readphy(tp, 0x06) != 0xbf00) ||
+	    (rtl_apply_firmware(tp, FIRMWARE_8168D_1) < 0)) {
 		netif_warn(tp, probe, tp->dev, "unable to apply firmware patch\n");
 	}
 
@@ -2257,7 +2278,6 @@ static void rtl8168d_2_hw_phy_config(struct rtl8169_private *tp)
 		{ 0x0d, 0xf880 }
 	};
 	void __iomem *ioaddr = tp->mmio_addr;
-	const struct firmware *fw;
 
 	rtl_writephy_batch(tp, phy_reg_init_0, ARRAY_SIZE(phy_reg_init_0));
 
@@ -2312,11 +2332,8 @@ static void rtl8168d_2_hw_phy_config(struct rtl8169_private *tp)
 
 	rtl_writephy(tp, 0x1f, 0x0005);
 	rtl_writephy(tp, 0x05, 0x001b);
-	if (rtl_readphy(tp, 0x06) == 0xb300 &&
-	    request_firmware(&fw, FIRMWARE_8168D_2, &tp->pci_dev->dev) == 0) {
-		rtl_phy_write_fw(tp, fw);
-		release_firmware(fw);
-	} else {
+	if ((rtl_readphy(tp, 0x06) != 0xb300) ||
+	    (rtl_apply_firmware(tp, FIRMWARE_8168D_2) < 0)) {
 		netif_warn(tp, probe, tp->dev, "unable to apply firmware patch\n");
 	}
 
@@ -3200,6 +3217,8 @@ static void __devexit rtl8169_remove_one(struct pci_dev *pdev)
 
 	cancel_delayed_work_sync(&tp->task);
 
+	rtl_release_firmware(tp);
+
 	unregister_netdev(dev);
 
 	if (pci_dev_run_wake(pdev))
-- 
1.7.3.4


^ permalink raw reply related

* [PATCH] bluetooth: Fix failure to release lock in read_index_list() when mem alloc fails.
From: Jesper Juhl @ 2011-01-13 23:18 UTC (permalink / raw)
  To: linux-bluetooth-u79uwXL29TY76Z2rM5mHXA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, David S. Miller,
	Gustavo F. Padovan, Marcel Holtmann

If alloc_skb() fails in read_index_list() we'll return -ENOMEM without 
releasing 'hci_dev_list_lock'.

Signed-off-by: Jesper Juhl <jj-IYz4IdjRLj0sV2N9l4h3zg@public.gmane.org>
---
 mgmt.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index f827fd9..ace8726 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
@@ -111,8 +111,10 @@ static int read_index_list(struct sock *sk)
 
 	body_len = sizeof(*ev) + sizeof(*rp) + (2 * count);
 	skb = alloc_skb(sizeof(*hdr) + body_len, GFP_ATOMIC);
-	if (!skb)
+	if (!skb) {
+		read_unlock(&hci_dev_list_lock);
 		return -ENOMEM;
+	}
 
 	hdr = (void *) skb_put(skb, sizeof(*hdr));
 	hdr->opcode = cpu_to_le16(MGMT_EV_CMD_COMPLETE);


-- 
Jesper Juhl <jj-IYz4IdjRLj0sV2N9l4h3zg@public.gmane.org>            http://www.chaosbits.net/
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please.

^ permalink raw reply related

* [PATCH] CHOKe flow scheduler (0.8)
From: Stephen Hemminger @ 2011-01-13 23:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1294951069.3403.11.camel@edumazet-laptop>

CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
packet scheduler based on the Random Exponential Drop (RED) algorithm.

The core idea is:
  For every packet arrival:
  	Calculate Qave
	if (Qave < minth) 
	     Queue the new packet
	else 
	     Select randomly a packet from the queue 
	     if (both packets from same flow)
	     then Drop both the packets
	     else if (Qave > maxth)
	          Drop packet
	     else
	       	  Admit packet with probability P (same as RED)

See also:
  Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
   queue management scheme for approximating fair bandwidth allocation", 
  Proceeding of INFOCOM'2000, March 2000.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
0.8 change queue length and holes account.
    keep sch->q.qlen updated, and holes counter not needed.

---
 net/sched/Kconfig     |   11 +
 net/sched/Makefile    |    1 
 net/sched/sch_choke.c |  536 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 548 insertions(+)

--- a/net/sched/Kconfig	2011-01-13 15:19:41.542022830 -0800
+++ b/net/sched/Kconfig	2011-01-13 15:20:53.586380066 -0800
@@ -205,6 +205,17 @@ config NET_SCH_DRR
 
 	  If unsure, say N.
 
+config NET_SCH_CHOKE
+	tristate "CHOose and Keep responsive flow scheduler (CHOKE)"
+	help
+	  Say Y here if you want to use the CHOKe packet scheduler (CHOose
+	  and Keep for responsive flows, CHOose and Kill for unresponsive
+	  flows). This is a variation of RED which trys to penalize flows
+	  that monopolize the queue.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_choke.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
--- a/net/sched/Makefile	2011-01-13 15:19:41.578022995 -0800
+++ b/net/sched/Makefile	2011-01-13 15:20:53.586380066 -0800
@@ -32,6 +32,7 @@ obj-$(CONFIG_NET_SCH_MULTIQ)	+= sch_mult
 obj-$(CONFIG_NET_SCH_ATM)	+= sch_atm.o
 obj-$(CONFIG_NET_SCH_NETEM)	+= sch_netem.o
 obj-$(CONFIG_NET_SCH_DRR)	+= sch_drr.o
+obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
 obj-$(CONFIG_NET_CLS_FW)	+= cls_fw.o
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/net/sched/sch_choke.c	2011-01-13 15:26:18.771992614 -0800
@@ -0,0 +1,552 @@
+/*
+ * net/sched/sch_choke.c	CHOKE scheduler
+ *
+ * Copyright (c) 2011 Stephen Hemminger <shemminger@vyatta.com>
+ * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/reciprocal_div.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <net/red.h>
+
+/*	CHOKe stateless AQM for fair bandwidth allocation
+        =================================================
+
+   CHOKe (CHOose and Keep for responsive flows, CHOose and Kill for
+   unresponsive flows) is a variant of RED that penalizes misbehaving flows but
+   maintains no flow state. The difference from RED is an additional step
+   during the enqueuing process. If average queue size is over the
+   low threshold (qmin), a packet is chosen at random from the queue.
+   If both the new and chosen packet are from the same flow, both
+   are dropped. Unlike RED, CHOKe is not really a "classful" qdisc because it
+   needs to access packets in queue randomly. It has a minimal class
+   interface to allow overriding the builtin flow classifier with
+   filters.
+
+   Source:
+   R. Pan, B. Prabhakar, and K. Psounis, "CHOKe, A Stateless
+   Active Queue Management Scheme for Approximating Fair Bandwidth Allocation",
+   IEEE INFOCOM, 2000.
+
+   A. Tang, J. Wang, S. Low, "Understanding CHOKe: Throughput and Spatial
+   Characteristics", IEEE/ACM Transactions on Networking, 2004
+
+ */
+
+/* Upper bound on size of sk_buff table */
+#define CHOKE_MAX_QUEUE	(128*1024 - 1)
+
+struct choke_sched_data {
+/* Parameters */
+	u32		 limit;
+	unsigned char	 flags;
+
+	struct red_parms parms;
+
+/* Variables */
+	struct tcf_proto *filter_list;
+	struct {
+		u32	prob_drop;	/* Early probability drops */
+		u32	prob_mark;	/* Early probability marks */
+		u32	forced_drop;	/* Forced drops, qavg > max_thresh */
+		u32	forced_mark;	/* Forced marks, qavg > max_thresh */
+		u32	pdrop;          /* Drops due to queue limits */
+		u32	other;          /* Drops due to drop() calls */
+		u32	matched;	/* Drops to flow match */
+	} stats;
+
+	unsigned int	 head;
+	unsigned int	 tail;
+
+	unsigned int	 tab_mask; /* size - 1 */
+
+	struct sk_buff **tab;
+};
+
+/* deliver a random number between 0 and N - 1 */
+static inline u32 random_N(unsigned int N)
+{
+	return reciprocal_divide(random32(), N);
+}
+
+/* Select a packet at random from the queue in O(1) and handle holes */
+static struct sk_buff *choke_peek_random(struct Qdisc *sch,
+					 unsigned int *pidx)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+	int retrys = 3;
+
+	do {
+		*pidx = (q->head + random_N(sch->q.qlen)) & q->tab_mask;
+		skb = q->tab[*pidx];
+		if (skb)
+			return skb;
+	} while (--retrys > 0);
+
+	/* queue is has lots of holes use the head which is known to exist */
+	return q->tab[*pidx = q->head];
+}
+
+/* Is ECN parameter configured */
+static inline int use_ecn(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_ECN;
+}
+
+/* Should packets over max just be dropped (versus marked) */
+static inline int use_harddrop(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_HARDDROP;
+}
+
+/* Move head pointer forward to skip over holes */
+static void choke_zap_head_holes(struct choke_sched_data *q)
+{
+	while (q->tab[q->head] == NULL) {
+		q->head = (q->head + 1) & q->tab_mask;
+
+		BUG_ON(q->head == q->tail);
+	}
+}
+
+/* Move tail pointer backwards to reuse holes */
+static void choke_zap_tail_holes(struct choke_sched_data *q)
+{
+	while (q->tab[q->tail - 1] == NULL) {
+		q->tail = (q->tail - 1) & q->tab_mask;
+		BUG_ON(q->head == q->tail);
+	}
+}
+
+/* Drop packet from queue array by creating a "hole" */
+static void choke_drop_by_idx(struct choke_sched_data *q, unsigned int idx)
+{
+	q->tab[idx] = NULL;
+
+	if (idx == q->head)
+		choke_zap_head_holes(q);
+	if (idx == q->tail)
+		choke_zap_tail_holes(q);
+}
+
+/* Classify flow using either:
+   1. pre-existing classification result in skb
+   2. fast internal classification
+   3. use TC filter based classification
+*/
+static inline unsigned int choke_classify(struct sk_buff *skb,
+					  struct Qdisc *sch, int *qerr)
+
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tcf_result res;
+	int result;
+
+	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+
+	if (TC_H_MAJ(skb->priority) == sch->handle &&
+	    TC_H_MIN(skb->priority) > 0)
+		return TC_H_MIN(skb->priority);
+
+	if (!q->filter_list)
+		return skb_get_rxhash(skb);
+
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return 0;
+		}
+#endif
+		return TC_H_MIN(res.classid);
+	}
+
+	return 0;
+}
+
+static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct red_parms *p = &q->parms;
+	unsigned int hash;
+	int uninitialized_var(ret);
+
+	hash = choke_classify(skb, sch, &ret);
+	if (!hash) {
+		/* Packet was eaten by filter */
+		if (ret & __NET_XMIT_BYPASS)
+			sch->qstats.drops++;
+		kfree_skb(skb);
+		return ret;
+	}
+
+	/* Maybe add hash as field in struct qdisc_skb_cb? */
+	*(unsigned int *)(qdisc_skb_cb(skb)->data) = hash;
+
+	/* Compute average queue usage (see RED) */
+	p->qavg = red_calc_qavg(p, sch->q.qlen);
+	if (red_is_idling(p))
+		red_end_of_idle_period(p);
+
+	/* Is queue small? */
+	if (p->qavg <= p->qth_min)
+		p->qcount = -1;
+	else {
+		struct sk_buff *oskb;
+		unsigned int idx;
+
+		/* Draw a packet at random from queue */
+		oskb = choke_peek_random(sch, &idx);
+
+		/* Both packets from same flow ? */
+		if (*(unsigned int *)(qdisc_skb_cb(oskb)->data) == hash) {
+			/* Drop both packets */
+			q->stats.matched++;
+			choke_drop_by_idx(q, idx);
+			sch->qstats.backlog -= qdisc_pkt_len(skb);
+			--sch->q.qlen;
+			qdisc_drop(oskb, sch);
+			goto congestion_drop;
+		}
+
+		/* Queue is large, always mark/drop */
+		if (p->qavg > p->qth_max) {
+			p->qcount = -1;
+
+			sch->qstats.overlimits++;
+			if (use_harddrop(q) || !use_ecn(q) ||
+			    !INET_ECN_set_ce(skb)) {
+				q->stats.forced_drop++;
+				goto congestion_drop;
+			}
+
+			q->stats.forced_mark++;
+		} else if (++p->qcount) {
+			if (red_mark_probability(p, p->qavg)) {
+				p->qcount = 0;
+				p->qR = red_random(p);
+
+				sch->qstats.overlimits++;
+				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
+					q->stats.prob_drop++;
+					goto congestion_drop;
+				}
+
+				q->stats.prob_mark++;
+			}
+		} else
+			p->qR = red_random(p);
+	}
+
+	/* Admit new packet */
+	if (sch->q.qlen < q->limit) {
+		q->tab[q->tail] = skb;
+		q->tail = (q->tail + 1) & q->tab_mask;
+		++sch->q.qlen;
+		sch->qstats.backlog += qdisc_pkt_len(skb);
+		qdisc_bstats_update(sch, skb);
+		return NET_XMIT_SUCCESS;
+	}
+
+	q->stats.pdrop++;
+	sch->qstats.drops++;
+	kfree_skb(skb);
+	return NET_XMIT_DROP;
+
+ congestion_drop:
+	qdisc_drop(skb, sch);
+	return NET_XMIT_CN;
+}
+
+static struct sk_buff *choke_dequeue(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+
+	if (q->head == q->tail) {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+		return NULL;
+	}
+
+	skb = q->tab[q->head];
+	q->tab[q->head] = NULL; /* not really needed */
+	q->head = (q->head + 1) & q->tab_mask;
+	choke_zap_head_holes(q);
+	--sch->q.qlen;
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+
+	return skb;
+}
+
+static unsigned int choke_drop(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	unsigned int len;
+
+	len = qdisc_queue_drop(sch);
+	if (len > 0)
+		q->stats.other++;
+	else {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+	}
+
+	return len;
+}
+
+static void choke_reset(struct Qdisc* sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	red_restart(&q->parms);
+}
+
+static const struct nla_policy choke_policy[TCA_CHOKE_MAX + 1] = {
+	[TCA_CHOKE_PARMS]	= { .len = sizeof(struct tc_red_qopt) },
+	[TCA_CHOKE_STAB]	= { .len = 256 },
+};
+
+
+static void choke_free(void *addr)
+{
+	if (addr) {
+		if (is_vmalloc_addr(addr))
+			vfree(addr);
+		else
+			kfree(addr);
+	}
+}
+
+static int choke_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_CHOKE_MAX + 1];
+	struct tc_red_qopt *ctl;
+	int err;
+	struct sk_buff **old = NULL;
+	unsigned int mask;
+
+	if (opt == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CHOKE_MAX, opt, choke_policy);
+	if (err < 0)
+		return err;
+
+	if (tb[TCA_CHOKE_PARMS] == NULL ||
+	    tb[TCA_CHOKE_STAB] == NULL)
+		return -EINVAL;
+
+	ctl = nla_data(tb[TCA_CHOKE_PARMS]);
+
+	if (ctl->limit > CHOKE_MAX_QUEUE)
+		return -EINVAL;
+
+	mask = roundup_pow_of_two(ctl->limit + 1) - 1;
+	if (mask != q->tab_mask) {
+		struct sk_buff **ntab;
+
+		ntab = kcalloc(mask + 1, sizeof(struct sk_buff *), GFP_KERNEL);
+		if (!ntab)
+			ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *));
+		if (!ntab)
+			return -ENOMEM;
+
+		sch_tree_lock(sch);
+		old = q->tab;
+		if (old) {
+			unsigned int tail = 0;
+
+			while (q->head != q->tail) {
+				ntab[tail++] = q->tab[q->head];
+				q->head = (q->head + 1) & q->tab_mask;
+			}
+			q->head = 0;
+			q->tail = tail;
+		}
+
+		q->tab_mask = mask;
+		q->tab = ntab;
+	} else
+		sch_tree_lock(sch);
+
+	q->flags = ctl->flags;
+	q->limit = ctl->limit;
+
+	red_set_parms(&q->parms, ctl->qth_min, ctl->qth_max, ctl->Wlog,
+		      ctl->Plog, ctl->Scell_log,
+		      nla_data(tb[TCA_CHOKE_STAB]));
+
+	if (q->head == q->tail)
+		red_end_of_idle_period(&q->parms);
+
+	sch_tree_unlock(sch);
+	choke_free(old);
+	return 0;
+}
+
+static int choke_init(struct Qdisc* sch, struct nlattr *opt)
+{
+	return choke_change(sch, opt);
+}
+
+static int choke_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts = NULL;
+	struct tc_red_qopt opt = {
+		.limit		= q->limit,
+		.flags		= q->flags,
+		.qth_min	= q->parms.qth_min >> q->parms.Wlog,
+		.qth_max	= q->parms.qth_max >> q->parms.Wlog,
+		.Wlog		= q->parms.Wlog,
+		.Plog		= q->parms.Plog,
+		.Scell_log	= q->parms.Scell_log,
+	};
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+
+	NLA_PUT(skb, TCA_CHOKE_PARMS, sizeof(opt), &opt);
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -EMSGSIZE;
+}
+
+static int choke_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tc_choke_xstats st = {
+		.early	= q->stats.prob_drop + q->stats.forced_drop,
+		.marked	= q->stats.prob_mark + q->stats.forced_mark,
+		.pdrop	= q->stats.pdrop,
+		.other	= q->stats.other,
+		.matched = q->stats.matched,
+	};
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static void choke_destroy(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	choke_free(q->tab);
+}
+
+static struct Qdisc *choke_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	return NULL;
+}
+
+static unsigned long choke_get(struct Qdisc *sch, u32 classid)
+{
+	return 0;
+}
+
+static void choke_put(struct Qdisc *q, unsigned long cl)
+{
+}
+
+static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
+				u32 classid)
+{
+	return 0;
+}
+
+static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static int choke_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	return 0;
+}
+
+static void choke_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	if (!arg->stop) {
+		if (arg->fn(sch, 1, arg) < 0) {
+			arg->stop = 1;
+			return;
+		}
+		arg->count++;
+	}
+}
+
+static const struct Qdisc_class_ops choke_class_ops = {
+	.leaf		=	choke_leaf,
+	.get		=	choke_get,
+	.put		=	choke_put,
+	.tcf_chain	=	choke_find_tcf,
+	.bind_tcf	=	choke_bind,
+	.unbind_tcf	=	choke_put,
+	.dump		=	choke_dump_class,
+	.walk		=	choke_walk,
+};
+
+static struct sk_buff *choke_peek_head(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	return (q->head != q->tail) ? q->tab[q->head] : NULL;
+}
+
+static struct Qdisc_ops choke_qdisc_ops __read_mostly = {
+	.id		=	"choke",
+	.priv_size	=	sizeof(struct choke_sched_data),
+
+	.enqueue	=	choke_enqueue,
+	.dequeue	=	choke_dequeue,
+	.peek		=	choke_peek_head,
+	.drop		=	choke_drop,
+	.init		=	choke_init,
+	.destroy	=	choke_destroy,
+	.reset		=	choke_reset,
+	.change		=	choke_change,
+	.dump		=	choke_dump,
+	.dump_stats	=	choke_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init choke_module_init(void)
+{
+	return register_qdisc(&choke_qdisc_ops);
+}
+
+static void __exit choke_module_exit(void)
+{
+	unregister_qdisc(&choke_qdisc_ops);
+}
+
+module_init(choke_module_init)
+module_exit(choke_module_exit)
+
+MODULE_LICENSE("GPL");
--- a/include/linux/pkt_sched.h	2011-01-13 15:19:41.726023725 -0800
+++ b/include/linux/pkt_sched.h	2011-01-13 15:20:53.590380094 -0800
@@ -247,6 +247,35 @@ struct tc_gred_sopt {
 	__u16		pad1;
 };
 
+/* CHOKe section */
+
+enum {
+	TCA_CHOKE_UNSPEC,
+	TCA_CHOKE_PARMS,
+	TCA_CHOKE_STAB,
+	__TCA_CHOKE_MAX,
+};
+
+#define TCA_CHOKE_MAX (__TCA_CHOKE_MAX - 1)
+
+struct tc_choke_qopt {
+	__u32		limit;		/* HARD maximal queue length (packets)	*/
+	__u32		qth_min;	/* Min average length threshold (packets) */
+	__u32		qth_max;	/* Max average length threshold (packets) */
+	unsigned char   Wlog;		/* log(W)		*/
+	unsigned char   Plog;		/* log(P_max/(qth_max-qth_min))	*/
+	unsigned char   Scell_log;	/* cell size for idle damping */
+	unsigned char	flags;		/* see RED flags */
+};
+
+struct tc_choke_xstats {
+	__u32           early;          /* Early drops */
+	__u32           pdrop;          /* Drops due to queue limits */
+	__u32           other;          /* Drops due to drop() calls */
+	__u32           marked;         /* Marked packets */
+	__u32		matched;	/* Drops due to flow match */
+};
+
 /* HTB section */
 #define TC_HTB_NUMPRIO		8
 #define TC_HTB_MAXDEPTH		8

^ permalink raw reply

* Re: Flow Control and Port Mirroring Revisited
From: Simon Horman @ 2011-01-13 23:41 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Eric Dumazet, Rusty Russell, virtualization, dev, virtualization,
	netdev, kvm, Michael S. Tsirkin
In-Reply-To: <AANLkTimO=5HmTJO1kmHGAWa-HTac+3d0TbrmJX5W4hVu@mail.gmail.com>

On Thu, Jan 13, 2011 at 10:45:38AM -0500, Jesse Gross wrote:
> On Thu, Jan 13, 2011 at 1:47 AM, Simon Horman <horms@verge.net.au> wrote:
> > On Mon, Jan 10, 2011 at 06:31:55PM +0900, Simon Horman wrote:
> >> On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
> >> > On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:
> >> >
> >> > [ snip ]
> >> > >
> >> > > I know that everyone likes a nice netperf result but I agree with
> >> > > Michael that this probably isn't the right question to be asking.  I
> >> > > don't think that socket buffers are a real solution to the flow
> >> > > control problem: they happen to provide that functionality but it's
> >> > > more of a side effect than anything.  It's just that the amount of
> >> > > memory consumed by packets in the queue(s) doesn't really have any
> >> > > implicit meaning for flow control (think multiple physical adapters,
> >> > > all with the same speed instead of a virtual device and a physical
> >> > > device with wildly different speeds).  The analog in the physical
> >> > > world that you're looking for would be Ethernet flow control.
> >> > > Obviously, if the question is limiting CPU or memory consumption then
> >> > > that's a different story.
> >> >
> >> > Point taken. I will see if I can control CPU (and thus memory) consumption
> >> > using cgroups and/or tc.
> >>
> >> I have found that I can successfully control the throughput using
> >> the following techniques
> >>
> >> 1) Place a tc egress filter on dummy0
> >>
> >> 2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then eth1,
> >>    this is effectively the same as one of my hacks to the datapath
> >>    that I mentioned in an earlier mail. The result is that eth1
> >>    "paces" the connection.
> >
> > Further to this, I wonder if there is any interest in providing
> > a method to switch the action order - using ovs-ofctl is a hack imho -
> > and/or switching the default action order for mirroring.
> 
> I'm not sure that there is a way to do this that is correct in the
> generic case.  It's possible that the destination could be a VM while
> packets are being mirrored to a physical device or we could be
> multicasting or some other arbitrarily complex scenario.  Just think
> of what a physical switch would do if it has ports with two different
> speeds.

Yes, I have considered that case. And I agree that perhaps there
is no sensible default. But perhaps we could make it configurable somehow?

^ permalink raw reply

* [PATCH] ethtool : Add option -L | --set-common to set common flags.
From: Mahesh Bandewar @ 2011-01-14  0:11 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: David Miller, Tom Herbert, Laurent Chavey, netdev,
	Mahesh Bandewar

This patch adds -L | --set-common option to add / remove common flags which
includes loopback flag. The -l | --show-common displays the current values
for these common flags.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 ethtool-copy.h |    1 +
 ethtool.8      |   16 ++++++++++
 ethtool.c      |   90 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 105 insertions(+), 2 deletions(-)

diff --git a/ethtool-copy.h b/ethtool-copy.h
index 75c3ae7..5fd18c7 100644
--- a/ethtool-copy.h
+++ b/ethtool-copy.h
@@ -309,6 +309,7 @@ struct ethtool_perm_addr {
  * flag differs from the read-only value.
  */
 enum ethtool_flags {
+	ETH_FLAG_LOOPBACK	= (1 << 2),	/* Loopback enable / disable */
 	ETH_FLAG_TXVLAN		= (1 << 7),	/* TX VLAN offload enabled */
 	ETH_FLAG_RXVLAN		= (1 << 8),	/* RX VLAN offload enabled */
 	ETH_FLAG_LRO		= (1 << 15),	/* LRO is enabled */
diff --git a/ethtool.8 b/ethtool.8
index 1760924..cf7128f 100644
--- a/ethtool.8
+++ b/ethtool.8
@@ -174,6 +174,13 @@ ethtool \- Display or change ethernet card settings
 .B2 txvlan on off
 .B2 rxhash on off
 
+.B ethtool \-l|\-\-show\-common
+.I ethX
+
+.B ethtool \-L|\-\-set\-common
+.I ethX
+.B2 loopback on off
+
 .B ethtool \-p|\-\-identify
 .I ethX
 .RI [ N ]
@@ -406,6 +413,15 @@ Specifies whether TX VLAN acceleration should be enabled
 .A2 rxhash on off
 Specifies whether receive hashing offload should be enabled
 .TP
+.B \-l \-\-show\-common
+Queries the specified ethernet device for common flag settings.
+.TP
+.B \-L \-\-set\-common
+Changes the common parameters of the specified ethernet device.
+.TP
+.A2 loopback on off
+Specifies whether loopback should be enabled.
+.TP
 .B \-p \-\-identify
 Initiates adapter-specific action intended to enable an operator to
 easily identify the adapter by sight.  Typically this involves
diff --git a/ethtool.c b/ethtool.c
index 63e0ead..1a0c10c 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -97,6 +97,8 @@ static int do_gcoalesce(int fd, struct ifreq *ifr);
 static int do_scoalesce(int fd, struct ifreq *ifr);
 static int do_goffload(int fd, struct ifreq *ifr);
 static int do_soffload(int fd, struct ifreq *ifr);
+static int do_gcommon(int fd, struct ifreq *ifr);
+static int do_scommon(int fd, struct ifreq *ifr);
 static int do_gstats(int fd, struct ifreq *ifr);
 static int rxflow_str_to_type(const char *str);
 static int parse_rxfhashopts(char *optstr, u32 *data);
@@ -142,6 +144,8 @@ static enum {
 	MODE_GNTUPLE,
 	MODE_FLASHDEV,
 	MODE_PERMADDR,
+	MODE_GCOMMON,
+	MODE_SCOMMON,
 } mode = MODE_GSET;
 
 static struct option {
@@ -211,6 +215,10 @@ static struct option {
 		"		[ ntuple on|off ]\n"
 		"		[ rxhash on|off ]\n"
     },
+    { "-l", "--show-common", MODE_GCOMMON, "Get common flags information" },
+    { "-L", "--set-common", MODE_SCOMMON, "Set common flags",
+		"               [ loopback on|off ]\n"
+    },
     { "-i", "--driver", MODE_GDRV, "Show driver information" },
     { "-d", "--register-dump", MODE_GREGS, "Do a register dump",
 		"		[ raw on|off ]\n"
@@ -309,6 +317,10 @@ static u32 off_flags_wanted = 0;
 static u32 off_flags_mask = 0;
 static int off_gro_wanted = -1;
 
+static int gcommon_changed = 0;
+static u32 common_flags_wanted = 0;
+static u32 common_flags_mask = 0;
+
 static struct ethtool_pauseparam epause;
 static int gpause_changed = 0;
 static int pause_autoneg_wanted = -1;
@@ -482,6 +494,11 @@ static struct cmdline_info cmdline_offload[] = {
 	  ETH_FLAG_RXHASH, &off_flags_mask },
 };
 
+static struct cmdline_info cmdline_commonflags[] = {
+	{ "loopback", CMDL_FLAG, &common_flags_wanted, NULL,
+	  ETH_FLAG_LOOPBACK, &common_flags_mask },
+};
+
 static struct cmdline_info cmdline_pause[] = {
 	{ "autoneg", CMDL_BOOL, &pause_autoneg_wanted, &epause.autoneg },
 	{ "rx", CMDL_BOOL, &pause_rx_wanted, &epause.rx_pause },
@@ -829,6 +846,8 @@ static void parse_cmdline(int argc, char **argp)
 			    (mode == MODE_SRING) ||
 			    (mode == MODE_GOFFLOAD) ||
 			    (mode == MODE_SOFFLOAD) ||
+			    (mode == MODE_GCOMMON) ||
+			    (mode == MODE_SCOMMON) ||
 			    (mode == MODE_GSTATS) ||
 			    (mode == MODE_GNFC) ||
 			    (mode == MODE_SNFC) ||
@@ -919,6 +938,14 @@ static void parse_cmdline(int argc, char **argp)
 				i = argc;
 				break;
 			}
+			if (mode == MODE_SCOMMON) {
+				parse_generic_cmdline(argc, argp, i,
+					&gcommon_changed,
+			      		cmdline_commonflags,
+			      		ARRAY_SIZE(cmdline_offload));
+				i = argc;
+				break;
+			}
 			if (mode == MODE_SNTUPLE) {
 				if (!strcmp(argp[i], "flow-type")) {
 					i += 1;
@@ -1905,6 +1932,13 @@ static int dump_offload(int rx, int tx, int sg, int tso, int ufo, int gso,
 	return 0;
 }
 
+static int dump_common_flags(int loopback)
+{
+	fprintf(stdout, "loopback: %s\n", loopback ? "on" : "off");
+
+	return 0;
+}
+
 static int dump_rxfhash(int fhash, u64 val)
 {
 	switch (fhash) {
@@ -1998,6 +2032,10 @@ static int doit(void)
 		return do_goffload(fd, &ifr);
 	} else if (mode == MODE_SOFFLOAD) {
 		return do_soffload(fd, &ifr);
+	} else if (mode == MODE_GCOMMON) {
+		return do_gcommon(fd, &ifr);
+	} else if (mode == MODE_SCOMMON) {
+		return do_scommon(fd, &ifr);
 	} else if (mode == MODE_GSTATS) {
 		return do_gstats(fd, &ifr);
 	} else if (mode == MODE_GNFC) {
@@ -2219,6 +2257,53 @@ static int do_scoalesce(int fd, struct ifreq *ifr)
 	return 0;
 }
 
+static int do_gcommon(int fd, struct ifreq *ifr)
+{
+	struct ethtool_value eval;
+	int loopback = 0;
+
+	fprintf(stdout, "Common flags for %s:\n", devname);
+
+	eval.cmd = ETHTOOL_GFLAGS;
+	ifr->ifr_data = (caddr_t)&eval;
+	if (ioctl(fd, SIOCETHTOOL, ifr)) {
+		perror("Cannot get device flags");
+	} else {
+		loopback = (eval.data & ETH_FLAG_LOOPBACK) != 0;
+	}
+
+	return dump_common_flags(loopback);
+}
+
+static int do_scommon(int fd, struct ifreq *ifr)
+{
+	struct ethtool_value eval;
+
+	if (common_flags_mask) {
+		eval.cmd = ETHTOOL_GFLAGS;
+		eval.data = 0;
+		ifr->ifr_data = (caddr_t)&eval;
+		if (ioctl(fd, SIOCETHTOOL, ifr)) {
+			perror("Cannot get device common flags");
+			return 1;
+		}
+
+		eval.cmd = ETHTOOL_SFLAGS;
+		eval.data =
+		    ((eval.data & ~(common_flags_mask | off_flags_mask)) |
+		     (common_flags_wanted | off_flags_wanted));
+
+		if (ioctl(fd, SIOCETHTOOL, ifr)) {
+			perror("Cannot set device common flags");
+			return 1;
+		}
+	} else {
+		fprintf(stdout, "No common settings changed\n");
+	}
+
+	return 0;
+}
+
 static int do_goffload(int fd, struct ifreq *ifr)
 {
 	struct ethtool_value eval;
@@ -2407,8 +2492,9 @@ static int do_soffload(int fd, struct ifreq *ifr)
 		}
 
 		eval.cmd = ETHTOOL_SFLAGS;
-		eval.data = ((eval.data & ~off_flags_mask) |
-			     off_flags_wanted);
+		eval.data =
+		    ((eval.data & ~(off_flags_mask | common_flags_mask)) |
+		     (off_flags_wanted | common_flags_wanted));
 
 		err = ioctl(fd, SIOCETHTOOL, ifr);
 		if (err) {
-- 
1.7.3.1


^ permalink raw reply related

* Re: sch_sfb
From: Juliusz Chroboczek @ 2011-01-14  0:59 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, Jesper Dangaard Brouer, David Miller
In-Reply-To: <4D2F4B83.6060001@trash.net>

> Since to my knowledge you've never attempted an upstream merge

You're kidding, I hope.

  http://thread.gmane.org/gmane.linux.network/90225
  http://thread.gmane.org/gmane.linux.network/90375

It was reviewed in particular by one Patrick McHardy.

                                        Juliusz

^ permalink raw reply

* Re: [PATCH 2.6.36] vlan: Avoid hwaccel vlan packets when vid not used
From: Matt Carlson @ 2011-01-14  1:15 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Matthew Carlson, Michael Leun, Michael Chan, Eric Dumazet,
	David Miller, Ben Greear, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <AANLkTimQ1T0erugwK17o0zuSwxVh-Ro4yCsBpOqCykyr@mail.gmail.com>

On Thu, Jan 13, 2011 at 01:58:40PM -0800, Jesse Gross wrote:
> On Thu, Jan 13, 2011 at 3:50 PM, Matt Carlson <mcarlson@broadcom.com> wrote:
> > On Thu, Jan 13, 2011 at 07:06:22AM -0800, Jesse Gross wrote:
> >> On Wed, Jan 12, 2011 at 8:21 PM, Matt Carlson <mcarlson@broadcom.com> wrote:
> >> > On Thu, Jan 06, 2011 at 08:36:27PM -0800, Jesse Gross wrote:
> >> >> On Thu, Jan 6, 2011 at 10:24 PM, Matt Carlson <mcarlson@broadcom.com> wrote:
> >> >> > On Sat, Dec 18, 2010 at 07:38:00PM -0800, Jesse Gross wrote:
> >> >> >> On Tue, Dec 14, 2010 at 11:16 PM, Michael Leun
> >> >> >> <lkml20101129@newton.leun.net> wrote:
> >> >> >> > OK - all tests done on that DL320G5:
> >> >> >> >
> >> >> >> > For completeness, 2.6.37-rc5 unpatched:
> >> >> >> >
> >> >> >> > eth0, no vlan configured: totally broken - see double tagged vlans
> >> >> >> > without tag, single or untagged packets missing at all
> >> >> >>
> >> >> >> Random behavior? ?This one is somewhat hard to explain - maybe there
> >> >> >> are some other factors. ?eth0 has ASF on, so it always strips tags. ?I
> >> >> >> would expect it to behave like the vlan configured case.
> >> >> >>
> >> >> >> >
> >> >> >> > eth0, vlan configured: see packets without vlan tag (see double tagged
> >> >> >> > packets with one vlan tag)
> >> >> >>
> >> >> >> Both ASF and vlan group configured cause tag stripping to be enabled.
> >> >> >> Missing tag.
> >> >> >>
> >> >> >> >
> >> >> >> > eth1 same as originally reported:
> >> >> >> > without vlan configured see vlan tags (single and double tagged as
> >> >> >> > expected)
> >> >> >>
> >> >> >> No ASF and no vlan group means tag stripping is disabled. ?Have tag.
> >> >> >>
> >> >> >> > with vlan configured: see packets without vlan tag (see double tagged
> >> >> >> > packets with one vlan tag)
> >> >> >>
> >> >> >> Configuring vlan group causes stripping to be enabled. ?Missing tag.
> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> > 2.6.37-rc5, your tg3 use new vlan-code patch:
> >> >> >> >
> >> >> >> > eth0, no vlan configured: ?see packets without vlan tag (see double
> >> >> >> > tagged packets with one vlan tag)
> >> >> >>
> >> >> >> ASF enables tag stripping. ?Missing tag.
> >> >> >>
> >> >> >> > eth1, no vlan configured: see vlan tags (single and double tagged as
> >> >> >> > expected)
> >> >> >>
> >> >> >> No ASF, no vlan group means no stripping. ?Have tag.
> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> > eth0, vlan configured: as without vlan
> >> >> >>
> >> >> >> ASF enables stripping. ?Missing tag.
> >> >> >>
> >> >> >> > eth1, vlan configured: as without vlan
> >> >> >>
> >> >> >> With this patch vlan stripping is only enabled when ASF is on, so no
> >> >> >> stripping. ?Have tag.
> >> >> >>
> >> >> >> >
> >> >> >> > 2.6.37-rc5, your tg3 use new vlan-code patch with test patch ontop
> >> >> >> >
> >> >> >> > eth1 no vlan configured: see packets without vlan tag (see double tagged
> >> >> >> > packets with one vlan tag)
> >> >> >>
> >> >> >> With the second patch, vlan stripping is always enabled. ?Missing tag.
> >> >> >>
> >> >> >> > eth1 with vlan: the same
> >> >> >>
> >> >> >> Stripping still always enabled. ?Missing tag.
> >> >> >>
> >> >> >> The bottom line is whenever vlan stripping is enabled we're missing
> >> >> >> the outer tag. ?It might be worth adding some debugging in the area
> >> >> >> before napi_gro_receive/vlan_gro_receive (depending on version). ?My
> >> >> >> guess is that (desc->type_flags & RXD_FLAG_VLAN) is false even for
> >> >> >> vlan packets on this NIC.
> >> >> >>
> >> >> >> You said that everything works on the 5752? ?Matt, is it possible that
> >> >> >> the 5714 either has a problem with vlan stripping or a different way
> >> >> >> of reporting it?
> >> >> >
> >> >> > I don't think this is a 5714 specific issue. ?I think the problem is
> >> >> > rooted in the fact that the VLAN tag stripping is enabled.
> >> >>
> >> >> It's definitely related to vlan stripping being enabled. ?Other cards
> >> >> using tg3 seem to work fine with stripping though, which is why I
> >> >> thought it might be specific to the 5714.
> >> >
> >> > I just tested this on a 5714S, using a net-next-2.6 snapshot obtained
> >> > today. ?It does the right thing in both cases (2nd tg3 patch ommited /
> >> > applied). ?The tag is always visible in the packet stream as seen from
> >> > tcpdump.
> >> >
> >> >> > Your RXD_FLAG_VLAN idea sounds unlikely to me, but it's worth a check.
> >> >> >
> >> >> > The patch here is using __vlan_hwaccel_put_tag(), which informs the
> >> >> > stack a VLAN tag is present. ?If this is indeed a reporting problem, I'm
> >> >> > not sure what else the driver should be doing.
> >> >>
> >> >> The code to hand off the tag to the stack looks OK to me. ?Michael was
> >> >> seeing this on older versions of the kernel as well with this NIC,
> >> >> which predates both this patch and the larger vlan changes so it
> >> >> doesn't seem like a problem with passing the tag to the network stack.
> >> >> ?It's hard to know exactly what is going on though without seeing what
> >> >> the hardware is reporting.
> >> >
> >> > When RX_MODE_KEEP_VLAN_TAG is set, the RXD_FLAG_VLAN flag will not be set
> >> > when receiving a packet. ?The driver skips the __vlan_hwaccel_put_tag()
> >> > call.
> >> >
> >> > When RX_MODE_KEEP_VLAN_TAG is unset, the RXD_FLAG_VLAN flag is set, and
> >> > __vlan_hwaccel_put_tag() is called to reinject the packet.
> >>
> >> OK, thanks for testing it out. ?I'm not sure that there's anything
> >> more we can do without hearing from Michael.
> >
> > In the meantime, I think what we have should go upstream. ?Just to be
> > absolutely clear though, your position is that VLAN tags should always
> > be stripped?
> 
> That's what the other converted drivers do by default (though most of
> them also provide an ethtool set_flags() method to change this).  It's
> generally the most efficient and is now safe to do in all cases.  It's
> also the consistent with what was happening before, since stripping
> was enabled when a vlan device was configured.  So, yes, normally I
> think stripping should be enabled.
> 
> I assumed that disabling stripping in most situations was just an
> oversight.  Was there a reason why you feel it is better not to use
> it?

Actually, the tg3 driver was trying to disable VLAN tag stripping
when possible.  I believe this was primarily to support the raw packet
interface.

^ permalink raw reply

* Re: sch_sfb
From: Patrick McHardy @ 2011-01-14  1:39 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: netdev, Jesper Dangaard Brouer, David Miller
In-Reply-To: <7ir5cgs3k1.fsf@lanthane.pps.jussieu.fr>

On 14.01.2011 01:59, Juliusz Chroboczek wrote:
>> Since to my knowledge you've never attempted an upstream merge
> 
> You're kidding, I hope.
> 
>   http://thread.gmane.org/gmane.linux.network/90225
>   http://thread.gmane.org/gmane.linux.network/90375
> 
> It was reviewed in particular by one Patrick McHardy.

Well, sorry, I don't remember every patch I've ever reviewed.

Just to state it again, I've actually started implementing this
years ago without every finishing it and was quite happy about
noticing your implementation for inspiration. There's no reason
to be pissed, I don't really mind which version is merged if
any, reimplementing just forced me to think from scratch, with
existing code at hand this makes it easier to avoid or fix
mistakes.

^ permalink raw reply

* Re: [PATCH v1 2/2] TCPCT API sockopt update to draft -03
From: William Allen Simpson @ 2011-01-14  3:00 UTC (permalink / raw)
  To: Arnaud Lacombe
  Cc: Stephen Hemminger, Linux Kernel Developers,
	Linux Kernel Network Developers, David Miller, Andrew Morton
In-Reply-To: <AANLkTinhnaF3-scKnTOC5rF68+5otVW2NG3v1y80L0k6@mail.gmail.com>

On 1/13/11 12:53 PM, Arnaud Lacombe wrote:
> On Thu, Jan 13, 2011 at 12:32 PM, William Allen Simpson
> <william.allen.simpson@gmail.com>  wrote:
>> Even though I'm not paid to work on Linux, I'm doing my best to give you
>> folks a quick heads up and provide code to rectify the very recent changes
>> that can be propagated back through the stable tree (to 2.6.33).
>>
>> As always, what you actually do with my code is up to you....
>>
> FWIW, what is the basis of this hunk ? The RFC text[0] seems to use
> the TCP_COOKIE_* naming, not TCPCT_.
>
> Thanks,
>   - Arnaud
>
> [0]: http://www.rfc-editor.org/authors/rfc6013.txt
>
Is this supposed to be humorous?  Maybe folks here find it amusing that
somebody thinks they know more than the *author* about the contents of the
document?  Did you note the words above?  That is, "very recent changes"?

Perhaps you are viewing an older cached version.  Please check for the
current month on every page: "January 2011".

We discussed -- and ultimately decided -- these changes in private email
during the independent review process before making them available to the
general public.  That's how the RFC publication procedure works.

I tried to be helpful to the Linux community in advance of publication, so
you would be prepared.  I'm sorry that the community here is so lacking in
appreciation for my efforts on your behalf.

As always, what you actually do with my code is up to you....

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox