Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2 1/3] RFC tcp: early retransmit: tcp_enter_recovery()
From: Neal Cardwell @ 2012-05-01 17:54 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: davem, ilpo.jarvinen, nanditad, edumazet, netdev
In-Reply-To: <1335638781-960-1-git-send-email-ycheng@google.com>

On Sat, Apr 28, 2012 at 2:46 PM, Yuchung Cheng <ycheng@google.com> wrote:
> This a prepartion patch that refactors the code to enter recovery
> into a new function tcp_enter_recovery(). It's needed to implement
> the delayed fast retransmit in ER.
>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> ---
> ChangeLog in v2:
>  Fix sysctl_tcp_early_retrans default to 2
>  Re-arrange the patch parts so they are individually compilable and functional.
>
>  net/ipv4/tcp_input.c |   61 +++++++++++++++++++++++++++----------------------
>  1 files changed, 34 insertions(+), 27 deletions(-)

Acked-by: Neal Cardwell <ncardwell@google.com>

BTW, I think this is a nice clean-up, independent of early retransmit.

neal

^ permalink raw reply

* Re: [PATCH v2 2/3] RFC tcp: early retransmit
From: Neal Cardwell @ 2012-05-01 17:53 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: davem, ilpo.jarvinen, nanditad, edumazet, netdev
In-Reply-To: <1335638781-960-2-git-send-email-ycheng@google.com>

On Sat, Apr 28, 2012 at 2:46 PM, Yuchung Cheng <ycheng@google.com> wrote:
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -202,6 +202,20 @@ tcp_ecn - INTEGER
>                  not support ECN, behavior is like with ECN disabled.
>        Default: 2
>
> +tcp_early_retrans - INTEGER
> +       Enable Early Retransmit (ER), per RFC 5827. ER lowers the threshold

I get a trailing-whitespace error in the documentation when I apply this patch:

> git am /tmp/er2.patch
Applying: RFC tcp: early retransmit
.../.git/rebase-apply/patch:20:
trailing whitespace.
       Enable Early Retransmit (ER), per RFC 5827. ER lowers the threshold
warning: 1 line adds whitespace errors.

Also, to keep the sysctls alphabetical, tcp_early_retrans should probably go
before tcp_ecn.

> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -365,12 +365,13 @@ struct tcp_sock {
>
>        u32     frto_highmark;  /* snd_nxt when RTO occurred */
>        u16     advmss;         /* Advertised MSS                       */
> -       u8      frto_counter;   /* Number of new acks after RTO */
> -       u8      nonagle     : 4,/* Disable Nagle algorithm?             */
> +       u16     nonagle     : 4,/* Disable Nagle algorithm?             */
>                thin_lto    : 1,/* Use linear timeouts for thin streams */
>                thin_dupack : 1,/* Fast retransmit on first dupack      */
>                repair      : 1,
> -               unused      : 1;
> +               do_early_retrans: 1;/* Enable RFC5827 early-retransmit  */
> +
> +       u8      frto_counter;   /* Number of new acks after RTO */
>        u8      repair_queue;

To keep the change minimal and reduce the risk of mysterious
performance regressions from cache effects, I'd suggest keeping the
frto_counter and nonagle u8 bytes as u8 bytes in their current
location, and add a new u8 for the two ER bits. Same amount of space
as the scheme in the patch, just less shuffling.

> @@ -987,6 +989,7 @@ static void tcp_update_reordering(struct sock *sk, const int metric,
>                       tp->undo_marker ? tp->undo_retrans : 0);
>  #endif
>                tcp_disable_fack(tp);
> +               tcp_disable_early_retrans(tp);
>        }
>  }

I think we should stick with the behavior where we disable early
retransmit any time tcp_update_reordering() is called with a non-zero
reordering metric. This is what we've tested and measured, and my
sense is that we could risk a significant number of spurious ER
firings if instead we relax this so that only reordering >3 causes us
to disable ER. I know the delayed ER should help avoid spurious ER
firings when there is a small degree of reordering, but my guess would
be that the max(RTT/4, 2ms) is perhaps not big enough if we're
allowing delayed ER for connections that have already witnessed small
degrees of reordering. So until we have more experimental data, I'd
recommend sticking with:

	if (metric > 0)
		tcp_disable_early_retrans(tp);

Otherwise, looks good to me.

neal

^ permalink raw reply

* Re: [PATCH] bridge: make brctl showstp display port id
From: Joakim Tjernlund @ 2012-05-01 17:47 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <20120501103955.1553a9cb@s6510.linuxnetplumber.net>

Stephen Hemminger <shemminger@vyatta.com> wrote on 2012/05/01 19:39:55:
>
> > Stephen Hemminger <shemminger@vyatta.com> wrote on 2012/04/30 18:07:47:
> > >
> > > On Mon, 30 Apr 2012 10:39:08 +0200
> > > Joakim Tjernlund <Joakim.Tjernlund@transmode.se> wrote:
> > >
> > > > My brctl showstp br0 always shows a 0 port id:
> > > > eth2 (1)
> > > >  port id      0         state             disabled
> > > >  designated root   8000.00069c00b2fb   path cost       100
> > > >
> > > > because port id is printed as a hex number in sys fs. Change the
> > > > two hex occurrences(port no and port id) to decimal, just like all
> > > > the other numbers in this area.
> > > >
> > > > Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> > > > ---
> > > >  net/bridge/br_sysfs_if.c |    4 ++--
> > > >  1 files changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
> > > > index fd5799c..9c4c2eb 100644
> > > > --- a/net/bridge/br_sysfs_if.c
> > > > +++ b/net/bridge/br_sysfs_if.c
> > > > @@ -86,13 +86,13 @@ static BRPORT_ATTR(designated_cost, S_IRUGO, show_designated_cost, NULL);
> > > >
> > > >  static ssize_t show_port_id(struct net_bridge_port *p, char *buf)
> > > >  {
> > > > -   return sprintf(buf, "0x%x\n", p->port_id);
> > > > +   return sprintf(buf, "%d\n", p->port_id);
> > > >  }
> > > >  static BRPORT_ATTR(port_id, S_IRUGO, show_port_id, NULL);
> > > >
> > > >  static ssize_t show_port_no(struct net_bridge_port *p, char *buf)
> > > >  {
> > > > -   return sprintf(buf, "0x%x\n", p->port_no);
> > > > +   return sprintf(buf, "%d\n", p->port_no);
> > > >  }
> > > >
> > > >  static BRPORT_ATTR(port_no, S_IRUGO, show_port_no, NULL);
> > >
> > > No. This would be a visible change to applications.
> > > The bridge utilities should be fixed instead.
> >
> > Well, my reasoning was that if not the native tools got this right, what are the
> > chances that other apps got it right? I don't know any other apps than
> > brctl using this info, do you? We could check one or two apps to see what they do.
> >
> >  Jocke
> >
>
> Current bridge-utils does not have the issue. It has been fixed
> for 5 years. The problem was fixed by:
>
> commit 0397c6aa48769505e6aa4481f0786c8f4eaaf890
> Author: Jeremy Jackson <jerj@coplanar.net>
> Date:   Tue May 8 11:10:34 2007 -0700
>
>     I've noticed for a while that
>
>     output is showing 0 for port_no and  port_id
>
>     It seems that somewhere in 2.6 sysfs land the following items got
>     printed in hexadecimal, and brctl code was parsing for decimal only
>
>     doug:/sys/class/net/eth0/brport# cat port_id
>     0x8001
>     doug:/sys/class/net/eth0/brport# cat port_no
>     0x1


Yeah, just noticed that too. On our embedded board we have the release just before that fix :(

 Jocke

^ permalink raw reply

* Re: [PATCH] netem: fix possible skb leak
From: David Miller @ 2012-05-01 17:41 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, shemminger
In-Reply-To: <1335726502.3897.8.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 29 Apr 2012 21:08:22 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> skb_checksum_help(skb) can return an error, we must free skb in this
> case. qdisc_drop(skb, sch) can also be feeded with a NULL skb (if
> skb_unshare() failed), so lets use this generic helper.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH] bridge: make brctl showstp display port id
From: Stephen Hemminger @ 2012-05-01 17:39 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: netdev@vger.kernel.org
In-Reply-To: <OF4DD70AEF.9EE04586-ONC12579F1.00502FF3-C12579F1.00506F9C@transmode.se>

On Tue, 1 May 2012 16:38:34 +0200
Joakim Tjernlund <joakim.tjernlund@transmode.se> wrote:

> Stephen Hemminger <shemminger@vyatta.com> wrote on 2012/04/30 18:07:47:
> >
> > On Mon, 30 Apr 2012 10:39:08 +0200
> > Joakim Tjernlund <Joakim.Tjernlund@transmode.se> wrote:
> >
> > > My brctl showstp br0 always shows a 0 port id:
> > > eth2 (1)
> > >  port id      0         state             disabled
> > >  designated root   8000.00069c00b2fb   path cost       100
> > >
> > > because port id is printed as a hex number in sys fs. Change the
> > > two hex occurrences(port no and port id) to decimal, just like all
> > > the other numbers in this area.
> > >
> > > Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> > > ---
> > >  net/bridge/br_sysfs_if.c |    4 ++--
> > >  1 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
> > > index fd5799c..9c4c2eb 100644
> > > --- a/net/bridge/br_sysfs_if.c
> > > +++ b/net/bridge/br_sysfs_if.c
> > > @@ -86,13 +86,13 @@ static BRPORT_ATTR(designated_cost, S_IRUGO, show_designated_cost, NULL);
> > >
> > >  static ssize_t show_port_id(struct net_bridge_port *p, char *buf)
> > >  {
> > > -   return sprintf(buf, "0x%x\n", p->port_id);
> > > +   return sprintf(buf, "%d\n", p->port_id);
> > >  }
> > >  static BRPORT_ATTR(port_id, S_IRUGO, show_port_id, NULL);
> > >
> > >  static ssize_t show_port_no(struct net_bridge_port *p, char *buf)
> > >  {
> > > -   return sprintf(buf, "0x%x\n", p->port_no);
> > > +   return sprintf(buf, "%d\n", p->port_no);
> > >  }
> > >
> > >  static BRPORT_ATTR(port_no, S_IRUGO, show_port_no, NULL);
> >
> > No. This would be a visible change to applications.
> > The bridge utilities should be fixed instead.
> 
> Well, my reasoning was that if not the native tools got this right, what are the
> chances that other apps got it right? I don't know any other apps than
> brctl using this info, do you? We could check one or two apps to see what they do.
> 
>  Jocke
> 

Current bridge-utils does not have the issue. It has been fixed
for 5 years. The problem was fixed by:

commit 0397c6aa48769505e6aa4481f0786c8f4eaaf890
Author: Jeremy Jackson <jerj@coplanar.net>
Date:   Tue May 8 11:10:34 2007 -0700

    I've noticed for a while that
    
    output is showing 0 for port_no and  port_id
    
    It seems that somewhere in 2.6 sysfs land the following items got
    printed in hexadecimal, and brctl code was parsing for decimal only
    
    doug:/sys/class/net/eth0/brport# cat port_id
    0x8001
    doug:/sys/class/net/eth0/brport# cat port_no
    0x1

^ permalink raw reply

* Re: [PATCH 0/2] sky2 patches
From: David Miller @ 2012-05-01 17:39 UTC (permalink / raw)
  To: shemminger; +Cc: netdev
In-Reply-To: <20120430154944.890709139@vyatta.com>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Mon, 30 Apr 2012 08:49:44 -0700

> These are both patches against net-next but should be
> applied to -net and stable as well.

Both applied to 'net' and queued up for -stable, thanks.

^ permalink raw reply

* Re: 3.4-rc: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
From: Francois Romieu @ 2012-05-01 17:27 UTC (permalink / raw)
  To: Stefan Richter; +Cc: netdev
In-Reply-To: <20120501182405.3dda59b9@stein>

Stefan Richter <stefanr@s5r6.in-berlin.de> :
[...]
> I switched from 3.3 to 3.4-rc5 yesterday and am getting the following warning
> some time after boot (e.g. this time about two hours after boot).  What's up
> with that?

The device stopped being able to transmit anything at some point (netdev
watchdog message). The warnings always look the same. If the device does
not recover, we have a problem. Do we ?

[...]
> There are no noticeable issues, but I don't run anything demanding on the
> interface right now which would make any downtime obvious.

I don't understand: is it a low traffic interface or a no traffic one ?

> It is an RTL8168(?) on an Asus M3A78-EM motherboard.

dmesg | grep XID should identify the chipset. You may consider sending
an 'ethtool eth0' and 'ethtool -i eth0'.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH 3/8] Sometimes the ISDN chip only controls the D-channel
From: David Miller @ 2012-05-01 17:30 UTC (permalink / raw)
  To: kkeil; +Cc: netdev
In-Reply-To: <1335613404-10187-4-git-send-email-kkeil@linux-pingi.de>

From: Karsten Keil <kkeil@linux-pingi.de>
Date: Sat, 28 Apr 2012 13:43:19 +0200

> The B-channels are only accessed via the PCM backplane.
> Add infrastruckture for this special mode.
> 
> Signed-off-by: Karsten Keil <kkeil@linux-pingi.de>

I dread reviewing these ISDN patch sets because they are so
full of problems, and it's so damn obvious how little care is
put into preparing them.

What I see is that you put the minimum amount of work necessary
into splitting up your huge ISDN patch set submission into more
managable pieces, and as a result you are introducing problems.

> diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
> index abe2d69..502bcf1 100644
> --- a/drivers/isdn/mISDN/socket.c
> +++ b/drivers/isdn/mISDN/socket.c
> @@ -270,6 +270,7 @@ data_sock_release(struct socket *sock)
>  		break;
>  	case ISDN_P_LAPD_TE:
>  	case ISDN_P_LAPD_NT:
> +	case ISDN_P_B_PCM:
>  	case ISDN_P_B_RAW:
>  	case ISDN_P_B_HDLC:
>  	case ISDN_P_B_X75SLP:

Ok, that's fine.

> @@ -148,6 +149,8 @@ struct bchannel {
>  	u_int			state;
>  	void			*hw;
>  	int			slot;	/* multiport card channel slot */
> +	int			pcm_tx;	/* PCM tx slot nr */
> +	int			pcm_rx;	/* PCM rx slot nr */
>  	struct timer_list	timer;
>  	/* receive data */
>  	struct sk_buff		*rx_skb;

But what the hell is this?  These structure members are unused by
this patch, and in fact no patch in your entire series uses them.

> @@ -360,8 +360,8 @@ clear_channelmap(u_int nr, u_char *map)
>  #define MISDN_CTRL_LOOP			0x0001
>  #define MISDN_CTRL_CONNECT		0x0002
>  #define MISDN_CTRL_DISCONNECT		0x0004
> -#define MISDN_CTRL_PCMCONNECT		0x0010
> -#define MISDN_CTRL_PCMDISCONNECT	0x0020
> +#define MISDN_CTRL_GET_PCM_SLOTS	0x0010
> +#define MISDN_CTRL_SET_PCM_SLOTS	0x0020
>  #define MISDN_CTRL_SETPEER		0x0040
>  #define MISDN_CTRL_UNSETPEER		0x0080
>  #define MISDN_CTRL_RX_OFF		0x0100

Another completely unrelated change, nothing in this patch uses
these new defines.

> @@ -381,6 +381,10 @@ clear_channelmap(u_int nr, u_char *map)
>  #define MISDN_CTRL_HFC_WD_INIT		0x4009
>  #define MISDN_CTRL_HFC_WD_RESET		0x400A
>  
> +/* special PCM slot numbers */
> +#define MISDN_PCM_SLOT_DISABLE	-1	/* PCM disabled */
> +#define MISDN_PCM_SLOT_IGNORE	-2	/* PCM setting will be not changed */
> +
>  /* socket options */
>  #define MISDN_TIME_STAMP		0x0001
>  

Same thing.

> @@ -389,6 +393,7 @@ struct mISDN_ctrl_req {
>  	int		channel;
>  	int		p1;
>  	int		p2;
> +	int		p3;
>  };
>  
>  /* muxer options */

And again, same problem.

You really need to get your act in gear and prepare your patches
properly, so that they don't have unrelated changes in them.

This is not amateur hour.

^ permalink raw reply

* Re: [PATCH net-next 0/5] be2net fixes
From: David Miller @ 2012-05-01 17:22 UTC (permalink / raw)
  To: somnath.kotur; +Cc: netdev
In-Reply-To: <ef51171e-d498-4201-b836-db6fbdb2ee4f@exht1.ad.emulex.com>

From: Somnath Kotur <somnath.kotur@emulex.com>
Date: Mon, 30 Apr 2012 11:33:47 +0530

> Re-posting patches after incorporating Ben Hutchings's 
> review comments in patches 2 and one of the comments for patch 5

Ben still has feedback, please address it and resubmit this series.

^ permalink raw reply

* Re: [PATCH net-next] netem: add ECN capability
From: David Miller @ 2012-05-01 17:17 UTC (permalink / raw)
  To: shemminger; +Cc: eric.dumazet, netdev, therbert, ncardwell, hagen
In-Reply-To: <20120501095938.7214e4a5@s6510.linuxnetplumber.net>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 1 May 2012 09:59:38 -0700

> On Tue, 01 May 2012 09:40:51 -0400 (EDT)
> David Miller <davem@davemloft.net> wrote:
> 
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Tue, 01 May 2012 11:11:05 +0200
>> 
>> > From: Eric Dumazet <edumazet@google.com>
>> > 
>> > Add ECN (Explicit Congestion Notification) marking capability to netem
>> > 
>> > tc qdisc add dev eth0 root netem drop 0.5 ecn
>> > 
>> > Instead of dropping packets, try to ECN mark them.
>> > 
>> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>> 
>> Applied.
> 
> At least give a day of review, rather than premature acceptance

Yeah, because Eric Dumazet is going to fall off the face of the
planet and not fix whatever problems you have with his work.

Get real Stephen.

^ permalink raw reply

* Re: [PATCH net-next] netem: add ECN capability
From: Eric Dumazet @ 2012-05-01 17:10 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, netdev, Tom Herbert, Neal Cardwell,
	Hagen Paul Pfeifer
In-Reply-To: <20120501095914.0e60ff6a@s6510.linuxnetplumber.net>

On Tue, 2012-05-01 at 09:59 -0700, Stephen Hemminger wrote:

> The concept is fine, but a couple of questions.
>  1. Why a whole u32 for boolean?

a boolean in this structure wont save space, and this file is full of
u32. Why bother ?

IMHO boolean are fine for function arguments, but in a structure, not
very helpful.

>  2. The logic in this part of netem is setup to handle case of random duplication
>     combined with random loss. With ecn option set, will this code correctly
>     handled a duplication combined with a loss and send one packet?
>     It looks like the new code would change that behaviour.

Hmm, I'll take a look.

^ permalink raw reply

* skb_under_panic bug
From: Pieper, Jeffrey E @ 2012-05-01 17:08 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Pieper, Jeffrey E

I've seen the following stack trace several times recently, and was wondering if anyone else has ran into this. This occurs during bi-directional TCP/UDP traffic (netperf) on multiple platforms/devices. Whatever is occurring also seems to invoke hdparm.  I'm running 3.4.0-rc4 (recent net-next pull) with CONFIG_PREEMPT=y:

skb_under_panic: text:ffffffff8130d538 len:120 put:14 head:ffff880128c89800 data:ffff880128c897f4 tail:0x6c end:0xc0 dev:eth0
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:145!
invalid opcode: 0000 [#1] PREEMPT SMP 
CPU 4 
Modules linked in: nfsd lockd exportfs sunrpc e1000e [last unloaded: scsi_wait_scan]

Pid: 5030, comm: netperf Not tainted 3.4.0-rc2-net-next-e1000e-queue_20120423 #2                  /DQ57TM
RIP: 0010:[<ffffffff812ef1b4>]  [<ffffffff812ef1b4>] skb_push+0x72/0x7b
RSP: 0018:ffff8801283459d8  EFLAGS: 00010292
RAX: 0000000000000084 RBX: ffff880127ffc000 RCX: 00000000fffffff3
RDX: 00000000000000d6 RSI: 0000000000000046 RDI: ffffffff8162375a
RBP: ffff8801283459f8 R08: 0000000000008d0a R09: ffff88012f019000
R10: 0000000000000001 R11: 0000000000000078 R12: ffff880128ff6ca0
R13: 0000000000000800 R14: 0000000000000212 R15: ffff880128ff6c98
FS:  00007fc4c3feb700(0000) GS:ffff88012fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffff600400 CR3: 0000000127e81000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process netperf (pid: 5030, threadinfo ffff880128344000, task ffff8801279859d0)
Stack:
000000000000006c 00000000000000c0 ffff880127ffc000 ffff880128ff6ca0
ffff880128345a38 ffffffff8130d538 ffff88010000006a 0000000000000000
ffffffff81300cd5 ffff88011dfad9c0 ffff880128ff6c00 ffff880127ffc000
Call Trace:
[<ffffffff8130d538>] eth_header+0x28/0xb4
[<ffffffff81300cd5>] ? neigh_resolve_output+0x14d/0x19a
[<ffffffff81300cd7>] neigh_resolve_output+0x14f/0x19a
[<ffffffff8131c35a>] ip_finish_output+0x22f/0x271
[<ffffffff8131c3d9>] ip_output+0x3d/0x3f
[<ffffffff81319dd4>] ip_local_out+0x62/0x64
[<ffffffff81319ddf>] ip_send_skb+0x9/0x2c
[<ffffffff81337a85>] udp_send_skb+0x250/0x2aa
[<ffffffff81338fa2>] udp_sendmsg+0x4e3/0x6f8
[<ffffffff8131b345>] ? ip_append_page+0x4b2/0x4b2
[<ffffffff813a442b>] ? preempt_schedule_irq+0x3c/0x51
[<ffffffff81049998>] ? __dequeue_entity+0x2e/0x33
[<ffffffff8133f654>] inet_sendmsg+0x93/0x9c
[<ffffffff812e741d>] sock_sendmsg+0xbb/0xd4
[<ffffffff81049998>] ? __dequeue_entity+0x2e/0x33
[<ffffffff81049998>] ? __dequeue_entity+0x2e/0x33
[<ffffffff813a442b>] ? preempt_schedule_irq+0x3c/0x51
[<ffffffff813a5516>] ? retint_kernel+0x26/0x30
[<ffffffff812e748f>] ? sockfd_lookup_light+0x1b/0x54
[<ffffffff812e7b3f>] sys_sendto+0xfa/0x122
[<ffffffff813a442b>] ? preempt_schedule_irq+0x3c/0x51
[<ffffffff813a97a2>] system_call_fastpath+0x16/0x1b
Code: 8b 57 68 48 89 44 24 10 8b 87 b0 00 00 00 48 89 44 24 08 31 c0 8b bf ac 00 00 00 48 89 3c 24 48 c7 c7 b7 6d 4c 81 e8 26 37 0b 00 <0f> 0b eb fe 4c 89 c8 c9 c3 55 89 f1 48 89 e5 48 83 ec 20 4c 8b 
RIP  [<ffffffff812ef1b4>] skb_push+0x72/0x7b
RSP <ffff8801283459d8>
---[ end trace 35a690c4aebb4bd0 ]---
hdparm: sending ioctl 330 to a partition!
hdparm: sending ioctl 330 to a partition!
hdparm: sending ioctl 330 to a partition!
hdparm: sending ioctl 330 to a partition!

Thanks in advance,

Jeff Pieper

^ permalink raw reply

* Re: [PATCH 3/4 v2 net-next] net: make GRO aware of skb->head_frag
From: Eric Dumazet @ 2012-05-01 17:04 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Alexander Duyck, David Miller, netdev, Neal Cardwell, Tom Herbert,
	Jeff Kirsher, Michael Chan, Matt Carlson, Herbert Xu,
	Ben Hutchings, Ilpo Järvinen, Maciej Żenczykowski
In-Reply-To: <4FA00C9F.8080409@intel.com>

On Tue, 2012-05-01 at 09:17 -0700, Alexander Duyck wrote:
> On 04/30/2012 11:39 PM, Eric Dumazet wrote:
> > On Mon, 2012-04-30 at 22:33 -0700, Alexander Duyck wrote:
> >
> >> The question I had was more specific to GRO.  As long as we have
> >> skb->users == 1 and the skb isn't cloned we should be fine.   It just
> >> hadn't occurred to me before that napi_gro_receive had the extra
> >> requirement that the skb couldn't be cloned.
> >>
> > OK
> >
> > By the way, even if skb was cloned, we would be allowed to steal
> > skb->head.
> >
> > When we clone an oskb we :
> >
> > 1) allocate a struct nskb sk_buff (or use the shadow in case of TCP)
> > 2) increment dataref
> The problem I have is with this piece right here.  So you increment
> dataref.  Now you have an skb that is still pointing to the shared info
> on this page and dataref is 2.  What about the side that is stealing the
> head?  Is it going to be tracking the dataref as well and decrementing
> it before put_page or does it just assume that dataref is 1 and call
> put_page directly?  I am guessing the latter since I didn't see anything
> that allowed for tracking the dataref of stolen heads.

The only changed thing is the kfree() replaced by put_page()

This kfree() was done when last reference to dataref was released.

If we had a problem before, we have same problem after my patch.

Truth is : In TCP (coalesce and splice()) and GRO, we owns skbs.

(See the various __kfree_skb(skb) calls in net/ipv4/tcp_input.c
There is one exception in ipv6 / treq->pktopts ) but its for SYN packet
and this wont be merged with a previous packet.

^ permalink raw reply

* Re: [PATCH net-next] netem: add ECN capability
From: Stephen Hemminger @ 2012-05-01 16:59 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, therbert, ncardwell, hagen
In-Reply-To: <20120501.094051.655617860554790911.davem@davemloft.net>

On Tue, 01 May 2012 09:40:51 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 01 May 2012 11:11:05 +0200
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > Add ECN (Explicit Congestion Notification) marking capability to netem
> > 
> > tc qdisc add dev eth0 root netem drop 0.5 ecn
> > 
> > Instead of dropping packets, try to ECN mark them.
> > 
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> Applied.

At least give a day of review, rather than premature acceptance

^ permalink raw reply

* Re: [PATCH net-next] netem: add ECN capability
From: Stephen Hemminger @ 2012-05-01 16:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Tom Herbert, Neal Cardwell,
	Hagen Paul Pfeifer
In-Reply-To: <1335863465.11396.45.camel@edumazet-glaptop>

On Tue, 01 May 2012 11:11:05 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Eric Dumazet <edumazet@google.com>
> 
> Add ECN (Explicit Congestion Notification) marking capability to netem
> 
> tc qdisc add dev eth0 root netem drop 0.5 ecn
> 
> Instead of dropping packets, try to ECN mark them.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Tom Herbert <therbert@google.com>
> Cc: Hagen Paul Pfeifer <hagen@jauu.net>
> Cc: Stephen Hemminger <shemminger@vyatta.com>
> ---
>  include/linux/pkt_sched.h |    1 +
>  net/sched/sch_netem.c     |   18 +++++++++++++++---
>  2 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
> index 410b33d..ffe975c 100644
> --- a/include/linux/pkt_sched.h
> +++ b/include/linux/pkt_sched.h
> @@ -509,6 +509,7 @@ enum {
>  	TCA_NETEM_CORRUPT,
>  	TCA_NETEM_LOSS,
>  	TCA_NETEM_RATE,
> +	TCA_NETEM_ECN,
>  	__TCA_NETEM_MAX,
>  };
>  
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index 1109731..231cd11 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -26,6 +26,7 @@
>  
>  #include <net/netlink.h>
>  #include <net/pkt_sched.h>
> +#include <net/inet_ecn.h>
>  
>  #define VERSION "1.3"
>  
> @@ -78,6 +79,7 @@ struct netem_sched_data {
>  	psched_tdiff_t jitter;
>  
>  	u32 loss;
> +	u32 ecn;
>  	u32 limit;
>  	u32 counter;
>  	u32 gap;
> @@ -374,9 +376,12 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  		++count;
>  
>  	/* Drop packet? */
> -	if (loss_event(q))
> -		--count;
> -
> +	if (loss_event(q)) {
> +		if (q->ecn && INET_ECN_set_ce(skb))
> +			sch->qstats.drops++; /* mark packet */
> +		else
> +			--count;
> +	}
>  	if (count == 0) {
>  		sch->qstats.drops++;
>  		kfree_skb(skb);
> @@ -706,6 +711,7 @@ static const struct nla_policy netem_policy[TCA_NETEM_MAX + 1] = {
>  	[TCA_NETEM_CORRUPT]	= { .len = sizeof(struct tc_netem_corrupt) },
>  	[TCA_NETEM_RATE]	= { .len = sizeof(struct tc_netem_rate) },
>  	[TCA_NETEM_LOSS]	= { .type = NLA_NESTED },
> +	[TCA_NETEM_ECN]		= { .type = NLA_U32 },
>  };
>  
>  static int parse_attr(struct nlattr *tb[], int maxtype, struct nlattr *nla,
> @@ -776,6 +782,9 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt)
>  	if (tb[TCA_NETEM_RATE])
>  		get_rate(sch, tb[TCA_NETEM_RATE]);
>  
> +	if (tb[TCA_NETEM_ECN])
> +		q->ecn = nla_get_u32(tb[TCA_NETEM_ECN]);
> +
>  	q->loss_model = CLG_RANDOM;
>  	if (tb[TCA_NETEM_LOSS])
>  		ret = get_loss_clg(sch, tb[TCA_NETEM_LOSS]);
> @@ -902,6 +911,9 @@ static int netem_dump(struct Qdisc *sch, struct sk_buff *skb)
>  	if (nla_put(skb, TCA_NETEM_RATE, sizeof(rate), &rate))
>  		goto nla_put_failure;
>  
> +	if (q->ecn && nla_put_u32(skb, TCA_NETEM_ECN, q->ecn))
> +		goto nla_put_failure;
> +
>  	if (dump_loss_model(q, skb) != 0)
>  		goto nla_put_failure;

The concept is fine, but a couple of questions.
 1. Why a whole u32 for boolean?
 2. The logic in this part of netem is setup to handle case of random duplication
    combined with random loss. With ecn option set, will this code correctly
    handled a duplication combined with a loss and send one packet?
    It looks like the new code would change that behaviour.

^ permalink raw reply

* Re: [PATCH 1/1] sky2: fix YUKON FE+ rev. A0, ingress VLAN tagged frame problem
From: Stephen Hemminger @ 2012-05-01 16:54 UTC (permalink / raw)
  To: Michele Cucchi; +Cc: Mirko Lindner, netdev
In-Reply-To: <1335887340-24284-1-git-send-email-cucchimichele@tiscali.it>

On Tue,  1 May 2012 17:49:00 +0200
Michele Cucchi <cucchimichele@tiscali.it> wrote:

> The cards based on chip YUKON FE+ revision A0, fails to correctly receive VLAN tagged frames. The ingress frames stays on raw physical interface and don't reach VLANS allocated interfaces.
> The problem is the double call of __vlan_hwaccel_put_tag function, one in the sky2_skb_rx function and the other in the vlans global function vlan_untag.
> Adding a check of card version and revision in the sky2_skb_rx function allow to skip the first time call of __vlan_hwaccel_put_tag and to fix the problem.
> 
> This fix is tested on 3.4-rc5.
> 
> Signed-off-by: Michele Cucchi <cucchimichele@tiscali.it>
> ---
>  drivers/net/ethernet/marvell/sky2.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
> index c9b504e..2be0351 100644
> --- a/drivers/net/ethernet/marvell/sky2.c
> +++ b/drivers/net/ethernet/marvell/sky2.c
> @@ -2649,7 +2649,10 @@ static inline void sky2_tx_done(struct net_device *dev, u16 last)
>  static inline void sky2_skb_rx(const struct sky2_port *sky2,
>  			       u32 status, struct sk_buff *skb)
>  {
> -	if (status & GMR_FS_VLAN)
> +	/* Workaround for YUKON FE+ rev. A0, vlan untag problem */
> +	if ((status & GMR_FS_VLAN) &&
> +	(!(sky2->hw->chip_id == CHIP_ID_YUKON_FE_P &&
> +	sky2->hw->chip_rev == CHIP_REV_YU_FE2_A0)))
>  		__vlan_hwaccel_put_tag(skb, be16_to_cpu(sky2->rx_tag));
>  
>  	if (skb->ip_summed == CHECKSUM_NONE)

A better solution to this was started by Mirko and revised by me.
It is was posted yesterday.

^ permalink raw reply

* Re: [PATCH] bridge: make brctl showstp display port id
From: Stephen Hemminger @ 2012-05-01 16:53 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: netdev@vger.kernel.org
In-Reply-To: <OF4DD70AEF.9EE04586-ONC12579F1.00502FF3-C12579F1.00506F9C@transmode.se>

On Tue, 1 May 2012 16:38:34 +0200
Joakim Tjernlund <joakim.tjernlund@transmode.se> wrote:

> Stephen Hemminger <shemminger@vyatta.com> wrote on 2012/04/30 18:07:47:
> >
> > On Mon, 30 Apr 2012 10:39:08 +0200
> > Joakim Tjernlund <Joakim.Tjernlund@transmode.se> wrote:
> >
> > > My brctl showstp br0 always shows a 0 port id:
> > > eth2 (1)
> > >  port id      0         state             disabled
> > >  designated root   8000.00069c00b2fb   path cost       100
> > >
> > > because port id is printed as a hex number in sys fs. Change the
> > > two hex occurrences(port no and port id) to decimal, just like all
> > > the other numbers in this area.
> > >
> > > Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> > > ---
> > >  net/bridge/br_sysfs_if.c |    4 ++--
> > >  1 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
> > > index fd5799c..9c4c2eb 100644
> > > --- a/net/bridge/br_sysfs_if.c
> > > +++ b/net/bridge/br_sysfs_if.c
> > > @@ -86,13 +86,13 @@ static BRPORT_ATTR(designated_cost, S_IRUGO, show_designated_cost, NULL);
> > >
> > >  static ssize_t show_port_id(struct net_bridge_port *p, char *buf)
> > >  {
> > > -   return sprintf(buf, "0x%x\n", p->port_id);
> > > +   return sprintf(buf, "%d\n", p->port_id);
> > >  }
> > >  static BRPORT_ATTR(port_id, S_IRUGO, show_port_id, NULL);
> > >
> > >  static ssize_t show_port_no(struct net_bridge_port *p, char *buf)
> > >  {
> > > -   return sprintf(buf, "0x%x\n", p->port_no);
> > > +   return sprintf(buf, "%d\n", p->port_no);
> > >  }
> > >
> > >  static BRPORT_ATTR(port_no, S_IRUGO, show_port_no, NULL);
> >
> > No. This would be a visible change to applications.
> > The bridge utilities should be fixed instead.
> 
> Well, my reasoning was that if not the native tools got this right, what are the
> chances that other apps got it right? I don't know any other apps than
> brctl using this info, do you? We could check one or two apps to see what they do.

Unfortunately, that isn't really possible. API's get very hard to change
safely. It isn't worth the effort.

^ permalink raw reply

* 3.4-rc: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
From: Stefan Richter @ 2012-05-01 16:24 UTC (permalink / raw)
  To: netdev

Hi,

I switched from 3.3 to 3.4-rc5 yesterday and am getting the following warning
some time after boot (e.g. this time about two hours after boot).  What's up
with that?

May  1 15:08:35 stein kernel: WARNING: at net/sched/sch_generic.c:256 dev_watchdog+0x16b/0x20f()
May  1 15:08:35 stein kernel: Hardware name: System Product Name
May  1 15:08:35 stein kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
May  1 15:08:35 stein kernel: Modules linked in: usb_storage snd_isight snd_firewire_lib firewire_ohci firewire_core crc_itu_t cpufreq_ondemand hwmon_vid nfsd lockd sunrpc ext4 jbd2 crc16 sr_mod cdrom snd_hda_codec_realtek powernow_k8 freq_table mperf yenta_socket k10temp pcmcia_core pcmcia_rsrc pata_atiixp sg snd_hda_intel snd_hda_codec snd_pcm snd_timer snd snd_page_alloc processor r8169 mii
May  1 15:08:35 stein kernel: Pid: 0, comm: swapper/3 Not tainted 3.4.0-rc5 #4
May  1 15:08:35 stein kernel: Call Trace:
May  1 15:08:35 stein kernel: <IRQ>  [<ffffffff8102562d>] ? warn_slowpath_common+0x78/0x8c
May  1 15:08:35 stein kernel: [<ffffffff810256e2>] ? warn_slowpath_fmt+0x45/0x4a
May  1 15:08:35 stein kernel: [<ffffffff810174c4>] ? lapic_next_event+0x18/0x1f
May  1 15:08:35 stein kernel: [<ffffffff812c2040>] ? dev_watchdog+0x16b/0x20f
May  1 15:08:35 stein kernel: [<ffffffff8103da4c>] ? hrtimer_interrupt+0x100/0x19b
May  1 15:08:35 stein kernel: [<ffffffff8102e3a8>] ? run_timer_softirq+0x177/0x209
May  1 15:08:35 stein kernel: [<ffffffff8104e201>] ? clockevents_program_event+0xaa/0xce
May  1 15:08:35 stein kernel: [<ffffffff812c1ed5>] ? qdisc_reset+0x35/0x35
May  1 15:08:35 stein kernel: [<ffffffff8102a122>] ? __do_softirq+0x7f/0x106
May  1 15:08:35 stein kernel: [<ffffffff8131750c>] ? call_softirq+0x1c/0x30
May  1 15:08:35 stein kernel: [<ffffffff810034ea>] ? do_softirq+0x31/0x67
May  1 15:08:35 stein kernel: [<ffffffff8102a3cf>] ? irq_exit+0x44/0xae
May  1 15:08:35 stein kernel: [<ffffffff8100322b>] ? do_IRQ+0x94/0xad
May  1 15:08:35 stein kernel: [<ffffffff81315c67>] ? common_interrupt+0x67/0x67
May  1 15:08:35 stein kernel: <EOI>  [<ffffffff81008a6c>] ? default_idle+0x20/0x34
May  1 15:08:35 stein kernel: [<ffffffff81008b46>] ? amd_e400_idle+0xc6/0xe2
May  1 15:08:35 stein kernel: [<ffffffff810092c6>] ? cpu_idle+0x62/0x86
May  1 15:08:35 stein kernel: ---[ end trace 11b2855a75e24d28 ]---
May  1 15:08:35 stein kernel: r8169 0000:0b:00.0: eth0: link up

The "link up" message occurs several times per session.

(booted on Apr 30 20:38:16)
Apr 30 20:38:17 stein kernel: r8169 0000:0b:00.0: eth0: link up
Apr 30 23:57:36 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 01:57:46 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 04:20:21 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 04:28:57 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 06:11:35 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 06:13:59 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 07:51:58 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 08:14:56 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 08:15:20 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 08:16:08 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 08:26:20 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 08:35:04 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 08:46:00 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 08:50:24 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 08:57:48 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 09:07:26 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 10:33:23 stein kernel: r8169 0000:0b:00.0: eth0: link up

(booted on May  1 12:47:45)
May  1 12:47:46 stein kernel: r8169 0000:0b:00.0: eth0: link up
May  1 15:08:35 stein kernel: r8169 0000:0b:00.0: eth0: link up

I do not remember such messages from any earlier kernel.

There are no noticeable issues, but I don't run anything demanding on the
interface right now which would make any downtime obvious.

It is an RTL8168(?) on an Asus M3A78-EM motherboard.

-[0000:00]-+-00.0  Advanced Micro Devices [AMD] RS780 Host Bridge [1022:9600]
           +-01.0-[01]--+-05.0  Advanced Micro Devices [AMD] nee ATI RS780 [Radeon HD 3200] [1002:9610]
           |            \-05.1  Advanced Micro Devices [AMD] nee ATI RS780 HDMI Audio [Radeon HD 3000-3300 Series] [1002:960f]
           +-02.0-[02-07]----00.0-[03-07]--+-01.0-[04]----00.0  Agere Systems FW643 PCI Express 1394b Controller (PHY/Link) [11c1:5901]
           |                               +-02.0-[05]----00.0  Agere Systems FW643 PCI Express 1394b Controller (PHY/Link) [11c1:5901]
           |                               +-03.0-[06]--
           |                               \-04.0-[07]--
           +-04.0-[08-09]----00.0-[09]----00.0  Texas Instruments XIO2213A/B/XIO2221 IEEE-1394b OHCI Controller [Cheetah Express] [104c:823f]
           +-05.0-[0a]----00.0  JMicron Technology Corp. IEEE 1394 Host Controller [197b:2380]
           +-06.0-[0b]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller [10ec:8168]
           +-11.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] [1002:4391]
           +-12.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller [1002:4397]
           +-12.1  Advanced Micro Devices [AMD] nee ATI SB7x0 USB OHCI1 Controller [1002:4398]
           +-12.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller [1002:4396]
           +-13.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller [1002:4397]
           +-13.1  Advanced Micro Devices [AMD] nee ATI SB7x0 USB OHCI1 Controller [1002:4398]
           +-13.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller [1002:4396]
           +-14.0  Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller [1002:4385]
           +-14.1  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 IDE Controller [1002:439c]
           +-14.2  Advanced Micro Devices [AMD] nee ATI SBx00 Azalia (Intel HDA) [1002:4383]
           +-14.3  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 LPC host controller [1002:439d]
           +-14.4-[0c-10]----06.0  Ricoh Co Ltd RL5c475 [1180:0475]
           +-14.5  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI2 Controller [1002:4399]
           +-18.0  Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration [1022:1200]
           +-18.1  Advanced Micro Devices [AMD] Family 10h Processor Address Map [1022:1201]
           +-18.2  Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller [1022:1202]
           +-18.3  Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control [1022:1203]
           \-18.4  Advanced Micro Devices [AMD] Family 10h Processor Link Control [1022:1204]

-- 
Stefan Richter
-=====-===-- -=-= ----=
http://arcgraph.de/sr/

^ permalink raw reply

* Re: [PATCH 3/4 v2 net-next] net: make GRO aware of skb->head_frag
From: Alexander Duyck @ 2012-05-01 16:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Duyck, David Miller, netdev, Neal Cardwell, Tom Herbert,
	Jeff Kirsher, Michael Chan, Matt Carlson, Herbert Xu,
	Ben Hutchings, Ilpo Järvinen, Maciej Żenczykowski
In-Reply-To: <1335854378.11396.26.camel@edumazet-glaptop>

On 04/30/2012 11:39 PM, Eric Dumazet wrote:
> On Mon, 2012-04-30 at 22:33 -0700, Alexander Duyck wrote:
>
>> The question I had was more specific to GRO.  As long as we have
>> skb->users == 1 and the skb isn't cloned we should be fine.   It just
>> hadn't occurred to me before that napi_gro_receive had the extra
>> requirement that the skb couldn't be cloned.
>>
> OK
>
> By the way, even if skb was cloned, we would be allowed to steal
> skb->head.
>
> When we clone an oskb we :
>
> 1) allocate a struct nskb sk_buff (or use the shadow in case of TCP)
> 2) increment dataref
The problem I have is with this piece right here.  So you increment
dataref.  Now you have an skb that is still pointing to the shared info
on this page and dataref is 2.  What about the side that is stealing the
head?  Is it going to be tracking the dataref as well and decrementing
it before put_page or does it just assume that dataref is 1 and call
put_page directly?  I am guessing the latter since I didn't see anything
that allowed for tracking the dataref of stolen heads.

> 3) link skb->head
>
> After cloning()  nskb->users == 1, and oskb->users is unchanged
>
> I believe you are referring to skb_get(oskb), that ones increment
> oskb->users and skb_shared(oskb) then returns true.
skb_clone and skb_get are two completely separate things.  My concern
with the skb_get portion is if skb->users is greater than 1 we should
not be freeing the skbuff back to the head cache.  This was addressed
with the fact that GRO is requiring that skb->users be 1 before it is
handed the skb, although I didn't look too closely at the other patches
that are also stealing skb->head.  My concern with skb_clone, as I
mentioned above, is that I am not sure how the dataref tracking will
still work.

It looks like Dave applied the patches last night so I will pull the
latest net-next and do some poking around.  It is possible I am just
being dense and not getting this, but I just feel like there is
something that got missed or overlooked.

Thanks,

Alex

^ permalink raw reply

* [PATCH 1/1] sky2: fix YUKON FE+ rev. A0, ingress VLAN tagged frame problem
From: Michele Cucchi @ 2012-05-01 15:49 UTC (permalink / raw)
  To: Mirko Lindner, Stephen Hemminger; +Cc: netdev, Michele Cucchi

The cards based on chip YUKON FE+ revision A0, fails to correctly receive VLAN tagged frames. The ingress frames stays on raw physical interface and don't reach VLANS allocated interfaces.
The problem is the double call of __vlan_hwaccel_put_tag function, one in the sky2_skb_rx function and the other in the vlans global function vlan_untag.
Adding a check of card version and revision in the sky2_skb_rx function allow to skip the first time call of __vlan_hwaccel_put_tag and to fix the problem.

This fix is tested on 3.4-rc5.

Signed-off-by: Michele Cucchi <cucchimichele@tiscali.it>
---
 drivers/net/ethernet/marvell/sky2.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index c9b504e..2be0351 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -2649,7 +2649,10 @@ static inline void sky2_tx_done(struct net_device *dev, u16 last)
 static inline void sky2_skb_rx(const struct sky2_port *sky2,
 			       u32 status, struct sk_buff *skb)
 {
-	if (status & GMR_FS_VLAN)
+	/* Workaround for YUKON FE+ rev. A0, vlan untag problem */
+	if ((status & GMR_FS_VLAN) &&
+	(!(sky2->hw->chip_id == CHIP_ID_YUKON_FE_P &&
+	sky2->hw->chip_rev == CHIP_REV_YU_FE2_A0)))
 		__vlan_hwaccel_put_tag(skb, be16_to_cpu(sky2->rx_tag));

 	if (skb->ip_summed == CHECKSUM_NONE)
-- 
1.7.2.5

^ permalink raw reply related

* Re: [PATCH] mwl8k: Add 0x2a02 PCI device-id (Marvell 88W8361)
From: Lennert Buytenhek @ 2012-05-01 15:49 UTC (permalink / raw)
  To: sedat.dilek
  Cc: John W. Linville, linux-wireless, netdev, linux-kernel, lautriv,
	Jim Cromie, Ben Hutchings, Hauke Mehrtens
In-Reply-To: <CA+icZUV=qtXwVxb+uvQJcsza0y+6k2LHJFFPnECY0M=QWBexaQ@mail.gmail.com>

On Tue, May 01, 2012 at 03:54:49PM +0200, Sedat Dilek wrote:

> Some more concerns...
> Dunno, if [4] is really correct and how to use one and same device-id
> for 8363 ***and*** 8361P:
> ...
> { PCI_VDEVICE(MARVELL, 0x2a02), .driver_data = MWL8363, },

No, you'll have to add a MWL8361P to the enum, and add a
mwl8k_device_info table entry for that, and then add the PCI ID
table entry with the MWL8361P identifier.

^ permalink raw reply

* Re: [PATCH] bridge: make brctl showstp display port id
From: Joakim Tjernlund @ 2012-05-01 14:38 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <20120430090747.1d64198b@nehalam.linuxnetplumber.net>

Stephen Hemminger <shemminger@vyatta.com> wrote on 2012/04/30 18:07:47:
>
> On Mon, 30 Apr 2012 10:39:08 +0200
> Joakim Tjernlund <Joakim.Tjernlund@transmode.se> wrote:
>
> > My brctl showstp br0 always shows a 0 port id:
> > eth2 (1)
> >  port id      0         state             disabled
> >  designated root   8000.00069c00b2fb   path cost       100
> >
> > because port id is printed as a hex number in sys fs. Change the
> > two hex occurrences(port no and port id) to decimal, just like all
> > the other numbers in this area.
> >
> > Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> > ---
> >  net/bridge/br_sysfs_if.c |    4 ++--
> >  1 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
> > index fd5799c..9c4c2eb 100644
> > --- a/net/bridge/br_sysfs_if.c
> > +++ b/net/bridge/br_sysfs_if.c
> > @@ -86,13 +86,13 @@ static BRPORT_ATTR(designated_cost, S_IRUGO, show_designated_cost, NULL);
> >
> >  static ssize_t show_port_id(struct net_bridge_port *p, char *buf)
> >  {
> > -   return sprintf(buf, "0x%x\n", p->port_id);
> > +   return sprintf(buf, "%d\n", p->port_id);
> >  }
> >  static BRPORT_ATTR(port_id, S_IRUGO, show_port_id, NULL);
> >
> >  static ssize_t show_port_no(struct net_bridge_port *p, char *buf)
> >  {
> > -   return sprintf(buf, "0x%x\n", p->port_no);
> > +   return sprintf(buf, "%d\n", p->port_no);
> >  }
> >
> >  static BRPORT_ATTR(port_no, S_IRUGO, show_port_no, NULL);
>
> No. This would be a visible change to applications.
> The bridge utilities should be fixed instead.

Well, my reasoning was that if not the native tools got this right, what are the
chances that other apps got it right? I don't know any other apps than
brctl using this info, do you? We could check one or two apps to see what they do.

 Jocke

^ permalink raw reply

* Re: [net-next v2 6/8] ixgbe: add syfs interface for to export read only driver information
From: Ben Greear @ 2012-05-01 14:30 UTC (permalink / raw)
  To: David Miller
  Cc: jeffrey.t.kirsher, donald.c.skidmore, netdev, gospo, sassmann,
	bhutchings
In-Reply-To: <20120501.100241.1409452912879198250.davem@davemloft.net>

On 05/01/2012 07:02 AM, David Miller wrote:
> From: Jeff Kirsher<jeffrey.t.kirsher@intel.com>
> Date: Tue,  1 May 2012 01:51:07 -0700
>
>> From: Don Skidmore<donald.c.skidmore@intel.com>
>>
>> This patch exports non-thermal (which was done via hwmon in an earlier
>> patch) data to sysfs which isn't readily available elsewhere.  All of the
>> fields are read only as this interface is to only export driver data.
>>
>> Signed-off-by: Don Skidmore<donald.c.skidmore@intel.com>
>> Tested-by: Stephen Ko<stephen.s.ko@intel.com>
>> Signed-off-by: Jeff Kirsher<jeffrey.t.kirsher@intel.com>
>
> I don't like it.
>
> Some of this stuff is generic and belongs somewhere like ethtool, for
> example the descriptor sizes and queue sizes.
>
> The others are reading registers, and we have an ethtool API for that
> already.
>
> But putting anything like this in sysfs is pointless, because the
> stuff that other cards have too will then go into differently named
> sysfs files which, as is oft repeated here, is a terrible user
> experience.
>
> If you want to do this right, add a new ethtool interface that allows
> the publication of card specific unchanging values, in a style like
> what we already do for statistics.  Have one query that gets the
> string list, and then another which fetches the actual values.

And if you decide to go forward with this, I have some ideas for API
and would like to discuss it.  Or, if I beat you to it, I'll post some
RFC patches when I get a chance to write them up.

Basically, I'm thinking of a get-strings(flags), get-types(flags), and get-values(flags)
API.  The types would return an array of enums that would define the data
type, like uint32, int8, etc.  Strings would be same as it is today,
just with a new type.  Values would return array of uint64 as it does now.

To each method you could specify a uint32 flags that would be device specific.
I would like to use flags in wifi to specify whether to get the underlying NIC
stats v/s just the soft-device stats, for instance.  And maybe some additional
granularity if some stats are more costly to get than others so that one could
probe expensive stats more rarely....

And while strings would still be free-form, we could at least attempt to use
the same strings for the same kinds of stats where possible.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* [PATCH 1/2] iproute2: allow IPv6 addresses for l2tp local and remote parameters
From: James Chapman @ 2012-05-01 14:25 UTC (permalink / raw)
  To: netdev; +Cc: Chris Elston
In-Reply-To: <1335882323-6219-1-git-send-email-jchapman@katalix.com>

From: Chris Elston <celston@katalix.com>

Adds support for parsing IPv6 addresses to the parameters local and
remote in the l2tp commands. Requires netlink attributes L2TP_ATTR_IP6_SADDR
and L2TP_ATTR_IP6_DADDR, added in a required kernel patch already submitted
to netdev.

Also enables printing of IPv6 addresses returned by the L2TP_CMD_TUNNEL_GET
request.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
---
 ip/ipl2tp.c |   59 +++++++++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index c5683f5..a05e1a3 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -50,8 +50,8 @@ struct l2tp_parm {
 	uint8_t cookie[8];
 	int peer_cookie_len;
 	uint8_t peer_cookie[8];
-	struct in_addr local_ip;
-	struct in_addr peer_ip;
+	inet_prefix local_ip;
+	inet_prefix peer_ip;
 
 	uint16_t pw_type;
 	uint16_t mtu;
@@ -97,6 +97,8 @@ static int create_tunnel(struct l2tp_parm *p)
 		struct genlmsghdr	g;
 		char   			buf[1024];
 	} req;
+	uint32_t local_attr = L2TP_ATTR_IP_SADDR;
+	uint32_t peer_attr = L2TP_ATTR_IP_DADDR;
 
 	memset(&req, 0, sizeof(req));
 	req.n.nlmsg_type = genl_family;
@@ -110,8 +112,14 @@ static int create_tunnel(struct l2tp_parm *p)
 	addattr8(&req.n, 1024, L2TP_ATTR_PROTO_VERSION, 3);
 	addattr16(&req.n, 1024, L2TP_ATTR_ENCAP_TYPE, p->encap);
 
-	addattr32(&req.n, 1024, L2TP_ATTR_IP_SADDR, p->local_ip.s_addr);
-	addattr32(&req.n, 1024, L2TP_ATTR_IP_DADDR, p->peer_ip.s_addr);
+	if (p->local_ip.family == AF_INET6)
+		local_attr = L2TP_ATTR_IP6_SADDR;
+	addattr_l(&req.n, 1024, local_attr, &p->local_ip.data, p->local_ip.bytelen);
+
+	if (p->peer_ip.family == AF_INET6)
+		peer_attr = L2TP_ATTR_IP6_DADDR;
+	addattr_l(&req.n, 1024, peer_attr, &p->peer_ip.data, p->peer_ip.bytelen);
+
 	if (p->encap == L2TP_ENCAPTYPE_UDP) {
 		addattr16(&req.n, 1024, L2TP_ATTR_UDP_SPORT, p->local_udp_port);
 		addattr16(&req.n, 1024, L2TP_ATTR_UDP_DPORT, p->peer_udp_port);
@@ -225,13 +233,14 @@ static void print_cookie(char *name, const uint8_t *cookie, int len)
 static void print_tunnel(const struct l2tp_data *data)
 {
 	const struct l2tp_parm *p = &data->config;
+	char buf[INET6_ADDRSTRLEN];
 
 	printf("Tunnel %u, encap %s\n",
 	       p->tunnel_id,
 	       p->encap == L2TP_ENCAPTYPE_UDP ? "UDP" :
 	       p->encap == L2TP_ENCAPTYPE_IP ? "IP" : "??");
-	printf("  From %s ", inet_ntoa(p->local_ip));
-	printf("to %s\n", inet_ntoa(p->peer_ip));
+	printf("  From %s ", inet_ntop(p->local_ip.family, p->local_ip.data, buf, sizeof(buf)));
+	printf("to %s\n", inet_ntop(p->peer_ip.family, p->peer_ip.data, buf, sizeof(buf)));
 	printf("  Peer tunnel %u\n",
 	       p->peer_tunnel_id);
 
@@ -315,10 +324,30 @@ static int get_response(struct nlmsghdr *n, void *arg)
 
 	if (attrs[L2TP_ATTR_RECV_TIMEOUT])
 		p->reorder_timeout = rta_getattr_u64(attrs[L2TP_ATTR_RECV_TIMEOUT]);
-	if (attrs[L2TP_ATTR_IP_SADDR])
-		p->local_ip.s_addr = rta_getattr_u32(attrs[L2TP_ATTR_IP_SADDR]);
-	if (attrs[L2TP_ATTR_IP_DADDR])
-		p->peer_ip.s_addr = rta_getattr_u32(attrs[L2TP_ATTR_IP_DADDR]);
+	if (attrs[L2TP_ATTR_IP_SADDR]) {
+		p->local_ip.family = AF_INET;
+		p->local_ip.data[0] = rta_getattr_u32(attrs[L2TP_ATTR_IP_SADDR]);
+		p->local_ip.bytelen = 4;
+		p->local_ip.bitlen = -1;
+	}
+	if (attrs[L2TP_ATTR_IP_DADDR]) {
+		p->peer_ip.family = AF_INET;
+		p->peer_ip.data[0] = rta_getattr_u32(attrs[L2TP_ATTR_IP_DADDR]);
+		p->peer_ip.bytelen = 4;
+		p->peer_ip.bitlen = -1;
+	}
+	if (attrs[L2TP_ATTR_IP6_SADDR]) {
+		p->local_ip.family = AF_INET6;
+		memcpy(&p->local_ip.data, RTA_DATA(attrs[L2TP_ATTR_IP6_SADDR]),
+			p->local_ip.bytelen = 16);
+		p->local_ip.bitlen = -1;
+	}
+	if (attrs[L2TP_ATTR_IP6_DADDR]) {
+		p->peer_ip.family = AF_INET6;
+		memcpy(&p->peer_ip.data, RTA_DATA(attrs[L2TP_ATTR_IP6_DADDR]),
+			p->peer_ip.bytelen = 16);
+		p->peer_ip.bitlen = -1;
+	}
 	if (attrs[L2TP_ATTR_UDP_SPORT])
 		p->local_udp_port = rta_getattr_u16(attrs[L2TP_ATTR_UDP_SPORT]);
 	if (attrs[L2TP_ATTR_UDP_DPORT])
@@ -529,10 +558,12 @@ static int parse_args(int argc, char **argv, int cmd, struct l2tp_parm *p)
 			p->ifname = *argv;
 		} else if (strcmp(*argv, "remote") == 0) {
 			NEXT_ARG();
-			p->peer_ip.s_addr = get_addr32(*argv);
+			if (get_addr(&p->peer_ip, *argv, AF_UNSPEC))
+				invarg("invalid remote address\n", *argv);
 		} else if (strcmp(*argv, "local") == 0) {
 			NEXT_ARG();
-			p->local_ip.s_addr = get_addr32(*argv);
+			if (get_addr(&p->local_ip, *argv, AF_UNSPEC))
+				invarg("invalid local address\n", *argv);
 		} else if ((strcmp(*argv, "tunnel_id") == 0) ||
 			   (strcmp(*argv, "tid") == 0)) {
 			__u32 uval;
@@ -648,10 +679,10 @@ static int do_add(int argc, char **argv)
 		missarg("peer_tunnel_id");
 
 	if (p.tunnel) {
-		if (p.local_ip.s_addr == 0)
+		if (p.local_ip.family == AF_UNSPEC)
 			missarg("local");
 
-		if (p.peer_ip.s_addr == 0)
+		if (p.peer_ip.family == AF_UNSPEC)
 			missarg("remote");
 
 		if (p.encap == L2TP_ENCAPTYPE_UDP) {
-- 
1.7.0.4

^ permalink raw reply related

* [PATCH 2/2] iproute2: add ip-l2tp man page
From: James Chapman @ 2012-05-01 14:25 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1335882323-6219-1-git-send-email-jchapman@katalix.com>

Add a man page to cover the "ip l2tp" commands. Add a reference to it
in the main ip page.

This version removes the unnecessary setting of promiscuous mode
in the examples.

Signed-off-by: James Chapman <jchapman@katalix.com>
---
 man/man8/ip-l2tp.8 |  376 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 man/man8/ip.8      |    3 +-
 2 files changed, 378 insertions(+), 1 deletions(-)
 create mode 100644 man/man8/ip-l2tp.8

diff --git a/man/man8/ip-l2tp.8 b/man/man8/ip-l2tp.8
new file mode 100644
index 0000000..18a83d4
--- /dev/null
+++ b/man/man8/ip-l2tp.8
@@ -0,0 +1,376 @@
+.TH IP\-L2TP 8 "19 Apr 2012" "iproute2" "Linux"
+.SH "NAME"
+ip-l2tp - L2TPv3 static unmanaged tunnel configuration
+.SH "SYNOPSIS"
+.sp
+.ad l
+.in +8
+.ti -8
+.B ip
+.RI "[ " OPTIONS " ]"
+.B l2tp
+.RI " { " COMMAND " | "
+.BR help " }"
+.sp
+.ti -8
+.BR "ip l2tp add tunnel"
+.br
+.B remote
+.RI "[ " ADDR " ]"
+.B local
+.RI "[ " ADDR " ]"
+.br
+.B tunnel_id
+.IR ID
+.B peer_tunnel_id
+.IR ID
+.br
+.RB "[ " encap " { " ip " | " udp " } ]"
+.br
+.RB "[ " udp_sport
+.IR PORT
+.RB " ] [ " udp_dport
+.IR PORT
+.RB " ]"
+.br
+.ti -8
+.BR "ip l2tp add session"
+.RB "[ " name
+.IR NAME
+.RB " ]"
+.br
+.B tunnel_id
+.IR ID
+.B session_id
+.IR ID
+.B peer_session_id
+.IR ID
+.br
+.RB "[ " cookie
+.IR HEXSTR
+.RB " ] [ " peer_cookie
+.IR HEXSTR
+.RB " ]"
+.br
+.RB "[ " offset
+.IR OFFSET
+.RB " ] [ " peer_offset
+.IR OFFSET
+.RB " ]"
+.br
+.ti -8
+.BR "ip l2tp del tunnel"
+.B tunnel_id
+.IR ID
+.br
+.ti -8
+.BR "ip l2tp del session"
+.B tunnel_id
+.IR ID
+.B session_id
+.IR ID
+.br
+.ti -8
+.BR "ip l2tp show tunnel"
+.B "[" tunnel_id
+.IR ID
+.B "]"
+.br
+.ti -8
+.BR "ip l2tp show session"
+.B "[" tunnel_id
+.IR ID
+.B "] [" session_id
+.IR ID
+.B "]"
+.br
+.ti -8
+.IR NAME " := "
+.IR STRING
+.ti -8
+.IR ADDR " := { " IP_ADDRESS " }"
+.ti -8
+.IR PORT " := { " NUMBER " }"
+.ti -8
+.IR ID " := { " NUMBER " }"
+.ti -8
+.ti -8
+.IR HEXSTR " := { 8 or 16 hex digits (4 / 8 bytes) }"
+.SH DESCRIPTION
+The
+.B ip l2tp
+commands are used to establish static, or so-called
+.I unmanaged
+L2TPv3 ethernet tunnels. For unmanaged tunnels, there is no L2TP
+control protocol so no userspace daemon is required - tunnels are
+manually created by issuing commands at a local system and at a remote
+peer.
+.PP
+L2TPv3 is suitable for Layer-2 tunnelling. Static tunnels are useful
+to establish network links across IP networks when the tunnels are
+fixed. L2TPv3 tunnels can carry data of more than one session. Each
+session is identified by a session_id and its parent tunnel's
+tunnel_id. A tunnel must be created before a session can be created in
+the tunnel.
+.PP
+When creating an L2TP tunnel, the IP address of the remote peer is
+specified, which can be either an IPv4 or IPv6 address. The local IP
+address to be used to reach the peer must also be specified. This is
+the address on which the local system will listen for and accept
+received L2TP data packets from the peer.
+.PP
+L2TPv3 defines two packet encapsulation formats: UDP or IP. UDP
+encapsulation is most common. IP encapsulation uses a dedicated IP
+protocol value to carry L2TP data without the overhead of UDP. Use IP
+encapsulation only when there are no NAT devices or firewalls in the
+network path.
+.PP
+When an L2TPv3 ethernet session is created, a virtual network
+interface is created for the session, which must then be configured
+and brought up, just like any other network interface. When data is
+passed through the interface, it is carried over the L2TP tunnel to
+the peer. By configuring the system's routing tables or adding the
+interface to a bridge, the L2TP interface is like a virtual wire
+(pseudowire) connected to the peer.
+.PP
+Establishing an unmanaged L2TPv3 ethernet pseudowire involves manually
+creating L2TP contexts on the local system and at the peer. Parameters
+used at each site must correspond or no data will be passed. No
+consistency checks are possible since there is no control protocol
+used to establish unmanaged L2TP tunnels. Once the virtual network
+interface of a given L2TP session is configured and enabled, data can
+be transmitted, even if the peer isn't yet configured. If the peer
+isn't configured, the L2TP data packets will be discarded by
+the peer.
+.PP
+To establish an unmanaged L2TP tunnel, use
+.B l2tp add tunnel
+and
+.B l2tp add session
+commands described in this document. Then configure and enable the
+tunnel's virtual network interface, as required.
+.PP
+Note that unmanaged tunnels carry only ethernet frames. If you need to
+carry PPP traffic (L2TPv2) or your peer doesn't support unmanaged
+L2TPv3 tunnels, you will need an L2TP server which implements the L2TP
+control protocol. The L2TP control protocol allows dynamic L2TP
+tunnels and sessions to be established and provides for detecting and
+acting upon network failures.
+.SS ip l2tp add tunnel - add a new tunnel
+.TP
+.BI name " NAME "
+sets the session network interface name. Default is l2tpethN.
+.TP
+.BI tunnel_id " ID"
+set the tunnel id, which is a 32-bit integer value. Uniquely
+identifies the tunnel. The value used must match the peer_tunnel_id
+value being used at the peer.
+.TP
+.BI peer_tunnel_id " ID"
+set the peer tunnel id, which is a 32-bit integer value assigned to
+the tunnel by the peer. The value used must match the tunnel_id value
+being used at the peer.
+.TP
+.BI remote " ADDR"
+set the IP address of the remote peer. May be specified as an IPv4
+address or an IPv6 address.
+.TP
+.BI local " ADDR"
+set the IP address of the local interface to be used for the
+tunnel. This address must be the address of a local interface. May be
+specified as an IPv4 address or an IPv6 address.
+.TP
+.BI encap " ENCAP"
+set the encapsulation type of the tunnel.
+.br
+Valid values for encapsulation are:
+.BR udp ", " ip "."
+.TP
+.BI udp_sport " PORT"
+set the UDP source port to be used for the tunnel. Must be present
+when udp encapsulation is selected. Ignored when ip encapsulation is
+selected.
+.TP
+.BI udp_dport " PORT"
+set the UDP destination port to be used for the tunnel. Must be
+present when udp encapsulation is selected. Ignored when ip
+encapsulation is selected.
+.SS ip l2tp del tunnel - destroy a tunnel
+.TP
+.BI tunnel_id " ID"
+set the tunnel id of the tunnel to be deleted. All sessions within the
+tunnel must be deleted first.
+.SS ip l2tp show tunnel - show information about tunnels
+.TP
+.BI tunnel_id " ID"
+set the tunnel id of the tunnel to be shown. If not specified,
+information about all tunnels is printed.
+.SS ip l2tp add session - add a new session to a tunnel
+.TP
+.BI name " NAME "
+sets the session network interface name. Default is l2tpethN.
+.TP
+.BI tunnel_id " ID"
+set the tunnel id, which is a 32-bit integer value. Uniquely
+identifies the tunnel into which the session will be created. The
+tunnel must already exist.
+.TP
+.BI session_id " ID"
+set the session id, which is a 32-bit integer value. Uniquely
+identifies the session being created. The value used must match the
+peer_session_id value being used at the peer.
+.TP
+.BI peer_session_id " ID"
+set the peer session id, which is a 32-bit integer value assigned to
+the session by the peer. The value used must match the session_id
+value being used at the peer.
+.TP
+.BI cookie " HEXSTR"
+sets an optional cookie value to be assigned to the session. This is a
+4 or 8 byte value, specified as 8 or 16 hex digits,
+e.g. 014d3636deadbeef. The value must match the peer_cookie value set
+at the peer. The cookie value is carried in L2TP data packets and is
+checked for expected value at the peer. Default is to use no cookie.
+.TP
+.BI peer_cookie " HEXSTR"
+sets an optional peer cookie value to be assigned to the session. This
+is a 4 or 8 byte value, specified as 8 or 16 hex digits,
+e.g. 014d3636deadbeef. The value must match the cookie value set at
+the peer. It tells the local system what cookie value to expect to
+find in received L2TP packets. Default is to use no cookie.
+.TP
+.BI offset " OFFSET"
+sets the byte offset from the L2TP header where user data starts in
+transmitted L2TP data packets. This is hardly ever used. If set, the
+value must match the peer_offset value used at the peer. Default is 0.
+.TP
+.BI peer_offset " OFFSET"
+sets the byte offset from the L2TP header where user data starts in
+received L2TP data packets. This is hardly ever used. If set, the
+value must match the offset value used at the peer. Default is 0.
+.SS ip l2tp del session - destroy a session
+.TP
+.BI tunnel_id " ID"
+set the tunnel id in which the session to be deleted is located.
+.TP
+.BI session_id " ID"
+set the session id of the session to be deleted.
+.SS ip l2tp show session - show information about sessions
+.TP
+.BI tunnel_id " ID"
+set the tunnel id of the session(s) to be shown. If not specified,
+information about sessions in all tunnels is printed.
+.TP
+.BI session_id " ID"
+set the session id of the session to be shown. If not specified,
+information about all sessions is printed.
+.SH EXAMPLES
+.PP
+.SS Setup L2TP tunnels and sessions
+.nf
+site-A:# ip l2tp add tunnel tunnel_id 3000 peer_tunnel_id 4000 \\
+           encap udp local 1.2.3.4 remote 5.6.7.8 \\
+           udp_sport 5000 udp_dport 6000
+site-A:# ip l2tp add session tunnel_id 3000 session_id 1000 \\
+           peer_session_id 2000
+
+site-B:# ip l2tp add tunnel tunnel_id 4000 peer_tunnel_id 3000 \\
+           encap udp local 5.6.7.8 remote 1.2.3.4 \\
+           udp_sport 6000 udp_dport 5000
+site-B:# ip l2tp add session tunnel_id 4000 session_id 2000 \\
+           peer_session_id 1000
+
+site-A:# ip link set l2tpeth0 up mtu 1488
+
+site-B:# ip link set l2tpeth0 up mtu 1488
+.fi
+.PP
+Notice that the IP addresses, UDP ports and tunnel / session ids are
+matched and reversed at each site.
+.SS Configure as IP interfaces
+The two interfaces can be configured with IP addresses if only IP data
+is to be carried. This is perhaps the simplest configuration.
+.PP
+.nf
+site-A:# ip addr add 10.42.1.1 peer 10.42.1.2 dev l2tpeth0
+
+site-B:# ip addr add 10.42.1.2 peer 10.42.1.1 dev l2tpeth0
+
+site-A:# ping 10.42.1.2
+.fi
+.PP
+Now the link should be usable. Add static routes as needed to have
+data sent over the new link.
+.PP
+.SS Configure as bridged interfaces
+To carry non-IP data, the L2TP network interface is added to a bridge
+instead of being assigned its own IP address, using standard Linux
+utilities. Since raw ethernet frames are then carried inside the
+tunnel, the MTU of the L2TP interfaces must be set to allow space for
+those headers.
+.PP
+.nf
+site-A:# ip link set l2tpeth0 up mtu 1446
+site-A:# brctl addbr br0
+site-A:# brctl addif br0 l2tpeth0
+site-A:# brctl addif br0 eth0
+site-A:# ip link set br0 up
+.fi
+.PP
+If you are using VLANs, setup a bridge per VLAN and bridge each VLAN
+over a separate L2TP session. For example, to bridge VLAN ID 5 on eth1
+over an L2TP pseudowire:
+.PP
+.nf
+site-A:# ip link set l2tpeth0 up mtu 1446
+site-A:# brctl addbr brvlan5
+site-A:# brctl addif brvlan5 l2tpeth0.5
+site-A:# brctl addif brvlan5 eth1.5
+site-A:# ip link set brvlan5 up
+.fi
+.PP
+Adding the L2TP interface to a bridge causes the bridge to forward
+traffic over the L2TP pseudowire just like it forwards over any other
+interface. The bridge learns MAC addresses of hosts attached to each
+interface and intelligently forwards frames from one bridge port to
+another. IP addresses are not assigned to the l2tpethN interfaces. If
+the bridge is correctly configured at both sides of the L2TP
+pseudowire, it should be possible to reach hosts in the peer's bridged
+network.
+.PP
+When raw ethernet frames are bridged across an L2TP tunnel, large
+frames may be fragmented and forwarded as individual IP fragments to
+the recipient, depending on the MTU of the physical interface used by
+the tunnel. When the ethernet frames carry protocols which are
+reassembled by the recipient, like IP, this isn't a problem. However,
+such fragmentation can cause problems for protocols like PPPoE where
+the recipient expects to receive ethernet frames exactly as
+transmitted. In such cases, it is important that frames leaving the
+tunnel are reassembled back into a single frame before being
+forwarded on. To do so, enable netfilter connection tracking
+(conntrack) or manually load the Linux netfilter degrag modules at
+each tunnel endpoint.
+.PP
+.nf
+site-A:# modprobe nf_degrag_ipv4
+
+site-B:# modprobe nf_degrag_ipv4
+.fi
+.PP
+If L2TP is being used over IPv6, use the IPv6 degrag module.
+.SH INTEROPABILITY
+.PP
+Unmanaged (static) L2TPv3 tunnels are supported by some network
+equipment equipment vendors such as Cisco.
+.PP
+In Linux, L2TP Hello messages are not supported in unmanaged
+tunnels. Hello messages are used by L2TP clients and servers to detect
+link failures in order to automate tearing down and reestablishing
+dynamic tunnels. If a non-Linux peer supports Hello messages in
+unmanaged tunnels, it must be turned off to interoperate with Linux.
+.SH SEE ALSO
+.br
+.BR brctl (8)
+.BR ip (8)
+.SH AUTHOR
+James Chapman <jchapman@katalix.com>
diff --git a/man/man8/ip.8 b/man/man8/ip.8
index 0f9f454..ede3d12 100644
--- a/man/man8/ip.8
+++ b/man/man8/ip.8
@@ -113,7 +113,7 @@ host addresses.
 
 .TP
 .B l2tp
-- tunnel PPP over IP (L2TP).
+- tunnel ethernet over IP (L2TPv3).
 
 .TP
 .B link
@@ -205,6 +205,7 @@ was written by Alexey N. Kuznetsov and added in Linux 2.2.
 .SH SEE ALSO
 .BR ip-address (8),
 .BR ip-addrlabel (8),
+.BR ip-l2tp (8),
 .BR ip-link (8),
 .BR ip-maddress (8),
 .BR ip-monitor (8),
-- 
1.7.0.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox