Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: TCP delayed ACK heuristic
From: Cong Wang @ 2012-12-19  6:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Ben Greear, David Miller, Stephen Hemminger, Rick Jones,
	Thomas Graf
In-Reply-To: <1355848224.9380.30.camel@edumazet-glaptop>

On Tue, 2012-12-18 at 08:30 -0800, Eric Dumazet wrote:
>  
> 
> ACKS might also be delayed because of bidirectional traffic, and is more
> controlled by the application response time. TCP stack can not easily
> estimate it.

So we still need a knob?

> 
> If you focus on bulk receive, LRO/GRO should already lower number of
> ACKS to an acceptable level and without major disruption.

Indeed.

> 
> Stretch acks are not only the receiver concern, there are issues for the
> sender that you cannot always control/change.
> 
> I recommend reading RFC2525 2.13
> 

Very helpful information!

On the sender's side, it needs to "notify" the receiver not to send
stretch acks when it is in slow-start. But I think the receiver can
detect slow-start too on its own side (based one the window size?).

Thanks!

^ permalink raw reply

* RE: TCP delayed ACK heuristic
From: Cong Wang @ 2012-12-19  7:00 UTC (permalink / raw)
  To: David Laight
  Cc: netdev, Ben Greear, David Miller, Eric Dumazet, Stephen Hemminger,
	Rick Jones, Thomas Graf
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B70F4@saturn3.aculab.com>

On Tue, 2012-12-18 at 16:39 +0000, David Laight wrote:
> There are problems with only implementing the acks
> specified by RFC1122. 

Yeah, the problem is if we can violate this RFC for getting better
performance. Or it is just a no-no?

Although RFC 2525 mentions this as "Stretch ACK Violation", I am still
not sure if that means we can violate RFC1122 legally.

Thanks.

^ permalink raw reply

* Re: [PATCH] bridge: Correctly encode addresses when dumping mdb entries
From: Cong Wang @ 2012-12-19  7:11 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1355867648-17316-1-git-send-email-vyasevic@redhat.com>

On Tue, 18 Dec 2012 at 21:54 GMT, Vlad Yasevich <vyasevic@redhat.com> wrote:
> When dumping mdb table, set the addresses the kernel returns
> based on the address protocol type.
>
> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>

Looks good.

Acked-by: Cong Wang <amwang@redhat.com>

^ permalink raw reply

* Re: [PATCH v2 2/2] net: asix: handle packets crossing URB boundaries
From: Christian Riesch @ 2012-12-19  7:50 UTC (permalink / raw)
  To: Lucas Stach
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Oliver Neukum,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <1355844083-14285-2-git-send-email-dev-8ppwABl0HbeELgA04lAiVw@public.gmane.org>

Hi Lucas,

On 2012-12-18 16:21, Lucas Stach wrote:
> ASIX AX88772B started to pack data even more tightly. Packets and the ASIX packet
> header may now cross URB boundaries. To handle this we have to introduce
> some state between individual calls to asix_rx_fixup().

cc'ed Eric Dumazet, he did some asix_rx_fixup fixes in spring.

>
> Signed-off-by: Lucas Stach <dev-8ppwABl0HbeELgA04lAiVw@public.gmane.org>
> ---
> I've running this patch for some weeks already now and it gets rid of all
> the commonly seen rx failures with AX88772B.
>
> v2: don't forget to free driver_private
> ---
>   drivers/net/usb/asix.h         | 11 ++++++
>   drivers/net/usb/asix_common.c  | 81 +++++++++++++++++++++++++++++-------------
>   drivers/net/usb/asix_devices.c | 17 +++++++++
>   3 Dateien geändert, 85 Zeilen hinzugefügt(+), 24 Zeilen entfernt(-)

Your patch breaks the driver of the AX88172A in 
drivers/net/usb/ax88172a.c. It uses the asix_rx_fixup() function as 
well, but you did not add the struct asix_rx_fixup_info to its 
driver_priv struct.

Regards, Christian

>
> diff --git a/drivers/net/usb/asix.h b/drivers/net/usb/asix.h
> index 7afe8ac..4dfdbf6 100644
> --- a/drivers/net/usb/asix.h
> +++ b/drivers/net/usb/asix.h
> @@ -167,6 +167,17 @@ struct asix_data {
>   	u8 res;
>   };
>
> +struct asix_rx_fixup_info {
> +	struct sk_buff *ax_skb;
> +	u32 header;
> +	u16 size;
> +	bool split_head;
> +};
> +
> +struct asix_private {
> +	struct asix_rx_fixup_info rx_fixup_info;
> +};
> +
>   /* ASIX specific flags */
>   #define FLAG_EEPROM_MAC		(1UL << 0)  /* init device MAC from eeprom */
>
> diff --git a/drivers/net/usb/asix_common.c b/drivers/net/usb/asix_common.c
> index 50d1673..17f9801 100644
> --- a/drivers/net/usb/asix_common.c
> +++ b/drivers/net/usb/asix_common.c
> @@ -53,44 +53,77 @@ void asix_write_cmd_async(struct usbnet *dev, u8 cmd, u16 value, u16 index,
>
>   int asix_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
>   {
> +	struct asix_private *dp = dev->driver_priv;
> +	struct asix_rx_fixup_info *rx = &dp->rx_fixup_info;
>   	int offset = 0;
>
> -	while (offset + sizeof(u32) < skb->len) {
> -		struct sk_buff *ax_skb;
> -		u16 size;
> -		u32 header = get_unaligned_le32(skb->data + offset);
> -
> -		offset += sizeof(u32);
> -
> -		/* get the packet length */
> -		size = (u16) (header & 0x7ff);
> -		if (size != ((~header >> 16) & 0x07ff)) {
> -			netdev_err(dev->net, "asix_rx_fixup() Bad Header Length\n");
> -			return 0;
> +	while (offset + sizeof(u16) <= skb->len) {
> +		u16 remaining = 0;
> +		unsigned char *data;
> +
> +		if (!rx->size) {
> +			if ((skb->len - offset == sizeof(u16)) ||
> +			    rx->split_head) {
> +				if(!rx->split_head) {
> +					rx->header = get_unaligned_le16(
> +							skb->data + offset);
> +					rx->split_head = true;
> +					offset += sizeof(u16);
> +					break;
> +				} else {
> +					rx->header |= (get_unaligned_le16(
> +							skb->data + offset)
> +							<< 16);
> +					rx->split_head = false;
> +					offset += sizeof(u16);
> +				}
> +			} else {
> +				rx->header = get_unaligned_le32(skb->data +
> +								offset);
> +				offset += sizeof(u32);
> +			}
> +
> +			/* get the packet length */
> +			rx->size = (u16) (rx->header & 0x7ff);
> +			if (rx->size != ((~rx->header >> 16) & 0x7ff)) {
> +				netdev_err(dev->net, "asix_rx_fixup() Bad Header Length 0x%x, offset %d\n",
> +					   rx->header, offset);
> +				rx->size = 0;
> +				return 0;
> +			}
> +			rx->ax_skb = netdev_alloc_skb_ip_align(dev->net,
> +							       rx->size);
> +			if (!rx->ax_skb)
> +				return 0;
>   		}
>
> -		if ((size > dev->net->mtu + ETH_HLEN + VLAN_HLEN) ||
> -		    (size + offset > skb->len)) {
> +		if (rx->size > dev->net->mtu + ETH_HLEN + VLAN_HLEN) {
>   			netdev_err(dev->net, "asix_rx_fixup() Bad RX Length %d\n",
> -				   size);
> +				   rx->size);
> +			kfree_skb(rx->ax_skb);
>   			return 0;
>   		}
> -		ax_skb = netdev_alloc_skb_ip_align(dev->net, size);
> -		if (!ax_skb)
> -			return 0;
>
> -		skb_put(ax_skb, size);
> -		memcpy(ax_skb->data, skb->data + offset, size);
> -		usbnet_skb_return(dev, ax_skb);
> +		if (rx->size > skb->len - offset) {
> +			remaining = rx->size - (skb->len - offset);
> +			rx->size = skb->len - offset;
> +		}
> +
> +		data = skb_put(rx->ax_skb, rx->size);
> +		memcpy(data, skb->data + offset, rx->size);
> +		if (!remaining)
> +			usbnet_skb_return(dev, rx->ax_skb);
>
> -		offset += (size + 1) & 0xfffe;
> +		offset += (rx->size + 1) & 0xfffe;
> +		rx->size = remaining;
>   	}
>
>   	if (skb->len != offset) {
> -		netdev_err(dev->net, "asix_rx_fixup() Bad SKB Length %d\n",
> -			   skb->len);
> +		netdev_err(dev->net, "asix_rx_fixup() Bad SKB Length %d, %d\n",
> +			   skb->len, offset);
>   		return 0;
>   	}
> +
>   	return 1;
>   }
>
> diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
> index 0ecc3bc..c651243 100644
> --- a/drivers/net/usb/asix_devices.c
> +++ b/drivers/net/usb/asix_devices.c
> @@ -495,9 +495,19 @@ static int ax88772_bind(struct usbnet *dev, struct usb_interface *intf)
>   		dev->rx_urb_size = 2048;
>   	}
>
> +	dev->driver_priv = kzalloc(sizeof(struct asix_private), GFP_KERNEL);
> +	if (!dev->driver_priv)
> +		return -ENOMEM;
> +
>   	return 0;
>   }
>
> +void ax88772_unbind(struct usbnet *dev, struct usb_interface *intf)
> +{
> +	if (dev->driver_priv)
> +		kfree(dev->driver_priv);
> +}
> +
>   static const struct ethtool_ops ax88178_ethtool_ops = {
>   	.get_drvinfo		= asix_get_drvinfo,
>   	.get_link		= asix_get_link,
> @@ -829,6 +839,10 @@ static int ax88178_bind(struct usbnet *dev, struct usb_interface *intf)
>   		dev->rx_urb_size = 2048;
>   	}
>
> +	dev->driver_priv = kzalloc(sizeof(struct asix_private), GFP_KERNEL);
> +	if (!dev->driver_priv)
> +			return -ENOMEM;
> +
>   	return 0;
>   }
>
> @@ -875,6 +889,7 @@ static const struct driver_info hawking_uf200_info = {
>   static const struct driver_info ax88772_info = {
>   	.description = "ASIX AX88772 USB 2.0 Ethernet",
>   	.bind = ax88772_bind,
> +	.unbind = ax88772_unbind,
>   	.status = asix_status,
>   	.link_reset = ax88772_link_reset,
>   	.reset = ax88772_reset,
> @@ -886,6 +901,7 @@ static const struct driver_info ax88772_info = {
>   static const struct driver_info ax88772b_info = {
>   	.description = "ASIX AX88772B USB 2.0 Ethernet",
>   	.bind = ax88772_bind,
> +	.unbind = ax88772_unbind,
>   	.status = asix_status,
>   	.link_reset = ax88772_link_reset,
>   	.reset = ax88772_reset,
> @@ -899,6 +915,7 @@ static const struct driver_info ax88772b_info = {
>   static const struct driver_info ax88178_info = {
>   	.description = "ASIX AX88178 USB 2.0 Ethernet",
>   	.bind = ax88178_bind,
> +	.unbind = ax88772_unbind,
>   	.status = asix_status,
>   	.link_reset = ax88178_link_reset,
>   	.reset = ax88178_reset,
>

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC PATCH] fix IP_ECN_set_ce
From: Julian Anastasov @ 2012-12-19  8:11 UTC (permalink / raw)
  To: roy.qing.li; +Cc: netdev
In-Reply-To: <1355898095-7444-1-git-send-email-roy.qing.li@gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2288 bytes --]


	Hello,

On Wed, 19 Dec 2012, roy.qing.li@gmail.com wrote:

> From: Li RongQing <roy.qing.li@gmail.com>
> 
> 1. ECN uses the two least significant (right-most) bits of the DiffServ
> field in the IPv4, so it should be in iph->tos, not in (iph->tos+1)
> 
> 2. When setting CE, we should check if ECN Capable Transport supports,
> both 10 and 01 mean ECN Capable Transport, so only check 10 is not enough
>     00: Non ECN-Capable Transport — Non-ECT
>     10: ECN Capable Transport — ECT(0)
>     01: ECN Capable Transport — ECT(1)
>     11: Congestion Encountered — CE
> 
> 3. Remove the misunderstand comment
> 
> 4. fix the checksum computation
> 
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
> ---
>  include/net/inet_ecn.h |   22 ++++------------------
>  1 file changed, 4 insertions(+), 18 deletions(-)
> 
> diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
> index aab7375..545a683 100644
> --- a/include/net/inet_ecn.h
> +++ b/include/net/inet_ecn.h
> @@ -73,27 +73,13 @@ static inline void INET_ECN_dontxmit(struct sock *sk)
>  
>  static inline int IP_ECN_set_ce(struct iphdr *iph)
>  {
> -	u32 check = (__force u32)iph->check;
> -	u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
> -
> -	/*
> -	 * After the last operation we have (in binary):
> -	 * INET_ECN_NOT_ECT => 01
> -	 * INET_ECN_ECT_1   => 10
> -	 * INET_ECN_ECT_0   => 11
> -	 * INET_ECN_CE      => 00
> -	 */

	I think, the above comment explains how an
increment (iph->tos + 1) serves the purpose to check
for ECT_1 and ECT_0, there is no such thing as
addressing the next byte from header. It is just an
optimized logic that avoids complex INET_ECN_is_XXX
checks.

> -	if (!(ecn & 2))
> +	u32 ecn = iph->tos & INET_ECN_MASK;
> +
> +	if (INET_ECN_is_ce(ecn) || INET_ECN_is_not_ect(ecn))
>  		return !ecn;

	May be return INET_ECN_is_ce(ecn) ?

>  
> -	/*
> -	 * The following gives us:
> -	 * INET_ECN_ECT_1 => check += htons(0xFFFD)
> -	 * INET_ECN_ECT_0 => check += htons(0xFFFE)
> -	 */
> -	check += (__force u16)htons(0xFFFB) + (__force u16)htons(ecn);
> +	csum_replace2(&iph->check, iph->tos, iph->tos|INET_ECN_CE);
>  
> -	iph->check = (__force __sum16)(check + (check>=0xFFFF));
>  	iph->tos |= INET_ECN_CE;
>  	return 1;
>  }
> -- 
> 1.7.10.4

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH V2 00/12] Add basic VLAN support to bridges
From: Shmulik Ladkani @ 2012-12-19  8:10 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst
In-Reply-To: <1355857263-31197-1-git-send-email-vyasevic@redhat.com>

Thanks Vlad,

On Tue, 18 Dec 2012 14:00:51 -0500 Vlad Yasevich <vyasevic@redhat.com> wrote:
> A single vlan may also be designated as untagged.  Any untagged traffic
> recieved by the port will be assigned to this vlan.

Why the "untagged vlan" is per-bridge global?
Usually, 802.1q switches define the PVID (port's VID) which controls
the value of VID, in case ingress frame is either untagged or
priority-tagged (per port configuration).
This gives greater flexibility.

> Any traffic exiting
> the port with a VID matching the untagged vlan will exit untagged (the
> bridge will strip the vlan header).  This is similar to "Native Vlan" support
> available in most switches.

802.1q switches usually allow conifguring per-vlan, per-port
tagged/untagged egress policy: each vid has its port membership map and
an accompanying port egress-policy map.
This gives great flexibility defining all sorts of configurations.

Personally, I'd prefer a fully flexible vlan bridge allowing all sorts
of configurations (as available in 802.1q switches).

What's the reason limiting such configurations?

Regards,
Shmulik

^ permalink raw reply

* Re: [patch net-next 1/4] net: add change_carrier netdev op
From: Jiri Pirko @ 2012-12-19  8:20 UTC (permalink / raw)
  To: Dan Williams
  Cc: netdev, davem, edumazet, bhutchings, mirqus, shemminger, greearb,
	fbl, john.r.fastabend
In-Reply-To: <1355871771.30992.44.camel@dcbw.foobar.com>

Wed, Dec 19, 2012 at 12:02:51AM CET, dcbw@redhat.com wrote:
>On Tue, 2012-12-18 at 23:14 +0100, Jiri Pirko wrote:
>> This allows a driver to register change_carrier callback which will be
>> called whenever user will like to change carrier state. This is useful
>> for devices like dummy, gre, team and so on.
>> 
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>  include/linux/netdevice.h |  9 +++++++++
>>  net/core/dev.c            | 19 +++++++++++++++++++
>>  2 files changed, 28 insertions(+)
>> 
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 02e0f6b..8047330 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -891,6 +891,11 @@ struct netdev_fcoe_hbainfo {
>>   * int (*ndo_bridge_setlink)(struct net_device *dev, struct nlmsghdr *nlh)
>>   * int (*ndo_bridge_getlink)(struct sk_buff *skb, u32 pid, u32 seq,
>>   *			     struct net_device *dev)
>> + *
>> + * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier);
>> + *	Called to update device carrier. Soft-devices which do not manage
>> + *	real hardware like dummy, team, etc. can define this to provide
>> + *	possibility to set their carrier state.
>
>How about something like:
>
>---
>Called to change device carrier.  Virtual devices (like dummy, team,
>etc) which do not represent real hardware may define this to allow their
>userspace components to manage their virtual carrier state.  Devices
>that determine carrier state from physical hardware properties (eg
>network cables) or protocol-dependent mechanisms (eg
>USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.
>---
>
>I'm just worried that without some clearer wording, somebody will end up
>implementing the function for an actual hardware driver down the road
>and expect things to work when they change the carrier to "on" but the
>protocol isn't set up or cable isn't plugged in.  I'm also worried that
>it might be used to simply work around bugs in the drivers that should
>be fixed in the drivers instead.

Okay - I'll repost this single patch with your text as comment.
Thanks.

>
>Dan
>
>>  struct net_device_ops {
>>  	int			(*ndo_init)(struct net_device *dev);
>> @@ -1008,6 +1013,8 @@ struct net_device_ops {
>>  	int			(*ndo_bridge_getlink)(struct sk_buff *skb,
>>  						      u32 pid, u32 seq,
>>  						      struct net_device *dev);
>> +	int			(*ndo_change_carrier)(struct net_device *dev,
>> +						      bool new_carrier);
>>  };
>>  
>>  /*
>> @@ -2194,6 +2201,8 @@ extern int		dev_set_mtu(struct net_device *, int);
>>  extern void		dev_set_group(struct net_device *, int);
>>  extern int		dev_set_mac_address(struct net_device *,
>>  					    struct sockaddr *);
>> +extern int		dev_change_carrier(struct net_device *,
>> +					   bool new_carrier);
>>  extern int		dev_hard_start_xmit(struct sk_buff *skb,
>>  					    struct net_device *dev,
>>  					    struct netdev_queue *txq);
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index d0cbc93..268a714 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -5027,6 +5027,25 @@ int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa)
>>  }
>>  EXPORT_SYMBOL(dev_set_mac_address);
>>  
>> +/**
>> + *	dev_change_carrier - Change device carrier
>> + *	@dev: device
>> + *	@new_carries: new value
>> + *
>> + *	Change device carrier
>> + */
>> +int dev_change_carrier(struct net_device *dev, bool new_carrier)
>> +{
>> +	const struct net_device_ops *ops = dev->netdev_ops;
>> +
>> +	if (!ops->ndo_change_carrier)
>> +		return -EOPNOTSUPP;
>> +	if (!netif_device_present(dev))
>> +		return -ENODEV;
>> +	return ops->ndo_change_carrier(dev, new_carrier);
>> +}
>> +EXPORT_SYMBOL(dev_change_carrier);
>> +
>>  /*
>>   *	Perform the SIOCxIFxxx calls, inside rcu_read_lock()
>>   */
>
>

^ permalink raw reply

* Re: [PATCH V2 00/12] Add basic VLAN support to bridges
From: Jiri Pirko @ 2012-12-19  8:27 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, shemminger, davem, or.gerlitz, jhs, mst
In-Reply-To: <50D0F23D.4020508@redhat.com>

Tue, Dec 18, 2012 at 11:46:21PM CET, vyasevic@redhat.com wrote:
>On 12/18/2012 05:32 PM, Jiri Pirko wrote:
>>
>>
>>I see that this patchset replicates a lot of code which is already
>>present in net/8021q/ or include/linux/if_vlan.h. I think it would
>>be nice to move this code into some "common" place, wouldn't it?
>>
>
>The only replication that I am aware of is in br_vlan_untag().  I
>thought about pulling that piece out, but I think there is a reason
>why it's not available when 801q support isn't turned on.  I noted that
>openvswitch implemented its own vlan header manipulation functions as well.

openvswitch should use the "common" code as well.

>
>What else are you seeing that's duplicate?

For example I spotted check of ndo_vlan_rx_[add/kill]_vid and
NETIF_F_HW_VLAN_FILTER and ndo_vlan_rx_[add/kill]_vid call


>
>-vlad
>
>>Jiri
>>
>>Tue, Dec 18, 2012 at 08:00:51PM CET, vyasevic@redhat.com wrote:
>>>This series of patches provides an ability to add VLANs to the bridge
>>>ports.  This is similar to what can be found in most switches.  The bridge
>>>port may have any number of VLANs added to it including vlan 0 priority tagged
>>>traffic.  When vlans are added to the port, only traffic tagged with particular
>>>vlan will forwarded over this port.  Additionally, vlan ids are added to FDB
>>>entries and become part of the lookup.  This way we correctly identify the FDB
>>>entry.
>>>
>>>A single vlan may also be designated as untagged.  Any untagged traffic
>>>recieved by the port will be assigned to this vlan.  Any traffic exiting
>>>the port with a VID matching the untagged vlan will exit untagged (the
>>>bridge will strip the vlan header).  This is similar to "Native Vlan" support
>>>available in most switches.
>>>
>>>The default behavior ofthe bridge is unchanged if no vlans have been
>>>configured.
>>>
>>>Changes since v1:
>>>- Fixed some forwarding bugs.
>>>- Add vlan to local fdb entries.  New local entries are created per vlan
>>>   to facilite correct forwarding to bridge interface.
>>>- Allow configuration of vlans directly on the bridge master device
>>>   in addition to ports.
>>>
>>>Changes since rfc v2:
>>>- Per-port vlan bitmap is gone and is replaced with a vlan list.
>>>- Added bridge vlan list, which is referenced by each port.  Entries in
>>>   the birdge vlan list have port bitmap that shows which port are parts
>>>   of which vlan.
>>>- Netlink API changes.
>>>- Dropped sysfs support for now.  If people think this is really usefull,
>>>   can add it back.
>>>- Support for native/untagged vlans.
>>>
>>>Changes since rfc v1:
>>>- Comments addressed regarding formatting and RCU usage
>>>- iocts have been removed and changed over the netlink interface.
>>>- Added support of user added ndb entries.
>>>- changed sysfs interface to export a bitmap.  Also added a write interface.
>>>   I am not sure how much I like it, but it made my testing easier/faster.  I
>>>   might change the write interface to take text instead of binary.
>>>
>>>
>>>Vlad Yasevich (12):
>>>  bridge: Add vlan filtering infrastructure
>>>  bridge: Validate that vlan is permitted on ingress
>>>  bridge: Verify that a vlan is allowed to egress on give port
>>>  bridge: Cache vlan in the cb for faster egress lookup.
>>>  bridge: Add vlan to unicast fdb entries
>>>  bridge: Add vlan id to multicast groups
>>>  bridge: Add netlink interface to configure vlans on bridge ports
>>>  bridge: Add vlan support to static neighbors
>>>  bridge: Add the ability to configure untagged vlans
>>>  bridge: Implement untagged vlan handling
>>>  bridge: Dump vlan information from a bridge port
>>>  bridge: Add vlan support for local fdb entries
>>>
>>>drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    5 +-
>>>drivers/net/macvlan.c                         |    2 +-
>>>drivers/net/vxlan.c                           |    3 +-
>>>include/linux/netdevice.h                     |    4 +-
>>>include/uapi/linux/if_bridge.h                |   23 ++-
>>>include/uapi/linux/neighbour.h                |    1 +
>>>include/uapi/linux/rtnetlink.h                |    1 +
>>>net/bridge/br_device.c                        |   34 ++-
>>>net/bridge/br_fdb.c                           |  253 ++++++++++++---
>>>net/bridge/br_forward.c                       |  160 ++++++++++
>>>net/bridge/br_if.c                            |  404 ++++++++++++++++++++++++-
>>>net/bridge/br_input.c                         |   65 ++++-
>>>net/bridge/br_multicast.c                     |   71 +++--
>>>net/bridge/br_netlink.c                       |  178 ++++++++++--
>>>net/bridge/br_private.h                       |   71 ++++-
>>>net/core/rtnetlink.c                          |   40 ++-
>>>16 files changed, 1190 insertions(+), 125 deletions(-)
>>>
>>>--
>>>1.7.7.6
>>>
>>>--
>>>To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>the body of a message to majordomo@vger.kernel.org
>>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [RFC PATCH] fix IP_ECN_set_ce
From: RongQing Li @ 2012-12-19  8:41 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev
In-Reply-To: <alpine.LFD.2.00.1212191001530.1620@ja.ssi.bg>

2012/12/19 Julian Anastasov <ja@ssi.bg>:
>
>         Hello,
>
> On Wed, 19 Dec 2012, roy.qing.li@gmail.com wrote:
>
>> From: Li RongQing <roy.qing.li@gmail.com>
>>
>> 1. ECN uses the two least significant (right-most) bits of the DiffServ
>> field in the IPv4, so it should be in iph->tos, not in (iph->tos+1)
>>
>> 2. When setting CE, we should check if ECN Capable Transport supports,
>> both 10 and 01 mean ECN Capable Transport, so only check 10 is not enough
>>     00: Non ECN-Capable Transport — Non-ECT
>>     10: ECN Capable Transport — ECT(0)
>>     01: ECN Capable Transport — ECT(1)
>>     11: Congestion Encountered — CE
>>
>> 3. Remove the misunderstand comment
>>
>> 4. fix the checksum computation
>>
>> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
>> ---
>>  include/net/inet_ecn.h |   22 ++++------------------
>>  1 file changed, 4 insertions(+), 18 deletions(-)
>>
>> diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
>> index aab7375..545a683 100644
>> --- a/include/net/inet_ecn.h
>> +++ b/include/net/inet_ecn.h
>> @@ -73,27 +73,13 @@ static inline void INET_ECN_dontxmit(struct sock *sk)
>>
>>  static inline int IP_ECN_set_ce(struct iphdr *iph)
>>  {
>> -     u32 check = (__force u32)iph->check;
>> -     u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
>> -
>> -     /*
>> -      * After the last operation we have (in binary):
>> -      * INET_ECN_NOT_ECT => 01
>> -      * INET_ECN_ECT_1   => 10
>> -      * INET_ECN_ECT_0   => 11
>> -      * INET_ECN_CE      => 00
>> -      */
>
>         I think, the above comment explains how an
> increment (iph->tos + 1) serves the purpose to check
> for ECT_1 and ECT_0, there is no such thing as
> addressing the next byte from header. It is just an
> optimized logic that avoids complex INET_ECN_is_XXX
> checks.
Thanks for your reply.
Do you mean this comment are valuable?


>
>> -     if (!(ecn & 2))
>> +     u32 ecn = iph->tos & INET_ECN_MASK;
>> +
>> +     if (INET_ECN_is_ce(ecn) || INET_ECN_is_not_ect(ecn))
>>               return !ecn;
>
>         May be return INET_ECN_is_ce(ecn) ?
>

I like to set the return value to void, since noone cares about the
return value.

-Roy

>>
>> -     /*
>> -      * The following gives us:
>> -      * INET_ECN_ECT_1 => check += htons(0xFFFD)
>> -      * INET_ECN_ECT_0 => check += htons(0xFFFE)
>> -      */
>> -     check += (__force u16)htons(0xFFFB) + (__force u16)htons(ecn);
>> +     csum_replace2(&iph->check, iph->tos, iph->tos|INET_ECN_CE);
>>
>> -     iph->check = (__force __sum16)(check + (check>=0xFFFF));
>>       iph->tos |= INET_ECN_CE;
>>       return 1;
>>  }
>> --
>> 1.7.10.4
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [RFC PATCH] fix IP_ECN_set_ce
From: Julian Anastasov @ 2012-12-19  8:58 UTC (permalink / raw)
  To: RongQing Li; +Cc: netdev
In-Reply-To: <CAJFZqHyhMS22pKuW50D0uZLA5-WDUYBEZurPGCW6gSyRY-6JHg@mail.gmail.com>


	Hello,

On Wed, 19 Dec 2012, RongQing Li wrote:

> >>  static inline int IP_ECN_set_ce(struct iphdr *iph)
> >>  {
> >> -     u32 check = (__force u32)iph->check;
> >> -     u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
> >> -
> >> -     /*
> >> -      * After the last operation we have (in binary):
> >> -      * INET_ECN_NOT_ECT => 01
> >> -      * INET_ECN_ECT_1   => 10
> >> -      * INET_ECN_ECT_0   => 11
> >> -      * INET_ECN_CE      => 00
> >> -      */
> >
> >         I think, the above comment explains how an
> > increment (iph->tos + 1) serves the purpose to check
> > for ECT_1 and ECT_0, there is no such thing as
> > addressing the next byte from header. It is just an
> > optimized logic that avoids complex INET_ECN_is_XXX
> > checks.
> Thanks for your reply.
> Do you mean this comment are valuable?

	It looks better to me with the comment and the
original checks. But I can't comment the correctness of
the other changes in your patch.

> >> -     if (!(ecn & 2))
> >> +     u32 ecn = iph->tos & INET_ECN_MASK;
> >> +
> >> +     if (INET_ECN_is_ce(ecn) || INET_ECN_is_not_ect(ecn))
> >>               return !ecn;
> >
> >         May be return INET_ECN_is_ce(ecn) ?
> >
> 
> I like to set the return value to void, since noone cares about the
> return value.

	It is used by INET_ECN_set_ce and its users in
net/sched/

> -Roy

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* [patch net-next V3 1/4] net: add change_carrier netdev op
From: Jiri Pirko @ 2012-12-19  9:00 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, bhutchings, mirqus, shemminger, greearb, fbl,
	john.r.fastabend
In-Reply-To: <1355868858-1713-2-git-send-email-jiri@resnulli.us>

This allows a driver to register change_carrier callback which will be
called whenever user will like to change carrier state. This is useful
for devices like dummy, gre, team and so on.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/netdevice.h | 12 ++++++++++++
 net/core/dev.c            | 19 +++++++++++++++++++
 2 files changed, 31 insertions(+)

V2->V3: updated comment by Dan Williams

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 02e0f6b..22ca4a8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -891,6 +891,14 @@ struct netdev_fcoe_hbainfo {
  * int (*ndo_bridge_setlink)(struct net_device *dev, struct nlmsghdr *nlh)
  * int (*ndo_bridge_getlink)(struct sk_buff *skb, u32 pid, u32 seq,
  *			     struct net_device *dev)
+ *
+ * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier);
+ *	Called to change device carrier. Soft-devices (like dummy, team, etc)
+ *	which do not represent real hardware may define this to allow their
+ *	userspace components to manage their virtual carrier state. Devices
+ *	that determine carrier state from physical hardware properties (eg
+ *	network cables) or protocol-dependent mechanisms (eg
+ *	USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1008,6 +1016,8 @@ struct net_device_ops {
 	int			(*ndo_bridge_getlink)(struct sk_buff *skb,
 						      u32 pid, u32 seq,
 						      struct net_device *dev);
+	int			(*ndo_change_carrier)(struct net_device *dev,
+						      bool new_carrier);
 };
 
 /*
@@ -2194,6 +2204,8 @@ extern int		dev_set_mtu(struct net_device *, int);
 extern void		dev_set_group(struct net_device *, int);
 extern int		dev_set_mac_address(struct net_device *,
 					    struct sockaddr *);
+extern int		dev_change_carrier(struct net_device *,
+					   bool new_carrier);
 extern int		dev_hard_start_xmit(struct sk_buff *skb,
 					    struct net_device *dev,
 					    struct netdev_queue *txq);
diff --git a/net/core/dev.c b/net/core/dev.c
index d0cbc93..268a714 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5027,6 +5027,25 @@ int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa)
 }
 EXPORT_SYMBOL(dev_set_mac_address);
 
+/**
+ *	dev_change_carrier - Change device carrier
+ *	@dev: device
+ *	@new_carries: new value
+ *
+ *	Change device carrier
+ */
+int dev_change_carrier(struct net_device *dev, bool new_carrier)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (!ops->ndo_change_carrier)
+		return -EOPNOTSUPP;
+	if (!netif_device_present(dev))
+		return -ENODEV;
+	return ops->ndo_change_carrier(dev, new_carrier);
+}
+EXPORT_SYMBOL(dev_change_carrier);
+
 /*
  *	Perform the SIOCxIFxxx calls, inside rcu_read_lock()
  */
-- 
1.8.0

^ permalink raw reply related

* Re: [RFC PATCH] fix IP_ECN_set_ce
From: RongQing Li @ 2012-12-19  9:11 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: netdev
In-Reply-To: <alpine.LFD.2.00.1212191051590.1906@ja.ssi.bg>

2012/12/19 Julian Anastasov <ja@ssi.bg>:
>
>         Hello,
>
> On Wed, 19 Dec 2012, RongQing Li wrote:
>
>> >>  static inline int IP_ECN_set_ce(struct iphdr *iph)
>> >>  {
>> >> -     u32 check = (__force u32)iph->check;
>> >> -     u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
>> >> -
>> >> -     /*
>> >> -      * After the last operation we have (in binary):
>> >> -      * INET_ECN_NOT_ECT => 01
>> >> -      * INET_ECN_ECT_1   => 10
>> >> -      * INET_ECN_ECT_0   => 11
>> >> -      * INET_ECN_CE      => 00
>> >> -      */
>> >
>> >         I think, the above comment explains how an
>> > increment (iph->tos + 1) serves the purpose to check
>> > for ECT_1 and ECT_0, there is no such thing as
>> > addressing the next byte from header. It is just an
>> > optimized logic that avoids complex INET_ECN_is_XXX
>> > checks.
>> Thanks for your reply.
>> Do you mean this comment are valuable?
>
>         It looks better to me with the comment and the
> original checks. But I can't comment the correctness of
> the other changes in your patch.

I do not know how they are useful, and how the original check
works, since the value in comments are wrong, the correct is:

enum {
        INET_ECN_NOT_ECT = 0,
        INET_ECN_ECT_1 = 1,
        INET_ECN_ECT_0 = 2,
        INET_ECN_CE = 3,
        INET_ECN_MASK = 3,
};


   00: Non ECN-Capable Transport — Non-ECT
    10: ECN Capable Transport — ECT(0)
    01: ECN Capable Transport — ECT(1)
    11: Congestion Encountered — CE

-Roy


>
>> >> -     if (!(ecn & 2))
>> >> +     u32 ecn = iph->tos & INET_ECN_MASK;
>> >> +
>> >> +     if (INET_ECN_is_ce(ecn) || INET_ECN_is_not_ect(ecn))
>> >>               return !ecn;
>> >
>> >         May be return INET_ECN_is_ce(ecn) ?
>> >
>>
>> I like to set the return value to void, since noone cares about the
>> return value.
>
>         It is used by INET_ECN_set_ce and its users in
> net/sched/
>
>> -Roy
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* RE: [PATCH v2] netlink: align attributes on 64-bits
From: David Laight @ 2012-12-19  9:17 UTC (permalink / raw)
  To: Thomas Graf; +Cc: nicolas.dichtel, bhutchings, netdev, davem
In-Reply-To: <20121218171103.GI27746@casper.infradead.org>

> > Consider the structure:
> > 	struct bar {
> > 		__u32 foo1;
> > 		__u64 foo2;
> > 	}
> > On i386 it will have size 12 and foo2 will be at offset 4.
> > On sparc32 (and most 64bit) it will have size 16 with foo2
> > at offset 8 (and 4 bytes of pad after foo1).
> 
> This is a known problem and I can't think of anything
> that can be done about it except for memcpy()ing the
> data before accessing it.

You can't use memcpy() to copy a pointer to a misaligned
structure into an aligned buffer. The compiler assumes
the pointer is aligned and will use instructions that
depend on the alignment.

> If you have ideas, I'm more that willing to listen :)
> ... Netlink has and will be host bound. It also
> uses host byte order for that reason.

I think:
1) Alignment is only needed on systems that have 'strict alignment'
   requirements (maybe disable for testing?)
2) Alignment is only needed for parameters whose size is a
   multiple of the alignment (a structure containing a
   field that needs 8 byte alignment will always be a multiple
   of 8 bytes long).
3) You need to add NA_HDR_LEN to the write pointer before
   determining the size of the pad.

So a structure of three uint32_t will never need aligning.

	David

^ permalink raw reply

* Re: [patch net-next 1/4] net: add change_carrier netdev op
From: David Miller @ 2012-12-19  9:28 UTC (permalink / raw)
  To: jiri
  Cc: dcbw, netdev, edumazet, bhutchings, mirqus, shemminger, greearb,
	fbl, john.r.fastabend
In-Reply-To: <20121219082050.GA1637@minipsycho.orion>

From: Jiri Pirko <jiri@resnulli.us>
Date: Wed, 19 Dec 2012 09:20:50 +0100

> Okay - I'll repost this single patch with your text as comment.

Please always repost an entire series, rather than just part of
it.

^ permalink raw reply

* Re: [RFC PATCH] fix IP_ECN_set_ce
From: David Miller @ 2012-12-19  9:31 UTC (permalink / raw)
  To: roy.qing.li; +Cc: ja, netdev
In-Reply-To: <CAJFZqHzFvspqFyZuZsTSyOOKdCq7SS=xMY5hH3kycaqO+=bXGw@mail.gmail.com>

From: RongQing Li <roy.qing.li@gmail.com>
Date: Wed, 19 Dec 2012 17:11:59 +0800

> 2012/12/19 Julian Anastasov <ja@ssi.bg>:
>>
>>         Hello,
>>
>> On Wed, 19 Dec 2012, RongQing Li wrote:
>>
>>> >>  static inline int IP_ECN_set_ce(struct iphdr *iph)
>>> >>  {
>>> >> -     u32 check = (__force u32)iph->check;
>>> >> -     u32 ecn = (iph->tos + 1) & INET_ECN_MASK;
>>> >> -
>>> >> -     /*
>>> >> -      * After the last operation we have (in binary):
>>> >> -      * INET_ECN_NOT_ECT => 01
>>> >> -      * INET_ECN_ECT_1   => 10
>>> >> -      * INET_ECN_ECT_0   => 11
>>> >> -      * INET_ECN_CE      => 00
>>> >> -      */
>>> >
>>> >         I think, the above comment explains how an
>>> > increment (iph->tos + 1) serves the purpose to check
>>> > for ECT_1 and ECT_0, there is no such thing as
>>> > addressing the next byte from header. It is just an
>>> > optimized logic that avoids complex INET_ECN_is_XXX
>>> > checks.
>>> Thanks for your reply.
>>> Do you mean this comment are valuable?
>>
>>         It looks better to me with the comment and the
>> original checks. But I can't comment the correctness of
>> the other changes in your patch.
> 
> I do not know how they are useful, and how the original check
> works, since the value in comments are wrong, the correct is:
> 
> enum {
>         INET_ECN_NOT_ECT = 0,
>         INET_ECN_ECT_1 = 1,
>         INET_ECN_ECT_0 = 2,
>         INET_ECN_CE = 3,
>         INET_ECN_MASK = 3,
> };
> 
> 
>    00: Non ECN-Capable Transport ― Non-ECT
>     10: ECN Capable Transport ― ECT(0)
>     01: ECN Capable Transport ― ECT(1)
>     11: Congestion Encountered ― CE

You really don't understand the comment, it is saying what
the incremented value corresponds to, ECN wise.

If iph->tos + 1 is 01, we had INET_ECN_NOT_ECT in iph->tos to
begine with, and so on an so forth.

Because you are having so much trouble with this most fundamental
aspect of this code, I have zero confidence in your being able to
make reasonable changes here.

I am not applying this patch.

^ permalink raw reply

* Re: [patch net-next 1/4] net: add change_carrier netdev op
From: Jiri Pirko @ 2012-12-19  9:32 UTC (permalink / raw)
  To: David Miller
  Cc: dcbw, netdev, edumazet, bhutchings, mirqus, shemminger, greearb,
	fbl, john.r.fastabend
In-Reply-To: <20121219.012825.1076148821070092794.davem@davemloft.net>

Wed, Dec 19, 2012 at 10:28:25AM CET, davem@davemloft.net wrote:
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Wed, 19 Dec 2012 09:20:50 +0100
>
>> Okay - I'll repost this single patch with your text as comment.
>
>Please always repost an entire series, rather than just part of
>it.

Sorry, I thought that in this case where the change is in one patch and
only cosmethic it wouldn't be a problem. Reposting the whole patchset.

^ permalink raw reply

* [patch net-next V3 0/4] net: allow to change carrier from userspace
From: Jiri Pirko @ 2012-12-19  9:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, bhutchings, mirqus, shemminger, greearb, fbl,
	john.r.fastabend

This is basically a V3 of a repost of my previous patchset:
"[patch net-next-2.6 0/2] net: allow to change carrier via sysfs" from Aug 30

The way net-sysfs stores values changed and this patchset reflects it.
Also, I exposed carrier via rtnetlink iface.

So far, only dummy driver uses carrier change ndo. In very near future
team driver will use that as well.

V2->V3:
 - updated ndo_change_carrier comment by Dan Williams

V1->v2:
 - added bigger comment to ndo and also note to operstate.txt documentation
   stating the clear purpose of this iface

Jiri Pirko (4):
  net: add change_carrier netdev op
  net: allow to change carrier via sysfs
  rtnl: expose carrier value with possibility to set it
  dummy: implement carrier change

 Documentation/networking/operstates.txt |  4 ++++
 drivers/net/dummy.c                     | 10 ++++++++++
 include/linux/netdevice.h               | 12 ++++++++++++
 include/uapi/linux/if_link.h            |  1 +
 net/core/dev.c                          | 19 +++++++++++++++++++
 net/core/net-sysfs.c                    | 15 ++++++++++++++-
 net/core/rtnetlink.c                    | 10 ++++++++++
 7 files changed, 70 insertions(+), 1 deletion(-)

-- 
1.8.0

^ permalink raw reply

* [patch net-next 1/4] net: add change_carrier netdev op
From: Jiri Pirko @ 2012-12-19  9:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, bhutchings, mirqus, shemminger, greearb, fbl,
	john.r.fastabend
In-Reply-To: <1355909756-2389-1-git-send-email-jiri@resnulli.us>

This allows a driver to register change_carrier callback which will be
called whenever user will like to change carrier state. This is useful
for devices like dummy, gre, team and so on.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 include/linux/netdevice.h | 12 ++++++++++++
 net/core/dev.c            | 19 +++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 02e0f6b..22ca4a8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -891,6 +891,14 @@ struct netdev_fcoe_hbainfo {
  * int (*ndo_bridge_setlink)(struct net_device *dev, struct nlmsghdr *nlh)
  * int (*ndo_bridge_getlink)(struct sk_buff *skb, u32 pid, u32 seq,
  *			     struct net_device *dev)
+ *
+ * int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier);
+ *	Called to change device carrier. Soft-devices (like dummy, team, etc)
+ *	which do not represent real hardware may define this to allow their
+ *	userspace components to manage their virtual carrier state. Devices
+ *	that determine carrier state from physical hardware properties (eg
+ *	network cables) or protocol-dependent mechanisms (eg
+ *	USB_CDC_NOTIFY_NETWORK_CONNECTION) should NOT implement this function.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1008,6 +1016,8 @@ struct net_device_ops {
 	int			(*ndo_bridge_getlink)(struct sk_buff *skb,
 						      u32 pid, u32 seq,
 						      struct net_device *dev);
+	int			(*ndo_change_carrier)(struct net_device *dev,
+						      bool new_carrier);
 };
 
 /*
@@ -2194,6 +2204,8 @@ extern int		dev_set_mtu(struct net_device *, int);
 extern void		dev_set_group(struct net_device *, int);
 extern int		dev_set_mac_address(struct net_device *,
 					    struct sockaddr *);
+extern int		dev_change_carrier(struct net_device *,
+					   bool new_carrier);
 extern int		dev_hard_start_xmit(struct sk_buff *skb,
 					    struct net_device *dev,
 					    struct netdev_queue *txq);
diff --git a/net/core/dev.c b/net/core/dev.c
index d0cbc93..268a714 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5027,6 +5027,25 @@ int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa)
 }
 EXPORT_SYMBOL(dev_set_mac_address);
 
+/**
+ *	dev_change_carrier - Change device carrier
+ *	@dev: device
+ *	@new_carries: new value
+ *
+ *	Change device carrier
+ */
+int dev_change_carrier(struct net_device *dev, bool new_carrier)
+{
+	const struct net_device_ops *ops = dev->netdev_ops;
+
+	if (!ops->ndo_change_carrier)
+		return -EOPNOTSUPP;
+	if (!netif_device_present(dev))
+		return -ENODEV;
+	return ops->ndo_change_carrier(dev, new_carrier);
+}
+EXPORT_SYMBOL(dev_change_carrier);
+
 /*
  *	Perform the SIOCxIFxxx calls, inside rcu_read_lock()
  */
-- 
1.8.0

^ permalink raw reply related

* [patch net-next 2/4] net: allow to change carrier via sysfs
From: Jiri Pirko @ 2012-12-19  9:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, bhutchings, mirqus, shemminger, greearb, fbl,
	john.r.fastabend
In-Reply-To: <1355909756-2389-1-git-send-email-jiri@resnulli.us>

Make carrier writable

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/core/net-sysfs.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 334efd5..7eda40a 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -126,6 +126,19 @@ static ssize_t show_broadcast(struct device *dev,
 	return -EINVAL;
 }
 
+static int change_carrier(struct net_device *net, unsigned long new_carrier)
+{
+	if (!netif_running(net))
+		return -EINVAL;
+	return dev_change_carrier(net, (bool) new_carrier);
+}
+
+static ssize_t store_carrier(struct device *dev, struct device_attribute *attr,
+			 const char *buf, size_t len)
+{
+	return netdev_store(dev, attr, buf, len, change_carrier);
+}
+
 static ssize_t show_carrier(struct device *dev,
 			    struct device_attribute *attr, char *buf)
 {
@@ -331,7 +344,7 @@ static struct device_attribute net_class_attributes[] = {
 	__ATTR(link_mode, S_IRUGO, show_link_mode, NULL),
 	__ATTR(address, S_IRUGO, show_address, NULL),
 	__ATTR(broadcast, S_IRUGO, show_broadcast, NULL),
-	__ATTR(carrier, S_IRUGO, show_carrier, NULL),
+	__ATTR(carrier, S_IRUGO | S_IWUSR, show_carrier, store_carrier),
 	__ATTR(speed, S_IRUGO, show_speed, NULL),
 	__ATTR(duplex, S_IRUGO, show_duplex, NULL),
 	__ATTR(dormant, S_IRUGO, show_dormant, NULL),
-- 
1.8.0

^ permalink raw reply related

* [patch net-next 3/4] rtnl: expose carrier value with possibility to set it
From: Jiri Pirko @ 2012-12-19  9:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, bhutchings, mirqus, shemminger, greearb, fbl,
	john.r.fastabend
In-Reply-To: <1355909756-2389-1-git-send-email-jiri@resnulli.us>

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 Documentation/networking/operstates.txt |  4 ++++
 include/uapi/linux/if_link.h            |  1 +
 net/core/rtnetlink.c                    | 10 ++++++++++
 3 files changed, 15 insertions(+)

diff --git a/Documentation/networking/operstates.txt b/Documentation/networking/operstates.txt
index 1a77a3c..9769457 100644
--- a/Documentation/networking/operstates.txt
+++ b/Documentation/networking/operstates.txt
@@ -88,6 +88,10 @@ set this flag. On netif_carrier_off(), the scheduler stops sending
 packets. The name 'carrier' and the inversion are historical, think of
 it as lower layer.
 
+Note that for certain kind of soft-devices, which are not managing any
+real hardware, there is possible to set this bit from userpsace.
+One should use TVL IFLA_CARRIER to do so.
+
 netif_carrier_ok() can be used to query that bit.
 
 __LINK_STATE_DORMANT, maps to IFF_DORMANT:
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 60f3b6b..c4edfe1 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -142,6 +142,7 @@ enum {
 #define IFLA_PROMISCUITY IFLA_PROMISCUITY
 	IFLA_NUM_TX_QUEUES,
 	IFLA_NUM_RX_QUEUES,
+	IFLA_CARRIER,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1868625..2ef7a56 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -780,6 +780,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(4) /* IFLA_MTU */
 	       + nla_total_size(4) /* IFLA_LINK */
 	       + nla_total_size(4) /* IFLA_MASTER */
+	       + nla_total_size(1) /* IFLA_CARRIER */
 	       + nla_total_size(4) /* IFLA_PROMISCUITY */
 	       + nla_total_size(4) /* IFLA_NUM_TX_QUEUES */
 	       + nla_total_size(4) /* IFLA_NUM_RX_QUEUES */
@@ -909,6 +910,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 	     nla_put_u32(skb, IFLA_LINK, dev->iflink)) ||
 	    (dev->master &&
 	     nla_put_u32(skb, IFLA_MASTER, dev->master->ifindex)) ||
+	    nla_put_u8(skb, IFLA_CARRIER, netif_carrier_ok(dev)) ||
 	    (dev->qdisc &&
 	     nla_put_string(skb, IFLA_QDISC, dev->qdisc->ops->id)) ||
 	    (dev->ifalias &&
@@ -1108,6 +1110,7 @@ const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_MTU]		= { .type = NLA_U32 },
 	[IFLA_LINK]		= { .type = NLA_U32 },
 	[IFLA_MASTER]		= { .type = NLA_U32 },
+	[IFLA_CARRIER]		= { .type = NLA_U8 },
 	[IFLA_TXQLEN]		= { .type = NLA_U32 },
 	[IFLA_WEIGHT]		= { .type = NLA_U32 },
 	[IFLA_OPERSTATE]	= { .type = NLA_U8 },
@@ -1438,6 +1441,13 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
 		modified = 1;
 	}
 
+	if (tb[IFLA_CARRIER]) {
+		err = dev_change_carrier(dev, nla_get_u8(tb[IFLA_CARRIER]));
+		if (err)
+			goto errout;
+		modified = 1;
+	}
+
 	if (tb[IFLA_TXQLEN])
 		dev->tx_queue_len = nla_get_u32(tb[IFLA_TXQLEN]);
 
-- 
1.8.0

^ permalink raw reply related

* [patch net-next 4/4] dummy: implement carrier change
From: Jiri Pirko @ 2012-12-19  9:35 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, bhutchings, mirqus, shemminger, greearb, fbl,
	john.r.fastabend
In-Reply-To: <1355909756-2389-1-git-send-email-jiri@resnulli.us>

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 drivers/net/dummy.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index c260af5..42aa54a 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -100,6 +100,15 @@ static void dummy_dev_uninit(struct net_device *dev)
 	free_percpu(dev->dstats);
 }
 
+static int dummy_change_carrier(struct net_device *dev, bool new_carrier)
+{
+	if (new_carrier)
+		netif_carrier_on(dev);
+	else
+		netif_carrier_off(dev);
+	return 0;
+}
+
 static const struct net_device_ops dummy_netdev_ops = {
 	.ndo_init		= dummy_dev_init,
 	.ndo_uninit		= dummy_dev_uninit,
@@ -108,6 +117,7 @@ static const struct net_device_ops dummy_netdev_ops = {
 	.ndo_set_rx_mode	= set_multicast_list,
 	.ndo_set_mac_address	= eth_mac_addr,
 	.ndo_get_stats64	= dummy_get_stats64,
+	.ndo_change_carrier	= dummy_change_carrier,
 };
 
 static void dummy_setup(struct net_device *dev)
-- 
1.8.0

^ permalink raw reply related

* Re: [RFC PATCH net-next 0/5] Ease netns management for userland
From: Nicolas Dichtel @ 2012-12-19  9:47 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, davem, aatteka
In-Reply-To: <87zk1g8tnq.fsf@xmission.com>

Le 14/12/2012 17:50, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> Le 13/12/2012 20:08, Eric W. Biederman a écrit :
>
>>> No.  The difficulty monitoring which network namespaces are being used
>>> is an unintended side effect.
>> Why is netlink a bad idea? Having a way to know all existing netns is a start
>> point to monitor netns, isn't it?
>
> In the same way that having a neighbour table that contains all existing
> ip address to mac addresses mappings is a starting point to monitor all
> existing hosts.
>
> All does not scale.
>
> All removes a lot of perfectly valid use cases like checkpoint-restart,
> and nesting containers.
>
> All as different from what is already implemented requires implementing
> yet another namespace to put the names of all into it.  We have enough
> namespaces now thank you very much.
>
> An unfiltered global list is about as interesting to use as putting
> all files in /.  Sure you know which directory you put your file in but
> which file is it?
>
> What has already been implemented should be roughly as good for
> monitoring as what is available with lsof.
>
> And of course there is the fact that a global list of anything that is
> the same from every perspective violates the principle of relativity,
> and is in contradiction with the phsical reality in which we exist.
>
> So there is no way that having a global all inclusive list of network
> namespaces makes the least lick of sense and I really don't want to
> think about it.

Thank you for your explanations and your patience, this is very useful.


Nicolas

^ permalink raw reply

* RE: TCP delayed ACK heuristic
From: David Laight @ 2012-12-19  9:52 UTC (permalink / raw)
  To: Rick Jones
  Cc: Cong Wang, netdev, Ben Greear, David Miller, Eric Dumazet,
	Stephen Hemminger, Thomas Graf
In-Reply-To: <50D0ADD4.7030903@hp.com>

> > I've seen problems when the sending side is doing (I think)
> > 'slow start' with Nagle disabled.
> > The sender would only send 4 segments before waiting for an
> > ACK - even when it had more than a full sized segment waiting.
> > Sender was Linux 2.6.something (probably low 20s).
> > I changed the application flow to send data in the reverse
> > direction to avoid the problem.
> > That was on a ~0 delay local connection - which means that
> > there is almost never outstanding data, and the 'slow start'
> > happened almost all the time.
> > Nagle is completely the wrong algorithm for the data flow.
> 
> If Nagle was already disabled, why the last sentence?  And from your
> description, even if Nagle were enabled, I would think that it was
> remote ACK+cwnd behaviour getting in your way, not Nagle, given that
> Nagle is to be decided on a user-send by user-send basis and release
> queued data (to the mercies of other heuristics) when it gets to be an
> MSS-worth.

With Nagle enabled the first segment is sent, the following ones
get buffered until full segments can be sent. Although (probably)
only 4 segments will be sent (1 small and 3 full) the 3rd of these
does generate an ack.

> ... but it sounds like you have an actual
> application looking to do that??

We are relaying data packets received over multiple SS7 signalling
links (64k hdlc) over a TCP connection. The connection will be local,
in some cases the host ethernet MAC, switch, and target cpu are all
on the same PCI(e) card (MII crossover links).
While a delay of a millisecond or two wouldn't matter (1ms is 8 byte
times) the Nagle delay is far too long - and since the data isn't
command/response the Nagle would delay happen a lot.

One of the conformance tests managed to make the system 'busy'.
Since all it does is make one 64k channel busy it shouldn't have
been able to generate a backlog of receive data - but it managed to
get over 100 data packets unacked (app level ack).

> Allowing a byte-limit-cwnd's worth of single-byte-payload TCP segments
> could easily be seen as being rather anti-social :)

If the actual RTT is almost zero (as in our case) and the network
really shouldn't be dropping packets the it doesn't matter.

I suspect that if the tx rate is faster than the RTT then the
'slow start' turns off and you can get a lot of small segments
in flight. But when the RTT is zero 'slow start' almost always
applies and you only send 4.

> And forcing/maintaining the original segment boundaries in
> retransmissions for small packets isn't such a hot idea either.

True, not splitting them might be useful, but to need to avoid
merges.

	David

^ permalink raw reply

* Re: [PATCH v2] netlink: align attributes on 64-bits
From: Nicolas Dichtel @ 2012-12-19 11:22 UTC (permalink / raw)
  To: Thomas Graf; +Cc: bhutchings, netdev, davem, David.Laight
In-Reply-To: <20121218125735.GG27746@casper.infradead.org>

Le 18/12/2012 13:57, Thomas Graf a écrit :
> On 12/17/12 at 05:49pm, Nicolas Dichtel wrote:
>> @@ -492,6 +492,15 @@ static inline struct nlmsghdr *nlmsg_put_answer(struct sk_buff *skb,
>>    */
>>   static inline struct sk_buff *nlmsg_new(size_t payload, gfp_t flags)
>>   {
>> +	/* Because attributes may be aligned on 64-bits boundary with fake
>> +	 * attribute (type 0, size 4 (attributes are 32-bits align by default)),
>> +	 * an exact payload size cannot be calculated. Hence, we need to reserve
>> +	 * more space for these attributes.
>> +	 * 128 is arbitrary: it allows to align up to 32 attributes.
>> +	 */
>> +	if (payload < NLMSG_DEFAULT_SIZE)
>> +		payload = min(payload + 128, (size_t)NLMSG_DEFAULT_SIZE);
>
> This is doomed to fail eventually. A netlink message may carry
> hundreds of attributes eventually. See my suggestion below.
>
>> diff --git a/lib/nlattr.c b/lib/nlattr.c
>> index 18eca78..7440a80 100644
>> --- a/lib/nlattr.c
>> +++ b/lib/nlattr.c
>> @@ -450,9 +450,18 @@ EXPORT_SYMBOL(__nla_put_nohdr);
>>    */
>>   int nla_put(struct sk_buff *skb, int attrtype, int attrlen, const void *data)
>>   {
>> -	if (unlikely(skb_tailroom(skb) < nla_total_size(attrlen)))
>> +	int align = IS_ALIGNED((unsigned long)skb_tail_pointer(skb), 8) ? 0 : 4;
>
> This does not look right. In order for the attribute data to be
> aligned properly you would need to check skb_tail_pointer(skb) +
> NLA_HDRLEN for proper alignment or you end up aligning the
> attribute header.
>
> How about we change nla_total_size() to return the size with
> needed padding taken into account. That should fix the message
> size caluclation problem and we only need to reserve room for
> the initial padding to align the very first attribute.
>
> Below is an untested patch that does this. What do you think?
>
> diff --git a/include/net/netlink.h b/include/net/netlink.h
> index 9690b0f..7ce8e76 100644
> --- a/include/net/netlink.h
> +++ b/include/net/netlink.h
> @@ -114,7 +114,6 @@
>    * Attribute Length Calculations:
>    *   nla_attr_size(payload)		length of attribute w/o padding
>    *   nla_total_size(payload)		length of attribute w/ padding
> - *   nla_padlen(payload)		length of padding
>    *
>    * Attribute Payload Access:
>    *   nla_data(nla)			head of attribute payload
> @@ -492,6 +491,14 @@ static inline struct nlmsghdr *nlmsg_put_answer(struct sk_buff *skb,
>    */
>   static inline struct sk_buff *nlmsg_new(size_t payload, gfp_t flags)
>   {
> +	/* If an exact size if specified, reserve some additional space to
> +	 * align the first attribute, all subsequent attributes should have
> +	 * padding accounted for.
> +	 */
> +	if (payload != NLMSG_DEFAULT_SIZE)
> +		payload = min_t(size_t, payload + NLA_ATTR_ALIGN,
> +				NLMSG_DEFAULT_SIZE);
> +
>   	return alloc_skb(nlmsg_total_size(payload), flags);
>   }
>
> @@ -653,16 +660,12 @@ static inline int nla_attr_size(int payload)
>    */
>   static inline int nla_total_size(int payload)
>   {
> -	return NLA_ALIGN(nla_attr_size(payload));
> -}
> +	size_t len = NLA_ALIGN(nla_attr_size(payload));
>
> -/**
> - * nla_padlen - length of padding at the tail of attribute
> - * @payload: length of payload
> - */
> -static inline int nla_padlen(int payload)
> -{
> -	return nla_total_size(payload) - nla_attr_size(payload);
> +	if (!IS_ALIGNED(len, NLA_ATTR_ALIGN))
> +		len = ALIGN(len + NLA_HDRLEN, NLA_ATTR_ALIGN);
> +
> +	return len;
>   }
>
>   /**
> diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
> index 78d5b8a..1856729 100644
> --- a/include/uapi/linux/netlink.h
> +++ b/include/uapi/linux/netlink.h
> @@ -149,5 +149,10 @@ struct nlattr {
>   #define NLA_ALIGN(len)		(((len) + NLA_ALIGNTO - 1) & ~(NLA_ALIGNTO - 1))
>   #define NLA_HDRLEN		((int) NLA_ALIGN(sizeof(struct nlattr)))
>
> +/* Padding attribute type, added automatically to align attributes,
> + * must be ignored by readers. */
> +#define NLA_PADDING		0
> +#define NLA_ATTR_ALIGN		8
> +
>
>   #endif /* _UAPI__LINUX_NETLINK_H */
> diff --git a/lib/nlattr.c b/lib/nlattr.c
> index 18eca78..b09473c 100644
> --- a/lib/nlattr.c
> +++ b/lib/nlattr.c
> @@ -322,18 +322,36 @@ int nla_strcmp(const struct nlattr *nla, const char *str)
>    * Adds a netlink attribute header to a socket buffer and reserves
>    * room for the payload but does not copy it.
>    *
> + * May add a padding attribute of type NLA_PADDING before the
> + * real attribute to ensure proper alignment.
> + *
>    * The caller is responsible to ensure that the skb provides enough
>    * tailroom for the attribute header and payload.
>    */
>   struct nlattr *__nla_reserve(struct sk_buff *skb, int attrtype, int attrlen)
>   {
>   	struct nlattr *nla;
> +	size_t offset;
> +
> +	offset = (size_t) skb_tail_pointer(skb);
> +	if (!IS_ALIGNED(offset + NLA_HDRLEN, NLA_ATTR_ALIGN)) {
> +		struct nlattr *pad;
> +		size_t padlen;
> +
> +		padlen = nla_total_size(offset) - offset - NLA_HDRLEN;
Here padlen will return 4, which is wrong: padlen + NLA_HDRLEN = 8, alignment is 
the same than before. Here is a proposal fix:

diff --git a/lib/nlattr.c b/lib/nlattr.c
index e4f0329..1556313 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -338,7 +338,10 @@ struct nlattr *__nla_reserve(struct sk_buff *skb, int 
attrtype, int attrlen)
  		struct nlattr *pad;
  		size_t padlen;

-		padlen = nla_total_size(offset) - offset -  NLA_HDRLEN;
+		/* We need to remove NLA_HDRLEN two times: one time for the
+		 * attribute hdr and one time for the pad attribute hdr.
+		 */
+		padlen = nla_total_size(offset) - offset -  2 * NLA_HDRLEN;
  		pad = (struct nlattr *) skb_put(skb, nla_attr_size(padlen));
  		pad->nla_type = 0;
  		pad->nla_len = nla_attr_size(padlen);

With this patch, it seems goods. attribute are always aligned on 8 bytes. Also
I did not notice any problem with size calculation (I try some ip link, ip xfrm, 
ip [m]route).

Do you want to make more tests? Or will your repost the full patch?
I can do it if you don't have time.

^ permalink raw reply related

* [PATCH 2/2] drivers/net: Use of_match_ptr() macro in smsc911x.c
From: Sachin Kamat @ 2012-12-19 11:17 UTC (permalink / raw)
  To: netdev; +Cc: steve.glendinning, davem, sachin.kamat, patches, nico
In-Reply-To: <1355915830-29481-1-git-send-email-sachin.kamat@linaro.org>

Add CONFIG_OF guard and use of_match_ptr macro.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
---
Compile tested on linux-next.
---
 drivers/net/ethernet/smsc/smsc911x.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smsc911x.c b/drivers/net/ethernet/smsc/smsc911x.c
index 4616bf2..e112877 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -2575,11 +2575,13 @@ static const struct dev_pm_ops smsc911x_pm_ops = {
 #define SMSC911X_PM_OPS NULL
 #endif
 
+#ifdef CONFIG_OF
 static const struct of_device_id smsc911x_dt_ids[] = {
 	{ .compatible = "smsc,lan9115", },
 	{ /* sentinel */ }
 };
 MODULE_DEVICE_TABLE(of, smsc911x_dt_ids);
+#endif
 
 static struct platform_driver smsc911x_driver = {
 	.probe = smsc911x_drv_probe,
@@ -2588,7 +2590,7 @@ static struct platform_driver smsc911x_driver = {
 		.name	= SMSC_CHIPNAME,
 		.owner	= THIS_MODULE,
 		.pm	= SMSC911X_PM_OPS,
-		.of_match_table = smsc911x_dt_ids,
+		.of_match_table = of_match_ptr(smsc911x_dt_ids),
 	},
 };
 
-- 
1.7.4.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox