Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH] net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer
From: David Laight @ 2014-01-22 15:30 UTC (permalink / raw)
  To: 'Vlad Yasevich', 'Matija Glavinic Pecotic',
	linux-sctp@vger.kernel.org
  Cc: Alexander Sverdlin, netdev@vger.kernel.org
In-Reply-To: <52DFDB1E.7070701@gmail.com>

From: Vlad Yasevich 
...
> > IIRC the 'size' taken of the socket buffer is the skb's 'true size'
> > and that includes any padding before and after the actual rx data.
> > For short packets the driver may have copied the data into a smaller
> > skb, for long packets it is likely to be more than that of a full
> > length ethernet packet.
> > In either case it can be significantly more than sizeof(sk_buff)
> > (190?) plus the size of the actual data.
> 
> SCTP currently doesn't support GRO, so each packet is limited to
> ethernet packet plus sk_buff overhead.

The ethernet driver might still pass up a 2k buffer, or even a 4k one.
Especially if it supports GRO for TCP.

> What throws a real monkey
> wrench into this whole accounting business is SCTP bundling.  If you
> bundle multiple messages into the single packet, accounting for it
> is a mess.

How about dividing the 'truesize' by the reference count?
(except that isn't quite right...)
I assume there are multiple skb headers but only one data buffer?
At least the chunks are all for the same connection so end up on
one socket (except I remember some other horrid stuff for datagrams).
 
> > I'm also not sure that continuously removing 'credit' is a good idea.
> > I've done a lot of comms protocol code, removing credit and 'window
> > slamming' acks are not good ideas.
> 
> This patch doesn't continuously remove 'credit'.  It advertises the
> closest approximation of the window that we support and computes it
> the same way as we do for Initial Window (available sk_rcvbuff >> 1).
> As the receiver drains the receive queue, available buffer will increase
> and the available window will grow.

Let's assume the socket buffer size is 100k.
We advertise a window of 50k.
We now receive 100 bytes of data, the remote has 49900 window left.
The 'truesize' is something like 190+64(ish)+100+tail_pad, say 400.
Socket buffer space is reduced to 99600 and any ack would give 49800.
So we have reduced the window by 100 bytes.
With that much space it probably doesn't matter much.

However if the connection is receive limited then the remote system
will have a second packet in flight that uses the rest of the window.
We then receive an in-sequence but out-of-window packet that refers
to window that we had previously given to the remote system.

I don't know what the sctp (or tcp) code does with such packets.
In my experience it is best to treat them as valid data (unless
there are larger free memory issues) and ack them at some point.
Hopefully the rules of the underlying protocol let you do this!

If you discard these packets then every packet gets sent twice
(or even more often if the data is very short).

	David


^ permalink raw reply

* Re: [PATCH 1/1] Per socket value for max datagram queue length
From: Daniel Borkmann @ 2014-01-22 15:32 UTC (permalink / raw)
  To: Dan Ballard
  Cc: Lennart Poettering, kay.sievers, Arnd Bergmann, David S. Miller,
	Eric Dumazet, Eliezer Tamir, Neil Horman, Li Zefan, linux-arch,
	linux-kernel, netdev, Hannes Frederic Sowa
In-Reply-To: <13e8aad86903481261581de7c29444e3@mindstab.net>

On 01/22/2014 04:11 PM, Dan Ballard wrote:
> Provides a new option for setsockopt SO_MAX_DGRAM_QLEN that sets and
> gets a socket specific max datagram queue length. Currently each socket
> has one but it's only ever initialized from
> /proc/sys/net/unix/max_dgram_qlen and then never adjustable later. Now
> each socket can have it individually tweaked during it's life.
>
> Signed-off-by: Dan Ballard <dan@mindstab.net>
> ---
>   include/uapi/asm-generic/socket.h |    2 ++
>   net/core/sock.c                   |    7 +++++++
>   2 files changed, 9 insertions(+)
>
> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
> index 38f14d0..f8c3e6b 100644
> --- a/include/uapi/asm-generic/socket.h
> +++ b/include/uapi/asm-generic/socket.h
> @@ -80,4 +80,6 @@
>
>   #define SO_MAX_PACING_RATE     47
>
> +#define SO_MAX_DGRAM_QLEN      48
> +

This needs to be added in more than just asm-generic, e.g.
have a look at SO_MAX_PACING_RATE or SO_BPF_EXTENSIONS.

Also you might need to rebase to current net-next head and
maybe describe use cases more in-depth; next to what Hannes
just commented.

>   #endif /* __ASM_GENERIC_SOCKET_H */
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 5393b4b..1ff69d1 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -915,6 +915,10 @@ set_rcvbuf:
>                                           sk->sk_max_pacing_rate);
>                  break;
>
> +       case SO_MAX_DGRAM_QLEN:
> +               sk->sk_max_ack_backlog = val;
> +               break;
> +
>          default:
>                  ret = -ENOPROTOOPT;
>                  break;
> @@ -1182,6 +1186,9 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
>                  v.val = sk->sk_max_pacing_rate;
>                  break;
>
> +       case SO_MAX_DGRAM_QLEN:
> +               v.val = sk->sk_max_ack_backlog;
> +               break;
>          default:
>                  return -ENOPROTOOPT;
>          }

^ permalink raw reply

* Re: [PATCH] net: Correctly sync addresses from multiple sources to single device
From: Vlad Yasevich @ 2014-01-22 15:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, andrey.dmitrov
In-Reply-To: <20140121.145355.1406240839568386348.davem@davemloft.net>

On 01/21/2014 05:53 PM, David Miller wrote:
> From: Vlad Yasevich <vyasevic@redhat.com>
> Date: Fri, 17 Jan 2014 14:52:12 -0500
> 
>> When we have multiple devices attempting to sync the same address
>> to a single destination, each device should be permitted to sync
>> it once.  To accomplish this, pass the sync count of the source
>> address to __hw_addr_add_ex().  'sync_cnt' tracks how many time
>> a given address has been successfully synced.  If the address
>> is found in the destination list, but the 'sync_cnt' of the source
>> is 0, then this address has not been synced from this interface
>> and we need to allow the sync operation to succeed thus incrementing
>> reference counts.
>>
>> Reported-by: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
>> CC: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
>> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
> 
> I applied this to net-next since 3.13 just got released, and it doesn't
> compile.
> 
> net/core/dev_addr_lists.c: In function ‘__hw_addr_sync_one’:
> net/core/dev_addr_lists.c:144:26: error: ‘struct netdev_hw_addr’ has no member named ‘sync_count’
> 

That does it...  Time to add a hook to format_patch to check for dirty
index.

-vlad

^ permalink raw reply

* Re: [PATCH 1/1] Per socket value for max datagram queue length
From: Eric Dumazet @ 2014-01-22 15:30 UTC (permalink / raw)
  To: Dan Ballard
  Cc: Lennart Poettering, kay.sievers, Arnd Bergmann, David S. Miller,
	Eric Dumazet, Eliezer Tamir, Neil Horman, Li Zefan, linux-arch,
	linux-kernel, netdev
In-Reply-To: <13e8aad86903481261581de7c29444e3@mindstab.net>

On Wed, 2014-01-22 at 07:11 -0800, Dan Ballard wrote:
> Provides a new option for setsockopt SO_MAX_DGRAM_QLEN that sets and
> gets a socket specific max datagram queue length. Currently each socket
> has one but it's only ever initialized from
> /proc/sys/net/unix/max_dgram_qlen and then never adjustable later. Now
> each socket can have it individually tweaked during it's life.
> 

Your patch suffers from many problems.

It breaks listen()

It doesn't compile on a lot of arches.

sk_max_ack_backlog is an unsigned short.

max_dgram_qlen is used, but only for Unix sockets at this time
Try : "git grep -n max_dgram_qlen" for details


> Signed-off-by: Dan Ballard <dan@mindstab.net>
> ---
>   include/uapi/asm-generic/socket.h |    2 ++
>   net/core/sock.c                   |    7 +++++++
>   2 files changed, 9 insertions(+)
> 
> diff --git a/include/uapi/asm-generic/socket.h 
> b/include/uapi/asm-generic/socket.h
> index 38f14d0..f8c3e6b 100644
> --- a/include/uapi/asm-generic/socket.h
> +++ b/include/uapi/asm-generic/socket.h
> @@ -80,4 +80,6 @@
> 
>   #define SO_MAX_PACING_RATE     47
> 
> +#define SO_MAX_DGRAM_QLEN      48
> +
>   #endif /* __ASM_GENERIC_SOCKET_H */
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 5393b4b..1ff69d1 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -915,6 +915,10 @@ set_rcvbuf:
>                                           sk->sk_max_pacing_rate);
>                  break;
> 
> +       case SO_MAX_DGRAM_QLEN:
> +               sk->sk_max_ack_backlog = val;
> +               break;
> +
>          default:
>                  ret = -ENOPROTOOPT;
>                  break;
> @@ -1182,6 +1186,9 @@ int sock_getsockopt(struct socket *sock, int 
> level, int optname,
>                  v.val = sk->sk_max_pacing_rate;
>                  break;
> 
> +       case SO_MAX_DGRAM_QLEN:
> +               v.val = sk->sk_max_ack_backlog;
> +               break;
>          default:
>                  return -ENOPROTOOPT;
>          }

^ permalink raw reply

* Re: [PATCH net-next] bridge: Remove unnecessary vlan_put_tag in br_handle_vlan
From: Vlad Yasevich @ 2014-01-22 15:22 UTC (permalink / raw)
  To: Toshiaki Makita, David S . Miller, Stephen Hemminger, netdev
In-Reply-To: <1390350577-4122-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>

On 01/21/2014 07:29 PM, Toshiaki Makita wrote:
> br_handle_vlan() pushes HW accelerated vlan tag into skbuff when outgoing
> port is the bridge device.
> This is unnecessary because __netif_receive_skb_core() can handle skbs
> with HW accelerated vlan tag. In current implementation,
> __netif_receive_skb_core() needs to extract the vlan tag embedded in skb
> data. This could cause low network performance especially when receiving
> frames at a high frame rate on the bridge device.
> 
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>

Acked-by: Vlad Yasevich <vyasevic@redhat.com>

-vlad

> ---
>  net/bridge/br_vlan.c | 21 ---------------------
>  1 file changed, 21 deletions(-)
> 
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index 7ffc801..4ca4d0a 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -151,27 +151,6 @@ struct sk_buff *br_handle_vlan(struct net_bridge *br,
>  	br_vlan_get_tag(skb, &vid);
>  	if (test_bit(vid, pv->untagged_bitmap))
>  		skb = br_vlan_untag(skb);
> -	else {
> -		/* Egress policy says "send tagged".  If output device
> -		 * is the  bridge, we need to add the VLAN header
> -		 * ourselves since we'll be going through the RX path.
> -		 * Sending to ports puts the frame on the TX path and
> -		 * we let dev_hard_start_xmit() add the header.
> -		 */
> -		if (skb->protocol != htons(ETH_P_8021Q) &&
> -		    pv->port_idx == 0) {
> -			/* vlan_put_tag expects skb->data to point to
> -			 * mac header.
> -			 */
> -			skb_push(skb, ETH_HLEN);
> -			skb = __vlan_put_tag(skb, skb->vlan_proto, skb->vlan_tci);
> -			if (!skb)
> -				goto out;
> -			/* put skb->data back to where it was */
> -			skb_pull(skb, ETH_HLEN);
> -			skb->vlan_tci = 0;
> -		}
> -	}
>  
>  out:
>  	return skb;
> 

^ permalink raw reply

* Re: [PATCH 1/1] Per socket value for max datagram queue length
From: Hannes Frederic Sowa @ 2014-01-22 15:20 UTC (permalink / raw)
  To: Dan Ballard
  Cc: Lennart Poettering, kay.sievers, Arnd Bergmann, David S. Miller,
	Eric Dumazet, Eliezer Tamir, Neil Horman, Li Zefan, linux-arch,
	linux-kernel, netdev
In-Reply-To: <13e8aad86903481261581de7c29444e3@mindstab.net>

On Wed, Jan 22, 2014 at 07:11:20AM -0800, Dan Ballard wrote:
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 5393b4b..1ff69d1 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -915,6 +915,10 @@ set_rcvbuf:
>                                          sk->sk_max_pacing_rate);
>                 break;
> 
> +       case SO_MAX_DGRAM_QLEN:
> +               sk->sk_max_ack_backlog = val;
> +               break;
> +

Shouldn't the backlog be capped for unprivileged users to some configurable
value? I even think that max_dgram_qlen should be the upper bound.

I guess it is not that serious as socket read accounting does account all
packets which sit in the backlog queue.

Greetings,

  Hannes

^ permalink raw reply

* [PATCH] net/openvswitch: Remove the skb encapsulation mark after decapsulation
From: Or Gerlitz @ 2014-01-22 15:15 UTC (permalink / raw)
  To: davem; +Cc: netdev, Or Gerlitz, Joseph Gasparakis, Pravin B Shelar

We must unset the skb encapsulation mark before injecting the
decapsulated packet into ovs for the rest of its journey.

This follows the practice set by 0afb166 "vxlan: Add capability of Rx
checksum offload for inner packet" and the overall idea behind the
skb encapsulation field.

Cc: Joseph Gasparakis <joseph.gasparakis@intel.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 net/openvswitch/vport-vxlan.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index e797a50..3742c71 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -68,6 +68,9 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
 	key = cpu_to_be64(ntohl(vx_vni) >> 8);
 	ovs_flow_tun_key_init(&tun_key, iph, key, TUNNEL_KEY);
 
+	/* we must unset the encapsulation mark after decapsulation */
+	skb->encapsulation = 0;
+
 	ovs_vport_receive(vport, skb, &tun_key);
 }
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH 1/1] Per socket value for max datagram queue length
From: Dan Ballard @ 2014-01-22 15:11 UTC (permalink / raw)
  To: Lennart Poettering, kay.sievers, Arnd Bergmann, David S. Miller,
	Eric Dumazet, Eliezer Tamir, Neil Horman, Li Zefan, linux-arch,
	linux-kernel, netdev

Provides a new option for setsockopt SO_MAX_DGRAM_QLEN that sets and
gets a socket specific max datagram queue length. Currently each socket
has one but it's only ever initialized from
/proc/sys/net/unix/max_dgram_qlen and then never adjustable later. Now
each socket can have it individually tweaked during it's life.

Signed-off-by: Dan Ballard <dan@mindstab.net>
---
  include/uapi/asm-generic/socket.h |    2 ++
  net/core/sock.c                   |    7 +++++++
  2 files changed, 9 insertions(+)

diff --git a/include/uapi/asm-generic/socket.h 
b/include/uapi/asm-generic/socket.h
index 38f14d0..f8c3e6b 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -80,4 +80,6 @@

  #define SO_MAX_PACING_RATE     47

+#define SO_MAX_DGRAM_QLEN      48
+
  #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index 5393b4b..1ff69d1 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -915,6 +915,10 @@ set_rcvbuf:
                                          sk->sk_max_pacing_rate);
                 break;

+       case SO_MAX_DGRAM_QLEN:
+               sk->sk_max_ack_backlog = val;
+               break;
+
         default:
                 ret = -ENOPROTOOPT;
                 break;
@@ -1182,6 +1186,9 @@ int sock_getsockopt(struct socket *sock, int 
level, int optname,
                 v.val = sk->sk_max_pacing_rate;
                 break;

+       case SO_MAX_DGRAM_QLEN:
+               v.val = sk->sk_max_ack_backlog;
+               break;
         default:
                 return -ENOPROTOOPT;
         }
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH] net: Correctly sync addresses from multiple sources to single device
From: Vlad Yasevich @ 2014-01-22 15:06 UTC (permalink / raw)
  To: Andrey Dmitrov, Vlad Yasevich
  Cc: netdev, davem, Alexandra N. Kossovsky, Konstantin Ushakov
In-Reply-To: <52DFD32D.4060805@oktetlabs.ru>

On 01/22/2014 09:18 AM, Andrey Dmitrov wrote:
> On 01/17/2014 11:52 PM, Vlad Yasevich wrote:
>> When we have multiple devices attempting to sync the same address
>> to a single destination, each device should be permitted to sync
>> it once.  To accomplish this, pass the sync count of the source
>> address to __hw_addr_add_ex().  'sync_cnt' tracks how many time
>> a given address has been successfully synced.  If the address
>> is found in the destination list, but the 'sync_cnt' of the source
>> is 0, then this address has not been synced from this interface
>> and we need to allow the sync operation to succeed thus incrementing
>> reference counts.
>>
>> Reported-by: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
>> CC: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
>> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
>> ---
>>   net/core/dev_addr_lists.c | 12 +++++++-----
>>   1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
>> index ec40a84..bd1b880 100644
>> --- a/net/core/dev_addr_lists.c
>> +++ b/net/core/dev_addr_lists.c
>> @@ -48,7 +48,8 @@ static int __hw_addr_create_ex(struct
>> netdev_hw_addr_list *list,
>>     static int __hw_addr_add_ex(struct netdev_hw_addr_list *list,
>>                   const unsigned char *addr, int addr_len,
>> -                unsigned char addr_type, bool global, bool sync)
>> +                unsigned char addr_type, bool global, bool sync,
>> +                int sync_count)
>>   {
>>       struct netdev_hw_addr *ha;
>>   @@ -66,7 +67,7 @@ static int __hw_addr_add_ex(struct
>> netdev_hw_addr_list *list,
>>                       ha->global_use = true;
>>               }
>>               if (sync) {
>> -                if (ha->synced)
>> +                if (ha->synced && sync_count)
>>                       return -EEXIST;
>>                   else
>>                       ha->synced = true;
>> @@ -84,7 +85,8 @@ static int __hw_addr_add(struct netdev_hw_addr_list
>> *list,
>>                const unsigned char *addr, int addr_len,
>>                unsigned char addr_type)
>>   {
>> -    return __hw_addr_add_ex(list, addr, addr_len, addr_type, false,
>> false);
>> +    return __hw_addr_add_ex(list, addr, addr_len, addr_type, false,
>> false,
>> +                0);
>>   }
>>     static int __hw_addr_del_entry(struct netdev_hw_addr_list *list,
>> @@ -139,7 +141,7 @@ static int __hw_addr_sync_one(struct
>> netdev_hw_addr_list *to_list,
>>       int err;
>>         err = __hw_addr_add_ex(to_list, ha->addr, addr_len, ha->type,
>> -                   false, true);
>> +                   false, true, ha->sync_count);
>>       if (err && err != -EEXIST)
>>           return err;
>>   @@ -676,7 +678,7 @@ static int __dev_mc_add(struct net_device *dev,
>> const unsigned char *addr,
>>         netif_addr_lock_bh(dev);
>>       err = __hw_addr_add_ex(&dev->mc, addr, dev->addr_len,
>> -                   NETDEV_HW_ADDR_T_MULTICAST, global, false);
>> +                   NETDEV_HW_ADDR_T_MULTICAST, global, false, 0);
>>       if (!err)
>>           __dev_set_rx_mode(dev);
>>       netif_addr_unlock_bh(dev);
> 
> Thanks. The patch was tested with linux-3.12.6.
> 
> However, while building the kernel I got complains on:
> 
>>     err = __hw_addr_add_ex(to_list, ha->addr, addr_len, ha->type,
>> -                   false, true);
>> +                   false, true, ha->sync_count);
> 
> Namely ha (struct netdev_hw_addr) has no field sync_count. It has sync_cnt.
> So I’ve fixed this when testing the patch. Is it a typo or something
> intentional?
> 
> Your patch solves the initial problem, but it looks like it introduces a
> new one.
> 
> When interface joins multicast group corresponding MAC address is added
> to the interface maddr list. It should be removed when socket explicitly
> leaves the group or is just closed. It’s the case without your patch.
> However, with the patched kernel address is still there after socket is
> closed.
> 
> You can reproduce it with mcast_client binary (mcast_client.c was
> attached to the first letter in the thread). To be specific:
> 
> $ ip maddr show eth3
> 9:  eth3
>  link  33:33:00:00:00:01 users 3
>  link  01:00:5e:00:00:01 users 3
>  link  33:33:ff:01:39:7c users 3
>  link  01:80:c2:00:00:21 users 2
>  inet  224.0.0.1
>  inet6 ff02::1:ff01:397c
>  inet6 ff02::1
>  inet6 ff01::1
> 
> $ gcc mcast_client.c -o cl
> $ sudo ./cl
> Abort it with Ctrl+c
> $ ip maddr show eth3
> 9:  eth3
>  link  33:33:00:00:00:01 users 3
>  link  01:00:5e:00:00:01 users 3
>  link  33:33:ff:01:39:7c users 3
>  link  01:80:c2:00:00:21 users 2
>  link  01:00:5e:11:58:a8
>  inet  224.0.0.1
>  inet6 ff02::1:ff01:397c
>  inet6 ff02::1
>  inet6 ff01::1
> 
> 'link 01:00:5e:11:58:a8’ address is still there. Should’ve been removed.

You are right.  Sync really doesn't deal well with with multiple sources
syncing the same address.  Looks like we need to keep track of how
many time the address has been synced from the different sources.

Looks like I have to restore
commit 4543fbefe6e06a9e40d9f2b28d688393a299f079. :)

-vlad
> 
> Andrey
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next v4] ipv6: enable anycast addresses as source addresses for datagrams
From: Hannes Frederic Sowa @ 2014-01-22 15:03 UTC (permalink / raw)
  To: Francois-Xavier Le Bail; +Cc: netdev, David Stevens, David Miller
In-Reply-To: <1390372958-3654-1-git-send-email-fx.lebail@yahoo.com>

On Wed, Jan 22, 2014 at 07:42:37AM +0100, Francois-Xavier Le Bail wrote:
> This change allows to consider an anycast address valid as source address
> when given via an IPV6_PKTINFO or IPV6_2292PKTINFO ancillary data item.
> So, when sending a datagram with ancillary data, the unicast and anycast
> addresses are handled in the same way.
> 
> - Adds ipv6_chk_acast_addr_src() to check if an anycast address is link-local
>   on given interface or is global.
> - Uses it in ip6_datagram_send_ctl().
> 
> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Thanks!

^ permalink raw reply

* Re: [PATCH] net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer
From: Vlad Yasevich @ 2014-01-22 14:52 UTC (permalink / raw)
  To: David Laight, 'Matija Glavinic Pecotic',
	linux-sctp@vger.kernel.org
  Cc: Alexander Sverdlin, netdev@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6D463A0F@AcuExch.aculab.com>

On 01/22/2014 08:41 AM, David Laight wrote:
> From: Matija Glavinic Pecotic
>> Implementation of (a)rwnd calculation might lead to severe
performance issues
>> and associations completely stalling. These problems are described
and solution
>> is proposed which improves lksctp's robustness in congestion state.
>>
>> 1) Sudden drop of a_rwnd and incomplete window recovery afterwards
>>
>> Data accounted in sctp_assoc_rwnd_decrease takes only payload size
(sctp data),
>> but size of sk_buff, which is blamed against receiver buffer, is not
accounted
>> in rwnd. Theoretically, this should not be the problem as actual size
of buffer
>> is double the amount requested on the socket (SO_RECVBUF). Problem
here is
>> that this will have bad scaling for data which is less then sizeof
sk_buff.
>> E.g. in 4G (LTE) networks, link interfacing radio side will have a
large portion
>> of traffic of this size (less then 100B).
> ...
>>
>> Proposed solution:
>>
>> Both problems share the same root cause, and that is improper scaling
of socket
>> buffer with rwnd. Solution in which sizeof(sk_buff) is taken into
concern while
>> calculating rwnd is not possible due to fact that there is no linear
>> relationship between amount of data blamed in increase/decrease with
IP packet
>> in which payload arrived. Even in case such solution would be followed,
>> complexity of the code would increase. Due to nature of current rwnd
handling,
>> slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure
state is
>> entered is rationale, but it gives false representation to the sender
of current
>> buffer space. Furthermore, it implements additional congestion
control mechanism
>> which is defined on implementation, and not on standard basis.
>>
>> Proposed solution simplifies whole algorithm having on mind
definition from rfc:
>>
>> o  Receiver Window (rwnd): This gives the sender an indication of the
space
>>    available in the receiver's inbound buffer.
>>
>> Core of the proposed solution is given with these lines:
>>
>> sctp_assoc_rwnd_account:
>> 	if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
>> 		asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1;
>> 	else
>> 		asoc->rwnd = 0;
>>
>> We advertise to sender (half of) actual space we have. Half is in the
braces
>> depending whether you would like to observe size of socket buffer as
SO_RECVBUF
>> or twice the amount, i.e. size is the one visible from userspace,
that is,
>> from kernelspace.
>> In this way sender is given with good approximation of our buffer space,
>> regardless of the buffer policy - we always advertise what we have.
Proposed
>> solution fixes described problems and removes necessity for rwnd
restoration
>> algorithm. Finally, as proposed solution is simplification, some
lines of code,
>> along with some bytes in struct sctp_association are saved.
>
> IIRC the 'size' taken of the socket buffer is the skb's 'true size'
and that
> includes any padding before and after the actual rx data. For short
packets
> the driver may have copied the data into a smaller skb, for long
packets it
> is likely to be more than that of a full length ethernet packet.
> In either case it can be significantly more than sizeof(sk_buff)
(190?) plus
> the size of the actual data.

SCTP currently doesn't support GRO, so each packet is limited to
ethernet packet plus sk_buff overhead.  What throws a real monkey
wrench into this whole accounting business is SCTP bundling.  If you
bundle multiple messages into the single packet, accounting for it
is a mess.

>
> I'm also not sure that continuously removing 'credit' is a good idea.
> I've done a lot of comms protocol code, removing credit and 'window
> slamming' acks are not good ideas.

This patch doesn't continuously remove 'credit'.  It advertises the
closest approximation of the window that we support and computes it
the same way as we do for Initial Window (available sk_rcvbuff >> 1).
As the receiver drains the receive queue, available buffer will increase
and the available window will grow.

In fact, the current implementation does more 'windows slamming' then
this proposed patch.

>
> Perhaps the advertised window should be bounded by the configured socket
> buffer size, and only reduced if the actual space isn't likely to be large
> enough given the typical overhead of the received data.

Problem is there is no typical overhead due to the message oriented
nature of the SCTP, and the socket buffer limits entire buffer space
(overhead included).   What happens when the socket buffer buffer is
consumed faster then the window due to overhead being significantly
larger then the payload?  You have "window slamming" of the worst
kind.  The available window slowly decreases until it hits a point
receive buffer exhaustion and then it slams shut.

This patch is better is that it gradually reduces the window based
on available buffer space providing a more accurate depiction of what
is happening on the receiver.

>
> Similarly, as the window is opened after congestion it should be increased
> by the amount of data actually removed (not the number of free bytes).
> When there is a significant amount of space the window could be increased
> faster - allowing a smaller number of larger skb carrying more data be
queued.
>
> As a matter of interest, how does TCP handle this?

TCP also calculates it's available window based on available receive
buffer space.

-vlad
>
> 	David
>
>
N�����r��y���b�X��ǧv�^�)޺{.n�+����{���i�{ay�\x1dʇڙ�,j\a��f���h���z�\x1e�w���\f���j:+v���w�j�m����\a����zZ+�����ݢj"��!tml=
>

^ permalink raw reply

* Re: [PATCH net-next v2] tcp: metrics: Handle v6/v4-mapped sockets in tcp-metrics
From: Eric Dumazet @ 2014-01-22 14:49 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: David Miller, netdev
In-Reply-To: <1390395524-27930-1-git-send-email-christoph.paasch@uclouvain.be>

On Wed, 2014-01-22 at 13:58 +0100, Christoph Paasch wrote:
> A socket may be v6/v4-mapped. In that case sk->sk_family is AF_INET6,
> but the IP being used is actually an IPv4-address.
> Current's tcp-metrics will thus represent it as an IPv6-address:
> 
> root@server:~# ip tcp_metrics
> ::ffff:10.1.1.2 age 22.920sec rtt 18750us rttvar 15000us cwnd 10
> 10.1.1.2 age 47.970sec rtt 16250us rttvar 10000us cwnd 10
> 
> This patch modifies the tcp-metrics so that they are able to handle the
> v6/v4-mapped sockets correctly.
> 
> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH v3] sch_htb: let skb->priority refer to non-leaf class
From: Eric Dumazet @ 2014-01-22 14:47 UTC (permalink / raw)
  To: Harry Mason; +Cc: Sergei Shtylyov, Jamal Hadi Salim, linux-netdev
In-Reply-To: <1389964952.4698.20.camel@azathoth.dev.smoothwall.net>

On Fri, 2014-01-17 at 13:22 +0000, Harry Mason wrote:
> If the class in skb->priority is not a leaf, apply filters from the
> selected class, not the qdisc. This lets netfilter or user space
> partially classify the packet.
> 
> Signed-off-by: Harry Mason <harry.mason@smoothwall.net>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH net-next] IPv6: enable TCP to use an anycast address
From: François-Xavier Le Bail @ 2014-01-22 14:32 UTC (permalink / raw)
  To: Alexey Kuznetsov
  Cc: netdev, David Stevens, Hannes Frederic Sowa, David Miller,
	Hideaki Yoshifuji

Hello,

> Actually, I was alerted by reset processing in your patch, it cannot be right.

Perhaps, I missed something.

My tests were:
1) enable anycast in tcp_v6_conn_request().
So, a client-serveur exchange with anycast destination address is possible.

2) try to connect with a port not listen by the server.
With unicast destination address, the client got a reset.
With anycast destination address, no reset, the client try several syn and give up.

3) enable anycast in tcp_v6_send_reset().
With anycast destination address, the client got a reset.
So unicast and anycast behave the same.

Do you see any problem with this behaviour ?

> Do not you think this must not be enabled for common use? At least
> some separate sysctl disabled by default.

We could indeed use a sysctl, disabled by default.

But my goal was to enable anycast address usage transparently, if possible.

It's only a possibility for some use cases and if it's dont break unicast TCP usage,
like I think it, I propose to enable this change without sysctl.

So, my proposals for the TCP case, in descending order of preference are:

1) No sysctl.
2) A sysctl disabled par default, used in tcp_v6_conn_request() and tcp_v6_send_reset().
3) A sysctl disabled par default, used in tcp_v6_send_reset(), if problem only here.

Greetings,
Francois-Xavier

^ permalink raw reply

* Re: [PATCH net-next 3/3] bonding: cleanup some redundant code and wrong variables
From: Nikolay Aleksandrov @ 2014-01-22 14:28 UTC (permalink / raw)
  To: Ding Tianhong, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	David S. Miller, Netdev
In-Reply-To: <52DF8DC2.2030609@huawei.com>

On 01/22/2014 10:22 AM, Ding Tianhong wrote:
> The dev_set_mac_address() will check the dev->netdev_ops->ndo_set_mac_address,
> so no need to check it in bond_set_mac_address().
> 
> Fix the wrong variables for pr_err().
> 
> Cc: Jay Vosburgh <fubar@us.ibm.com>
> Cc: Veaceslav Falico <vfalico@redhat.com>
> Cc: Andy Gospodarek <andy@greyhouse.net>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---
>  drivers/net/bonding/bond_main.c | 10 +---------
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index ce0f5c0..9d92f46 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3509,15 +3509,7 @@ static int bond_set_mac_address(struct net_device *bond_dev, void *addr)
>  	 */
>  
>  	bond_for_each_slave(bond, slave, iter) {
> -		const struct net_device_ops *slave_ops = slave->dev->netdev_ops;
>  		pr_debug("slave %p %s\n", slave, slave->dev->name);
> -
> -		if (slave_ops->ndo_set_mac_address == NULL) {
> -			res = -EOPNOTSUPP;
> -			pr_debug("EOPNOTSUPP %s\n", slave->dev->name);
> -			goto unwind;
> -		}
> -
>  		res = dev_set_mac_address(slave->dev, addr);
>  		if (res) {
>  			/* TODO: consider downing the slave
> @@ -4317,7 +4309,7 @@ static int bond_check_params(struct bond_params *params)
>  						      fail_over_mac_tbl);
>  		if (fail_over_mac_value == -1) {
>  			pr_err("Error: invalid fail_over_mac \"%s\"\n",
> -			       arp_validate == NULL ? "NULL" : arp_validate);
> +			       fail_over_mac == NULL ? "NULL" : fail_over_mac);
>  			return -EINVAL;
>  		}
My option API changes include a fix for this which also removes the NULL check
as it's already checked earlier if fail_over_mac is != NULL. If you'd like to
fix this yourself I think you can also skip the NULL check when printing the error.

Cheers,
 Nik
>  
> 

^ permalink raw reply

* Re: [PATCH net-next 1/3] bonding: Set the correct value to fail_over_mac at enslavement
From: Nikolay Aleksandrov @ 2014-01-22 14:20 UTC (permalink / raw)
  To: Ding Tianhong, Jay Vosburgh, Veaceslav Falico, David S. Miller,
	Netdev, Andy Gospodarek
In-Reply-To: <52DF8DB8.9000006@huawei.com>

On 01/22/2014 10:22 AM, Ding Tianhong wrote:
> If the new slave don't support setting the MAC address, there are
> two ways to handle this situation:
> 
> 1). If the new slave is the first slave, set bond to the new slave's
>     MAC address, if the mode is active-backup, set fail_over_mac to
>     active, otherwise set fail_over_mac to none.
> 2). If the new slave is not the first slave and the fail_over_mac is
>     active, it means that the slave could work normally in active-backup
>     mode, otherwise if the fail_over_mac is none, the slave could not
>     work normally for no active-backup mode, so bond could not ensalve
>     the new dev.
> 
> Cc: Jay Vosburgh <fubar@us.ibm.com>
> Cc: Veaceslav Falico <vfalico@redhat.com>
> Cc: Andy Gospodarek <andy@greyhouse.net>
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---
>  drivers/net/bonding/bond_main.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 3220b48..598f100 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1334,9 +1334,17 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>  
>  	if (slave_ops->ndo_set_mac_address == NULL) {
>  		if (!bond_has_slaves(bond)) {
> -			pr_warning("%s: Warning: The first slave device specified does not support setting the MAC address. Setting fail_over_mac to active.",
> +			pr_warning("%s: Warning: The first slave device specified does not support setting the MAC address.\n",
>  				   bond_dev->name);
> -			bond->params.fail_over_mac = BOND_FOM_ACTIVE;
> +			if (bond->params.mode == BOND_MODE_ACTIVEBACKUP) {
> +				bond->params.fail_over_mac = BOND_FOM_ACTIVE;
> +				pr_warning("%s: Setting fail_over_mac to active for active-backup mode.\n",
> +					   bond_dev->name);
> +			} else {
> +				bond->params.fail_over_mac = BOND_FOM_NONE;
> +				pr_warning("%s: Setting fail_over_mac to none for no active-backup modes",
At least make the warnings consistent: "... for no active-backup mode.\n".
Also you might consider switching to pr_warn.

> +					   bond_dev->name);
> +			}
>  		} else if (bond->params.fail_over_mac != BOND_FOM_ACTIVE) {
>  			pr_err("%s: Error: The slave device specified does not support setting the MAC address, but fail_over_mac is not set to active.\n",
>  			       bond_dev->name);
> 

^ permalink raw reply

* Re: [PATCH] net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer
From: Vlad Yasevich @ 2014-01-22 14:18 UTC (permalink / raw)
  To: Neil Horman, Matija Glavinic Pecotic
  Cc: linux-sctp, Alexander Sverdlin, netdev
In-Reply-To: <20140122123059.GA9393@hmsreliant.think-freely.org>

On 01/22/2014 07:30 AM, Neil Horman wrote:
> On Fri, Jan 17, 2014 at 08:01:24AM +0100, Matija Glavinic Pecotic wrote:
>>
>> Proposed solution simplifies whole algorithm having on mind
definition from rfc:
>>
>> o  Receiver Window (rwnd): This gives the sender an indication of the
space
>>    available in the receiver's inbound buffer.
>>
>> Core of the proposed solution is given with these lines:
>>
>> sctp_assoc_rwnd_account:
>> 	if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
>> 		asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1;
>> 	else
>> 		asoc->rwnd = 0;
>>
>> We advertise to sender (half of) actual space we have. Half is in the
braces
>> depending whether you would like to observe size of socket buffer as
SO_RECVBUF
>> or twice the amount, i.e. size is the one visible from userspace,
that is,
>> from kernelspace.
>> In this way sender is given with good approximation of our buffer space,
>> regardless of the buffer policy - we always advertise what we have.
Proposed
>> solution fixes described problems and removes necessity for rwnd
restoration
>> algorithm. Finally, as proposed solution is simplification, some
lines of code,
>> along with some bytes in struct sctp_association are saved.
>>
>> Signed-off-by: Matija Glavinic Pecotic
<matija.glavinic-pecotic.ext@nsn.com>
>> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nsn.com>
>>
>
>
> General comment - While this seems to make sense to me generally speaking,
> doesn't it currently violate section 6 of the RFC?
>
>
> A SCTP receiver MUST be able to receive a minimum of 1500 bytes in
>    one SCTP packet.  This means that a SCTP endpoint MUST NOT indicate
>    less than 1500 bytes in its Initial a_rwnd sent in the INIT or INIT
>    ACK.
>
> Since we set the initial rwnd value to the larger of sk->sk_rcvbuf/2 or
> SCTP_MIN_RWND (1500 bytes), won't we potentially advertize half that
amount?

Not initially.  Initial window will still be advertized properly.  Once
we receive the first packet and consumed some space, we'll advertize
half of available receive buffer.  It is perfectly OK to advertize a
window smaller the MIN_WINDOW in the middle of the transfer.

> It seems we need to double the minimum socket receive buffer here.

Not here specifically, but yes.  It is already broken and this patch
doesn't change current behavior.  This is something SCTP may need to do
separately.

-vlad

>
> Neil
>
>

^ permalink raw reply

* Re: [PATCH] net: Correctly sync addresses from multiple sources to single device
From: Andrey Dmitrov @ 2014-01-22 14:18 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, davem, Alexandra N. Kossovsky, Konstantin Ushakov
In-Reply-To: <1389988332-22472-1-git-send-email-vyasevic@redhat.com>

On 01/17/2014 11:52 PM, Vlad Yasevich wrote:
> When we have multiple devices attempting to sync the same address
> to a single destination, each device should be permitted to sync
> it once.  To accomplish this, pass the sync count of the source
> address to __hw_addr_add_ex().  'sync_cnt' tracks how many time
> a given address has been successfully synced.  If the address
> is found in the destination list, but the 'sync_cnt' of the source
> is 0, then this address has not been synced from this interface
> and we need to allow the sync operation to succeed thus incrementing
> reference counts.
>
> Reported-by: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
> CC: Andrey Dmitrov <andrey.dmitrov@oktetlabs.ru>
> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
> ---
>   net/core/dev_addr_lists.c | 12 +++++++-----
>   1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
> index ec40a84..bd1b880 100644
> --- a/net/core/dev_addr_lists.c
> +++ b/net/core/dev_addr_lists.c
> @@ -48,7 +48,8 @@ static int __hw_addr_create_ex(struct netdev_hw_addr_list *list,
>   
>   static int __hw_addr_add_ex(struct netdev_hw_addr_list *list,
>   			    const unsigned char *addr, int addr_len,
> -			    unsigned char addr_type, bool global, bool sync)
> +			    unsigned char addr_type, bool global, bool sync,
> +			    int sync_count)
>   {
>   	struct netdev_hw_addr *ha;
>   
> @@ -66,7 +67,7 @@ static int __hw_addr_add_ex(struct netdev_hw_addr_list *list,
>   					ha->global_use = true;
>   			}
>   			if (sync) {
> -				if (ha->synced)
> +				if (ha->synced && sync_count)
>   					return -EEXIST;
>   				else
>   					ha->synced = true;
> @@ -84,7 +85,8 @@ static int __hw_addr_add(struct netdev_hw_addr_list *list,
>   			 const unsigned char *addr, int addr_len,
>   			 unsigned char addr_type)
>   {
> -	return __hw_addr_add_ex(list, addr, addr_len, addr_type, false, false);
> +	return __hw_addr_add_ex(list, addr, addr_len, addr_type, false, false,
> +				0);
>   }
>   
>   static int __hw_addr_del_entry(struct netdev_hw_addr_list *list,
> @@ -139,7 +141,7 @@ static int __hw_addr_sync_one(struct netdev_hw_addr_list *to_list,
>   	int err;
>   
>   	err = __hw_addr_add_ex(to_list, ha->addr, addr_len, ha->type,
> -			       false, true);
> +			       false, true, ha->sync_count);
>   	if (err && err != -EEXIST)
>   		return err;
>   
> @@ -676,7 +678,7 @@ static int __dev_mc_add(struct net_device *dev, const unsigned char *addr,
>   
>   	netif_addr_lock_bh(dev);
>   	err = __hw_addr_add_ex(&dev->mc, addr, dev->addr_len,
> -			       NETDEV_HW_ADDR_T_MULTICAST, global, false);
> +			       NETDEV_HW_ADDR_T_MULTICAST, global, false, 0);
>   	if (!err)
>   		__dev_set_rx_mode(dev);
>   	netif_addr_unlock_bh(dev);

Thanks. The patch was tested with linux-3.12.6.

However, while building the kernel I got complains on:

> 	err = __hw_addr_add_ex(to_list, ha->addr, addr_len, ha->type,
> -			       false, true);
> +			       false, true, ha->sync_count);

Namely ha (struct netdev_hw_addr) has no field sync_count. It has sync_cnt.
So I’ve fixed this when testing the patch. Is it a typo or something intentional?

Your patch solves the initial problem, but it looks like it introduces a new one.

When interface joins multicast group corresponding MAC address is added to the interface maddr list. It should be removed when socket explicitly leaves the group or is just closed. It’s the case without your patch. However, with the patched kernel address is still there after socket is closed.

You can reproduce it with mcast_client binary (mcast_client.c was attached to the first letter in the thread). To be specific:

$ ip maddr show eth3
9:  eth3
  link  33:33:00:00:00:01 users 3
  link  01:00:5e:00:00:01 users 3
  link  33:33:ff:01:39:7c users 3
  link  01:80:c2:00:00:21 users 2
  inet  224.0.0.1
  inet6 ff02::1:ff01:397c
  inet6 ff02::1
  inet6 ff01::1

$ gcc mcast_client.c -o cl
$ sudo ./cl
Abort it with Ctrl+c
$ ip maddr show eth3
9:  eth3
  link  33:33:00:00:00:01 users 3
  link  01:00:5e:00:00:01 users 3
  link  33:33:ff:01:39:7c users 3
  link  01:80:c2:00:00:21 users 2
  link  01:00:5e:11:58:a8
  inet  224.0.0.1
  inet6 ff02::1:ff01:397c
  inet6 ff02::1
  inet6 ff01::1

'link 01:00:5e:11:58:a8’ address is still there. Should’ve been removed.

Andrey

^ permalink raw reply

* virtio_net: doesn't free last skb in tx buffer
From: Daniel Andersson Tenninge @ 2014-01-22 14:02 UTC (permalink / raw)
  To: netdev; +Cc: Rusty Russell, Michael S. Tsirkin, Patrick McHardy, UABFRA

Hi,

I have been chasing a fault when closing a network namespace with a soft 
device in (vlan, macvlan, ...) and the "real" device is a virtio device. 
I couldn't decide if I should email the netdev or virtualization list 
but I hope it is sent right.

The problem is that the namespace can never close because there is a 
neighbour entry holding a reference to the device. This neighbour entry 
is not free:d since a dst_entry is left with a reference to it and this 
dst_entry is referenced from the last sk_buff sent on the virtio device.

 From the code it looks like whenever a sk_buff should be sent in the 
start_xmit (driver/net/virtio_net.c) function the driver checks the tx 
ring buffer if there are any old sk_buff there that should be free:d. 
When the network namespace is closed down the last sk_buff cannot be 
free:d until another is sent and if this doesn't happen the namespace 
will hang forever.

Now this could be done in the net_device_ops ndo_uninit, but it seems 
like if the there is a device on top of the virtio device, e.g. a vlan 
device, the ndo_uninit is called for the vlan but the vlan never passes 
it down to the real_dev. The same can be seen in the macvlan driver as 
well. Is it made like this intentional and in that case what is the 
preferred way for a device driver to clean up stuff during network 
namespace shutdown?

Wouldn't this cause a more generic problem for network device drivers 
that expect to be able to clean up stuff in ndo_uninit or is this not 
meant to be used for this and the problem is specific to the virtio_net 
leaving data after start_xmit returns?

BR,
--Daniel

^ permalink raw reply

* [PATCH net-next v2 25/25] bonding: convert slaves to use the new option API
From: Nikolay Aleksandrov @ 2014-01-22 13:53 UTC (permalink / raw)
  To: netdev; +Cc: Nikolay Aleksandrov
In-Reply-To: <1390398820-5355-1-git-send-email-nikolay@redhat.com>

This patch adds the necessary changes so slaves would use
the new bonding option API. Also move the option to its own set function
in bond_options.c and fix some style errors.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_options.c | 53 ++++++++++++++++++++++++++++++++++++++
 drivers/net/bonding/bond_options.h |  2 ++
 drivers/net/bonding/bond_sysfs.c   | 51 +++---------------------------------
 3 files changed, 59 insertions(+), 47 deletions(-)

diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 60d7001..05a402c 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -300,6 +300,13 @@ static struct bond_option bond_opts[] = {
 		.values = bond_lp_interval_tbl,
 		.set = bond_option_lp_interval_set
 	},
+	[BOND_OPT_SLAVES] = {
+		.id = BOND_OPT_SLAVES,
+		.name = "slaves",
+		.desc = "Slave membership management",
+		.flags = BOND_OPTFLAG_RAWVAL,
+		.set = bond_option_slaves_set
+	},
 	{ }
 };
 
@@ -1221,3 +1228,49 @@ err_no_cmd:
 	goto out;
 
 }
+
+int bond_option_slaves_set(struct bonding *bond, struct bond_opt_value *newval)
+{
+	char command[IFNAMSIZ + 1] = { 0, };
+	struct net_device *dev;
+	char *ifname;
+	int ret;
+
+	sscanf(newval->string, "%16s", command); /* IFNAMSIZ*/
+	ifname = command + 1;
+	if ((strlen(command) <= 1) ||
+	    !dev_valid_name(ifname))
+		goto err_no_cmd;
+
+	dev = __dev_get_by_name(dev_net(bond->dev), ifname);
+	if (!dev) {
+		pr_info("%s: Interface %s does not exist!\n",
+			bond->dev->name, ifname);
+		ret = -ENODEV;
+		goto out;
+	}
+
+	switch (command[0]) {
+	case '+':
+		pr_info("%s: Adding slave %s.\n", bond->dev->name, dev->name);
+		ret = bond_enslave(bond->dev, dev);
+		break;
+
+	case '-':
+		pr_info("%s: Removing slave %s.\n", bond->dev->name, dev->name);
+		ret = bond_release(bond->dev, dev);
+		break;
+
+	default:
+		goto err_no_cmd;
+	}
+
+out:
+	return ret;
+
+err_no_cmd:
+	pr_err("no command found in slaves file for bond %s. Use +ifname or -ifname.\n",
+	       bond->dev->name);
+	ret = -EPERM;
+	goto out;
+}
diff --git a/drivers/net/bonding/bond_options.h b/drivers/net/bonding/bond_options.h
index eb3a773..433d37f 100644
--- a/drivers/net/bonding/bond_options.h
+++ b/drivers/net/bonding/bond_options.h
@@ -61,6 +61,7 @@ enum {
 	BOND_OPT_ALL_SLAVES_ACTIVE,
 	BOND_OPT_RESEND_IGMP,
 	BOND_OPT_LP_INTERVAL,
+	BOND_OPT_SLAVES,
 	BOND_OPT_LAST
 };
 
@@ -165,4 +166,5 @@ int bond_option_resend_igmp_set(struct bonding *bond,
 				struct bond_opt_value *newval);
 int bond_option_lp_interval_set(struct bonding *bond,
 				struct bond_opt_value *newval);
+int bond_option_slaves_set(struct bonding *bond, struct bond_opt_value *newval);
 #endif /* _BOND_OPTIONS_H */
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index f7bc791..643fcc1 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -200,58 +200,15 @@ static ssize_t bonding_store_slaves(struct device *d,
 				    struct device_attribute *attr,
 				    const char *buffer, size_t count)
 {
-	char command[IFNAMSIZ + 1] = { 0, };
-	char *ifname;
-	int res, ret = count;
-	struct net_device *dev;
 	struct bonding *bond = to_bond(d);
+	int ret;
 
-	if (!rtnl_trylock())
-		return restart_syscall();
-
-	sscanf(buffer, "%16s", command); /* IFNAMSIZ*/
-	ifname = command + 1;
-	if ((strlen(command) <= 1) ||
-	    !dev_valid_name(ifname))
-		goto err_no_cmd;
-
-	dev = __dev_get_by_name(dev_net(bond->dev), ifname);
-	if (!dev) {
-		pr_info("%s: Interface %s does not exist!\n",
-			bond->dev->name, ifname);
-		ret = -ENODEV;
-		goto out;
-	}
-
-	switch (command[0]) {
-	case '+':
-		pr_info("%s: Adding slave %s.\n", bond->dev->name, dev->name);
-		res = bond_enslave(bond->dev, dev);
-		break;
-
-	case '-':
-		pr_info("%s: Removing slave %s.\n", bond->dev->name, dev->name);
-		res = bond_release(bond->dev, dev);
-		break;
-
-	default:
-		goto err_no_cmd;
-	}
-
-	if (res)
-		ret = res;
-	goto out;
-
-err_no_cmd:
-	pr_err("no command found in slaves file for bond %s. Use +ifname or -ifname.\n",
-	       bond->dev->name);
-	ret = -EPERM;
+	ret = bond_opt_tryset_rtnl(bond, BOND_OPT_SLAVES, (char *)buffer);
+	if (!ret)
+		ret = count;
 
-out:
-	rtnl_unlock();
 	return ret;
 }
-
 static DEVICE_ATTR(slaves, S_IRUGO | S_IWUSR, bonding_show_slaves,
 		   bonding_store_slaves);
 
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH net-next v2 24/25] bonding: convert lp_interval to use the new option API
From: Nikolay Aleksandrov @ 2014-01-22 13:53 UTC (permalink / raw)
  To: netdev; +Cc: Nikolay Aleksandrov
In-Reply-To: <1390398820-5355-1-git-send-email-nikolay@redhat.com>

This patch adds the necessary changes so lp_interval would use
the new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_netlink.c |  3 ++-
 drivers/net/bonding/bond_options.c | 23 +++++++++++++++--------
 drivers/net/bonding/bond_options.h |  3 +++
 drivers/net/bonding/bond_sysfs.c   | 14 ++------------
 drivers/net/bonding/bonding.h      |  1 -
 5 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 1e099c6..a8fa725 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -301,7 +301,8 @@ static int bond_changelink(struct net_device *bond_dev,
 		int lp_interval =
 			nla_get_u32(data[IFLA_BOND_LP_INTERVAL]);
 
-		err = bond_option_lp_interval_set(bond, lp_interval);
+		bond_opt_initval(&newval, lp_interval);
+		err = __bond_opt_set(bond, BOND_OPT_LP_INTERVAL, &newval);
 		if (err)
 			return err;
 	}
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 1373093..60d7001 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -118,6 +118,11 @@ static struct bond_opt_value bond_resend_igmp_tbl[] = {
 	{ NULL,      -1,  0}
 };
 
+static struct bond_opt_value bond_lp_interval_tbl[] = {
+	{ "minval",  1,       BOND_VALFLAG_MIN | BOND_VALFLAG_DEFAULT},
+	{ "maxval",  INT_MAX, BOND_VALFLAG_MAX},
+};
+
 static struct bond_option bond_opts[] = {
 	[BOND_OPT_MODE] = {
 		.id = BOND_OPT_MODE,
@@ -288,6 +293,13 @@ static struct bond_option bond_opts[] = {
 		.values = bond_resend_igmp_tbl,
 		.set = bond_option_resend_igmp_set
 	},
+	[BOND_OPT_LP_INTERVAL] = {
+		.id = BOND_OPT_LP_INTERVAL,
+		.name = "lp_interval",
+		.desc = "The number of seconds between instances where the bonding driver sends learning packets to each slave's peer switch",
+		.values = bond_lp_interval_tbl,
+		.set = bond_option_lp_interval_set
+	},
 	{ }
 };
 
@@ -1101,15 +1113,10 @@ int bond_option_min_links_set(struct bonding *bond,
 	return 0;
 }
 
-int bond_option_lp_interval_set(struct bonding *bond, int lp_interval)
+int bond_option_lp_interval_set(struct bonding *bond,
+				struct bond_opt_value *newval)
 {
-	if (lp_interval <= 0) {
-		pr_err("%s: lp_interval must be between 1 and %d\n",
-		       bond->dev->name, INT_MAX);
-		return -EINVAL;
-	}
-
-	bond->params.lp_interval = lp_interval;
+	bond->params.lp_interval = newval->value;
 
 	return 0;
 }
diff --git a/drivers/net/bonding/bond_options.h b/drivers/net/bonding/bond_options.h
index f0c2cbb..eb3a773 100644
--- a/drivers/net/bonding/bond_options.h
+++ b/drivers/net/bonding/bond_options.h
@@ -60,6 +60,7 @@ enum {
 	BOND_OPT_QUEUE_ID,
 	BOND_OPT_ALL_SLAVES_ACTIVE,
 	BOND_OPT_RESEND_IGMP,
+	BOND_OPT_LP_INTERVAL,
 	BOND_OPT_LAST
 };
 
@@ -162,4 +163,6 @@ int bond_option_all_slaves_active_set(struct bonding *bond,
 				      struct bond_opt_value *newval);
 int bond_option_resend_igmp_set(struct bonding *bond,
 				struct bond_opt_value *newval);
+int bond_option_lp_interval_set(struct bonding *bond,
+				struct bond_opt_value *newval);
 #endif /* _BOND_OPTIONS_H */
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 1693ebd..f7bc791 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1070,22 +1070,12 @@ static ssize_t bonding_store_lp_interval(struct device *d,
 					 const char *buf, size_t count)
 {
 	struct bonding *bond = to_bond(d);
-	int new_value, ret;
-
-	if (sscanf(buf, "%d", &new_value) != 1) {
-		pr_err("%s: no lp interval value specified.\n",
-			bond->dev->name);
-		return -EINVAL;
-	}
-
-	if (!rtnl_trylock())
-		return restart_syscall();
+	int ret;
 
-	ret = bond_option_lp_interval_set(bond, new_value);
+	ret = bond_opt_tryset_rtnl(bond, BOND_OPT_LP_INTERVAL, (char *)buf);
 	if (!ret)
 		ret = count;
 
-	rtnl_unlock();
 	return ret;
 }
 
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 1683e40..8032e07 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -454,7 +454,6 @@ int bond_netlink_init(void);
 void bond_netlink_fini(void);
 int bond_option_arp_ip_target_add(struct bonding *bond, __be32 target);
 int bond_option_arp_ip_target_rem(struct bonding *bond, __be32 target);
-int bond_option_lp_interval_set(struct bonding *bond, int min_links);
 struct net_device *bond_option_active_slave_get_rcu(struct bonding *bond);
 struct net_device *bond_option_active_slave_get(struct bonding *bond);
 const char *bond_slave_link_status(s8 link);
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH net-next v2 23/25] bonding: convert resend_igmp to use the new option API
From: Nikolay Aleksandrov @ 2014-01-22 13:53 UTC (permalink / raw)
  To: netdev; +Cc: Nikolay Aleksandrov
In-Reply-To: <1390398820-5355-1-git-send-email-nikolay@redhat.com>

This patch adds the necessary changes so resend_igmp would use
the new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_netlink.c |  3 ++-
 drivers/net/bonding/bond_options.c | 29 +++++++++++++++++++----------
 drivers/net/bonding/bond_options.h |  3 +++
 drivers/net/bonding/bond_sysfs.c   | 14 ++------------
 drivers/net/bonding/bonding.h      |  1 -
 5 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 828013a..1e099c6 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -265,7 +265,8 @@ static int bond_changelink(struct net_device *bond_dev,
 		int resend_igmp =
 			nla_get_u32(data[IFLA_BOND_RESEND_IGMP]);
 
-		err = bond_option_resend_igmp_set(bond, resend_igmp);
+		bond_opt_initval(&newval, resend_igmp);
+		err = __bond_opt_set(bond, BOND_OPT_RESEND_IGMP, &newval);
 		if (err)
 			return err;
 	}
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 7fafc34..1373093 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -111,6 +111,13 @@ static struct bond_opt_value bond_all_slaves_active_tbl[] = {
 	{ NULL,  -1, 0}
 };
 
+static struct bond_opt_value bond_resend_igmp_tbl[] = {
+	{ "off",     0,   0},
+	{ "maxval",  255, BOND_VALFLAG_MAX},
+	{ "default", 1,   BOND_VALFLAG_DEFAULT},
+	{ NULL,      -1,  0}
+};
+
 static struct bond_option bond_opts[] = {
 	[BOND_OPT_MODE] = {
 		.id = BOND_OPT_MODE,
@@ -274,6 +281,13 @@ static struct bond_option bond_opts[] = {
 		.values = bond_all_slaves_active_tbl,
 		.set = bond_option_all_slaves_active_set
 	},
+	[BOND_OPT_RESEND_IGMP] = {
+		.id = BOND_OPT_RESEND_IGMP,
+		.name = "resend_igmp",
+		.desc = "Number of IGMP membership reports to send on link failure",
+		.values = bond_resend_igmp_tbl,
+		.set = bond_option_resend_igmp_set
+	},
 	{ }
 };
 
@@ -1038,17 +1052,12 @@ int bond_option_xmit_hash_policy_set(struct bonding *bond,
 	return 0;
 }
 
-int bond_option_resend_igmp_set(struct bonding *bond, int resend_igmp)
+int bond_option_resend_igmp_set(struct bonding *bond,
+				struct bond_opt_value *newval)
 {
-	if (resend_igmp < 0 || resend_igmp > 255) {
-		pr_err("%s: Invalid resend_igmp value %d not in range 0-255; rejected.\n",
-		       bond->dev->name, resend_igmp);
-		return -EINVAL;
-	}
-
-	bond->params.resend_igmp = resend_igmp;
-	pr_info("%s: Setting resend_igmp to %d.\n",
-		bond->dev->name, resend_igmp);
+	pr_info("%s: Setting resend_igmp to %llu.\n",
+		bond->dev->name, newval->value);
+	bond->params.resend_igmp = newval->value;
 
 	return 0;
 }
diff --git a/drivers/net/bonding/bond_options.h b/drivers/net/bonding/bond_options.h
index 09ee8c8..f0c2cbb 100644
--- a/drivers/net/bonding/bond_options.h
+++ b/drivers/net/bonding/bond_options.h
@@ -59,6 +59,7 @@ enum {
 	BOND_OPT_ACTIVE_SLAVE,
 	BOND_OPT_QUEUE_ID,
 	BOND_OPT_ALL_SLAVES_ACTIVE,
+	BOND_OPT_RESEND_IGMP,
 	BOND_OPT_LAST
 };
 
@@ -159,4 +160,6 @@ int bond_option_queue_id_set(struct bonding *bond,
 			     struct bond_opt_value *newval);
 int bond_option_all_slaves_active_set(struct bonding *bond,
 				      struct bond_opt_value *newval);
+int bond_option_resend_igmp_set(struct bonding *bond,
+				struct bond_opt_value *newval);
 #endif /* _BOND_OPTIONS_H */
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 20210d2..1693ebd 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1043,23 +1043,13 @@ static ssize_t bonding_store_resend_igmp(struct device *d,
 					 struct device_attribute *attr,
 					 const char *buf, size_t count)
 {
-	int new_value, ret = count;
 	struct bonding *bond = to_bond(d);
+	int ret;
 
-	if (sscanf(buf, "%d", &new_value) != 1) {
-		pr_err("%s: no resend_igmp value specified.\n",
-		       bond->dev->name);
-		return -EINVAL;
-	}
-
-	if (!rtnl_trylock())
-		return restart_syscall();
-
-	ret = bond_option_resend_igmp_set(bond, new_value);
+	ret = bond_opt_tryset_rtnl(bond, BOND_OPT_RESEND_IGMP, (char *)buf);
 	if (!ret)
 		ret = count;
 
-	rtnl_unlock();
 	return ret;
 }
 
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 9801c5d..1683e40 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -454,7 +454,6 @@ int bond_netlink_init(void);
 void bond_netlink_fini(void);
 int bond_option_arp_ip_target_add(struct bonding *bond, __be32 target);
 int bond_option_arp_ip_target_rem(struct bonding *bond, __be32 target);
-int bond_option_resend_igmp_set(struct bonding *bond, int resend_igmp);
 int bond_option_lp_interval_set(struct bonding *bond, int min_links);
 struct net_device *bond_option_active_slave_get_rcu(struct bonding *bond);
 struct net_device *bond_option_active_slave_get(struct bonding *bond);
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH net-next v2 22/25] bonding: convert all_slaves_active to use the new option API
From: Nikolay Aleksandrov @ 2014-01-22 13:53 UTC (permalink / raw)
  To: netdev; +Cc: Nikolay Aleksandrov
In-Reply-To: <1390398820-5355-1-git-send-email-nikolay@redhat.com>

This patch adds the necessary changes so all_slaves_active would use
the new bonding option API.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_netlink.c |  4 ++--
 drivers/net/bonding/bond_options.c | 29 +++++++++++++++++------------
 drivers/net/bonding/bond_options.h |  3 +++
 drivers/net/bonding/bond_sysfs.c   | 15 +++------------
 drivers/net/bonding/bonding.h      |  2 --
 5 files changed, 25 insertions(+), 28 deletions(-)

diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index dba1f2e..828013a 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -282,8 +282,8 @@ static int bond_changelink(struct net_device *bond_dev,
 		int all_slaves_active =
 			nla_get_u8(data[IFLA_BOND_ALL_SLAVES_ACTIVE]);
 
-		err = bond_option_all_slaves_active_set(bond,
-							all_slaves_active);
+		bond_opt_initval(&newval, all_slaves_active);
+		err = __bond_opt_set(bond, BOND_OPT_ALL_SLAVES_ACTIVE, &newval);
 		if (err)
 			return err;
 	}
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 8775c72..7fafc34 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -105,6 +105,12 @@ static struct bond_opt_value bond_use_carrier_tbl[] = {
 	{ NULL,  -1, 0}
 };
 
+static struct bond_opt_value bond_all_slaves_active_tbl[] = {
+	{ "off", 0,  BOND_VALFLAG_DEFAULT},
+	{ "on",  1,  0},
+	{ NULL,  -1, 0}
+};
+
 static struct bond_option bond_opts[] = {
 	[BOND_OPT_MODE] = {
 		.id = BOND_OPT_MODE,
@@ -261,6 +267,13 @@ static struct bond_option bond_opts[] = {
 		.flags = BOND_OPTFLAG_RAWVAL,
 		.set = bond_option_queue_id_set
 	},
+	[BOND_OPT_ALL_SLAVES_ACTIVE] = {
+		.id = BOND_OPT_ALL_SLAVES_ACTIVE,
+		.name = "all_slaves_active",
+		.desc = "Keep all frames received on an interface by setting active flag for all slaves",
+		.values = bond_all_slaves_active_tbl,
+		.set = bond_option_all_slaves_active_set
+	},
 	{ }
 };
 
@@ -1049,25 +1062,17 @@ int bond_option_num_peer_notif_set(struct bonding *bond,
 }
 
 int bond_option_all_slaves_active_set(struct bonding *bond,
-				      int all_slaves_active)
+				      struct bond_opt_value *newval)
 {
 	struct list_head *iter;
 	struct slave *slave;
 
-	if (all_slaves_active == bond->params.all_slaves_active)
+	if (newval->value == bond->params.all_slaves_active)
 		return 0;
-
-	if ((all_slaves_active == 0) || (all_slaves_active == 1)) {
-		bond->params.all_slaves_active = all_slaves_active;
-	} else {
-		pr_info("%s: Ignoring invalid all_slaves_active value %d.\n",
-			bond->dev->name, all_slaves_active);
-		return -EINVAL;
-	}
-
+	bond->params.all_slaves_active = newval->value;
 	bond_for_each_slave(bond, slave, iter) {
 		if (!bond_is_active_slave(slave)) {
-			if (all_slaves_active)
+			if (newval->value)
 				slave->inactive = 0;
 			else
 				slave->inactive = 1;
diff --git a/drivers/net/bonding/bond_options.h b/drivers/net/bonding/bond_options.h
index 396d504..09ee8c8 100644
--- a/drivers/net/bonding/bond_options.h
+++ b/drivers/net/bonding/bond_options.h
@@ -58,6 +58,7 @@ enum {
 	BOND_OPT_USE_CARRIER,
 	BOND_OPT_ACTIVE_SLAVE,
 	BOND_OPT_QUEUE_ID,
+	BOND_OPT_ALL_SLAVES_ACTIVE,
 	BOND_OPT_LAST
 };
 
@@ -156,4 +157,6 @@ int bond_option_active_slave_set(struct bonding *bond,
 				 struct bond_opt_value *newval);
 int bond_option_queue_id_set(struct bonding *bond,
 			     struct bond_opt_value *newval);
+int bond_option_all_slaves_active_set(struct bonding *bond,
+				      struct bond_opt_value *newval);
 #endif /* _BOND_OPTIONS_H */
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index a9cd3f5..20210d2 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1015,22 +1015,13 @@ static ssize_t bonding_store_slaves_active(struct device *d,
 					   const char *buf, size_t count)
 {
 	struct bonding *bond = to_bond(d);
-	int new_value, ret;
-
-	if (sscanf(buf, "%d", &new_value) != 1) {
-		pr_err("%s: no all_slaves_active value specified.\n",
-		       bond->dev->name);
-		return -EINVAL;
-	}
-
-	if (!rtnl_trylock())
-		return restart_syscall();
+	int ret;
 
-	ret = bond_option_all_slaves_active_set(bond, new_value);
+	ret = bond_opt_tryset_rtnl(bond, BOND_OPT_ALL_SLAVES_ACTIVE,
+				   (char *)buf);
 	if (!ret)
 		ret = count;
 
-	rtnl_unlock();
 	return ret;
 }
 static DEVICE_ATTR(all_slaves_active, S_IRUGO | S_IWUSR,
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 4ac79e1..9801c5d 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -455,8 +455,6 @@ void bond_netlink_fini(void);
 int bond_option_arp_ip_target_add(struct bonding *bond, __be32 target);
 int bond_option_arp_ip_target_rem(struct bonding *bond, __be32 target);
 int bond_option_resend_igmp_set(struct bonding *bond, int resend_igmp);
-int bond_option_all_slaves_active_set(struct bonding *bond,
-				      int all_slaves_active);
 int bond_option_lp_interval_set(struct bonding *bond, int min_links);
 struct net_device *bond_option_active_slave_get_rcu(struct bonding *bond);
 struct net_device *bond_option_active_slave_get(struct bonding *bond);
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH net-next v2 21/25] bonding: convert queue_id to use the new option API
From: Nikolay Aleksandrov @ 2014-01-22 13:53 UTC (permalink / raw)
  To: netdev; +Cc: Nikolay Aleksandrov
In-Reply-To: <1390398820-5355-1-git-send-email-nikolay@redhat.com>

This patch adds the necessary changes so queue_id would use
the new bonding option API. Also move it to its own set function in
bond_options.c and fix some style errors.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_options.c | 70 ++++++++++++++++++++++++++++++++++++++
 drivers/net/bonding/bond_options.h |  3 ++
 drivers/net/bonding/bond_sysfs.c   | 65 +++--------------------------------
 3 files changed, 77 insertions(+), 61 deletions(-)

diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 2315104d..8775c72 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -254,6 +254,13 @@ static struct bond_option bond_opts[] = {
 						BIT(BOND_MODE_ALB)),
 		.set = bond_option_active_slave_set
 	},
+	[BOND_OPT_QUEUE_ID] = {
+		.id = BOND_OPT_QUEUE_ID,
+		.name = "queue_id",
+		.desc = "Set queue id of a slave",
+		.flags = BOND_OPTFLAG_RAWVAL,
+		.set = bond_option_queue_id_set
+	},
 	{ }
 };
 
@@ -1130,3 +1137,66 @@ int bond_option_ad_select_set(struct bonding *bond,
 
 	return 0;
 }
+
+int bond_option_queue_id_set(struct bonding *bond,
+			     struct bond_opt_value *newval)
+{
+	struct slave *slave, *update_slave;
+	struct net_device *sdev;
+	struct list_head *iter;
+	char *delim;
+	int ret = 0;
+	u16 qid;
+
+	/* delim will point to queue id if successful */
+	delim = strchr(newval->string, ':');
+	if (!delim)
+		goto err_no_cmd;
+
+	/* Terminate string that points to device name and bump it
+	 * up one, so we can read the queue id there.
+	 */
+	*delim = '\0';
+	if (sscanf(++delim, "%hd\n", &qid) != 1)
+		goto err_no_cmd;
+
+	/* Check buffer length, valid ifname and queue id */
+	if (strlen(newval->string) > IFNAMSIZ ||
+	    !dev_valid_name(newval->string) ||
+	    qid > bond->dev->real_num_tx_queues)
+		goto err_no_cmd;
+
+	/* Get the pointer to that interface if it exists */
+	sdev = __dev_get_by_name(dev_net(bond->dev), newval->string);
+	if (!sdev)
+		goto err_no_cmd;
+
+	/* Search for thes slave and check for duplicate qids */
+	update_slave = NULL;
+	bond_for_each_slave(bond, slave, iter) {
+		if (sdev == slave->dev)
+			/* We don't need to check the matching
+			 * slave for dups, since we're overwriting it
+			 */
+			update_slave = slave;
+		else if (qid && qid == slave->queue_id) {
+			goto err_no_cmd;
+		}
+	}
+
+	if (!update_slave)
+		goto err_no_cmd;
+
+	/* Actually set the qids for the slave */
+	update_slave->queue_id = qid;
+
+out:
+	return ret;
+
+err_no_cmd:
+	pr_info("invalid input for queue_id set for %s.\n",
+		bond->dev->name);
+	ret = -EPERM;
+	goto out;
+
+}
diff --git a/drivers/net/bonding/bond_options.h b/drivers/net/bonding/bond_options.h
index 1f486a7..396d504 100644
--- a/drivers/net/bonding/bond_options.h
+++ b/drivers/net/bonding/bond_options.h
@@ -57,6 +57,7 @@ enum {
 	BOND_OPT_PRIMARY_RESELECT,
 	BOND_OPT_USE_CARRIER,
 	BOND_OPT_ACTIVE_SLAVE,
+	BOND_OPT_QUEUE_ID,
 	BOND_OPT_LAST
 };
 
@@ -153,4 +154,6 @@ int bond_option_use_carrier_set(struct bonding *bond,
 				struct bond_opt_value *newval);
 int bond_option_active_slave_set(struct bonding *bond,
 				 struct bond_opt_value *newval);
+int bond_option_queue_id_set(struct bonding *bond,
+			     struct bond_opt_value *newval);
 #endif /* _BOND_OPTIONS_H */
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 181a59d..a9cd3f5 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -985,72 +985,15 @@ static ssize_t bonding_store_queue_id(struct device *d,
 				      struct device_attribute *attr,
 				      const char *buffer, size_t count)
 {
-	struct slave *slave, *update_slave;
 	struct bonding *bond = to_bond(d);
-	struct list_head *iter;
-	u16 qid;
-	int ret = count;
-	char *delim;
-	struct net_device *sdev = NULL;
-
-	if (!rtnl_trylock())
-		return restart_syscall();
-
-	/* delim will point to queue id if successful */
-	delim = strchr(buffer, ':');
-	if (!delim)
-		goto err_no_cmd;
-
-	/*
-	 * Terminate string that points to device name and bump it
-	 * up one, so we can read the queue id there.
-	 */
-	*delim = '\0';
-	if (sscanf(++delim, "%hd\n", &qid) != 1)
-		goto err_no_cmd;
-
-	/* Check buffer length, valid ifname and queue id */
-	if (strlen(buffer) > IFNAMSIZ ||
-	    !dev_valid_name(buffer) ||
-	    qid > bond->dev->real_num_tx_queues)
-		goto err_no_cmd;
-
-	/* Get the pointer to that interface if it exists */
-	sdev = __dev_get_by_name(dev_net(bond->dev), buffer);
-	if (!sdev)
-		goto err_no_cmd;
-
-	/* Search for thes slave and check for duplicate qids */
-	update_slave = NULL;
-	bond_for_each_slave(bond, slave, iter) {
-		if (sdev == slave->dev)
-			/*
-			 * We don't need to check the matching
-			 * slave for dups, since we're overwriting it
-			 */
-			update_slave = slave;
-		else if (qid && qid == slave->queue_id) {
-			goto err_no_cmd;
-		}
-	}
-
-	if (!update_slave)
-		goto err_no_cmd;
+	int ret;
 
-	/* Actually set the qids for the slave */
-	update_slave->queue_id = qid;
+	ret = bond_opt_tryset_rtnl(bond, BOND_OPT_QUEUE_ID, (char *)buffer);
+	if (!ret)
+		ret = count;
 
-out:
-	rtnl_unlock();
 	return ret;
-
-err_no_cmd:
-	pr_info("invalid input for queue_id set for %s.\n",
-		bond->dev->name);
-	ret = -EPERM;
-	goto out;
 }
-
 static DEVICE_ATTR(queue_id, S_IRUGO | S_IWUSR, bonding_show_queue_id,
 		   bonding_store_queue_id);
 
-- 
1.8.4.2

^ permalink raw reply related

* [PATCH net-next v2 20/25] bonding: convert active_slave to use the new option API
From: Nikolay Aleksandrov @ 2014-01-22 13:53 UTC (permalink / raw)
  To: netdev; +Cc: Nikolay Aleksandrov
In-Reply-To: <1390398820-5355-1-git-send-email-nikolay@redhat.com>

This patch adds the necessary changes so active_slave would use
the new bonding option API. Also some trivial/style fixes.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_main.c    |  4 +++-
 drivers/net/bonding/bond_netlink.c |  9 +++++----
 drivers/net/bonding/bond_options.c | 30 +++++++++++++++++++++++-------
 drivers/net/bonding/bond_options.h |  3 +++
 drivers/net/bonding/bond_sysfs.c   | 24 ++----------------------
 drivers/net/bonding/bonding.h      |  1 -
 6 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 59edf18..2ca949f 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3123,6 +3123,7 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd
 	struct ifslave k_sinfo;
 	struct ifslave __user *u_sinfo = NULL;
 	struct mii_ioctl_data *mii = NULL;
+	struct bond_opt_value newval;
 	struct net *net;
 	int res = 0;
 
@@ -3218,7 +3219,8 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd
 		break;
 	case BOND_CHANGE_ACTIVE_OLD:
 	case SIOCBONDCHANGEACTIVE:
-		res = bond_option_active_slave_set(bond, slave_dev);
+		bond_opt_initstr(&newval, slave_dev->name);
+		res = __bond_opt_set(bond, BOND_OPT_ACTIVE_SLAVE, &newval);
 		break;
 	default:
 		res = -EOPNOTSUPP;
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c
index 2175e19..dba1f2e 100644
--- a/drivers/net/bonding/bond_netlink.c
+++ b/drivers/net/bonding/bond_netlink.c
@@ -116,16 +116,17 @@ static int bond_changelink(struct net_device *bond_dev,
 	if (data[IFLA_BOND_ACTIVE_SLAVE]) {
 		int ifindex = nla_get_u32(data[IFLA_BOND_ACTIVE_SLAVE]);
 		struct net_device *slave_dev;
+		char *active_slave = "";
 
-		if (ifindex == 0) {
-			slave_dev = NULL;
-		} else {
+		if (ifindex != 0) {
 			slave_dev = __dev_get_by_index(dev_net(bond_dev),
 						       ifindex);
 			if (!slave_dev)
 				return -ENODEV;
+			active_slave = slave_dev->name;
 		}
-		err = bond_option_active_slave_set(bond, slave_dev);
+		bond_opt_initstr(&newval, active_slave);
+		err = __bond_opt_set(bond, BOND_OPT_ACTIVE_SLAVE, &newval);
 		if (err)
 			return err;
 	}
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 18b1cc0..2315104d 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -244,6 +244,16 @@ static struct bond_option bond_opts[] = {
 		.values = bond_use_carrier_tbl,
 		.set = bond_option_use_carrier_set
 	},
+	[BOND_OPT_ACTIVE_SLAVE] = {
+		.id = BOND_OPT_ACTIVE_SLAVE,
+		.name = "active_slave",
+		.desc = "Currently active slave",
+		.flags = BOND_OPTFLAG_RAWVAL,
+		.unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_ACTIVEBACKUP) |
+						BIT(BOND_MODE_TLB) |
+						BIT(BOND_MODE_ALB)),
+		.set = bond_option_active_slave_set
+	},
 	{ }
 };
 
@@ -556,10 +566,21 @@ struct net_device *bond_option_active_slave_get(struct bonding *bond)
 }
 
 int bond_option_active_slave_set(struct bonding *bond,
-				 struct net_device *slave_dev)
+				 struct bond_opt_value *newval)
 {
+	char ifname[IFNAMSIZ] = { 0, };
+	struct net_device *slave_dev;
 	int ret = 0;
 
+	sscanf(newval->string, "%15s", ifname); /* IFNAMSIZ */
+	if (!strlen(ifname) || newval->string[0] == '\n') {
+		slave_dev = NULL;
+	} else {
+		slave_dev = __dev_get_by_name(dev_net(bond->dev), ifname);
+		if (!slave_dev)
+			return -ENODEV;
+	}
+
 	if (slave_dev) {
 		if (!netif_is_bond_slave(slave_dev)) {
 			pr_err("Device %s is not bonding slave.\n",
@@ -574,12 +595,6 @@ int bond_option_active_slave_set(struct bonding *bond,
 		}
 	}
 
-	if (!USES_PRIMARY(bond->params.mode)) {
-		pr_err("%s: Unable to change active slave; %s is in mode %d\n",
-		       bond->dev->name, bond->dev->name, bond->params.mode);
-		return -EINVAL;
-	}
-
 	block_netpoll_tx();
 	write_lock_bh(&bond->curr_slave_lock);
 
@@ -616,6 +631,7 @@ int bond_option_active_slave_set(struct bonding *bond,
 
 	write_unlock_bh(&bond->curr_slave_lock);
 	unblock_netpoll_tx();
+
 	return ret;
 }
 
diff --git a/drivers/net/bonding/bond_options.h b/drivers/net/bonding/bond_options.h
index 229f5c1..1f486a7 100644
--- a/drivers/net/bonding/bond_options.h
+++ b/drivers/net/bonding/bond_options.h
@@ -56,6 +56,7 @@ enum {
 	BOND_OPT_PRIMARY,
 	BOND_OPT_PRIMARY_RESELECT,
 	BOND_OPT_USE_CARRIER,
+	BOND_OPT_ACTIVE_SLAVE,
 	BOND_OPT_LAST
 };
 
@@ -150,4 +151,6 @@ int bond_option_primary_reselect_set(struct bonding *bond,
 				     struct bond_opt_value *newval);
 int bond_option_use_carrier_set(struct bonding *bond,
 				struct bond_opt_value *newval);
+int bond_option_active_slave_set(struct bonding *bond,
+				 struct bond_opt_value *newval);
 #endif /* _BOND_OPTIONS_H */
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 0040482..181a59d 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -809,34 +809,14 @@ static ssize_t bonding_store_active_slave(struct device *d,
 					  struct device_attribute *attr,
 					  const char *buf, size_t count)
 {
-	int ret;
 	struct bonding *bond = to_bond(d);
-	char ifname[IFNAMSIZ];
-	struct net_device *dev;
-
-	if (!rtnl_trylock())
-		return restart_syscall();
-
-	sscanf(buf, "%15s", ifname); /* IFNAMSIZ */
-	if (!strlen(ifname) || buf[0] == '\n') {
-		dev = NULL;
-	} else {
-		dev = __dev_get_by_name(dev_net(bond->dev), ifname);
-		if (!dev) {
-			ret = -ENODEV;
-			goto out;
-		}
-	}
+	int ret;
 
-	ret = bond_option_active_slave_set(bond, dev);
+	ret = bond_opt_tryset_rtnl(bond, BOND_OPT_ACTIVE_SLAVE, (char *)buf);
 	if (!ret)
 		ret = count;
 
- out:
-	rtnl_unlock();
-
 	return ret;
-
 }
 static DEVICE_ATTR(active_slave, S_IRUGO | S_IWUSR,
 		   bonding_show_active_slave, bonding_store_active_slave);
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 51f3b0a..4ac79e1 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -452,7 +452,6 @@ void bond_setup(struct net_device *bond_dev);
 unsigned int bond_get_num_tx_queues(void);
 int bond_netlink_init(void);
 void bond_netlink_fini(void);
-int bond_option_active_slave_set(struct bonding *bond, struct net_device *slave_dev);
 int bond_option_arp_ip_target_add(struct bonding *bond, __be32 target);
 int bond_option_arp_ip_target_rem(struct bonding *bond, __be32 target);
 int bond_option_resend_igmp_set(struct bonding *bond, int resend_igmp);
-- 
1.8.4.2

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox