Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] fix BUG: scheduling while atomic in netlink broadcast
From: Akshay Narayan @ 2017-05-19 18:47 UTC (permalink / raw)
  To: Cong Wang; +Cc: Linux Kernel Network Developers, David Miller
In-Reply-To: <CAM_iQpXraspjEvh9Wy5jOscFz7zKS4Z3XsigxCJFJ5_RuCOnyw@mail.gmail.com>

> I don't want to defend the use of yield() but it looks like there is other
> problem.

I believe this use of yield() should be replaced with cond_resched()
even if it turns out there is an unrelated problem.

> Does this module call netlink_broadcast() with __GFP_DIRECT_RECLAIM
> in IRQ context? If so you should adjust the gfp flags.

The module only calls netlink_broadcast() from a pluggable TCP
function; from my understanding this is not in the IRQ context. Full
trace, perhaps more clear, attached below.

May 19 14:30:44 ccp kernel: [  178.885546] BUG: scheduling while
atomic: mm-link/3105/0x00000200
May 19 14:30:44 ccp kernel: [  178.885552] Modules linked in:
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat libcrc32c xt_connmark nf_conntrack
ccp(OE) crct10dif_pclmul crc32_pclmul
 ghash_clmulni_intel snd_intel8x0 pcbc snd_ac97_codec joydev ac97_bus
snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi aesni_intel
snd_seq aes_x86_64 crypto_simd snd_seq_device snd_timer snd input_leds
i2c_piix4 glue_helper cryptd so
undcore mac_hid serio_raw vboxvideo ttm drm_kms_helper drm fb_sys_fops
syscopyarea sysfillrect sysimgblt vboxguest intel_rapl_perf parport_pc
ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid
e1000 ahci libahci psmouse
fjes pata_acpi video
May 19 14:30:44 ccp kernel: [  178.885665] CPU: 0 PID: 3105 Comm:
mm-link Tainted: G        W  OE   4.10.0-21-generic #23-Ubuntu
May 19 14:30:44 ccp kernel: [  178.885666] Hardware name: innotek GmbH
VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
May 19 14:30:44 ccp kernel: [  178.885667] Call Trace:
May 19 14:30:44 ccp kernel: [  178.885674]  dump_stack+0x63/0x81
May 19 14:30:44 ccp kernel: [  178.885678]  __schedule_bug+0x54/0x70
May 19 14:30:44 ccp kernel: [  178.885682]  __schedule+0x536/0x6f0
May 19 14:30:44 ccp kernel: [  178.885685]  schedule+0x36/0x80
May 19 14:30:44 ccp kernel: [  178.885687]  sys_sched_yield+0x4f/0x60
May 19 14:30:44 ccp kernel: [  178.885688]  yield+0x33/0x40
May 19 14:30:44 ccp kernel: [  178.885691]
netlink_broadcast_filtered+0x29b/0x3c0
May 19 14:30:44 ccp kernel: [  178.885692]  netlink_broadcast+0x1d/0x20
May 19 14:30:44 ccp kernel: [  178.885697]  nl_sendmsg+0xb8/0x664 [ccp]
May 19 14:30:44 ccp kernel: [  178.885699]  nl_send_ack_notif+0x7d/0x90 [ccp]
May 19 14:30:44 ccp kernel: [  178.885702]  tcp_ccp_cong_avoid+0x69/0x70 [ccp]
May 19 14:30:44 ccp kernel: [  178.885704]  tcp_ack+0x980/0xa60
May 19 14:30:44 ccp kernel: [  178.885708]  tcp_rcv_state_process+0x2be/0xda0
May 19 14:30:44 ccp kernel: [  178.885712]  ? security_sock_rcv_skb+0x3b/0x50
May 19 14:30:44 ccp kernel: [  178.885715]  ? sk_filter_trim_cap+0x3b/0x270
May 19 14:30:44 ccp kernel: [  178.885717]  tcp_v4_do_rcv+0xb2/0x200
May 19 14:30:44 ccp kernel: [  178.885719]  tcp_v4_rcv+0x90a/0xa00
May 19 14:30:44 ccp kernel: [  178.885722]  ip_local_deliver_finish+0x96/0x1c0
May 19 14:30:44 ccp kernel: [  178.885725]  ip_local_deliver+0x6f/0xe0
May 19 14:30:44 ccp kernel: [  178.885727]  ? ip_rcv_finish+0x3f0/0x3f0
May 19 14:30:44 ccp kernel: [  178.885730]  ip_rcv_finish+0x118/0x3f0
May 19 14:30:44 ccp kernel: [  178.885732]  ip_rcv+0x282/0x390
May 19 14:30:44 ccp kernel: [  178.885735]  ? inet_del_offload+0x40/0x40
May 19 14:30:44 ccp kernel: [  178.885737]  __netif_receive_skb_core+0x514/0xa40
May 19 14:30:44 ccp kernel: [  178.885740]  ? __check_object_size+0x10/0x1d7
May 19 14:30:44 ccp kernel: [  178.885742]  __netif_receive_skb+0x18/0x60
May 19 14:30:44 ccp kernel: [  178.885744]  netif_receive_skb_internal+0x32/0xa0
May 19 14:30:44 ccp kernel: [  178.885746]  netif_receive_skb+0x1c/0x70
May 19 14:30:44 ccp kernel: [  178.885749]  tun_get_user+0x425/0x800
May 19 14:30:44 ccp kernel: [  178.885751]  tun_chr_write_iter+0x57/0x70
May 19 14:30:44 ccp kernel: [  178.885752]  new_sync_write+0xd5/0x130
May 19 14:30:44 ccp kernel: [  178.885754]  __vfs_write+0x26/0x40
May 19 14:30:44 ccp kernel: [  178.885756]  vfs_write+0xb5/0x1a0
May 19 14:30:44 ccp kernel: [  178.885757]  SyS_write+0x55/0xc0
May 19 14:30:44 ccp kernel: [  178.885760]  entry_SYSCALL_64_fastpath+0x1e/0xad
May 19 14:30:44 ccp kernel: [  178.885762] RIP: 0033:0x7f8d9abbf670
May 19 14:30:44 ccp kernel: [  178.885763] RSP: 002b:00007ffc2f16d8b8
EFLAGS: 00000246 ORIG_RAX: 0000000000000001
May 19 14:30:44 ccp kernel: [  178.885765] RAX: ffffffffffffffda RBX:
00007ffc2f16dde0 RCX: 00007f8d9abbf670
May 19 14:30:44 ccp kernel: [  178.885767] RDX: 0000000000000038 RSI:
0000557403fd86e0 RDI: 0000000000000008
May 19 14:30:44 ccp kernel: [  178.885768] RBP: 00007ffc2f16dcd0 R08:
0000557403fd8d40 R09: 0000557403fd8690
May 19 14:30:44 ccp kernel: [  178.885769] R10: 000e20d41e89a5c5 R11:
0000000000000246 R12: 0000000000000108
May 19 14:30:44 ccp kernel: [  178.885770] R13: 000000000000145c R14:
00007ffc2f16dd98 R15: 0000000000000003

^ permalink raw reply

* Re: [PATCH net] bonding: fix randomly populated arp target array
From: Andy Gospodarek @ 2017-05-19 19:23 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, Mahesh Bandewar, Jay Vosburgh, Veaceslav Falico,
	netdev, stable
In-Reply-To: <20170519184646.42572-1-jarod@redhat.com>

On Fri, May 19, 2017 at 02:46:46PM -0400, Jarod Wilson wrote:
> In commit dc9c4d0fe023, the arp_target array moved from a static global
> to a local variable. By the nature of static globals, the array used to
> be initialized to all 0. At present, it's full of random data, which
> that gets interpreted as arp_target values, when none have actually been
> specified. Systems end up booting with spew along these lines:
> 
> [   32.161783] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
> [   32.168475] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
> [   32.175089] 8021q: adding VLAN 0 to HW filter on device lacp0
> [   32.193091] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
> [   32.204892] lacp0: Setting MII monitoring interval to 100
> [   32.211071] lacp0: Removing ARP target 216.124.228.17
> [   32.216824] lacp0: Removing ARP target 218.160.255.255
> [   32.222646] lacp0: Removing ARP target 185.170.136.184
> [   32.228496] lacp0: invalid ARP target 255.255.255.255 specified for removal
> [   32.236294] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
> [   32.243987] lacp0: Removing ARP target 56.125.228.17
> [   32.249625] lacp0: Removing ARP target 218.160.255.255
> [   32.255432] lacp0: Removing ARP target 15.157.233.184
> [   32.261165] lacp0: invalid ARP target 255.255.255.255 specified for removal
> [   32.268939] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
> [   32.276632] lacp0: Removing ARP target 16.0.0.0
> [   32.281755] lacp0: Removing ARP target 218.160.255.255
> [   32.287567] lacp0: Removing ARP target 72.125.228.17
> [   32.293165] lacp0: Removing ARP target 218.160.255.255
> [   32.298970] lacp0: Removing ARP target 8.125.228.17
> [   32.304458] lacp0: Removing ARP target 218.160.255.255
> 
> None of these were actually specified as ARP targets, and the driver does
> seem to clean up the mess okay, but it's rather noisy and confusing, leaks
> values to userspace, and the 255.255.255.255 spew shows up even when debug
> prints are disabled.
> 
> The fix: just zero out arp_target at init time.
> 
> While we're in here, init arp_all_targets_value in the right place.
> 

Looks good.  Thanks, Jarod!

Acked-by: Andy Gospodarek <andy@greyhouse.net>

> Fixes: dc9c4d0fe023 ("bonding: reduce scope of some global variables")
> CC: Mahesh Bandewar <maheshb@google.com>
> CC: Jay Vosburgh <j.vosburgh@gmail.com>
> CC: Veaceslav Falico <vfalico@gmail.com>
> CC: Andy Gospodarek <andy@greyhouse.net>
> CC: netdev@vger.kernel.org
> CC: stable@vger.kernel.org
> Signed-off-by: Jarod Wilson <jarod@redhat.com>
> ---
>  drivers/net/bonding/bond_main.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 2be78807fd6e..73313318399c 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4271,10 +4271,10 @@ static int bond_check_params(struct bond_params *params)
>  	int arp_validate_value, fail_over_mac_value, primary_reselect_value, i;
>  	struct bond_opt_value newval;
>  	const struct bond_opt_value *valptr;
> -	int arp_all_targets_value;
> +	int arp_all_targets_value = 0;
>  	u16 ad_actor_sys_prio = 0;
>  	u16 ad_user_port_key = 0;
> -	__be32 arp_target[BOND_MAX_ARP_TARGETS];
> +	__be32 arp_target[BOND_MAX_ARP_TARGETS] = { 0 };
>  	int arp_ip_count;
>  	int bond_mode	= BOND_MODE_ROUNDROBIN;
>  	int xmit_hashtype = BOND_XMIT_POLICY_LAYER2;
> @@ -4501,7 +4501,6 @@ static int bond_check_params(struct bond_params *params)
>  		arp_validate_value = 0;
>  	}
>  
> -	arp_all_targets_value = 0;
>  	if (arp_all_targets) {
>  		bond_opt_initstr(&newval, arp_all_targets);
>  		valptr = bond_opt_parse(bond_opt_get(BOND_OPT_ARP_ALL_TARGETS),
> -- 
> 2.12.1
> 

^ permalink raw reply

* Darlehen angebot 3 %
From: Frau SCHMIDT @ 2017-05-19 18:01 UTC (permalink / raw)



Sehr geehrte Damen  und Herren,

Haben Sie Interesse über einer finanziellen Darlehen zu 3%?
kontaktieren Sie mich für mehr Details und Bedingungen. ich kann all
jenen helfen, wer ein Darlehen benötigen.
Ich kann Ihnen biete ein darlehen in hohe von 10.000.000 EUR
Meine mail: info@rschmidt.online

Mit freundlichen Grüßen

Frau SCHMIDT

^ permalink raw reply

* Alignment in BPF verifier
From: Edward Cree @ 2017-05-19 20:00 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller, Daniel Borkmann
  Cc: alexei.starovoitov, netdev
In-Reply-To: <a0a47297-4765-3ba0-3ea2-013dda40e84a@solarflare.com>

Well, I've managed to get somewhat confused by reg->id.
In particular, I'm unsure which bpf_reg_types can have an id, and what
 exactly it means.  There seems to be some code that checks around map value
 pointers, which seems strange as maps have fixed sizes (and the comments in
 enum bpf_reg_type make it seem like id is a PTR_TO_PACKET thing) - is this
 maybe because of map-of-maps support, can the contained maps have differing
 element sizes?  Or do we allow *(map_value + var + imm), if map_value + var
 was appropriately bounds-checked?

Does the 'id' identify the variable that was added to an object pointer, or
 the object itself?  Or does it blur these and identify (what the comment in
 enum bpf_reg_type calls) "skb->data + (u16) var"?

Here's what I'm thinking of doing:
struct bpf_reg_state {
    enum bpf_reg_type type;
    union {
        /* valid when type == PTR_TO_PACKET */
        u16 range;

        /* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
         *   PTR_TO_MAP_VALUE_OR_NULL
         */
        struct bpf_map *map_ptr;
    };
    /* Used to find other pointers with the same variable base, so they
     * can share range and align knowledge.
     */
    u32 id;
    u32 off; /* fixed part of pointer offset */
    /* For scalar types (CONST_IMM | UNKNOWN_VALUE), this represents our
     * knowledge of the actual value.
     * For pointer types, this represents the variable part of the offset
     * from the pointed-to object, and is shared with all bpf_reg_states
     * with the same id as us.
     */
    struct tnum align;
    /* Used to determine if any memory access using this register will
     * result in a bad access. These two fields must be last.
     * See states_equal()
     * These refer to the same value as align, not necessarily the actual
     * contents of the register.
     */
    s64 min_value;
    u64 max_value;
};

Does that sound reasonable?  (And does my added comment on min/max_value
 accurately describe the current semantics, or will I need to change that
 as well?)

-Ed

PS. I think this approach would also mean several of the bpf_reg_types can
 be folded together:
* PTR_TO_MAP_VALUE and PTR_TO_MAP_VALUE_ADJ are the same
* FRAME_PTR is just a PTR_TO_STACK with known-zero offset
* CONST_IMM is similarly a special case of UNKNOWN_VALUE

^ permalink raw reply

* Re: [PATCH net-next v2] bridge: fix hello and hold timers starting/stopping
From: Nikolay Aleksandrov @ 2017-05-19 20:12 UTC (permalink / raw)
  To: Ivan Vecera, netdev; +Cc: davem, sashok, stephen, bridge, lucien.xin
In-Reply-To: <20170519173043.10201-1-cera@cera.cz>

On 5/19/17 8:30 PM, Ivan Vecera wrote:
> Current bridge code incorrectly handles starting/stopping of hello and
> hold timers during STP enable/disable.
> 
> 1. Timers are stopped in br_stp_start() during NO_STP->USER_STP
>     transition. The timers are already stopped in NO_STP state so
>     this is confusing no-op.
> 
> 2. During USER_STP->NO_STP transition the timers are started. This
>     does not make sense and is confusion because the timer should not be
>     active in NO_STP state.
> 
> Cc: davem@davemloft.net
> Cc: sashok@cumulusnetworks.com
> Cc: stephen@networkplumber.org
> Cc: bridge@lists.linux-foundation.org
> Cc: lucien.xin@gmail.com
> Cc: nikolay@cumulusnetworks.com
> Signed-off-by: Ivan Vecera <cera@cera.cz>
> ---
>   net/bridge/br_stp_if.c | 11 -----------
>   1 file changed, 11 deletions(-)
> 

LGTM, thanks!

Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

^ permalink raw reply

* Re: [PATCH v6 net-next 0/7] Extend socket timestamping API
From: Richard Cochran @ 2017-05-19 20:15 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: netdev, Willem de Bruijn
In-Reply-To: <20170519155241.15817-1-mlichvar@redhat.com>

On Fri, May 19, 2017 at 05:52:34PM +0200, Miroslav Lichvar wrote:
> Changes v5->v6:
> - fixed skb_is_swtx_tstamp() when OPT_TX_SWHW is disabled and improved
>   its description
> - improved OPT_PKTINFO documentation
> - improved scm_timestamping documentation

For the series:

Acked-by: Richard Cochran <richardcochran@gmail.com>

^ permalink raw reply

* [PATCH 0/4] Configuring traffic classes via new hardware offload mechanism in tc/mqprio
From: Amritha Nambiar @ 2017-05-20  0:58 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: alexander.h.duyck, kiran.patil, amritha.nambiar,
	sridhar.samudrala, mitch.a.williams, neerav.parikh,
	jeffrey.t.kirsher, netdev

The following series introduces a new harware offload mode in tc/mqprio where the TCs, the queue configurations and bandwidth rate limits are offloaded to the hardware.
The i40e driver enables the new mqprio hardware offload mechanism factoring the TCs, queue configuration and bandwidth rates by creating HW channel VSIs. 

In this mode, the priority to traffic class mapping and the user specified queue ranges are used to configure the traffic class when the 'hw' option is set to 2. This is achieved by creating HW channels(VSI). A new channel is created for each of the traffic class configuration offloaded via mqprio framework except for the first TC (TC0) which is for the main VSI. TC0 for the main VSI is also reconfigured as per user provided queue parameters. Finally, bandwidth rate limits are set on these traffic classes through the mqprio offload framework by sending these rates in addition to the number of TCs and the queue configurations.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
  queues 4@0 4@4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2

To dump the bandwidth rates:

# tc qdisc show dev eth0
  qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
               queues:(0:3) (4:7)
               min rates:0bit 0bit
               max rates:55Mbit 60Mbit

---

Amritha Nambiar (4):
      [next-queue]net: mqprio: Introduce new hardware offload mode in mqprio for offloading full TC configurations
      [next-queue]net: i40e: Add infrastructure for queue channel support with the TCs and queue configurations offloaded via mqprio scheduler
      [next-queue]net: i40e: Enable mqprio full offload mode in the i40e driver for configuring TCs and queue mapping
      [next-queue]net: i40e: Add support to set max bandwidth rates for TCs offloaded via tc/mqprio

 drivers/net/ethernet/intel/i40e/i40e.h         |   42 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |    6 
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 1365 +++++++++++++++++++++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.h    |    2 
 include/linux/netdevice.h                      |    2 
 include/net/pkt_cls.h                          |    7 
 include/uapi/linux/pkt_sched.h                 |   13 
 net/sched/sch_mqprio.c                         |  169 +++
 8 files changed, 1449 insertions(+), 157 deletions(-)

^ permalink raw reply

* [PATCH 1/4] [next-queue]net: mqprio: Introduce new hardware offload mode in mqprio for offloading full TC configurations
From: Amritha Nambiar @ 2017-05-20  0:58 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: alexander.h.duyck, kiran.patil, amritha.nambiar,
	sridhar.samudrala, mitch.a.williams, neerav.parikh,
	jeffrey.t.kirsher, netdev
In-Reply-To: <149524122523.11022.4541073724650541658.stgit@anamdev.jf.intel.com>

This patch introduces a new hardware offload mode in mqprio
which makes full use of the mqprio options, the TCs, the
queue configurations and the bandwidth rates for the TCs.
This is achieved by setting the value 2 for the "hw" option.
This new offload mode supports new attributes for traffic
class such as minimum and maximum values for bandwidth rate limits.

Introduces a new datastructure 'tc_mqprio_qopt_offload' for offloading
mqprio queue options and use this to be shared between the kernel and
device driver. This contains a copy of the exisiting datastructure
for mqprio queue options. This new datastructure can be extended when
adding new attributes for traffic class such as bandwidth rate limits. The
existing datastructure for mqprio queue options will be shared between the
kernel and userspace.

This patch enables configuring additional attributes associated
with a traffic class such as minimum and maximum bandwidth
rates and can be offloaded to the hardware in the new offload mode.
The min and max limits for bandwidth rates are provided
by the user along with the the TCs and the queue configurations
when creating the mqprio qdisc.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
  queues 4@0 4@4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2

To dump the bandwidth rates:

# tc qdisc show dev eth0
qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
             queues:(0:3) (4:7)
             min rates:0bit 0bit
             max rates:55Mbit 60Mbit

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 include/linux/netdevice.h      |    2 
 include/net/pkt_cls.h          |    7 ++
 include/uapi/linux/pkt_sched.h |   13 +++
 net/sched/sch_mqprio.c         |  169 +++++++++++++++++++++++++++++++++++++---
 4 files changed, 180 insertions(+), 11 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0150b2d..17b9066 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -779,6 +779,7 @@ enum {
 	TC_SETUP_CLSFLOWER,
 	TC_SETUP_MATCHALL,
 	TC_SETUP_CLSBPF,
+	TC_SETUP_MQPRIO_EXT,
 };
 
 struct tc_cls_u32_offload;
@@ -791,6 +792,7 @@ struct tc_to_netdev {
 		struct tc_cls_matchall_offload *cls_mall;
 		struct tc_cls_bpf_offload *cls_bpf;
 		struct tc_mqprio_qopt *mqprio;
+		struct tc_mqprio_qopt_offload *mqprio_qopt;
 	};
 	bool egress_dev;
 };
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 2c213a6..5ab8052 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -549,6 +549,13 @@ struct tc_cls_bpf_offload {
 	u32 gen_flags;
 };
 
+struct tc_mqprio_qopt_offload {
+	/* struct tc_mqprio_qopt must always be the first element */
+	struct tc_mqprio_qopt qopt;
+	u32	flags;
+	u64	min_rate[TC_QOPT_MAX_QUEUE];
+	u64	max_rate[TC_QOPT_MAX_QUEUE];
+};
 
 /* This structure holds cookie structure that is passed from user
  * to the kernel for actions and classifiers
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 099bf55..cf2a146 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -620,6 +620,7 @@ struct tc_drr_stats {
 enum {
 	TC_MQPRIO_HW_OFFLOAD_NONE,	/* no offload requested */
 	TC_MQPRIO_HW_OFFLOAD_TCS,	/* offload TCs, no queue counts */
+	TC_MQPRIO_HW_OFFLOAD,		/* fully supported offload */
 	__TC_MQPRIO_HW_OFFLOAD_MAX
 };
 
@@ -633,6 +634,18 @@ struct tc_mqprio_qopt {
 	__u16	offset[TC_QOPT_MAX_QUEUE];
 };
 
+#define TC_MQPRIO_F_MIN_RATE  0x1
+#define TC_MQPRIO_F_MAX_RATE  0x2
+
+enum {
+	TCA_MQPRIO_UNSPEC,
+	TCA_MQPRIO_MIN_RATE64,
+	TCA_MQPRIO_MAX_RATE64,
+	__TCA_MQPRIO_MAX,
+};
+
+#define TCA_MQPRIO_MAX (__TCA_MQPRIO_MAX - 1)
+
 /* SFB */
 
 enum {
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 0a4cf27..6457ec9 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -18,10 +18,13 @@
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
 #include <net/sch_generic.h>
+#include <net/pkt_cls.h>
 
 struct mqprio_sched {
 	struct Qdisc		**qdiscs;
 	int hw_offload;
+	u32 flags;
+	u64 min_rate[TC_QOPT_MAX_QUEUE], max_rate[TC_QOPT_MAX_QUEUE];
 };
 
 static void mqprio_destroy(struct Qdisc *sch)
@@ -39,10 +42,21 @@ static void mqprio_destroy(struct Qdisc *sch)
 	}
 
 	if (priv->hw_offload && dev->netdev_ops->ndo_setup_tc) {
-		struct tc_mqprio_qopt offload = { 0 };
-		struct tc_to_netdev tc = { .type = TC_SETUP_MQPRIO,
-					   { .mqprio = &offload } };
+		struct tc_mqprio_qopt_offload offload = { 0 };
+		struct tc_to_netdev tc = { 0 };
 
+		switch (priv->hw_offload) {
+		case TC_MQPRIO_HW_OFFLOAD_TCS:
+			tc.type = TC_SETUP_MQPRIO;
+			tc.mqprio = &offload.qopt;
+			break;
+		case TC_MQPRIO_HW_OFFLOAD:
+			tc.type = TC_SETUP_MQPRIO_EXT;
+			tc.mqprio_qopt = &offload;
+			break;
+		default:
+			return;
+		}
 		dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0, &tc);
 	} else {
 		netdev_set_num_tc(dev, 0);
@@ -99,6 +113,24 @@ static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt)
 	return 0;
 }
 
+static const struct nla_policy mqprio_policy[TCA_MQPRIO_MAX + 1] = {
+	[TCA_MQPRIO_MIN_RATE64] = { .type = NLA_NESTED },
+	[TCA_MQPRIO_MAX_RATE64] = { .type = NLA_NESTED },
+};
+
+static int parse_attr(struct nlattr *tb[], int maxtype, struct nlattr *nla,
+		      const struct nla_policy *policy, int len)
+{
+	int nested_len = nla_len(nla) - NLA_ALIGN(len);
+
+	if (nested_len >= nla_attr_size(0))
+		return nla_parse(tb, maxtype, nla_data(nla) + NLA_ALIGN(len),
+				 nested_len, policy, NULL);
+
+	memset(tb, 0, sizeof(struct nlattr *) * (maxtype + 1));
+	return 0;
+}
+
 static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 {
 	struct net_device *dev = qdisc_dev(sch);
@@ -107,6 +139,10 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 	struct Qdisc *qdisc;
 	int i, err = -EOPNOTSUPP;
 	struct tc_mqprio_qopt *qopt = NULL;
+	struct nlattr *tb[TCA_MQPRIO_MAX + 1];
+	struct nlattr *attr;
+	int rem;
+	int len = nla_len(opt) - NLA_ALIGN(sizeof(*qopt));
 
 	BUILD_BUG_ON(TC_MAX_QUEUE != TC_QOPT_MAX_QUEUE);
 	BUILD_BUG_ON(TC_BITMASK != TC_QOPT_BITMASK);
@@ -124,6 +160,51 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 	if (mqprio_parse_opt(dev, qopt))
 		return -EINVAL;
 
+	if (len > 0) {
+		err = parse_attr(tb, TCA_MQPRIO_MAX, opt, mqprio_policy,
+				 sizeof(*qopt));
+		if (err < 0)
+			return err;
+
+		if (tb[TCA_MQPRIO_MIN_RATE64]) {
+			if (qopt->hw != TC_MQPRIO_HW_OFFLOAD)
+				return -EINVAL;
+
+			i = 0;
+			nla_for_each_nested(attr, tb[TCA_MQPRIO_MIN_RATE64],
+					    rem) {
+				if (nla_type(attr) != TCA_MQPRIO_MIN_RATE64)
+					return -EINVAL;
+
+				if (i >= qopt->num_tc)
+					return -EINVAL;
+
+				priv->min_rate[i] = *(u64 *)nla_data(attr);
+				i++;
+			}
+			priv->flags |= TC_MQPRIO_F_MIN_RATE;
+		}
+
+		if (tb[TCA_MQPRIO_MAX_RATE64]) {
+			if (qopt->hw != TC_MQPRIO_HW_OFFLOAD)
+				return -EINVAL;
+
+			i = 0;
+			nla_for_each_nested(attr, tb[TCA_MQPRIO_MAX_RATE64],
+					    rem) {
+				if (nla_type(attr) != TCA_MQPRIO_MAX_RATE64)
+					return -EINVAL;
+
+				if (i >= qopt->num_tc)
+					return -EINVAL;
+
+				priv->max_rate[i] = *(u64 *)nla_data(attr);
+				i++;
+			}
+			priv->flags |= TC_MQPRIO_F_MAX_RATE;
+		}
+	}
+
 	/* pre-allocate qdisc, attachment can't fail */
 	priv->qdiscs = kcalloc(dev->num_tx_queues, sizeof(priv->qdiscs[0]),
 			       GFP_KERNEL);
@@ -148,15 +229,36 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
 	 * supplied and verified mapping
 	 */
 	if (qopt->hw) {
-		struct tc_mqprio_qopt offload = *qopt;
-		struct tc_to_netdev tc = { .type = TC_SETUP_MQPRIO,
-					   { .mqprio = &offload } };
+		struct tc_mqprio_qopt_offload offload = {.qopt = *qopt};
+		struct tc_to_netdev tc = { 0 };
+
+		switch (qopt->hw) {
+		case TC_MQPRIO_HW_OFFLOAD_TCS:
+			tc.type = TC_SETUP_MQPRIO;
+			tc.mqprio = &offload.qopt;
+			break;
+		case TC_MQPRIO_HW_OFFLOAD:
+			tc.type = TC_SETUP_MQPRIO_EXT;
+			tc.mqprio_qopt = &offload;
+
+			offload.flags = priv->flags;
+			if (priv->flags & TC_MQPRIO_F_MIN_RATE)
+				for (i = 0; i < offload.qopt.num_tc; i++)
+					offload.min_rate[i] = priv->min_rate[i];
+
+			if (priv->flags & TC_MQPRIO_F_MAX_RATE)
+				for (i = 0; i < offload.qopt.num_tc; i++)
+					offload.max_rate[i] = priv->max_rate[i];
+			break;
+		default:
+			return -EINVAL;
+		}
 
 		err = dev->netdev_ops->ndo_setup_tc(dev, sch->handle, 0, &tc);
 		if (err)
 			return err;
 
-		priv->hw_offload = offload.hw;
+		priv->hw_offload = offload.qopt.hw;
 	} else {
 		netdev_set_num_tc(dev, qopt->num_tc);
 		for (i = 0; i < qopt->num_tc; i++)
@@ -226,11 +328,51 @@ static int mqprio_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
 	return 0;
 }
 
+static int dump_rates(struct mqprio_sched *priv,
+		      struct tc_mqprio_qopt *opt, struct sk_buff *skb)
+{
+	struct nlattr *nest;
+	int i;
+
+	if (priv->flags & TC_MQPRIO_F_MIN_RATE) {
+		nest = nla_nest_start(skb, TCA_MQPRIO_MIN_RATE64);
+		if (!nest)
+			goto nla_put_failure;
+
+		for (i = 0; i < opt->num_tc; i++) {
+			if (nla_put(skb, TCA_MQPRIO_MIN_RATE64,
+				    sizeof(priv->min_rate[i]),
+				    &priv->min_rate[i]))
+				goto nla_put_failure;
+		}
+		nla_nest_end(skb, nest);
+	}
+
+	if (priv->flags & TC_MQPRIO_F_MAX_RATE) {
+		nest = nla_nest_start(skb, TCA_MQPRIO_MAX_RATE64);
+		if (!nest)
+			goto nla_put_failure;
+
+		for (i = 0; i < opt->num_tc; i++) {
+			if (nla_put(skb, TCA_MQPRIO_MAX_RATE64,
+				    sizeof(priv->max_rate[i]),
+				    &priv->max_rate[i]))
+				goto nla_put_failure;
+		}
+		nla_nest_end(skb, nest);
+	}
+	return 0;
+
+nla_put_failure:
+	nla_nest_cancel(skb, nest);
+	return -1;
+}
+
 static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 {
 	struct net_device *dev = qdisc_dev(sch);
 	struct mqprio_sched *priv = qdisc_priv(sch);
-	unsigned char *b = skb_tail_pointer(skb);
+	struct nlattr *nla = (struct nlattr *)skb_tail_pointer(skb);
 	struct tc_mqprio_qopt opt = { 0 };
 	struct Qdisc *qdisc;
 	unsigned int i;
@@ -261,12 +403,17 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
 		opt.offset[i] = dev->tc_to_txq[i].offset;
 	}
 
-	if (nla_put(skb, TCA_OPTIONS, sizeof(opt), &opt))
+	if (nla_put(skb, TCA_OPTIONS, NLA_ALIGN(sizeof(opt)), &opt))
 		goto nla_put_failure;
 
-	return skb->len;
+	if (priv->flags & TC_MQPRIO_F_MIN_RATE ||
+	    priv->flags & TC_MQPRIO_F_MAX_RATE)
+		if (dump_rates(priv, &opt, skb) != 0)
+			goto nla_put_failure;
+
+	return nla_nest_end(skb, nla);
 nla_put_failure:
-	nlmsg_trim(skb, b);
+	nlmsg_trim(skb, nla);
 	return -1;
 }
 

^ permalink raw reply related

* [PATCH 2/4] [next-queue]net: i40e: Add infrastructure for queue channel support with the TCs and queue configurations offloaded via mqprio scheduler
From: Amritha Nambiar @ 2017-05-20  0:58 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: alexander.h.duyck, kiran.patil, amritha.nambiar,
	sridhar.samudrala, mitch.a.williams, neerav.parikh,
	jeffrey.t.kirsher, netdev
In-Reply-To: <149524122523.11022.4541073724650541658.stgit@anamdev.jf.intel.com>

This patch sets up the infrastructure for offloading TCs and
queue configurations to the hardware by creating HW channels(VSI).
A new channel is created for each of the traffic class
configuration offloaded via mqprio framework except for the first TC
(TC0). TC0 for the main VSI is also reconfigured as per user provided
queue parameters. Queue counts that are not power-of-2 are handled by
reconfiguring RSS by reprogramming LUTs using the queue count value.
This patch also handles configuring the TX rings for the channels,
setting up the RX queue map for channel.

Also, the channels so created are removed and all the queue
configuration is set to default when the qdisc is detached from the
root of the device.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      |   36 +
 drivers/net/ethernet/intel/i40e/i40e_main.c |  740 +++++++++++++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_txrx.h |    2 
 3 files changed, 771 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 395ca94..0915b02 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -330,6 +330,24 @@ struct i40e_flex_pit {
 	u8 pit_index;
 };
 
+struct i40e_channel {
+	struct list_head list;
+	bool initialized;
+	u8 type;
+	u16 vsi_number;
+	u16 stat_counter_idx;
+	u16 base_queue;
+	u16 num_queue_pairs; /* Requested by user */
+	u16 allowed_queue_pairs;
+	u16 seid;
+
+	u8 enabled_tc;
+	struct i40e_aqc_vsi_properties_data info;
+
+	/* track this channel belongs to which VSI */
+	struct i40e_vsi *parent_vsi;
+};
+
 /* struct that defines the Ethernet device */
 struct i40e_pf {
 	struct pci_dev *pdev;
@@ -442,6 +460,7 @@ struct i40e_pf {
 #define I40E_FLAG_CLIENT_L2_CHANGE		BIT_ULL(56)
 #define I40E_FLAG_WOL_MC_MAGIC_PKT_WAKE		BIT_ULL(57)
 #define I40E_FLAG_LEGACY_RX			BIT_ULL(58)
+#define I40E_FLAG_TC_MQPRIO			BIT_ULL(59)
 
 	struct i40e_client_instance *cinst;
 	bool stat_offsets_loaded;
@@ -523,6 +542,9 @@ struct i40e_pf {
 	u32 ioremap_len;
 	u32 fd_inv;
 	u16 phy_led_val;
+
+#define I40E_MAX_QUEUES_PER_CH	64
+	u16 override_q_count;
 };
 
 /**
@@ -684,6 +706,16 @@ struct i40e_vsi {
 	bool current_isup;	/* Sync 'link up' logging */
 	enum i40e_aq_link_speed current_speed;	/* Sync link speed logging */
 
+	/* channel specific fields */
+	u16 cnt_q_avail; /* num of queues available for channel usage */
+	u16 orig_rss_size;
+	u16 current_rss_size;
+
+	/* keeps track of next_base_queue to be used for channel setup */
+	atomic_t next_base_queue;
+
+	struct list_head ch_list;
+
 	void *priv;	/* client driver data reference. */
 
 	/* VSI specific handlers */
@@ -716,6 +748,9 @@ struct i40e_q_vector {
 	bool arm_wb_state;
 #define ITR_COUNTDOWN_START 100
 	u8 itr_countdown;	/* when 0 should adjust ITR */
+
+	/* Following field(s) are specific to channel usage */
+	bool is_an_atq;
 } ____cacheline_internodealigned_in_smp;
 
 /* lan device */
@@ -972,4 +1007,5 @@ i40e_status i40e_get_npar_bw_setting(struct i40e_pf *pf);
 i40e_status i40e_set_npar_bw_setting(struct i40e_pf *pf);
 i40e_status i40e_commit_npar_bw_setting(struct i40e_pf *pf);
 void i40e_print_link_message(struct i40e_vsi *vsi, bool isup);
+int i40e_create_queue_channel(struct i40e_vsi *vsi, struct i40e_channel *ch);
 #endif /* _I40E_H_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 8d1d3b85..e1bea45 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2864,7 +2864,7 @@ static void i40e_config_xps_tx_ring(struct i40e_ring *ring)
 	struct i40e_vsi *vsi = ring->vsi;
 	cpumask_var_t mask;
 
-	if (!ring->q_vector || !ring->netdev)
+	if (!ring->q_vector || !ring->netdev || ring->ch)
 		return;
 
 	/* Single TC mode enable XPS */
@@ -2937,7 +2937,17 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
 	 * initialization. This has to be done regardless of
 	 * DCB as by default everything is mapped to TC0.
 	 */
-	tx_ctx.rdylist = le16_to_cpu(vsi->info.qs_handle[ring->dcb_tc]);
+
+	if (ring->ch) {
+		tx_ctx.rdylist =
+			le16_to_cpu(ring->ch->info.qs_handle[ring->dcb_tc]);
+
+		dev_dbg(&vsi->back->pdev->dev, "ch, pf_q %d, rdylist %d\n",
+			pf_q, tx_ctx.rdylist);
+	} else {
+		tx_ctx.rdylist = le16_to_cpu(vsi->info.qs_handle[ring->dcb_tc]);
+	}
+
 	tx_ctx.rdylist_act = 0;
 
 	/* clear the context in the HMC */
@@ -2959,12 +2969,25 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
 	}
 
 	/* Now associate this queue with this PCI function */
-	if (vsi->type == I40E_VSI_VMDQ2) {
-		qtx_ctl = I40E_QTX_CTL_VM_QUEUE;
-		qtx_ctl |= ((vsi->id) << I40E_QTX_CTL_VFVM_INDX_SHIFT) &
-			   I40E_QTX_CTL_VFVM_INDX_MASK;
+	if (ring->ch) {
+		if (ring->ch->type == I40E_VSI_VMDQ2)
+			qtx_ctl = I40E_QTX_CTL_VM_QUEUE;
+		else if (ring->ch->type == I40E_VSI_SRIOV)
+			qtx_ctl = I40E_QTX_CTL_VF_QUEUE;
+
+		qtx_ctl |= (ring->ch->vsi_number <<
+			    I40E_QTX_CTL_VFVM_INDX_SHIFT) &
+			    I40E_QTX_CTL_VFVM_INDX_MASK;
+		dev_dbg(&vsi->back->pdev->dev, "ch, pf_q %d, qtx_ctl 0x%x\n",
+			pf_q, qtx_ctl);
 	} else {
-		qtx_ctl = I40E_QTX_CTL_PF_QUEUE;
+		if (vsi->type == I40E_VSI_VMDQ2) {
+			qtx_ctl = I40E_QTX_CTL_VM_QUEUE;
+			qtx_ctl |= ((vsi->id) << I40E_QTX_CTL_VFVM_INDX_SHIFT) &
+				    I40E_QTX_CTL_VFVM_INDX_MASK;
+		} else {
+			qtx_ctl = I40E_QTX_CTL_PF_QUEUE;
+		}
 	}
 
 	qtx_ctl |= ((hw->pf_id << I40E_QTX_CTL_PF_INDX_SHIFT) &
@@ -5060,6 +5083,699 @@ static int i40e_vsi_config_tc(struct i40e_vsi *vsi, u8 enabled_tc)
 }
 
 /**
+ * i40e_remove_queue_channel - Remove queue channel for the TC
+ * @vsi: VSI to be configured
+ *
+ * Remove queue channel for the TC
+ **/
+static void i40e_remove_queue_channel(struct i40e_vsi *vsi)
+{
+	struct i40e_channel *ch, *ch_tmp;
+	int ret, i;
+
+	/* Reset rss size that was stored when reconfiguring rss for
+	 * channel VSIs with non-power-of-2 queue count.
+	 */
+	vsi->current_rss_size = 0;
+
+	/* perform cleanup for channels if they exist */
+	if (list_empty(&vsi->ch_list))
+		return;
+
+	list_for_each_entry_safe(ch, ch_tmp, &vsi->ch_list, list) {
+		struct i40e_vsi *p_vsi;
+
+		list_del(&ch->list);
+		p_vsi = ch->parent_vsi;
+		if (!p_vsi || !ch->initialized) {
+			kfree(ch);
+			continue;
+		}
+		/* Reset queue contexts */
+		for (i = 0; i < ch->num_queue_pairs; i++) {
+			struct i40e_ring *tx_ring, *rx_ring;
+			u16 pf_q;
+
+			pf_q = ch->base_queue + i;
+			tx_ring = vsi->tx_rings[pf_q];
+			tx_ring->ch = NULL;
+
+			rx_ring = vsi->rx_rings[pf_q];
+			rx_ring->ch = NULL;
+		}
+
+		/* delete VSI from FW */
+		ret = i40e_aq_delete_element(&vsi->back->hw, ch->seid,
+					     NULL);
+		if (ret)
+			dev_err(&vsi->back->pdev->dev,
+				"unable to remove channel (%d) for parent VSI(%d)\n",
+				ch->seid, p_vsi->seid);
+		kfree(ch);
+	}
+}
+
+/**
+ * i40e_is_any_channel - channel exist or not
+ * @vsi: ptr to VSI to which channels are associated with
+ *
+ * Returns true or false if channel(s) exist for associated VSI or not
+ **/
+static bool i40e_is_any_channel(struct i40e_vsi *vsi)
+{
+	struct i40e_channel *ch, *ch_tmp;
+
+	list_for_each_entry_safe(ch, ch_tmp, &vsi->ch_list, list) {
+		if (ch->initialized)
+			return true;
+	}
+
+	return false;
+}
+
+/**
+ * i40e_get_max_queues_for_channel
+ * @vsi: ptr to VSI to which channels are associated with
+ *
+ * Helper function which returns max_queues count ever used for any of the
+ * channel which are parent of specified VSI
+ **/
+static int i40e_get_max_queues_for_channel(struct i40e_vsi *vsi)
+{
+	struct i40e_channel *ch, *ch_tmp;
+	int max = 0;
+
+	list_for_each_entry_safe(ch, ch_tmp, &vsi->ch_list, list) {
+		if (!ch->initialized)
+			continue;
+		if (ch->allowed_queue_pairs > max)
+			max = ch->allowed_queue_pairs;
+	}
+
+	return max;
+}
+
+/**
+ * i40e_validate_num_queues - validate num_queues w.r.t channel
+ * @pf: ptr to PF device
+ * @num_queues: number of queues
+ * @vsi: the parent VSI
+ * @reconfig_rss: indicates should the RSS be reconfigured or not
+ *
+ * This function validates number of queues in the context of new channel
+ * which is being established and determines if RSS should be reconfigured
+ * or not for parent VSI.
+ **/
+static int i40e_validate_num_queues(struct i40e_pf *pf, int num_queues,
+				    struct i40e_vsi *vsi, bool *reconfig_rss)
+{
+	int max_ch_queues;
+
+	if (!reconfig_rss)
+		return -EINVAL;
+
+	*reconfig_rss = false;
+
+	if (num_queues > I40E_MAX_QUEUES_PER_CH) {
+		dev_info(&pf->pdev->dev,
+			 "Failed to create VMDq VSI. User requested num_queues (%d) > I40E_MAX_QUEUES_PER_VSI (%u)\n",
+			 num_queues, I40E_MAX_QUEUES_PER_CH);
+		return -EINVAL;
+	}
+
+	if (vsi->current_rss_size) {
+		if (num_queues > vsi->current_rss_size) {
+			dev_info(&pf->pdev->dev,
+				 "Error: num_queues (%d) > vsi's current_size(%d)\n",
+				 num_queues, vsi->current_rss_size);
+			return -EINVAL;
+		} else if ((num_queues < vsi->current_rss_size) &&
+			   (!is_power_of_2(num_queues))) {
+			dev_info(&pf->pdev->dev,
+				 "Error: num_queues (%d) < vsi's current_size(%d), but not power of 2\n",
+				 num_queues, vsi->current_rss_size);
+			return -EINVAL;
+		}
+	}
+
+	if (!is_power_of_2(num_queues)) {
+		/* Find the max num_queues configures for channel if channel
+		 * exist.
+		 * if channel exist, then enforce 'num_queues' to be more than
+		 * max ever num_queues configured for channel.
+		 */
+		max_ch_queues = i40e_get_max_queues_for_channel(vsi);
+		if (num_queues < max_ch_queues) {
+			dev_info(&pf->pdev->dev,
+				 "Error: num_queues (%d) > main_vsi's current_size(%d)\n",
+				 num_queues, vsi->current_rss_size);
+			return -EINVAL;
+		}
+		*reconfig_rss = true;
+	}
+
+	return 0;
+}
+
+/**
+ * i40e_vsi_reconfig_rss - reconfig RSS based on specified rss_size
+ * @vsi: the VSI being setup
+ * @rss_size: size of RSS, accordingly LUT gets reprogrammed
+ *
+ * This function reconfigures RSS by reprogramming LUTs using 'rss_size'
+ **/
+static int i40e_vsi_reconfig_rss(struct i40e_vsi *vsi, u16 rss_size)
+{
+	struct i40e_pf *pf = vsi->back;
+	u8 seed[I40E_HKEY_ARRAY_SIZE];
+	struct i40e_hw *hw = &pf->hw;
+	int local_rss_size;
+	u8 *lut;
+	int ret;
+
+	if (!vsi->rss_size)
+		return -EINVAL;
+
+	if (rss_size > vsi->rss_size)
+		return -EINVAL;
+
+	local_rss_size = min_t(int, vsi->rss_size, rss_size);
+	lut = kzalloc(vsi->rss_table_size, GFP_KERNEL);
+	if (!lut)
+		return -ENOMEM;
+
+	/* Ignoring user configured lut if there is one */
+	i40e_fill_rss_lut(pf, lut, vsi->rss_table_size, local_rss_size);
+
+	/* Use user configured hash key if there is one, otherwise
+	 * use default.
+	 */
+	if (vsi->rss_hkey_user)
+		memcpy(seed, vsi->rss_hkey_user, I40E_HKEY_ARRAY_SIZE);
+	else
+		netdev_rss_key_fill((void *)seed, I40E_HKEY_ARRAY_SIZE);
+
+	ret = i40e_config_rss(vsi, seed, lut, vsi->rss_table_size);
+	if (ret) {
+		dev_info(&pf->pdev->dev,
+			 "Cannot set RSS lut, err %s aq_err %s\n",
+			 i40e_stat_str(hw, ret),
+			 i40e_aq_str(hw, hw->aq.asq_last_status));
+		kfree(lut);
+		return ret;
+	}
+	kfree(lut);
+
+	/* Do the update w.r.t. storing rss_size */
+	if (!vsi->orig_rss_size)
+		vsi->orig_rss_size = vsi->rss_size;
+	vsi->current_rss_size = local_rss_size;
+
+	return ret;
+}
+
+/**
+ * i40e_channel_setup_queue_map - Setup a channel queue map based on enabled_tc
+ * @pf: ptr to PF device
+ * @vsi: the VSI being setup
+ * @ctxt: VSI context structure
+ * @enabled_tc: Enabled TCs bitmap
+ * @ch: ptr to channel structure
+ *
+ * Setup queue map based on enabled_tc for specific channel
+ **/
+static void i40e_channel_setup_queue_map(struct i40e_pf *pf,
+					 struct i40e_vsi_context *ctxt,
+					 u8 enabled_tc, struct i40e_channel *ch)
+{
+	u16 qcount, num_tc_qps, qmap;
+	int pow, num_qps;
+	u16 sections = 0;
+	/* At least TC0 is enabled in case of non-DCB case, non-MQPRIO */
+	u16 numtc = 1;
+	u8 offset = 0;
+
+	sections = I40E_AQ_VSI_PROP_QUEUE_MAP_VALID;
+	sections |= I40E_AQ_VSI_PROP_SCHED_VALID;
+
+	if (pf->flags & I40E_FLAG_MSIX_ENABLED) {
+		qcount = min_t(int, ch->num_queue_pairs,
+			       pf->num_lan_msix);
+		ch->allowed_queue_pairs = qcount;
+	} else {
+		qcount = 1;
+	}
+
+	/* find num of qps per traffic class */
+	num_tc_qps = qcount / numtc;
+	num_tc_qps = min_t(int, num_tc_qps, i40e_pf_get_max_q_per_tc(pf));
+	num_qps = qcount;
+
+	/* find the next higher power-of-2 of num queue pairs */
+	pow = ilog2(num_qps);
+	if (!is_power_of_2(num_qps))
+		pow++;
+
+	qmap = (offset << I40E_AQ_VSI_TC_QUE_OFFSET_SHIFT) |
+		(pow << I40E_AQ_VSI_TC_QUE_NUMBER_SHIFT);
+
+	/* Setup queue TC[0].qmap for given VSI context */
+	ctxt->info.tc_mapping[0] = cpu_to_le16(qmap);
+
+	ctxt->info.up_enable_bits = enabled_tc;
+	ctxt->info.mapping_flags |= cpu_to_le16(I40E_AQ_VSI_QUE_MAP_CONTIG);
+	ctxt->info.queue_mapping[0] = cpu_to_le16(ch->base_queue);
+	ctxt->info.valid_sections |= cpu_to_le16(sections);
+}
+
+/**
+ * i40e_add_channel - add a channel by adding VSI
+ * @pf: ptr to PF device
+ * @uplink_seid: underlying HW switching element (VEB) ID
+ * @ch: ptr to channel structure
+ *
+ * Add a channel (VSI) using add_vsi and queue_map
+ **/
+static int i40e_add_channel(struct i40e_pf *pf, u16 uplink_seid,
+			    struct i40e_channel *ch)
+{
+	struct i40e_hw *hw = &pf->hw;
+	struct i40e_vsi_context ctxt;
+	u8 enabled_tc = 0x1; /* TC0 enabled */
+	int ret;
+
+	if (!(ch->type == I40E_VSI_SRIOV || ch->type == I40E_VSI_VMDQ2)) {
+		dev_info(&pf->pdev->dev,
+			 "add new vsi failed, ch->type %d\n", ch->type);
+		return -EINVAL;
+	}
+
+	memset(&ctxt, 0, sizeof(ctxt));
+	ctxt.pf_num = hw->pf_id;
+	ctxt.vf_num = 0;
+	ctxt.uplink_seid = uplink_seid;
+	ctxt.connection_type = I40E_AQ_VSI_CONN_TYPE_NORMAL;
+	if (ch->type == I40E_VSI_SRIOV)
+		ctxt.flags = I40E_AQ_VSI_TYPE_VF;
+	else if (ch->type == I40E_VSI_VMDQ2)
+		ctxt.flags = I40E_AQ_VSI_TYPE_VMDQ2;
+
+	if (pf->flags & I40E_FLAG_VEB_MODE_ENABLED) {
+		ctxt.info.valid_sections |=
+		     cpu_to_le16(I40E_AQ_VSI_PROP_SWITCH_VALID);
+		ctxt.info.switch_id =
+		   cpu_to_le16(I40E_AQ_VSI_SW_ID_FLAG_ALLOW_LB);
+	}
+
+	/* Set queue map for a given VSI context */
+	i40e_channel_setup_queue_map(pf, &ctxt, enabled_tc, ch);
+
+	/* Now time to create VSI */
+	ret = i40e_aq_add_vsi(hw, &ctxt, NULL);
+	if (ret) {
+		dev_info(&pf->pdev->dev,
+			 "add new vsi failed, err %s aq_err %s\n",
+			 i40e_stat_str(&pf->hw, ret),
+			 i40e_aq_str(&pf->hw,
+				     pf->hw.aq.asq_last_status));
+		return -ENOENT;
+	}
+
+	/* Success, update channel */
+	ch->enabled_tc = enabled_tc;
+	ch->seid = ctxt.seid;
+	ch->vsi_number = ctxt.vsi_number;
+	ch->stat_counter_idx = cpu_to_le16(ctxt.info.stat_counter_idx);
+
+	/* copy just the sections touched not the entire info
+	 * since not all sections are valid as returned by
+	 * update vsi params
+	 */
+	ch->info.mapping_flags = ctxt.info.mapping_flags;
+	memcpy(&ch->info.queue_mapping,
+	       &ctxt.info.queue_mapping, sizeof(ctxt.info.queue_mapping));
+	memcpy(&ch->info.tc_mapping, ctxt.info.tc_mapping,
+	       sizeof(ctxt.info.tc_mapping));
+
+	/* Now it's time to update 'num_queue_pairs' if it is more than
+	 * 'allowed_queue_pairs' because RX queue map is setup based on
+	 * value of 'allowed_queue_pairs' (min of num_queue_pairs,
+	 * num_lan_msix). This update is needed so that TX rings are setup
+	 * correctly.
+	 */
+	if (ch->num_queue_pairs > ch->allowed_queue_pairs)
+		ch->num_queue_pairs = ch->allowed_queue_pairs;
+
+	return 0;
+}
+
+static int i40e_channel_config_bw(struct i40e_vsi *vsi, struct i40e_channel *ch,
+				  u8 *bw_share)
+{
+	struct i40e_aqc_configure_vsi_tc_bw_data bw_data;
+	i40e_status ret;
+	int i;
+
+	bw_data.tc_valid_bits = ch->enabled_tc;
+	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
+		bw_data.tc_bw_credits[i] = bw_share[i];
+
+	ret = i40e_aq_config_vsi_tc_bw(&vsi->back->hw, ch->seid,
+				       &bw_data, NULL);
+	if (ret) {
+		dev_info(&vsi->back->pdev->dev,
+			 "Config VSI BW allocation per TC failed, aq_err: %d for new_vsi->seid %u\n",
+			 vsi->back->hw.aq.asq_last_status, ch->seid);
+		return -EINVAL;
+	}
+
+	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
+		ch->info.qs_handle[i] = bw_data.qs_handles[i];
+
+	return 0;
+}
+
+/**
+ * i40e_channel_config_tx_ring - config TX ring associated with new channel
+ * @pf: ptr to PF device
+ * @vsi: the VSI being setup
+ * @ch: ptr to channel structure
+ *
+ * Configure TX rings associated with channel (VSI) since queues are being
+ * from parent VSI.
+ **/
+static int i40e_channel_config_tx_ring(struct i40e_pf *pf,
+				       struct i40e_vsi *vsi,
+				       struct i40e_channel *ch)
+{
+	i40e_status ret;
+	int i;
+	u8 bw_share[I40E_MAX_TRAFFIC_CLASS] = {0};
+
+	/* Enable ETS TCs with equal BW Share for now across all VSIs */
+	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) {
+		if (ch->enabled_tc & BIT(i))
+			bw_share[i] = 1;
+	}
+
+	/* configure BW for new VSi */
+	ret = i40e_channel_config_bw(vsi, ch, bw_share);
+	if (ret) {
+		dev_info(&vsi->back->pdev->dev,
+			 "Failed configuring TC map %d for channel (seid %u)\n",
+			 ch->enabled_tc, ch->seid);
+		return ret;
+	}
+
+	for (i = 0; i < ch->num_queue_pairs; i++) {
+		struct i40e_ring *tx_ring, *rx_ring;
+		u16 pf_q;
+
+		pf_q = ch->base_queue + i;
+
+		/* Get to TX ring ptr of main VSI, for re-setup TX queue
+		 * context
+		 */
+		tx_ring = vsi->tx_rings[pf_q];
+		tx_ring->ch = ch;
+
+		/* Get the RX ring ptr */
+		rx_ring = vsi->rx_rings[pf_q];
+		rx_ring->ch = ch;
+	}
+
+	return 0;
+}
+
+/**
+ * i40e_setup_hw_channel - setup new channel
+ * @pf: ptr to PF device
+ * @vsi: the VSI being setup
+ * @ch: ptr to channel structure
+ * @uplink_seid: underlying HW switching element (VEB) ID
+ * @type: type of channel to be created (VMDq2/VF)
+ *
+ * Setup new channel (VSI) based on specified type (VMDq2/VF)
+ * and configures TX rings accordingly
+ **/
+static inline int i40e_setup_hw_channel(struct i40e_pf *pf,
+					struct i40e_vsi *vsi,
+					struct i40e_channel *ch,
+					u16 uplink_seid, u8 type)
+{
+	struct i40e_q_vector *q_vector;
+	int base_queue = 0;
+	int ret, i;
+
+	ch->initialized = false;
+	ch->base_queue = atomic_read(&vsi->next_base_queue);
+	ch->type = type;
+
+	/* Proceed with creation of channel (VMDq2/VF) VSI */
+	ret = i40e_add_channel(pf, uplink_seid, ch);
+	if (ret) {
+		dev_info(&pf->pdev->dev,
+			 "failed to add_channel using uplink_seid %u\n",
+			 uplink_seid);
+		return ret;
+	}
+
+	/* Mark the successful creation of channel */
+	ch->initialized = true;
+
+	/* Mark q_vectors indicating that they are part of newly created
+	 * channel (VSI)
+	 */
+	base_queue = ch->base_queue;
+	for (i = 0; i < ch->num_queue_pairs; i++) {
+		q_vector = vsi->tx_rings[base_queue + i]->q_vector;
+
+		if (!q_vector)
+			continue;
+
+		q_vector->is_an_atq = true;
+	}
+
+	/* Reconfigure TX queues using QTX_CTL register */
+	ret = i40e_channel_config_tx_ring(pf, vsi, ch);
+	if (ret) {
+		dev_info(&pf->pdev->dev,
+			 "failed to configure TX rings for channel %u\n",
+			 ch->seid);
+		return ret;
+	}
+
+	/* update 'next_base_queue' */
+	atomic_set(&vsi->next_base_queue,
+		   atomic_read(&vsi->next_base_queue) + ch->num_queue_pairs);
+
+	dev_dbg(&pf->pdev->dev,
+		"Added channel: vsi_seid %u, vsi_number %u, stat_counter_idx %u, num_queue_pairs %u, pf->next_base_queue %d\n",
+		ch->seid, ch->vsi_number, ch->stat_counter_idx,
+		ch->num_queue_pairs,
+		atomic_read(&vsi->next_base_queue));
+
+	return ret;
+}
+
+/**
+ * i40e_setup_channel - setup new channel using uplink element
+ * @pf: ptr to PF device
+ * @type: type of channel to be created (VMDq2/VF)
+ * @uplink_seid: underlying HW switching element (VEB) ID
+ * @ch: ptr to channel structure
+ *
+ * Setup new channel (VSI) based on specified type (VMDq2/VF)
+ * and uplink switching element (uplink_seid)
+ **/
+static bool i40e_setup_channel(struct i40e_pf *pf, struct i40e_vsi *vsi,
+			       struct i40e_channel *ch)
+{
+	u16 seid = vsi->seid;
+	u8 vsi_type;
+	int ret;
+
+	if (vsi->type == I40E_VSI_MAIN) {
+		vsi_type = I40E_VSI_VMDQ2;
+	} else if (vsi->type == I40E_VSI_SRIOV) {
+		vsi_type = I40E_VSI_SRIOV;
+	} else {
+		dev_err(&pf->pdev->dev, "unsupported vsi type(%d) of parent vsi\n",
+			vsi->type);
+		return false;
+	}
+
+	/* underlying switching element */
+	seid = pf->vsi[pf->lan_vsi]->uplink_seid;
+
+	/* create channel (VSI), configure TX rings */
+	ret = i40e_setup_hw_channel(pf, vsi, ch, seid, vsi_type);
+	if (ret) {
+		dev_err(&pf->pdev->dev, "failed to setup hw_channel\n");
+		return false;
+	}
+
+	return ch->initialized ? true : false;
+}
+
+/**
+ * i40e_create_queue_channel - function to create channel
+ * @vsi: VSI to be configured
+ * @ch: ptr to channel (it contains channel specific params)
+ *
+ * This function creates channel (VSI) using num_queues specified by user,
+ * reconfigs RSS if needed.
+ **/
+int i40e_create_queue_channel(struct i40e_vsi *vsi,
+			      struct i40e_channel *ch)
+{
+	struct i40e_pf *pf = vsi->back;
+	bool reconfig_rss;
+	int err;
+
+	if (!ch)
+		return -EINVAL;
+
+	if (!ch->num_queue_pairs) {
+		dev_err(&pf->pdev->dev, "Invalid num_queues requested: %d\n",
+			ch->num_queue_pairs);
+		return -EINVAL;
+	}
+
+	/* validate user requested num_queues for channel */
+	err = i40e_validate_num_queues(pf, ch->num_queue_pairs, vsi,
+				       &reconfig_rss);
+	if (err) {
+		dev_info(&pf->pdev->dev, "Failed to validate num_queues (%d)\n",
+			 ch->num_queue_pairs);
+		return -EINVAL;
+	}
+
+	/* By default we are in VEPA mode, if this is the first VF/VMDq
+	 * VSI to be added switch to VEB mode.
+	 */
+	if ((!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) ||
+	    (!i40e_is_any_channel(vsi))) {
+		if (!is_power_of_2(vsi->tc_config.tc_info[0].qcount)) {
+			dev_info(&pf->pdev->dev,
+				 "Failed to create channel. Override queues (%u) not power of 2\n",
+				 vsi->tc_config.tc_info[0].qcount);
+			return -EINVAL;
+		}
+
+		if (vsi->type == I40E_VSI_SRIOV) {
+			if (!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) {
+				dev_info(&pf->pdev->dev,
+					 "Expected to be VEB mode by this time\n");
+				return -EINVAL;
+			}
+		}
+		if (!(pf->flags & I40E_FLAG_VEB_MODE_ENABLED)) {
+			pf->flags |= I40E_FLAG_VEB_MODE_ENABLED;
+
+			if (vsi->type == I40E_VSI_MAIN) {
+				if (pf->flags & I40E_FLAG_TC_MQPRIO)
+					i40e_do_reset(pf,
+					BIT_ULL(__I40E_PF_RESET_REQUESTED),
+						      true);
+				else
+					i40e_do_reset_safe(pf,
+					BIT_ULL(__I40E_PF_RESET_REQUESTED));
+			}
+		}
+		/* now onwards for main VSI, number of queues will be value
+		 * of TC0's queue count
+		 */
+	}
+
+	/* By this time, vsi->cnt_q_avail shall be set to non-zero and
+	 * it should be more than num_queues
+	 */
+	if (!vsi->cnt_q_avail || (vsi->cnt_q_avail < ch->num_queue_pairs)) {
+		dev_info(&pf->pdev->dev,
+			 "Error: cnt_q_avail (%u) less than num_queues %d\n",
+			 vsi->cnt_q_avail, ch->num_queue_pairs);
+		return -EINVAL;
+	}
+
+	/* reconfig_rss only if vsi type is MAIN_VSI, in case of SR_IOV VF VSI,
+	 * reconfig_rss to be handled by VF driver, if 'reconfig_rss' flag
+	 * is set to true
+	 */
+	if (reconfig_rss && (vsi->type == I40E_VSI_MAIN)) {
+		err = i40e_vsi_reconfig_rss(vsi, ch->num_queue_pairs);
+		if (err) {
+			dev_info(&pf->pdev->dev,
+				 "Error: unable to reconfig rss for num_queues (%u)\n",
+				 ch->num_queue_pairs);
+			return -EINVAL;
+		}
+	}
+
+	if (!i40e_setup_channel(pf, vsi, ch)) {
+		dev_info(&pf->pdev->dev, "Failed to setup channel\n");
+		return -EINVAL;
+	}
+
+	dev_info(&pf->pdev->dev,
+		 "Setup channel (id:%u) utilizing num_queues %d\n",
+		 ch->seid, ch->num_queue_pairs);
+
+	/* in case of VF, this will be main SRIOV VSI */
+	ch->parent_vsi = vsi;
+
+	/* and update main_vsi's count for queue_available to use */
+	vsi->cnt_q_avail -= ch->num_queue_pairs;
+
+	return 0;
+}
+
+/**
+ * i40e_configure_queue_channel - Add queue channel for the given TCs
+ * @vsi: VSI to be configured
+ *
+ * Configures queue channel mapping to the given TCs
+ **/
+static int i40e_configure_queue_channel(struct i40e_vsi *vsi)
+{
+	struct i40e_channel *ch;
+	int ret = 0, i;
+
+	INIT_LIST_HEAD(&vsi->ch_list);
+
+	/* Create app vsi with the TCs. Main VSI with TC0 is already set up */
+	for (i = 1; i < I40E_MAX_TRAFFIC_CLASS; i++)
+		if (vsi->tc_config.enabled_tc & BIT(i)) {
+			ch = kzalloc(sizeof(*ch), GFP_KERNEL);
+			if (!ch) {
+				ret = -ENOMEM;
+				goto err_free;
+			}
+
+			INIT_LIST_HEAD(&ch->list);
+			ch->num_queue_pairs =
+				vsi->tc_config.tc_info[i].qcount;
+			ch->base_queue =
+				vsi->tc_config.tc_info[i].qoffset;
+
+			list_add_tail(&ch->list, &vsi->ch_list);
+
+			ret = i40e_create_queue_channel(vsi, ch);
+			if (ret) {
+				dev_err(&vsi->back->pdev->dev,
+					"Failed creating queue channel with TC%d: queues %d\n",
+					i, ch->num_queue_pairs);
+				goto err_free;
+			}
+		}
+	return ret;
+
+err_free:
+	i40e_remove_queue_channel(vsi);
+	return ret;
+}
+
+/**
  * i40e_veb_config_tc - Configure TCs for given VEB
  * @veb: given VEB
  * @enabled_tc: TC bitmap
@@ -5502,10 +6218,20 @@ static int i40e_setup_tc(struct net_device *netdev, u8 tc)
 		goto exit;
 	}
 
+	if (pf->flags & I40E_FLAG_TC_MQPRIO) {
+		ret = i40e_configure_queue_channel(vsi);
+		if (ret) {
+			netdev_info(netdev,
+				    "Failed configuring queue channels\n");
+			goto exit;
+		}
+	}
+
 	/* Unquiesce VSI */
 	i40e_unquiesce_vsi(vsi);
 
 exit:
+	i40e_unquiesce_vsi(vsi);
 	return ret;
 }
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index f5de511..02e1e84 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -420,6 +420,8 @@ struct i40e_ring {
 					 * i40e_clean_rx_ring_irq() is called
 					 * for this ring.
 					 */
+
+	struct i40e_channel *ch;
 } ____cacheline_internodealigned_in_smp;
 
 static inline bool ring_uses_build_skb(struct i40e_ring *ring)

^ permalink raw reply related

* [PATCH 3/4] [next-queue]net: i40e: Enable mqprio full offload mode in the i40e driver for configuring TCs and queue mapping
From: Amritha Nambiar @ 2017-05-20  0:58 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: alexander.h.duyck, kiran.patil, amritha.nambiar,
	sridhar.samudrala, mitch.a.williams, neerav.parikh,
	jeffrey.t.kirsher, netdev
In-Reply-To: <149524122523.11022.4541073724650541658.stgit@anamdev.jf.intel.com>

The i40e driver is modified to enable the new mqprio hardware
offload mode and factor the TCs and queue configuration by
creating channel VSIs. In this mode, the priority to traffic
class mapping and the user specified queue ranges are used
to configure the traffic classes when the 'hw' option is set
to 2.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 4\
  map 0 0 0 0 1 2 2 3 queues 2@0 2@2 1@4 1@5 hw 2

# tc qdisc show dev eth0
qdisc mqprio 8038: root  tc 4 map 0 0 0 0 1 2 2 3 0 0 0 0 0 0 0 0
             queues:(0:1) (2:3) (4:4) (5:5)

The HW channels created are removed and all the queue configuration
is set to default when the qdisc is detached from the root of the
device.

#tc qdisc del dev eth0 root

This patch also disables setting up channels via ethtool (ethtool -L)
when the TCs are confgured using mqprio scheduler.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h         |    4 
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |    6 
 drivers/net/ethernet/intel/i40e/i40e_main.c    |  311 ++++++++++++++++++++++--
 3 files changed, 292 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 0915b02..a62f65a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -54,6 +54,8 @@
 #include <linux/clocksource.h>
 #include <linux/net_tstamp.h>
 #include <linux/ptp_clock_kernel.h>
+#include <net/pkt_cls.h>
+
 #include "i40e_type.h"
 #include "i40e_prototype.h"
 #include "i40e_client.h"
@@ -685,6 +687,7 @@ struct i40e_vsi {
 	enum i40e_vsi_type type;  /* VSI type, e.g., LAN, FCoE, etc */
 	s16 vf_id;		/* Virtual function ID for SRIOV VSIs */
 
+	struct tc_mqprio_qopt_offload mqprio_qopt; /* queue parameters */
 	struct i40e_tc_configuration tc_config;
 	struct i40e_aqc_vsi_properties_data info;
 
@@ -710,6 +713,7 @@ struct i40e_vsi {
 	u16 cnt_q_avail; /* num of queues available for channel usage */
 	u16 orig_rss_size;
 	u16 current_rss_size;
+	bool reconfig_rss;
 
 	/* keeps track of next_base_queue to be used for channel setup */
 	atomic_t next_base_queue;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 3d58762..ab52979 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -3841,6 +3841,12 @@ static int i40e_set_channels(struct net_device *dev,
 	if (vsi->type != I40E_VSI_MAIN)
 		return -EINVAL;
 
+	/* We do not support setting channels via ethtool when TCs are
+	 * configured through mqprio
+	 */
+	if (pf->flags & I40E_FLAG_TC_MQPRIO)
+		return -EINVAL;
+
 	/* verify they are not requesting separate vectors */
 	if (!count || ch->rx_count || ch->tx_count)
 		return -EINVAL;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e1bea45..7f61d4f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -68,6 +68,7 @@ static int i40e_reset(struct i40e_pf *pf);
 static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool lock_acquired);
 static void i40e_fdir_sb_setup(struct i40e_pf *pf);
 static int i40e_veb_get_bw_info(struct i40e_veb *veb);
+static int i40e_vsi_config_rss(struct i40e_vsi *vsi);
 
 /* i40e_pci_tbl - PCI Device ID Table
  *
@@ -1560,6 +1561,105 @@ static int i40e_set_mac(struct net_device *netdev, void *p)
 }
 
 /**
+ * i40e_vsi_setup_queue_map_mqprio - Prepares VSI tc_config to have queue
+ * configurations based on MQPRIO options.
+ * @vsi: the VSI being configured,
+ * @ctxt: VSI context structure
+ * @enabled_tc: number of traffic classes to enable
+ **/
+static int i40e_vsi_setup_queue_map_mqprio(struct i40e_vsi *vsi,
+					   struct i40e_vsi_context *ctxt,
+					   u8 enabled_tc)
+{
+	u8 netdev_tc = 0, offset = 0;
+	u16 qcount = 0, max_qcount, qmap, sections = 0;
+	int i, override_q, pow, num_qps, ret;
+
+	if (vsi->type != I40E_VSI_MAIN)
+		return -EINVAL;
+
+	sections = I40E_AQ_VSI_PROP_QUEUE_MAP_VALID;
+	sections |= I40E_AQ_VSI_PROP_SCHED_VALID;
+
+	vsi->tc_config.numtc = vsi->mqprio_qopt.qopt.num_tc;
+	vsi->tc_config.enabled_tc = enabled_tc ? enabled_tc : 1;
+
+	num_qps = vsi->mqprio_qopt.qopt.count[0];
+
+	/* find the next higher power-of-2 of num queue pairs */
+	pow = ilog2(num_qps);
+	if (!is_power_of_2(num_qps))
+		pow++;
+
+	qmap = (offset << I40E_AQ_VSI_TC_QUE_OFFSET_SHIFT) |
+		(pow << I40E_AQ_VSI_TC_QUE_NUMBER_SHIFT);
+
+	/* Setup queue offset/count for all TCs for given VSI */
+	max_qcount = vsi->mqprio_qopt.qopt.count[0];
+
+	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++) {
+		/* See if the given TC is enabled for the given VSI */
+		if (vsi->tc_config.enabled_tc & BIT(i)) {
+			offset = vsi->mqprio_qopt.qopt.offset[i];
+			qcount = vsi->mqprio_qopt.qopt.count[i];
+
+			if (qcount > max_qcount)
+				max_qcount = qcount;
+
+			vsi->tc_config.tc_info[i].qoffset = offset;
+			vsi->tc_config.tc_info[i].qcount = qcount;
+			vsi->tc_config.tc_info[i].netdev_tc = netdev_tc++;
+
+		} else {
+			/* TC is not enabled so set the offset to
+			 * default queue and allocate one queue
+			 * for the given TC.
+			 */
+			vsi->tc_config.tc_info[i].qoffset = 0;
+			vsi->tc_config.tc_info[i].qcount = 1;
+			vsi->tc_config.tc_info[i].netdev_tc = 0;
+		}
+	}
+
+	/* Set actual Tx/Rx queue pairs */
+	vsi->num_queue_pairs = offset + qcount;
+
+	/* Setup queue TC[0].qmap for given VSI context */
+	ctxt->info.tc_mapping[0] = cpu_to_le16(qmap);
+
+	ctxt->info.mapping_flags |=
+					cpu_to_le16(I40E_AQ_VSI_QUE_MAP_CONTIG);
+
+	ctxt->info.queue_mapping[0] = cpu_to_le16(vsi->base_queue);
+	ctxt->info.valid_sections |= cpu_to_le16(sections);
+
+	/* Reconfigure RSS for main VSI with max queue count */
+	vsi->rss_size = max_qcount;
+
+	ret = i40e_vsi_config_rss(vsi);
+	if (ret) {
+		dev_info(&vsi->back->pdev->dev,
+			 "Failed to reconfig rss for num_queues (%u)\n",
+			 max_qcount);
+		return ret;
+	}
+	vsi->reconfig_rss = true;
+	dev_dbg(&vsi->back->pdev->dev,
+		"Reconfigured rss with num_queues (%u)\n", max_qcount);
+
+	/* Find queue count available for channel VSIs and starting offset
+	 * for channel VSIs
+	 */
+	override_q = vsi->mqprio_qopt.qopt.count[0];
+	if (override_q && (override_q < vsi->num_queue_pairs)) {
+		vsi->cnt_q_avail = vsi->num_queue_pairs - override_q;
+		atomic_set(&vsi->next_base_queue, override_q);
+	}
+
+	return 0;
+}
+
+/**
  * i40e_vsi_setup_queue_map - Setup a VSI queue map based on enabled_tc
  * @vsi: the VSI being setup
  * @ctxt: VSI context structure
@@ -1597,7 +1697,7 @@ static void i40e_vsi_setup_queue_map(struct i40e_vsi *vsi,
 			numtc = 1;
 		}
 	} else {
-		/* At least TC0 is enabled in case of non-DCB case */
+		/* At least TC0 is enabled in non-DCB, non-MQPRIO case */
 		numtc = 1;
 	}
 
@@ -3150,6 +3250,7 @@ static void i40e_vsi_config_dcb_rings(struct i40e_vsi *vsi)
 			rx_ring->dcb_tc = 0;
 			tx_ring->dcb_tc = 0;
 		}
+		return;
 	}
 
 	for (n = 0; n < I40E_MAX_TRAFFIC_CLASS; n++) {
@@ -4777,6 +4878,25 @@ static u8 i40e_dcb_get_enabled_tc(struct i40e_dcbx_config *dcbcfg)
 }
 
 /**
+ * i40e_mqprio_get_enabled_tc - Get enabled traffic classes
+ * @pf: PF being queried
+ *
+ * Query the current MQPRIO configuration and return the number of
+ * traffic classes enabled.
+ **/
+static u8 i40e_mqprio_get_enabled_tc(struct i40e_pf *pf)
+{
+	struct i40e_vsi *vsi = pf->vsi[pf->lan_vsi];
+	u8 num_tc = vsi->mqprio_qopt.qopt.num_tc;
+	u8 enabled_tc = 1, i;
+
+	for (i = 1; i < num_tc; i++)
+		enabled_tc |= BIT(i);
+
+	return enabled_tc;
+}
+
+/**
  * i40e_pf_get_num_tc - Get enabled traffic classes for PF
  * @pf: PF being queried
  *
@@ -4789,7 +4909,10 @@ static u8 i40e_pf_get_num_tc(struct i40e_pf *pf)
 	u8 num_tc = 0;
 	struct i40e_dcbx_config *dcbcfg = &hw->local_dcbx_config;
 
-	/* If DCB is not enabled then always in single TC */
+	if (pf->flags & I40E_FLAG_TC_MQPRIO)
+		return pf->vsi[pf->lan_vsi]->mqprio_qopt.qopt.num_tc;
+
+	/* If neither MQPRIO nor DCB is enabled, then always in single TC */
 	if (!(pf->flags & I40E_FLAG_DCB_ENABLED))
 		return 1;
 
@@ -4818,7 +4941,12 @@ static u8 i40e_pf_get_num_tc(struct i40e_pf *pf)
  **/
 static u8 i40e_pf_get_tc_map(struct i40e_pf *pf)
 {
-	/* If DCB is not enabled for this PF then just return default TC */
+	if (pf->flags & I40E_FLAG_TC_MQPRIO)
+		return i40e_mqprio_get_enabled_tc(pf);
+
+	/* If neither MQPRIO nor DCB is enabled for this PF then just return
+	 * default TC
+	 */
 	if (!(pf->flags & I40E_FLAG_DCB_ENABLED))
 		return I40E_DEFAULT_TRAFFIC_CLASS;
 
@@ -4912,6 +5040,10 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
 	for (i = 0; i < I40E_MAX_TRAFFIC_CLASS; i++)
 		bw_data.tc_bw_credits[i] = bw_share[i];
 
+	if ((vsi->back->flags & I40E_FLAG_TC_MQPRIO) ||
+	    !vsi->mqprio_qopt.qopt.hw)
+		return 0;
+
 	ret = i40e_aq_config_vsi_tc_bw(&vsi->back->hw, vsi->seid, &bw_data,
 				       NULL);
 	if (ret) {
@@ -4970,6 +5102,9 @@ static void i40e_vsi_config_netdev_tc(struct i40e_vsi *vsi, u8 enabled_tc)
 					vsi->tc_config.tc_info[i].qoffset);
 	}
 
+	if (pf->flags & I40E_FLAG_TC_MQPRIO)
+		return;
+
 	/* Assign UP2TC map for the VSI */
 	for (i = 0; i < I40E_MAX_USER_PRIORITY; i++) {
 		/* Get the actual TC# for the UP */
@@ -5020,7 +5155,8 @@ static int i40e_vsi_config_tc(struct i40e_vsi *vsi, u8 enabled_tc)
 	int i;
 
 	/* Check if enabled_tc is same as existing or new TCs */
-	if (vsi->tc_config.enabled_tc == enabled_tc)
+	if (vsi->tc_config.enabled_tc == enabled_tc &&
+	    vsi->mqprio_qopt.qopt.hw != TC_MQPRIO_HW_OFFLOAD)
 		return ret;
 
 	/* Enable ETS TCs with equal BW Share for now across all VSIs */
@@ -5043,7 +5179,30 @@ static int i40e_vsi_config_tc(struct i40e_vsi *vsi, u8 enabled_tc)
 	ctxt.vf_num = 0;
 	ctxt.uplink_seid = vsi->uplink_seid;
 	ctxt.info = vsi->info;
-	i40e_vsi_setup_queue_map(vsi, &ctxt, enabled_tc, false);
+
+	if (vsi->back->flags & I40E_FLAG_TC_MQPRIO) {
+		ret = i40e_vsi_setup_queue_map_mqprio(vsi, &ctxt, enabled_tc);
+		if (ret)
+			goto out;
+
+	} else {
+		i40e_vsi_setup_queue_map(vsi, &ctxt, enabled_tc, false);
+	}
+
+	/* On destroying the qdisc, reset vsi->rss_size, as number of enabled
+	 * queues changed.
+	 */
+	if (!vsi->mqprio_qopt.qopt.hw && vsi->reconfig_rss) {
+		vsi->rss_size = min_t(int, vsi->back->alloc_rss_size,
+				      vsi->num_queue_pairs);
+		ret = i40e_vsi_config_rss(vsi);
+		if (ret) {
+			dev_info(&vsi->back->pdev->dev,
+				 "Failed to reconfig rss for num_queues\n");
+			return ret;
+		}
+		vsi->reconfig_rss = false;
+	}
 
 	if (vsi->back->flags & I40E_FLAG_IWARP_ENABLED) {
 		ctxt.info.valid_sections |=
@@ -5051,7 +5210,9 @@ static int i40e_vsi_config_tc(struct i40e_vsi *vsi, u8 enabled_tc)
 		ctxt.info.queueing_opt_flags |= I40E_AQ_VSI_QUE_OPT_TCP_ENA;
 	}
 
-	/* Update the VSI after updating the VSI queue-mapping information */
+	/* Update the VSI after updating the VSI queue-mapping
+	 * information
+	 */
 	ret = i40e_aq_update_vsi_params(&vsi->back->hw, &ctxt, NULL);
 	if (ret) {
 		dev_info(&vsi->back->pdev->dev,
@@ -6168,48 +6329,142 @@ void i40e_down(struct i40e_vsi *vsi)
 }
 
 /**
+ * i40e_validate_mqprio_queue_mapping - validate queue mapping info
+ * @vsi: the VSI being configured
+ * @mqprio_qopt: queue parametrs
+ **/
+int i40e_validate_mqprio_queue_mapping(struct i40e_vsi *vsi,
+				struct tc_mqprio_qopt_offload *mqprio_qopt)
+{
+	int i;
+
+	if ((mqprio_qopt->qopt.offset[0] != 0) ||
+	    (mqprio_qopt->qopt.num_tc < 1))
+		return -EINVAL;
+
+	for (i = 0; ; i++) {
+		if (!mqprio_qopt->qopt.count[i])
+			return -EINVAL;
+
+		if (mqprio_qopt->min_rate[i] || mqprio_qopt->max_rate[i])
+			return -EINVAL;
+
+		if (i >= mqprio_qopt->qopt.num_tc - 1)
+			break;
+
+		if (mqprio_qopt->qopt.offset[i + 1] !=
+		    (mqprio_qopt->qopt.offset[i] + mqprio_qopt->qopt.count[i]))
+			return -EINVAL;
+	}
+
+	if (vsi->num_queue_pairs <
+	    (mqprio_qopt->qopt.offset[i] + mqprio_qopt->qopt.count[i])) {
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/**
  * i40e_setup_tc - configure multiple traffic classes
  * @netdev: net device to configure
- * @tc: number of traffic classes to enable
+ * @tc: pointer to struct tc_to_netdev
  **/
-static int i40e_setup_tc(struct net_device *netdev, u8 tc)
+static int i40e_setup_tc(struct net_device *netdev, struct tc_to_netdev *tc)
 {
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_vsi *vsi = np->vsi;
 	struct i40e_pf *pf = vsi->back;
-	u8 enabled_tc = 0;
+	u8 enabled_tc = 0, num_tc = 0, hw = 0;
 	int ret = -EINVAL;
 	int i;
 
-	/* Check if DCB enabled to continue */
-	if (!(pf->flags & I40E_FLAG_DCB_ENABLED)) {
-		netdev_info(netdev, "DCB is not enabled for adapter\n");
-		goto exit;
+	if (tc->type == TC_SETUP_MQPRIO) {
+		hw = tc->mqprio->hw;
+		num_tc = tc->mqprio->num_tc;
+	} else if (tc->type == TC_SETUP_MQPRIO_EXT) {
+		hw = tc->mqprio_qopt->qopt.hw;
+		num_tc = tc->mqprio_qopt->qopt.num_tc;
 	}
 
-	/* Check if MFP enabled */
-	if (pf->flags & I40E_FLAG_MFP_ENABLED) {
-		netdev_info(netdev, "Configuring TC not supported in MFP mode\n");
-		goto exit;
+	if (!hw) {
+		pf->flags &= ~I40E_FLAG_TC_MQPRIO;
+		if (tc->type == TC_SETUP_MQPRIO_EXT)
+			memcpy(&vsi->mqprio_qopt, tc->mqprio_qopt,
+			       sizeof(*tc->mqprio_qopt));
+		goto config_tc;
 	}
 
-	/* Check whether tc count is within enabled limit */
-	if (tc > i40e_pf_get_num_tc(pf)) {
-		netdev_info(netdev, "TC count greater than enabled on link for adapter\n");
-		goto exit;
+	switch (hw) {
+	case TC_MQPRIO_HW_OFFLOAD_TCS:
+		pf->flags &= ~I40E_FLAG_TC_MQPRIO;
+		/* Check if DCB enabled to continue */
+		if (!(pf->flags & I40E_FLAG_DCB_ENABLED)) {
+			netdev_info(netdev,
+				    "DCB is not enabled for adapter\n");
+			goto exit;
+		}
+
+		/* Check if MFP enabled */
+		if (pf->flags & I40E_FLAG_MFP_ENABLED) {
+			netdev_info(netdev,
+				    "Configuring TC not supported in MFP mode\n");
+			goto exit;
+		}
+
+		/* Check whether tc count is within enabled limit */
+		if (num_tc > i40e_pf_get_num_tc(pf)) {
+			netdev_info(netdev,
+				    "TC count greater than enabled on link for adapter\n");
+			goto exit;
+		}
+		break;
+	case TC_MQPRIO_HW_OFFLOAD:
+		if (pf->flags & I40E_FLAG_DCB_ENABLED) {
+			netdev_info(netdev,
+				    "Full offload of TC Mqprio options is not supported when DCB is enabled\n");
+			goto exit;
+		}
+
+		/* Check if MFP enabled */
+		if (pf->flags & I40E_FLAG_MFP_ENABLED) {
+			netdev_info(netdev,
+				    "Configuring TC not supported in MFP mode\n");
+			goto exit;
+		}
+
+		ret = i40e_validate_mqprio_queue_mapping(vsi,
+							 tc->mqprio_qopt);
+		if (ret)
+			goto exit;
+
+		memcpy(&vsi->mqprio_qopt, tc->mqprio_qopt,
+		       sizeof(*tc->mqprio_qopt));
+
+		pf->flags |= I40E_FLAG_TC_MQPRIO;
+		pf->flags &= ~I40E_FLAG_DCB_ENABLED;
+
+		break;
+	default:
+		return -EINVAL;
 	}
 
+config_tc:
 	/* Generate TC map for number of tc requested */
-	for (i = 0; i < tc; i++)
+	for (i = 0; i < num_tc; i++)
 		enabled_tc |= BIT(i);
 
 	/* Requesting same TC configuration as already enabled */
-	if (enabled_tc == vsi->tc_config.enabled_tc)
+	if (enabled_tc == vsi->tc_config.enabled_tc &&
+	    hw != TC_MQPRIO_HW_OFFLOAD)
 		return 0;
 
 	/* Quiesce VSI queues */
 	i40e_quiesce_vsi(vsi);
 
+	if (!hw && !(pf->flags & I40E_FLAG_TC_MQPRIO))
+		i40e_remove_queue_channel(vsi);
+
 	/* Configure VSI for enabled TCs */
 	ret = i40e_vsi_config_tc(vsi, enabled_tc);
 	if (ret) {
@@ -6229,8 +6484,11 @@ static int i40e_setup_tc(struct net_device *netdev, u8 tc)
 
 	/* Unquiesce VSI */
 	i40e_unquiesce_vsi(vsi);
+	return ret;
 
 exit:
+	/* Reset the configuration data */
+	memset(&vsi->tc_config, 0, sizeof(vsi->tc_config));
 	i40e_unquiesce_vsi(vsi);
 	return ret;
 }
@@ -6238,12 +6496,7 @@ static int i40e_setup_tc(struct net_device *netdev, u8 tc)
 static int __i40e_setup_tc(struct net_device *netdev, u32 handle, __be16 proto,
 			   struct tc_to_netdev *tc)
 {
-	if (tc->type != TC_SETUP_MQPRIO)
-		return -EINVAL;
-
-	tc->mqprio->hw = TC_MQPRIO_HW_OFFLOAD_TCS;
-
-	return i40e_setup_tc(netdev, tc->mqprio->num_tc);
+	return i40e_setup_tc(netdev, tc);
 }
 
 /**

^ permalink raw reply related

* [PATCH 4/4] [next-queue]net: i40e: Add support to set max bandwidth rates for TCs offloaded via tc/mqprio
From: Amritha Nambiar @ 2017-05-20  0:58 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: alexander.h.duyck, kiran.patil, amritha.nambiar,
	sridhar.samudrala, mitch.a.williams, neerav.parikh,
	jeffrey.t.kirsher, netdev
In-Reply-To: <149524122523.11022.4541073724650541658.stgit@anamdev.jf.intel.com>

This patch enables setting up maximum Tx rates for the traffic
classes in i40e. The maximum rate offloaded to the hardware through
the mqprio framework is configured for the VSI. Configuring
minimum Tx rate limit is not supported in the device. The minimum
usable value for Tx rate is 50Mbps.

Example:
# tc qdisc add dev eth0 root mqprio num_tc 2  map 0 0 0 0 1 1 1 1\
  queues 4@0 4@4 min_rate 0Mbit 0Mbit max_rate 55Mbit 60Mbit hw 2

To dump the bandwidth rates:

# tc qdisc show dev eth0
qdisc mqprio 804a: root  tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
             queues:(0:3) (4:7)
             min rates:0bit 0bit
             max rates:55Mbit 60Mbit

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      |    2 +
 drivers/net/ethernet/intel/i40e/i40e_main.c |  102 ++++++++++++++++++++++++++-
 2 files changed, 100 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index a62f65a..83a060d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -346,6 +346,8 @@ struct i40e_channel {
 	u8 enabled_tc;
 	struct i40e_aqc_vsi_properties_data info;
 
+	u32 max_tx_rate;
+
 	/* track this channel belongs to which VSI */
 	struct i40e_vsi *parent_vsi;
 };
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 7f61d4f..3261dab 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -69,6 +69,8 @@ static void i40e_rebuild(struct i40e_pf *pf, bool reinit, bool lock_acquired);
 static void i40e_fdir_sb_setup(struct i40e_pf *pf);
 static int i40e_veb_get_bw_info(struct i40e_veb *veb);
 static int i40e_vsi_config_rss(struct i40e_vsi *vsi);
+static int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 ch_seid,
+			     u32 max_tx_rate);
 
 /* i40e_pci_tbl - PCI Device ID Table
  *
@@ -5033,7 +5035,7 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
 				       u8 *bw_share)
 {
 	struct i40e_aqc_configure_vsi_tc_bw_data bw_data;
-	i40e_status ret;
+	i40e_status ret = 0;
 	int i;
 
 	bw_data.tc_valid_bits = enabled_tc;
@@ -5041,8 +5043,20 @@ static int i40e_vsi_configure_bw_alloc(struct i40e_vsi *vsi, u8 enabled_tc,
 		bw_data.tc_bw_credits[i] = bw_share[i];
 
 	if ((vsi->back->flags & I40E_FLAG_TC_MQPRIO) ||
-	    !vsi->mqprio_qopt.qopt.hw)
-		return 0;
+	    !vsi->mqprio_qopt.qopt.hw) {
+		if (vsi->mqprio_qopt.max_rate[0]) {
+			u32 max_tx_rate = vsi->mqprio_qopt.max_rate[0];
+
+			max_tx_rate = (max_tx_rate * 8) / 1000000;
+
+			ret = i40e_set_bw_limit(vsi, vsi->seid, max_tx_rate);
+			if (ret)
+				dev_err(&vsi->back->pdev->dev,
+					"Failed to set tx rate (%u Mbps) for vsi->seid %u, error code %d.\n",
+					max_tx_rate, vsi->seid, ret);
+		}
+		return ret;
+	}
 
 	ret = i40e_aq_config_vsi_tc_bw(&vsi->back->hw, vsi->seid, &bw_data,
 				       NULL);
@@ -5297,6 +5311,71 @@ static void i40e_remove_queue_channel(struct i40e_vsi *vsi)
 }
 
 /**
+ * i40e_set_bw_limit - setup BW limit based on max_tx_rate
+ * @vsi: the VSI being setup
+ * @ch_seid: seid of the channel (VSI)
+ * @max_tx_rate: max TX rate to be configured as BW limit
+ *
+ * This function sets up BW limit for a given channel (ch_seid)
+ * based on max TX rate specified.
+ **/
+static int i40e_set_bw_limit(struct i40e_vsi *vsi, u16 ch_seid, u32 max_tx_rate)
+{
+	struct i40e_pf *pf = vsi->back;
+	int speed = 0;
+	int ret = 0;
+
+	switch (pf->hw.phy.link_info.link_speed) {
+	case I40E_LINK_SPEED_40GB:
+		speed = 40000;
+		break;
+	case I40E_LINK_SPEED_20GB:
+		speed = 20000;
+		break;
+	case I40E_LINK_SPEED_10GB:
+		speed = 10000;
+		break;
+	case I40E_LINK_SPEED_1GB:
+		speed = 1000;
+		break;
+	default:
+		break;
+	}
+
+	if (max_tx_rate > speed) {
+		dev_err(&pf->pdev->dev,
+			"Invalid tx rate %d specified for channel seid %d.",
+			max_tx_rate, ch_seid);
+		return -EINVAL;
+	}
+
+	if ((max_tx_rate < 50) && (max_tx_rate > 0)) {
+		dev_warn(&pf->pdev->dev,
+			 "Setting tx rate to minimum usable value of 50Mbps.\n");
+		max_tx_rate = 50;
+	}
+
+#define I40E_BW_CREDIT_DIVISOR 50     /* 50Mbps per BW credit */
+#define I40E_MAX_BW_INACTIVE_ACCUM 1
+
+	/* TX rate credits are in values of 50Mbps, 0 is disabled*/
+	ret = i40e_aq_config_vsi_bw_limit(&pf->hw, ch_seid,
+					  max_tx_rate / I40E_BW_CREDIT_DIVISOR,
+					  I40E_MAX_BW_INACTIVE_ACCUM,
+					  NULL);
+	if (ret)
+		dev_err(&pf->pdev->dev,
+			"Failed set tx rate (%u Mbps) for vsi->seid %u, error code %d.\n",
+			max_tx_rate, ch_seid, ret);
+	else
+		dev_info(&pf->pdev->dev,
+			 "Set tx rate of %u Mbps (count of 50Mbps %u) for vsi->seid %u\n",
+			 max_tx_rate, max_tx_rate / I40E_BW_CREDIT_DIVISOR,
+			 ch_seid);
+	return ret;
+}
+
+/**
  * i40e_is_any_channel - channel exist or not
  * @vsi: ptr to VSI to which channels are associated with
  *
@@ -5882,6 +5961,11 @@ int i40e_create_queue_channel(struct i40e_vsi *vsi,
 		 "Setup channel (id:%u) utilizing num_queues %d\n",
 		 ch->seid, ch->num_queue_pairs);
 
+	/* configure VSI for BW limit */
+	if (ch->max_tx_rate)
+		if (i40e_set_bw_limit(vsi, ch->seid, ch->max_tx_rate))
+			return -EINVAL;
+
 	/* in case of VF, this will be main SRIOV VSI */
 	ch->parent_vsi = vsi;
 
@@ -5918,6 +6002,13 @@ static int i40e_configure_queue_channel(struct i40e_vsi *vsi)
 				vsi->tc_config.tc_info[i].qcount;
 			ch->base_queue =
 				vsi->tc_config.tc_info[i].qoffset;
+			ch->max_tx_rate =
+				vsi->mqprio_qopt.max_rate[i];
+
+			/* Bandwidth limit through tc interface is in bytes/s,
+			 * change to Mbit/s
+			 */
+			ch->max_tx_rate = (ch->max_tx_rate * 8) / 1000000;
 
 			list_add_tail(&ch->list, &vsi->ch_list);
 
@@ -6346,8 +6437,11 @@ int i40e_validate_mqprio_queue_mapping(struct i40e_vsi *vsi,
 		if (!mqprio_qopt->qopt.count[i])
 			return -EINVAL;
 
-		if (mqprio_qopt->min_rate[i] || mqprio_qopt->max_rate[i])
+		if (mqprio_qopt->min_rate[i]) {
+			dev_err(&vsi->back->pdev->dev,
+				"Invalid min tx rate (greater than 0) specified\n");
 			return -EINVAL;
+		}
 
 		if (i >= mqprio_qopt->qopt.num_tc - 1)
 			break;

^ permalink raw reply related

* Re: [PATCH net] bonding: fix randomly populated arp target array
From: Mahesh Bandewar (महेश बंडेवार) @ 2017-05-19 20:38 UTC (permalink / raw)
  To: Jarod Wilson
  Cc: linux-kernel, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	linux-netdev, stable
In-Reply-To: <20170519184646.42572-1-jarod@redhat.com>

On Fri, May 19, 2017 at 11:46 AM, Jarod Wilson <jarod@redhat.com> wrote:
> In commit dc9c4d0fe023, the arp_target array moved from a static global
> to a local variable. By the nature of static globals, the array used to
> be initialized to all 0. At present, it's full of random data, which
> that gets interpreted as arp_target values, when none have actually been
> specified. Systems end up booting with spew along these lines:
>
> [   32.161783] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
> [   32.168475] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
> [   32.175089] 8021q: adding VLAN 0 to HW filter on device lacp0
> [   32.193091] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
> [   32.204892] lacp0: Setting MII monitoring interval to 100
> [   32.211071] lacp0: Removing ARP target 216.124.228.17
> [   32.216824] lacp0: Removing ARP target 218.160.255.255
> [   32.222646] lacp0: Removing ARP target 185.170.136.184
> [   32.228496] lacp0: invalid ARP target 255.255.255.255 specified for removal
> [   32.236294] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
> [   32.243987] lacp0: Removing ARP target 56.125.228.17
> [   32.249625] lacp0: Removing ARP target 218.160.255.255
> [   32.255432] lacp0: Removing ARP target 15.157.233.184
> [   32.261165] lacp0: invalid ARP target 255.255.255.255 specified for removal
> [   32.268939] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
> [   32.276632] lacp0: Removing ARP target 16.0.0.0
> [   32.281755] lacp0: Removing ARP target 218.160.255.255
> [   32.287567] lacp0: Removing ARP target 72.125.228.17
> [   32.293165] lacp0: Removing ARP target 218.160.255.255
> [   32.298970] lacp0: Removing ARP target 8.125.228.17
> [   32.304458] lacp0: Removing ARP target 218.160.255.255
>
> None of these were actually specified as ARP targets, and the driver does
> seem to clean up the mess okay, but it's rather noisy and confusing, leaks
> values to userspace, and the 255.255.255.255 spew shows up even when debug
> prints are disabled.
>
> The fix: just zero out arp_target at init time.
>
> While we're in here, init arp_all_targets_value in the right place.
>
> Fixes: dc9c4d0fe023 ("bonding: reduce scope of some global variables")
> CC: Mahesh Bandewar <maheshb@google.com>
> CC: Jay Vosburgh <j.vosburgh@gmail.com>
> CC: Veaceslav Falico <vfalico@gmail.com>
> CC: Andy Gospodarek <andy@greyhouse.net>
> CC: netdev@vger.kernel.org
> CC: stable@vger.kernel.org
> Signed-off-by: Jarod Wilson <jarod@redhat.com>
> ---
>  drivers/net/bonding/bond_main.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 2be78807fd6e..73313318399c 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4271,10 +4271,10 @@ static int bond_check_params(struct bond_params *params)
>         int arp_validate_value, fail_over_mac_value, primary_reselect_value, i;
>         struct bond_opt_value newval;
>         const struct bond_opt_value *valptr;
> -       int arp_all_targets_value;
> +       int arp_all_targets_value = 0;

I think this is unnecessary as long as the var is initialized before it's use.

>         u16 ad_actor_sys_prio = 0;
>         u16 ad_user_port_key = 0;
> -       __be32 arp_target[BOND_MAX_ARP_TARGETS];
> +       __be32 arp_target[BOND_MAX_ARP_TARGETS] = { 0 };

this is the only change required to avoid reported error.

>         int arp_ip_count;
>         int bond_mode   = BOND_MODE_ROUNDROBIN;
>         int xmit_hashtype = BOND_XMIT_POLICY_LAYER2;
> @@ -4501,7 +4501,6 @@ static int bond_check_params(struct bond_params *params)
>                 arp_validate_value = 0;
>         }
>
> -       arp_all_targets_value = 0;
>         if (arp_all_targets) {
>                 bond_opt_initstr(&newval, arp_all_targets);
>                 valptr = bond_opt_parse(bond_opt_get(BOND_OPT_ARP_ALL_TARGETS),
> --
> 2.12.1
>

^ permalink raw reply

* Re: Alignment in BPF verifier
From: David Miller @ 2017-05-19 20:39 UTC (permalink / raw)
  To: ecree; +Cc: ast, daniel, alexei.starovoitov, netdev
In-Reply-To: <bed5b512-6069-53cc-f128-05be05f89889@solarflare.com>

From: Edward Cree <ecree@solarflare.com>
Date: Fri, 19 May 2017 21:00:13 +0100

> Well, I've managed to get somewhat confused by reg->id.
> In particular, I'm unsure which bpf_reg_types can have an id, and what
>  exactly it means.  There seems to be some code that checks around map value
>  pointers, which seems strange as maps have fixed sizes (and the comments in
>  enum bpf_reg_type make it seem like id is a PTR_TO_PACKET thing) - is this
>  maybe because of map-of-maps support, can the contained maps have differing
>  element sizes?  Or do we allow *(map_value + var + imm), if map_value + var
>  was appropriately bounds-checked?
> 
> Does the 'id' identify the variable that was added to an object pointer, or
>  the object itself?  Or does it blur these and identify (what the comment in
>  enum bpf_reg_type calls) "skb->data + (u16) var"?

The reg->id value changes any time a variable gets added to a packet
pointer.

You will also notice right now that only packet pointers have their
alignment tracked.

I have changes pending that will do that for MAP pointers too, but
it needs more work.

^ permalink raw reply

* Re: [PATCH net] fix BUG: scheduling while atomic in netlink broadcast
From: Eric Dumazet @ 2017-05-19 20:40 UTC (permalink / raw)
  To: Akshay Narayan; +Cc: Cong Wang, Linux Kernel Network Developers, David Miller
In-Reply-To: <CAN-7y0rZV-coOpOAzVGZoZB=Mzo4LP7Jpxx3wk+PX_jNMr_=iA@mail.gmail.com>

On Fri, 2017-05-19 at 14:47 -0400, Akshay Narayan wrote:
> > I don't want to defend the use of yield() but it looks like there is other
> > problem.
> 
> I believe this use of yield() should be replaced with cond_resched()
> even if it turns out there is an unrelated problem.
> 
> > Does this module call netlink_broadcast() with __GFP_DIRECT_RECLAIM
> > in IRQ context? If so you should adjust the gfp flags.
> 
> The module only calls netlink_broadcast() from a pluggable TCP
> function; from my understanding this is not in the IRQ context. Full
> trace, perhaps more clear, attached below.
> 
> May 19 14:30:44 ccp kernel: [  178.885546] BUG: scheduling while
> atomic: mm-link/3105/0x00000200
> May 19 14:30:44 ccp kernel: [  178.885552] Modules linked in:
> ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
> nf_defrag_ipv4 nf_nat_ipv4 nf_nat libcrc32c xt_connmark nf_conntrack
> ccp(OE) crct10dif_pclmul crc32_pclmul
>  ghash_clmulni_intel snd_intel8x0 pcbc snd_ac97_codec joydev ac97_bus
> snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi aesni_intel
> snd_seq aes_x86_64 crypto_simd snd_seq_device snd_timer snd input_leds
> i2c_piix4 glue_helper cryptd so
> undcore mac_hid serio_raw vboxvideo ttm drm_kms_helper drm fb_sys_fops
> syscopyarea sysfillrect sysimgblt vboxguest intel_rapl_perf parport_pc
> ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid
> e1000 ahci libahci psmouse
> fjes pata_acpi video
> May 19 14:30:44 ccp kernel: [  178.885665] CPU: 0 PID: 3105 Comm:
> mm-link Tainted: G        W  OE   4.10.0-21-generic #23-Ubuntu
> May 19 14:30:44 ccp kernel: [  178.885666] Hardware name: innotek GmbH
> VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> May 19 14:30:44 ccp kernel: [  178.885667] Call Trace:
> May 19 14:30:44 ccp kernel: [  178.885674]  dump_stack+0x63/0x81
> May 19 14:30:44 ccp kernel: [  178.885678]  __schedule_bug+0x54/0x70
> May 19 14:30:44 ccp kernel: [  178.885682]  __schedule+0x536/0x6f0
> May 19 14:30:44 ccp kernel: [  178.885685]  schedule+0x36/0x80
> May 19 14:30:44 ccp kernel: [  178.885687]  sys_sched_yield+0x4f/0x60
> May 19 14:30:44 ccp kernel: [  178.885688]  yield+0x33/0x40
> May 19 14:30:44 ccp kernel: [  178.885691]
> netlink_broadcast_filtered+0x29b/0x3c0
> May 19 14:30:44 ccp kernel: [  178.885692]  netlink_broadcast+0x1d/0x20
> May 19 14:30:44 ccp kernel: [  178.885697]  nl_sendmsg+0xb8/0x664 [ccp]
> May 19 14:30:44 ccp kernel: [  178.885699]  nl_send_ack_notif+0x7d/0x90 [ccp]
> May 19 14:30:44 ccp kernel: [  178.885702]  tcp_ccp_cong_avoid+0x69/0x70 [ccp]
> May 19 14:30:44 ccp kernel: [  178.885704]  tcp_ack+0x980/0xa60
> May 19 14:30:44 ccp kernel: [  178.885708]  tcp_rcv_state_process+0x2be/0xda0
> May 19 14:30:44 ccp kernel: [  178.885712]  ? security_sock_rcv_skb+0x3b/0x50
> May 19 14:30:44 ccp kernel: [  178.885715]  ? sk_filter_trim_cap+0x3b/0x270

No idea what ccp is, it is not in upstream kernel, and it looks buggy.

Please do not send patches that are not needed in upstream kernel.

^ permalink raw reply

* Re: [PATCH v2 1/3] bpf: Use 1<<16 as ceiling for immediate alignment in verifier.
From: David Miller @ 2017-05-19 20:41 UTC (permalink / raw)
  To: ecree; +Cc: ast, daniel, alexei.starovoitov, netdev
In-Reply-To: <a0a47297-4765-3ba0-3ea2-013dda40e84a@solarflare.com>

From: Edward Cree <ecree@solarflare.com>
Date: Fri, 19 May 2017 18:17:42 +0100

> One question: is there a way to build the verifier as userland code
>  (or at least as a module), or will I have to reboot every time I
>  want to test a change?

There currently is no such machanism, you will have to reboot every
time.

I have considered working on making the code buildable outside of the
kernel.  It shouldn't be too hard.

^ permalink raw reply

* Re: Alignment in BPF verifier
From: David Miller @ 2017-05-19 20:48 UTC (permalink / raw)
  To: ecree; +Cc: ast, daniel, alexei.starovoitov, netdev
In-Reply-To: <bed5b512-6069-53cc-f128-05be05f89889@solarflare.com>

From: Edward Cree <ecree@solarflare.com>
Date: Fri, 19 May 2017 21:00:13 +0100

> Here's what I'm thinking of doing:
> struct bpf_reg_state {
>     enum bpf_reg_type type;
>     union {
>         /* valid when type == PTR_TO_PACKET */
>         u16 range;
> 
>         /* valid when type == CONST_PTR_TO_MAP | PTR_TO_MAP_VALUE |
>          *   PTR_TO_MAP_VALUE_OR_NULL
>          */
>         struct bpf_map *map_ptr;
>     };
>     /* Used to find other pointers with the same variable base, so they
>      * can share range and align knowledge.
>      */
>     u32 id;
>     u32 off; /* fixed part of pointer offset */
>     /* For scalar types (CONST_IMM | UNKNOWN_VALUE), this represents our
>      * knowledge of the actual value.
>      * For pointer types, this represents the variable part of the offset
>      * from the pointed-to object, and is shared with all bpf_reg_states
>      * with the same id as us.
>      */
>     struct tnum align;
>     /* Used to determine if any memory access using this register will
>      * result in a bad access. These two fields must be last.
>      * See states_equal()
>      * These refer to the same value as align, not necessarily the actual
>      * contents of the register.
>      */
>     s64 min_value;
>     u64 max_value;
> };

Be very careful with the layout of bpf_reg_state.

There are layout dependencies in the state pruning.  Please take a look
at states_equal().  It is walking the set of registers at two snapshot
locations and trying to see if they are "equivalent".

What's happening here is that the verifier makes a stack of all branch
points in the program.  On the first pass it analyzes the register
values taking one of the two paths a branch takes.  Then when it hits
the end of the program, on that path, to BPF_EXIT it starts popping
the entries on the stack.

The naive implementation would pop each stack entry, and then traverse
the other arm of the branch.  But for programs with lots of branches
this gets very expensive.

So at each stack pop, the verifier tries to determine if it can skip
traversing the branch's other path.  And it does this by analyzing
register state.

The tests are basically:

		if (memcmp(rold, rcur, sizeof(*rold)) == 0)
			continue;

exact equivalent, then we're fine.

		/* If the ranges were not the same, but everything else was and
		 * we didn't do a variable access into a map then we are a-ok.
		 */
		if (!varlen_map_access &&
		    memcmp(rold, rcur, offsetofend(struct bpf_reg_state, id)) == 0)
			continue;

We didn't do any variable MAP accesses, and everything in the register
"up to and including member ID" is the same, we're fine.

And then we drop down into some packet pointer specific tests to try
and optimize things further.

So you have to be careful what you place before and/or after 'id'.

Probably we need to put the alignment stuff before 'id' so that it
is considered by the offsetofend() length memcmp().

Hope that helps.

^ permalink raw reply

* Re: [PATCH net-next] tcp: warn on negative reordering values
From: David Miller @ 2017-05-19 20:56 UTC (permalink / raw)
  To: soheil.kdev; +Cc: netdev, soheil, ncardwell, ycheng, edumazet
In-Reply-To: <20170516213903.78909-1-soheil.kdev@gmail.com>

From: Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
Date: Tue, 16 May 2017 17:39:02 -0400

> From: Soheil Hassas Yeganeh <soheil@google.com>
> 
> Commit bafbb9c73241 ("tcp: eliminate negative reordering
> in tcp_clean_rtx_queue") fixes an issue for negative
> reordering metrics.
> 
> To be resilient to such errors, warn and return
> when a negative metric is passed to tcp_update_reordering().
> 
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net] fix BUG: scheduling while atomic in netlink broadcast
From: Cong Wang @ 2017-05-19 20:58 UTC (permalink / raw)
  To: Akshay Narayan; +Cc: Linux Kernel Network Developers, David Miller
In-Reply-To: <CAN-7y0rZV-coOpOAzVGZoZB=Mzo4LP7Jpxx3wk+PX_jNMr_=iA@mail.gmail.com>

On Fri, May 19, 2017 at 11:47 AM, Akshay Narayan <akshayn@mit.edu> wrote:
>> I don't want to defend the use of yield() but it looks like there is other
>> problem.
>
> I believe this use of yield() should be replaced with cond_resched()
> even if it turns out there is an unrelated problem.

Yeah, it is a different problem, because cond_resched() itself could
sleep too so it doesn't fix the schedule-in-atomic problem, not to
mention the kmalloc() would sleep.

>
>> Does this module call netlink_broadcast() with __GFP_DIRECT_RECLAIM
>> in IRQ context? If so you should adjust the gfp flags.
>
> The module only calls netlink_broadcast() from a pluggable TCP
> function; from my understanding this is not in the IRQ context. Full
> trace, perhaps more clear, attached below.

It is process context but with a spinlock (bh_lock_sock) held, so
you still can't sleep. IOW, you have to pass a proper gfp flag to
reflect this.

^ permalink raw reply

* [PATCH net-next 01/20] net: dsa: change scope of STP state setter
From: Vivien Didelot @ 2017-05-19 21:00 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170519210055.9366-1-vivien.didelot@savoirfairelinux.com>

Instead of having multiple STP state helpers scoping a slave device
supporting both the DSA logic and the switchdev binding, provide a
single dsa_port_set_state helper scoping a DSA port, as well as its
dsa_port_set_state_now wrapper which skips the prepare phase.

This allows us to better separate the DSA logic from the slave device
handling.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 net/dsa/slave.c | 44 ++++++++++++++++++++++----------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 91236d602301..403d1dfe7f50 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -85,13 +85,15 @@ static inline bool dsa_port_is_bridged(struct dsa_port *dp)
 	return !!dp->bridge_dev;
 }
 
-static void dsa_slave_set_state(struct net_device *dev, u8 state)
+static int dsa_port_set_state(struct dsa_port *dp, u8 state,
+			      struct switchdev_trans *trans)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct dsa_port *dp = p->dp;
 	struct dsa_switch *ds = dp->ds;
 	int port = dp->index;
 
+	if (switchdev_trans_ph_prepare(trans))
+		return ds->ops->port_stp_state_set ? 0 : -EOPNOTSUPP;
+
 	if (ds->ops->port_stp_state_set)
 		ds->ops->port_stp_state_set(ds, port, state);
 
@@ -110,6 +112,17 @@ static void dsa_slave_set_state(struct net_device *dev, u8 state)
 	}
 
 	dp->stp_state = state;
+
+	return 0;
+}
+
+static void dsa_port_set_state_now(struct dsa_port *dp, u8 state)
+{
+	int err;
+
+	err = dsa_port_set_state(dp, state, NULL);
+	if (err)
+		pr_err("DSA: failed to set STP state %u (%d)\n", state, err);
 }
 
 static int dsa_slave_open(struct net_device *dev)
@@ -147,7 +160,7 @@ static int dsa_slave_open(struct net_device *dev)
 			goto clear_promisc;
 	}
 
-	dsa_slave_set_state(dev, stp_state);
+	dsa_port_set_state_now(p->dp, stp_state);
 
 	if (p->phy)
 		phy_start(p->phy);
@@ -189,7 +202,7 @@ static int dsa_slave_close(struct net_device *dev)
 	if (ds->ops->port_disable)
 		ds->ops->port_disable(ds, p->dp->index, p->phy);
 
-	dsa_slave_set_state(dev, BR_STATE_DISABLED);
+	dsa_port_set_state_now(p->dp, BR_STATE_DISABLED);
 
 	return 0;
 }
@@ -386,21 +399,6 @@ static int dsa_slave_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	return -EOPNOTSUPP;
 }
 
-static int dsa_slave_stp_state_set(struct net_device *dev,
-				   const struct switchdev_attr *attr,
-				   struct switchdev_trans *trans)
-{
-	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct dsa_switch *ds = p->dp->ds;
-
-	if (switchdev_trans_ph_prepare(trans))
-		return ds->ops->port_stp_state_set ? 0 : -EOPNOTSUPP;
-
-	dsa_slave_set_state(dev, attr->u.stp_state);
-
-	return 0;
-}
-
 static int dsa_slave_vlan_filtering(struct net_device *dev,
 				    const struct switchdev_attr *attr,
 				    struct switchdev_trans *trans)
@@ -465,11 +463,13 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
 				   const struct switchdev_attr *attr,
 				   struct switchdev_trans *trans)
 {
+	struct dsa_slave_priv *p = netdev_priv(dev);
+	struct dsa_port *dp = p->dp;
 	int ret;
 
 	switch (attr->id) {
 	case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
-		ret = dsa_slave_stp_state_set(dev, attr, trans);
+		ret = dsa_port_set_state(dp, attr->u.stp_state, trans);
 		break;
 	case SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING:
 		ret = dsa_slave_vlan_filtering(dev, attr, trans);
@@ -621,7 +621,7 @@ static void dsa_slave_bridge_port_leave(struct net_device *dev,
 	/* Port left the bridge, put in BR_STATE_DISABLED by the bridge layer,
 	 * so allow it to be in BR_STATE_FORWARDING to be kept functional
 	 */
-	dsa_slave_set_state(dev, BR_STATE_FORWARDING);
+	dsa_port_set_state_now(p->dp, BR_STATE_FORWARDING);
 }
 
 static int dsa_slave_port_attr_get(struct net_device *dev,
-- 
2.13.0

^ permalink raw reply related

* [PATCH net-next 03/20] net: dsa: change scope of bridging code
From: Vivien Didelot @ 2017-05-19 21:00 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170519210055.9366-1-vivien.didelot@savoirfairelinux.com>

Now that the bridge join and leave functions only deal with a DSA port,
change their scope from the DSA slave net_device to the DSA generic
dsa_port.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 net/dsa/slave.c | 36 +++++++++++++++++-------------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 371f6d267917..1ad62ef8c261 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -572,13 +572,11 @@ static int dsa_slave_port_obj_dump(struct net_device *dev,
 	return err;
 }
 
-static int dsa_slave_bridge_port_join(struct net_device *dev,
-				      struct net_device *br)
+static int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
 	struct dsa_notifier_bridge_info info = {
-		.sw_index = p->dp->ds->index,
-		.port = p->dp->index,
+		.sw_index = dp->ds->index,
+		.port = dp->index,
 		.br = br,
 	};
 	int err;
@@ -586,24 +584,22 @@ static int dsa_slave_bridge_port_join(struct net_device *dev,
 	/* Here the port is already bridged. Reflect the current configuration
 	 * so that drivers can program their chips accordingly.
 	 */
-	p->dp->bridge_dev = br;
+	dp->bridge_dev = br;
 
-	err = dsa_port_notify(p->dp, DSA_NOTIFIER_BRIDGE_JOIN, &info);
+	err = dsa_port_notify(dp, DSA_NOTIFIER_BRIDGE_JOIN, &info);
 
 	/* The bridging is rolled back on error */
 	if (err)
-		p->dp->bridge_dev = NULL;
+		dp->bridge_dev = NULL;
 
 	return err;
 }
 
-static void dsa_slave_bridge_port_leave(struct net_device *dev,
-					struct net_device *br)
+static void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
 	struct dsa_notifier_bridge_info info = {
-		.sw_index = p->dp->ds->index,
-		.port = p->dp->index,
+		.sw_index = dp->ds->index,
+		.port = dp->index,
 		.br = br,
 	};
 	int err;
@@ -611,16 +607,16 @@ static void dsa_slave_bridge_port_leave(struct net_device *dev,
 	/* Here the port is already unbridged. Reflect the current configuration
 	 * so that drivers can program their chips accordingly.
 	 */
-	p->dp->bridge_dev = NULL;
+	dp->bridge_dev = NULL;
 
-	err = dsa_port_notify(p->dp, DSA_NOTIFIER_BRIDGE_LEAVE, &info);
+	err = dsa_port_notify(dp, DSA_NOTIFIER_BRIDGE_LEAVE, &info);
 	if (err)
-		netdev_err(dev, "failed to notify DSA_NOTIFIER_BRIDGE_LEAVE\n");
+		pr_err("DSA: failed to notify DSA_NOTIFIER_BRIDGE_LEAVE\n");
 
 	/* Port left the bridge, put in BR_STATE_DISABLED by the bridge layer,
 	 * so allow it to be in BR_STATE_FORWARDING to be kept functional
 	 */
-	dsa_port_set_state_now(p->dp, BR_STATE_FORWARDING);
+	dsa_port_set_state_now(dp, BR_STATE_FORWARDING);
 }
 
 static int dsa_slave_port_attr_get(struct net_device *dev,
@@ -1526,14 +1522,16 @@ static bool dsa_slave_dev_check(struct net_device *dev)
 static int dsa_slave_changeupper(struct net_device *dev,
 				 struct netdev_notifier_changeupper_info *info)
 {
+	struct dsa_slave_priv *p = netdev_priv(dev);
+	struct dsa_port *dp = p->dp;
 	int err = NOTIFY_DONE;
 
 	if (netif_is_bridge_master(info->upper_dev)) {
 		if (info->linking) {
-			err = dsa_slave_bridge_port_join(dev, info->upper_dev);
+			err = dsa_port_bridge_join(dp, info->upper_dev);
 			err = notifier_from_errno(err);
 		} else {
-			dsa_slave_bridge_port_leave(dev, info->upper_dev);
+			dsa_port_bridge_leave(dp, info->upper_dev);
 			err = NOTIFY_OK;
 		}
 	}
-- 
2.13.0

^ permalink raw reply related

* [PATCH net-next 04/20] net: dsa: change scope of FDB handlers
From: Vivien Didelot @ 2017-05-19 21:00 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170519210055.9366-1-vivien.didelot@savoirfairelinux.com>

Change the scope of the switchdev FDB object handlers from the DSA slave
device to the generic DSA port, so that the future port-wide API can
also be used for other port types, such as CPU and DSA links.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 net/dsa/slave.c | 50 ++++++++++++++++++++++++--------------------------
 1 file changed, 24 insertions(+), 26 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 1ad62ef8c261..e9c3ea09cc09 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -299,47 +299,44 @@ static int dsa_slave_port_vlan_dump(struct net_device *dev,
 	return -EOPNOTSUPP;
 }
 
-static int dsa_slave_port_fdb_add(struct net_device *dev,
-				  const struct switchdev_obj_port_fdb *fdb,
-				  struct switchdev_trans *trans)
+static int dsa_port_fdb_add(struct dsa_port *dp,
+			    const struct switchdev_obj_port_fdb *fdb,
+			    struct switchdev_trans *trans)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct dsa_switch *ds = p->dp->ds;
+	struct dsa_switch *ds = dp->ds;
 
 	if (switchdev_trans_ph_prepare(trans)) {
 		if (!ds->ops->port_fdb_prepare || !ds->ops->port_fdb_add)
 			return -EOPNOTSUPP;
 
-		return ds->ops->port_fdb_prepare(ds, p->dp->index, fdb, trans);
+		return ds->ops->port_fdb_prepare(ds, dp->index, fdb, trans);
 	}
 
-	ds->ops->port_fdb_add(ds, p->dp->index, fdb, trans);
+	ds->ops->port_fdb_add(ds, dp->index, fdb, trans);
 
 	return 0;
 }
 
-static int dsa_slave_port_fdb_del(struct net_device *dev,
-				  const struct switchdev_obj_port_fdb *fdb)
+static int dsa_port_fdb_del(struct dsa_port *dp,
+			    const struct switchdev_obj_port_fdb *fdb)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct dsa_switch *ds = p->dp->ds;
+	struct dsa_switch *ds = dp->ds;
 	int ret = -EOPNOTSUPP;
 
 	if (ds->ops->port_fdb_del)
-		ret = ds->ops->port_fdb_del(ds, p->dp->index, fdb);
+		ret = ds->ops->port_fdb_del(ds, dp->index, fdb);
 
 	return ret;
 }
 
-static int dsa_slave_port_fdb_dump(struct net_device *dev,
-				   struct switchdev_obj_port_fdb *fdb,
-				   switchdev_obj_dump_cb_t *cb)
+static int dsa_port_fdb_dump(struct dsa_port *dp,
+			     struct switchdev_obj_port_fdb *fdb,
+			     switchdev_obj_dump_cb_t *cb)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct dsa_switch *ds = p->dp->ds;
+	struct dsa_switch *ds = dp->ds;
 
 	if (ds->ops->port_fdb_dump)
-		return ds->ops->port_fdb_dump(ds, p->dp->index, fdb, cb);
+		return ds->ops->port_fdb_dump(ds, dp->index, fdb, cb);
 
 	return -EOPNOTSUPP;
 }
@@ -488,6 +485,8 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
 				  const struct switchdev_obj *obj,
 				  struct switchdev_trans *trans)
 {
+	struct dsa_slave_priv *p = netdev_priv(dev);
+	struct dsa_port *dp = p->dp;
 	int err;
 
 	/* For the prepare phase, ensure the full set of changes is feasable in
@@ -497,9 +496,7 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
 
 	switch (obj->id) {
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
-		err = dsa_slave_port_fdb_add(dev,
-					     SWITCHDEV_OBJ_PORT_FDB(obj),
-					     trans);
+		err = dsa_port_fdb_add(dp, SWITCHDEV_OBJ_PORT_FDB(obj), trans);
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_MDB:
 		err = dsa_slave_port_mdb_add(dev, SWITCHDEV_OBJ_PORT_MDB(obj),
@@ -521,12 +518,13 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
 static int dsa_slave_port_obj_del(struct net_device *dev,
 				  const struct switchdev_obj *obj)
 {
+	struct dsa_slave_priv *p = netdev_priv(dev);
+	struct dsa_port *dp = p->dp;
 	int err;
 
 	switch (obj->id) {
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
-		err = dsa_slave_port_fdb_del(dev,
-					     SWITCHDEV_OBJ_PORT_FDB(obj));
+		err = dsa_port_fdb_del(dp, SWITCHDEV_OBJ_PORT_FDB(obj));
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_MDB:
 		err = dsa_slave_port_mdb_del(dev, SWITCHDEV_OBJ_PORT_MDB(obj));
@@ -547,13 +545,13 @@ static int dsa_slave_port_obj_dump(struct net_device *dev,
 				   struct switchdev_obj *obj,
 				   switchdev_obj_dump_cb_t *cb)
 {
+	struct dsa_slave_priv *p = netdev_priv(dev);
+	struct dsa_port *dp = p->dp;
 	int err;
 
 	switch (obj->id) {
 	case SWITCHDEV_OBJ_ID_PORT_FDB:
-		err = dsa_slave_port_fdb_dump(dev,
-					      SWITCHDEV_OBJ_PORT_FDB(obj),
-					      cb);
+		err = dsa_port_fdb_dump(dp, SWITCHDEV_OBJ_PORT_FDB(obj), cb);
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_MDB:
 		err = dsa_slave_port_mdb_dump(dev, SWITCHDEV_OBJ_PORT_MDB(obj),
-- 
2.13.0

^ permalink raw reply related

* [PATCH net-next 05/20] net: dsa: change scope of MDB handlers
From: Vivien Didelot @ 2017-05-19 21:00 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170519210055.9366-1-vivien.didelot@savoirfairelinux.com>

Change the scope of the switchdev MDB object handlers from the DSA slave
device to the generic DSA port, so that the future port-wide API can
also be used for other port types, such as CPU and DSA links.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 net/dsa/slave.c | 41 ++++++++++++++++++-----------------------
 1 file changed, 18 insertions(+), 23 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index e9c3ea09cc09..0921d306aedf 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -341,46 +341,43 @@ static int dsa_port_fdb_dump(struct dsa_port *dp,
 	return -EOPNOTSUPP;
 }
 
-static int dsa_slave_port_mdb_add(struct net_device *dev,
-				  const struct switchdev_obj_port_mdb *mdb,
-				  struct switchdev_trans *trans)
+static int dsa_port_mdb_add(struct dsa_port *dp,
+			    const struct switchdev_obj_port_mdb *mdb,
+			    struct switchdev_trans *trans)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct dsa_switch *ds = p->dp->ds;
+	struct dsa_switch *ds = dp->ds;
 
 	if (switchdev_trans_ph_prepare(trans)) {
 		if (!ds->ops->port_mdb_prepare || !ds->ops->port_mdb_add)
 			return -EOPNOTSUPP;
 
-		return ds->ops->port_mdb_prepare(ds, p->dp->index, mdb, trans);
+		return ds->ops->port_mdb_prepare(ds, dp->index, mdb, trans);
 	}
 
-	ds->ops->port_mdb_add(ds, p->dp->index, mdb, trans);
+	ds->ops->port_mdb_add(ds, dp->index, mdb, trans);
 
 	return 0;
 }
 
-static int dsa_slave_port_mdb_del(struct net_device *dev,
-				  const struct switchdev_obj_port_mdb *mdb)
+static int dsa_port_mdb_del(struct dsa_port *dp,
+			    const struct switchdev_obj_port_mdb *mdb)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct dsa_switch *ds = p->dp->ds;
+	struct dsa_switch *ds = dp->ds;
 
 	if (ds->ops->port_mdb_del)
-		return ds->ops->port_mdb_del(ds, p->dp->index, mdb);
+		return ds->ops->port_mdb_del(ds, dp->index, mdb);
 
 	return -EOPNOTSUPP;
 }
 
-static int dsa_slave_port_mdb_dump(struct net_device *dev,
-				   struct switchdev_obj_port_mdb *mdb,
-				   switchdev_obj_dump_cb_t *cb)
+static int dsa_port_mdb_dump(struct dsa_port *dp,
+			     struct switchdev_obj_port_mdb *mdb,
+			     switchdev_obj_dump_cb_t *cb)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct dsa_switch *ds = p->dp->ds;
+	struct dsa_switch *ds = dp->ds;
 
 	if (ds->ops->port_mdb_dump)
-		return ds->ops->port_mdb_dump(ds, p->dp->index, mdb, cb);
+		return ds->ops->port_mdb_dump(ds, dp->index, mdb, cb);
 
 	return -EOPNOTSUPP;
 }
@@ -499,8 +496,7 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
 		err = dsa_port_fdb_add(dp, SWITCHDEV_OBJ_PORT_FDB(obj), trans);
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_MDB:
-		err = dsa_slave_port_mdb_add(dev, SWITCHDEV_OBJ_PORT_MDB(obj),
-					     trans);
+		err = dsa_port_mdb_add(dp, SWITCHDEV_OBJ_PORT_MDB(obj), trans);
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
 		err = dsa_slave_port_vlan_add(dev,
@@ -527,7 +523,7 @@ static int dsa_slave_port_obj_del(struct net_device *dev,
 		err = dsa_port_fdb_del(dp, SWITCHDEV_OBJ_PORT_FDB(obj));
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_MDB:
-		err = dsa_slave_port_mdb_del(dev, SWITCHDEV_OBJ_PORT_MDB(obj));
+		err = dsa_port_mdb_del(dp, SWITCHDEV_OBJ_PORT_MDB(obj));
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
 		err = dsa_slave_port_vlan_del(dev,
@@ -554,8 +550,7 @@ static int dsa_slave_port_obj_dump(struct net_device *dev,
 		err = dsa_port_fdb_dump(dp, SWITCHDEV_OBJ_PORT_FDB(obj), cb);
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_MDB:
-		err = dsa_slave_port_mdb_dump(dev, SWITCHDEV_OBJ_PORT_MDB(obj),
-					      cb);
+		err = dsa_port_mdb_dump(dp, SWITCHDEV_OBJ_PORT_MDB(obj), cb);
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
 		err = dsa_slave_port_vlan_dump(dev,
-- 
2.13.0

^ permalink raw reply related

* [PATCH net-next 06/20] net: dsa: change scope of VLAN handlers
From: Vivien Didelot @ 2017-05-19 21:00 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170519210055.9366-1-vivien.didelot@savoirfairelinux.com>

Change the scope of the switchdev VLAN object handlers from the DSA
slave device to the generic DSA port, so that the future port-wide API
can also be used for other port types, such as CPU and DSA links.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 net/dsa/slave.c | 40 ++++++++++++++++------------------------
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 0921d306aedf..de39da69fd33 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -254,12 +254,10 @@ static int dsa_slave_set_mac_address(struct net_device *dev, void *a)
 	return 0;
 }
 
-static int dsa_slave_port_vlan_add(struct net_device *dev,
-				   const struct switchdev_obj_port_vlan *vlan,
-				   struct switchdev_trans *trans)
+static int dsa_port_vlan_add(struct dsa_port *dp,
+			     const struct switchdev_obj_port_vlan *vlan,
+			     struct switchdev_trans *trans)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct dsa_port *dp = p->dp;
 	struct dsa_switch *ds = dp->ds;
 
 	if (switchdev_trans_ph_prepare(trans)) {
@@ -274,27 +272,25 @@ static int dsa_slave_port_vlan_add(struct net_device *dev,
 	return 0;
 }
 
-static int dsa_slave_port_vlan_del(struct net_device *dev,
-				   const struct switchdev_obj_port_vlan *vlan)
+static int dsa_port_vlan_del(struct dsa_port *dp,
+			     const struct switchdev_obj_port_vlan *vlan)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct dsa_switch *ds = p->dp->ds;
+	struct dsa_switch *ds = dp->ds;
 
 	if (!ds->ops->port_vlan_del)
 		return -EOPNOTSUPP;
 
-	return ds->ops->port_vlan_del(ds, p->dp->index, vlan);
+	return ds->ops->port_vlan_del(ds, dp->index, vlan);
 }
 
-static int dsa_slave_port_vlan_dump(struct net_device *dev,
-				    struct switchdev_obj_port_vlan *vlan,
-				    switchdev_obj_dump_cb_t *cb)
+static int dsa_port_vlan_dump(struct dsa_port *dp,
+			      struct switchdev_obj_port_vlan *vlan,
+			      switchdev_obj_dump_cb_t *cb)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct dsa_switch *ds = p->dp->ds;
+	struct dsa_switch *ds = dp->ds;
 
 	if (ds->ops->port_vlan_dump)
-		return ds->ops->port_vlan_dump(ds, p->dp->index, vlan, cb);
+		return ds->ops->port_vlan_dump(ds, dp->index, vlan, cb);
 
 	return -EOPNOTSUPP;
 }
@@ -499,9 +495,8 @@ static int dsa_slave_port_obj_add(struct net_device *dev,
 		err = dsa_port_mdb_add(dp, SWITCHDEV_OBJ_PORT_MDB(obj), trans);
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
-		err = dsa_slave_port_vlan_add(dev,
-					      SWITCHDEV_OBJ_PORT_VLAN(obj),
-					      trans);
+		err = dsa_port_vlan_add(dp, SWITCHDEV_OBJ_PORT_VLAN(obj),
+					trans);
 		break;
 	default:
 		err = -EOPNOTSUPP;
@@ -526,8 +521,7 @@ static int dsa_slave_port_obj_del(struct net_device *dev,
 		err = dsa_port_mdb_del(dp, SWITCHDEV_OBJ_PORT_MDB(obj));
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
-		err = dsa_slave_port_vlan_del(dev,
-					      SWITCHDEV_OBJ_PORT_VLAN(obj));
+		err = dsa_port_vlan_del(dp, SWITCHDEV_OBJ_PORT_VLAN(obj));
 		break;
 	default:
 		err = -EOPNOTSUPP;
@@ -553,9 +547,7 @@ static int dsa_slave_port_obj_dump(struct net_device *dev,
 		err = dsa_port_mdb_dump(dp, SWITCHDEV_OBJ_PORT_MDB(obj), cb);
 		break;
 	case SWITCHDEV_OBJ_ID_PORT_VLAN:
-		err = dsa_slave_port_vlan_dump(dev,
-					       SWITCHDEV_OBJ_PORT_VLAN(obj),
-					       cb);
+		err = dsa_port_vlan_dump(dp, SWITCHDEV_OBJ_PORT_VLAN(obj), cb);
 		break;
 	default:
 		err = -EOPNOTSUPP;
-- 
2.13.0

^ permalink raw reply related

* [PATCH net-next 07/20] net: dsa: change scope of VLAN filtering setter
From: Vivien Didelot @ 2017-05-19 21:00 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170519210055.9366-1-vivien.didelot@savoirfairelinux.com>

Change the scope of the switchdev VLAN filtering attribute setter from
the DSA slave device to the generic DSA port, so that the future
port-wide API can also be used for other port types, such as CPU and DSA
links.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 net/dsa/slave.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index de39da69fd33..216eb38a847d 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -388,20 +388,18 @@ static int dsa_slave_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	return -EOPNOTSUPP;
 }
 
-static int dsa_slave_vlan_filtering(struct net_device *dev,
-				    const struct switchdev_attr *attr,
-				    struct switchdev_trans *trans)
+static int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering,
+				   struct switchdev_trans *trans)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct dsa_switch *ds = p->dp->ds;
+	struct dsa_switch *ds = dp->ds;
 
 	/* bridge skips -EOPNOTSUPP, so skip the prepare phase */
 	if (switchdev_trans_ph_prepare(trans))
 		return 0;
 
 	if (ds->ops->port_vlan_filtering)
-		return ds->ops->port_vlan_filtering(ds, p->dp->index,
-						    attr->u.vlan_filtering);
+		return ds->ops->port_vlan_filtering(ds, dp->index,
+						    vlan_filtering);
 
 	return 0;
 }
@@ -461,7 +459,8 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
 		ret = dsa_port_set_state(dp, attr->u.stp_state, trans);
 		break;
 	case SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING:
-		ret = dsa_slave_vlan_filtering(dev, attr, trans);
+		ret = dsa_port_vlan_filtering(dp, attr->u.vlan_filtering,
+					      trans);
 		break;
 	case SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME:
 		ret = dsa_slave_ageing_time(dev, attr, trans);
-- 
2.13.0

^ permalink raw reply related

* [PATCH net-next 08/20] net: dsa: change scope of ageing time setter
From: Vivien Didelot @ 2017-05-19 21:00 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
	Andrew Lunn, Vivien Didelot
In-Reply-To: <20170519210055.9366-1-vivien.didelot@savoirfairelinux.com>

Change the scope of the switchdev bridge ageing time attribute setter
from the DSA slave device to the generic DSA port, so that the future
port-wide API can also be used for other port types, such as CPU and DSA
links.

Also ds->ports is now a contiguous array of dsa_port structures, thus
their addresses cannot be NULL. Remove the useless check in
dsa_fastest_ageing_time.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 net/dsa/slave.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 216eb38a847d..b0150f79dcdd 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -412,21 +412,19 @@ static unsigned int dsa_fastest_ageing_time(struct dsa_switch *ds,
 	for (i = 0; i < ds->num_ports; ++i) {
 		struct dsa_port *dp = &ds->ports[i];
 
-		if (dp && dp->ageing_time && dp->ageing_time < ageing_time)
+		if (dp->ageing_time && dp->ageing_time < ageing_time)
 			ageing_time = dp->ageing_time;
 	}
 
 	return ageing_time;
 }
 
-static int dsa_slave_ageing_time(struct net_device *dev,
-				 const struct switchdev_attr *attr,
-				 struct switchdev_trans *trans)
+static int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock,
+				struct switchdev_trans *trans)
 {
-	struct dsa_slave_priv *p = netdev_priv(dev);
-	struct dsa_switch *ds = p->dp->ds;
-	unsigned long ageing_jiffies = clock_t_to_jiffies(attr->u.ageing_time);
+	unsigned long ageing_jiffies = clock_t_to_jiffies(ageing_clock);
 	unsigned int ageing_time = jiffies_to_msecs(ageing_jiffies);
+	struct dsa_switch *ds = dp->ds;
 
 	if (switchdev_trans_ph_prepare(trans)) {
 		if (ds->ageing_time_min && ageing_time < ds->ageing_time_min)
@@ -437,7 +435,7 @@ static int dsa_slave_ageing_time(struct net_device *dev,
 	}
 
 	/* Keep the fastest ageing time in case of multiple bridges */
-	p->dp->ageing_time = ageing_time;
+	dp->ageing_time = ageing_time;
 	ageing_time = dsa_fastest_ageing_time(ds, ageing_time);
 
 	if (ds->ops->set_ageing_time)
@@ -463,7 +461,7 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
 					      trans);
 		break;
 	case SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME:
-		ret = dsa_slave_ageing_time(dev, attr, trans);
+		ret = dsa_port_ageing_time(dp, attr->u.ageing_time, trans);
 		break;
 	default:
 		ret = -EOPNOTSUPP;
-- 
2.13.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox