Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next 3/3] net: dsa: mv88e6xxx: add a stats setup function
From: Vivien Didelot @ 2018-05-11 21:16 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, kernel, Vivien Didelot, davem, andrew, f.fainelli
In-Reply-To: <20180511211636.25995-1-vivien.didelot@savoirfairelinux.com>

Now that the Global 1 specific setup function only setup the statistics
unit, kill it in favor of a mv88e6xxx_stats_setup function.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index df92fed44674..a4efc6544c0d 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -995,14 +995,6 @@ static void mv88e6xxx_get_ethtool_stats(struct dsa_switch *ds, int port,
 
 }
 
-static int mv88e6xxx_stats_set_histogram(struct mv88e6xxx_chip *chip)
-{
-	if (chip->info->ops->stats_set_histogram)
-		return chip->info->ops->stats_set_histogram(chip);
-
-	return 0;
-}
-
 static int mv88e6xxx_get_regs_len(struct dsa_switch *ds, int port)
 {
 	return 32 * sizeof(u16);
@@ -2267,14 +2259,16 @@ static int mv88e6xxx_set_ageing_time(struct dsa_switch *ds,
 	return err;
 }
 
-static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip *chip)
+static int mv88e6xxx_stats_setup(struct mv88e6xxx_chip *chip)
 {
 	int err;
 
 	/* Initialize the statistics unit */
-	err = mv88e6xxx_stats_set_histogram(chip);
-	if (err)
-		return err;
+	if (chip->info->ops->stats_set_histogram) {
+		err = chip->info->ops->stats_set_histogram(chip);
+		if (err)
+			return err;
+	}
 
 	return mv88e6xxx_g1_stats_clear(chip);
 }
@@ -2300,11 +2294,6 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
 			goto unlock;
 	}
 
-	/* Setup Switch Global 1 Registers */
-	err = mv88e6xxx_g1_setup(chip);
-	if (err)
-		goto unlock;
-
 	err = mv88e6xxx_irl_setup(chip);
 	if (err)
 		goto unlock;
@@ -2368,6 +2357,10 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
 			goto unlock;
 	}
 
+	err = mv88e6xxx_stats_setup(chip);
+	if (err)
+		goto unlock;
+
 unlock:
 	mutex_unlock(&chip->reg_lock);
 
-- 
2.17.0

^ permalink raw reply related

* RE: [PATCH net-next 2/4] bonding: use common mac addr checks
From: Banerjee, Debabrata @ 2018-05-11 21:25 UTC (permalink / raw)
  To: 'Jay Vosburgh'
  Cc: David S . Miller, netdev@vger.kernel.org, Veaceslav Falico,
	Andy Gospodarek
In-Reply-To: <4921.1526072038@famine>

> From: Jay Vosburgh [mailto:jay.vosburgh@canonical.com]
> Debabrata Banerjee <dbanerje@akamai.com> wrote:

> >-				if
> (!ether_addr_equal_64bits(rx_hash_table[index].mac_dst,
> >-							     mac_bcast) &&
> >-
> !is_zero_ether_addr(rx_hash_table[index].mac_dst)) {
> >+				if
> (is_valid_ether_addr(rx_hash_table[index].mac_dst)) {
> 
> 	This change and the similar ones below will now fail non-broadcast
> multicast Ethernet addresses, where the prior code would not.  Is this an
> intentional change?

Yes I don't see how it makes sense to use multicast addresses at all, but I may be missing something. It's also illegal according to rfc1812 3.3.2, but obviously this balancing mode is trying to be very clever. We probably shouldn't violate the rfc anyway.

^ permalink raw reply

* Re: [GIT] Networking
From: Linus Torvalds @ 2018-05-11 21:25 UTC (permalink / raw)
  To: David Miller
  Cc: Andrew Morton, Network Development, Linux Kernel Mailing List
In-Reply-To: <20180511.170018.656888133931954275.davem@davemloft.net>

David, is there something you want to tell us?

Drugs are bad, m'kay..

               Linus

On Fri, May 11, 2018 at 2:00 PM David Miller <davem@davemloft.net> wrote:

> "from Kevin Easton", "Thanks to Bhadram Varka", "courtesy of Gustavo A.
> R.  Silva", "To Eric Dumazet we are most grateful for this fix", "This
> fix from YU Bo, we do appreciate", "Once again we are blessed by the
> honorable Eric Dumazet with this fix", "This fix is bestowed upon us by
> Andrew Tomt", "another great gift from Eric Dumazet", "to Hangbin Liu we
> give thanks for this", "Paolo Abeni, he gave us this", "thank you Moshe
> Shemesh", "from our good brother David Howells", "Daniel Juergens,
> you're the best!", "Debabrata Benerjee saved us!", "The ship is now
> water tight, thanks to Andrey Ignatov", "from Colin Ian King, man we've
> got holes everywhere!", "Jiri Pirko what would we do without you!

^ permalink raw reply

* Re: [PATCH net-next 2/4] bonding: use common mac addr checks
From: Jay Vosburgh @ 2018-05-11 21:29 UTC (permalink / raw)
  To: Banerjee, Debabrata
  Cc: David S . Miller, netdev@vger.kernel.org, Veaceslav Falico,
	Andy Gospodarek
In-Reply-To: <9b51c882f54244e5972da43d7955c959@usma1ex-dag1mb2.msg.corp.akamai.com>

Banerjee, Debabrata <dbanerje@akamai.com> wrote:

>> From: Jay Vosburgh [mailto:jay.vosburgh@canonical.com]
>> Debabrata Banerjee <dbanerje@akamai.com> wrote:
>
>> >-				if
>> (!ether_addr_equal_64bits(rx_hash_table[index].mac_dst,
>> >-							     mac_bcast) &&
>> >-
>> !is_zero_ether_addr(rx_hash_table[index].mac_dst)) {
>> >+				if
>> (is_valid_ether_addr(rx_hash_table[index].mac_dst)) {
>> 
>> 	This change and the similar ones below will now fail non-broadcast
>> multicast Ethernet addresses, where the prior code would not.  Is this an
>> intentional change?
>
>Yes I don't see how it makes sense to use multicast addresses at all, but I may be missing something. It's also illegal according to rfc1812 3.3.2, but obviously this balancing mode is trying to be very clever. We probably shouldn't violate the rfc anyway.

	Fair enough, but I think it would be good to call this out in
the change log just in case it does somehow cause a regression.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply

* Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF
From: Martin KaFai Lau @ 2018-05-11 21:41 UTC (permalink / raw)
  To: Joe Stringer; +Cc: daniel, netdev, ast, john fastabend
In-Reply-To: <CAOftzPg-2JdMOgvwTtubKijaF8mMO+s5w7CdYmFDuBDK3gAiog@mail.gmail.com>

On Fri, May 11, 2018 at 02:08:01PM -0700, Joe Stringer wrote:
> On 10 May 2018 at 22:00, Martin KaFai Lau <kafai@fb.com> wrote:
> > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote:
> >> This patch adds a new BPF helper function, sk_lookup() which allows BPF
> >> programs to find out if there is a socket listening on this host, and
> >> returns a socket pointer which the BPF program can then access to
> >> determine, for instance, whether to forward or drop traffic. sk_lookup()
> >> takes a reference on the socket, so when a BPF program makes use of this
> >> function, it must subsequently pass the returned pointer into the newly
> >> added sk_release() to return the reference.
> >>
> >> By way of example, the following pseudocode would filter inbound
> >> connections at XDP if there is no corresponding service listening for
> >> the traffic:
> >>
> >>   struct bpf_sock_tuple tuple;
> >>   struct bpf_sock_ops *sk;
> >>
> >>   populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet
> >>   sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0);
> >>   if (!sk) {
> >>     // Couldn't find a socket listening for this traffic. Drop.
> >>     return TC_ACT_SHOT;
> >>   }
> >>   bpf_sk_release(sk, 0);
> >>   return TC_ACT_OK;
> >>
> >> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> >> ---
> 
> ...
> 
> >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = {
> >>  };
> >>  #endif
> >>
> >> +struct sock *
> >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) {
> > Would it be possible to have another version that
> > returns a sk without taking its refcnt?
> > It may have performance benefit.
> 
> Not really. The sockets are not RCU-protected, and established sockets
> may be torn down without notice. If we don't take a reference, there's
> no guarantee that the socket will continue to exist for the duration
> of running the BPF program.
> 
> From what I follow, the comment below has a hidden implication which
> is that sockets without SOCK_RCU_FREE, eg established sockets, may be
> directly freed regardless of RCU.
Right, SOCK_RCU_FREE sk is the one I am concern about.
For example, TCP_LISTEN socket does not require taking a refcnt
now.  Doing a bpf_sk_lookup() may have a rather big
impact on handling TCP syn flood.  or the usual intention
is to redirect instead of passing it up to the stack?


> 
> /* Sockets having SOCK_RCU_FREE will call this function after one RCU
>  * grace period. This is the case for UDP sockets and TCP listeners.
>  */
> static void __sk_destruct(struct rcu_head *head)
> ...
> 
> Therefore without the refcount, it won't be safe.

^ permalink raw reply

* Re: [PATCH bpf-next 3/7] samples: bpf: compile and link against full libbpf
From: Jakub Kicinski @ 2018-05-11 21:47 UTC (permalink / raw)
  To: alexei.starovoitov, daniel; +Cc: oss-drivers, netdev
In-Reply-To: <20180510172443.17238-4-jakub.kicinski@netronome.com>

On Thu, 10 May 2018 10:24:39 -0700, Jakub Kicinski wrote:
> samples/bpf currently cherry-picks object files from tools/lib/bpf
> to link against.  Just compile the full library and link statically
> against it.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Looks like this breaks some build configs :(  Fix is forthcoming, sorry!

^ permalink raw reply

* Re: [PATCH net v2] rps: Correct wrong skb_flow_limit check when enable RPS
From: Willem de Bruijn @ 2018-05-11 21:47 UTC (permalink / raw)
  To: Gao Feng
  Cc: David Miller, Daniel Borkmann, Eric Dumazet, Willem de Bruijn,
	jakub.kicinski, ktkhai, Alexei Starovoitov, Rasmus Villemoes,
	John Fastabend, Jesper Dangaard Brouer, David Ahern,
	Network Development
In-Reply-To: <1525990182-12042-1-git-send-email-gfree.wind@vip.163.com>

On Thu, May 10, 2018 at 6:09 PM,  <gfree.wind@vip.163.com> wrote:
> From: Gao Feng <gfree.wind@vip.163.com>
>
> The skb flow limit is implemented for each CPU independently. In the
> current codes, the function skb_flow_limit gets the softnet_data by
> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
> the current cpu when enable RPS. As the result, the skb_flow_limit checks
> the stats of current CPU, while the skb is going to append the queue of
> another CPU. It isn't the expected behavior.
>
> Now pass the softnet_data as a param to make consistent.
>
> Fixes: 99bbc7074190 ("rps: selective flow shedding during softnet overflow")
> Signed-off-by: Gao Feng <gfree.wind@vip.163.com>

See also the discussion in the v1 of this patch.

The merits of moving flow_limit state from irq to rps cpu can
be argued, but the existing behavior is intentional and correct,
so this should not be applied to net and be backported to stable
branches.

My bad for reviving the discussion in the v1 thread while v2 was
already pending, sorry.

^ permalink raw reply

* Re: [PATCH net-next 3/4] bonding: allow use of tx hashing in balance-alb
From: Jay Vosburgh @ 2018-05-11 21:49 UTC (permalink / raw)
  To: Debabrata Banerjee
  Cc: David S . Miller, netdev, Veaceslav Falico, Andy Gospodarek
In-Reply-To: <20180511192548.8119-4-dbanerje@akamai.com>

Debabrata Banerjee <dbanerje@akamai.com> wrote:

>The rx load balancing provided by balance-alb is not mutually
>exclusive with using hashing for tx selection, and should provide a decent
>speed increase because this eliminates spinlocks and cache contention.
>
>Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
>---
> drivers/net/bonding/bond_alb.c     | 20 ++++++++++++++++++--
> drivers/net/bonding/bond_main.c    | 25 +++++++++++++++----------
> drivers/net/bonding/bond_options.c |  2 +-
> include/net/bonding.h              | 10 +++++++++-
> 4 files changed, 43 insertions(+), 14 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 180e50f7806f..6228635880d5 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -1478,8 +1478,24 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
> 	}
> 
> 	if (do_tx_balance) {
>-		hash_index = _simple_hash(hash_start, hash_size);
>-		tx_slave = tlb_choose_channel(bond, hash_index, skb->len);
>+		if (bond->params.tlb_dynamic_lb) {
>+			hash_index = _simple_hash(hash_start, hash_size);
>+			tx_slave = tlb_choose_channel(bond, hash_index, skb->len);
>+		} else {
>+			/*
>+			 * do_tx_balance means we are free to select the tx_slave
>+			 * So we do exactly what tlb would do for hash selection
>+			 */
>+
>+			struct bond_up_slave *slaves;
>+			unsigned int count;
>+
>+			slaves = rcu_dereference(bond->slave_arr);
>+			count = slaves ? READ_ONCE(slaves->count) : 0;
>+			if (likely(count))
>+				tx_slave = slaves->arr[bond_xmit_hash(bond, skb) %
>+						       count];
>+		}
> 	}
> 
> 	return bond_do_alb_xmit(skb, bond, tx_slave);
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 1f1e97b26f95..f7f8a49cb32b 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -159,7 +159,7 @@ module_param(min_links, int, 0);
> MODULE_PARM_DESC(min_links, "Minimum number of available links before turning on carrier");
> 
> module_param(xmit_hash_policy, charp, 0);
>-MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; "
>+MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; "
> 				   "0 for layer 2 (default), 1 for layer 3+4, "
> 				   "2 for layer 2+3, 3 for encap layer 2+3, "
> 				   "4 for encap layer 3+4");
>@@ -1735,7 +1735,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
> 		unblock_netpoll_tx();
> 	}
> 
>-	if (bond_mode_uses_xmit_hash(bond))
>+	if (bond_mode_can_use_xmit_hash(bond))
> 		bond_update_slave_arr(bond, NULL);
> 
> 	bond->nest_level = dev_get_nest_level(bond_dev);
>@@ -1870,7 +1870,7 @@ static int __bond_release_one(struct net_device *bond_dev,
> 	if (BOND_MODE(bond) == BOND_MODE_8023AD)
> 		bond_3ad_unbind_slave(slave);
> 
>-	if (bond_mode_uses_xmit_hash(bond))
>+	if (bond_mode_can_use_xmit_hash(bond))
> 		bond_update_slave_arr(bond, slave);
> 
> 	netdev_info(bond_dev, "Releasing %s interface %s\n",
>@@ -3102,7 +3102,7 @@ static int bond_slave_netdev_event(unsigned long event,
> 		 * events. If these (miimon/arpmon) parameters are configured
> 		 * then array gets refreshed twice and that should be fine!
> 		 */
>-		if (bond_mode_uses_xmit_hash(bond))
>+		if (bond_mode_can_use_xmit_hash(bond))
> 			bond_update_slave_arr(bond, NULL);
> 		break;
> 	case NETDEV_CHANGEMTU:
>@@ -3322,7 +3322,7 @@ static int bond_open(struct net_device *bond_dev)
> 		 */
> 		if (bond_alb_initialize(bond, (BOND_MODE(bond) == BOND_MODE_ALB)))
> 			return -ENOMEM;
>-		if (bond->params.tlb_dynamic_lb)
>+		if (bond->params.tlb_dynamic_lb || BOND_MODE(bond) == BOND_MODE_ALB)
> 			queue_delayed_work(bond->wq, &bond->alb_work, 0);
> 	}
> 
>@@ -3341,7 +3341,7 @@ static int bond_open(struct net_device *bond_dev)
> 		bond_3ad_initiate_agg_selection(bond, 1);
> 	}
> 
>-	if (bond_mode_uses_xmit_hash(bond))
>+	if (bond_mode_can_use_xmit_hash(bond))
> 		bond_update_slave_arr(bond, NULL);
> 
> 	return 0;
>@@ -3892,7 +3892,7 @@ static void bond_slave_arr_handler(struct work_struct *work)
>  * to determine the slave interface -
>  * (a) BOND_MODE_8023AD
>  * (b) BOND_MODE_XOR
>- * (c) BOND_MODE_TLB && tlb_dynamic_lb == 0
>+ * (c) (BOND_MODE_TLB || BOND_MODE_ALB) && tlb_dynamic_lb == 0
>  *
>  * The caller is expected to hold RTNL only and NO other lock!
>  */
>@@ -3945,6 +3945,11 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
> 			continue;
> 		if (skipslave == slave)
> 			continue;
>+
>+		netdev_dbg(bond->dev,
>+			   "Adding slave dev %s to tx hash array[%d]\n",
>+			   slave->dev->name, new_arr->count);
>+
> 		new_arr->arr[new_arr->count++] = slave;
> 	}
> 
>@@ -4320,9 +4325,9 @@ static int bond_check_params(struct bond_params *params)
> 	}
> 
> 	if (xmit_hash_policy) {
>-		if ((bond_mode != BOND_MODE_XOR) &&
>-		    (bond_mode != BOND_MODE_8023AD) &&
>-		    (bond_mode != BOND_MODE_TLB)) {
>+		if (bond_mode == BOND_MODE_ROUNDROBIN ||
>+		    bond_mode == BOND_MODE_ACTIVEBACKUP ||
>+		    bond_mode == BOND_MODE_BROADCAST) {
> 			pr_info("xmit_hash_policy param is irrelevant in mode %s\n",
> 				bond_mode_name(bond_mode));
> 		} else {
>diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
>index 58c705f24f96..8a945c9341d6 100644
>--- a/drivers/net/bonding/bond_options.c
>+++ b/drivers/net/bonding/bond_options.c
>@@ -395,7 +395,7 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
> 		.id = BOND_OPT_TLB_DYNAMIC_LB,
> 		.name = "tlb_dynamic_lb",
> 		.desc = "Enable dynamic flow shuffling",
>-		.unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_TLB)),
>+		.unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_TLB) | BIT(BOND_MODE_ALB)),
> 		.values = bond_tlb_dynamic_lb_tbl,
> 		.flags = BOND_OPTFLAG_IFDOWN,
> 		.set = bond_option_tlb_dynamic_lb_set,
>diff --git a/include/net/bonding.h b/include/net/bonding.h
>index b52235158836..9a41a50b0bd2 100644
>--- a/include/net/bonding.h
>+++ b/include/net/bonding.h
>@@ -285,10 +285,18 @@ static inline bool bond_needs_speed_duplex(const struct bonding *bond)
> 
> static inline bool bond_is_nondyn_tlb(const struct bonding *bond)
> {
>-	return (BOND_MODE(bond) == BOND_MODE_TLB)  &&
>+	return (BOND_MODE(bond) == BOND_MODE_TLB || BOND_MODE(bond) == BOND_MODE_ALB) &&

	I believe this could use bond_is_lb(bond) instead.

	-J
	
> 	       (bond->params.tlb_dynamic_lb == 0);
> }
> 
>+static inline bool bond_mode_can_use_xmit_hash(const struct bonding *bond)
>+{
>+	return (BOND_MODE(bond) == BOND_MODE_8023AD ||
>+		BOND_MODE(bond) == BOND_MODE_XOR ||
>+		BOND_MODE(bond) == BOND_MODE_TLB ||
>+		BOND_MODE(bond) == BOND_MODE_ALB);
>+}
>+
> static inline bool bond_mode_uses_xmit_hash(const struct bonding *bond)
> {
> 	return (BOND_MODE(bond) == BOND_MODE_8023AD ||
>-- 
>2.17.0
>

^ permalink raw reply

* Re: [PATCH net] macmace: Set platform device coherent_dma_mask
From: Michael Schmitz @ 2018-05-11 22:02 UTC (permalink / raw)
  To: Finn Thain
  Cc: Geert Uytterhoeven, David S. Miller, linux-m68k, netdev,
	Linux Kernel Mailing List, Christoph Hellwig
In-Reply-To: <alpine.LNX.2.21.1805112003250.8@nippy.intranet>

Hi Finn,

Am 11.05.2018 um 22:06 schrieb Finn Thain:
>> You would have to be careful not to overwrite a pdev->dev.dma_mask and 
>> pdev->dev.dma_coherent_mask that might have been set in a platform 
>> device passed via platform_device_register here. Coldfire is the only 
>> m68k platform currently using that, but there might be others in future.
>>
> 
> That Coldfire patch could be reverted if this is a better solution.

True, but there might be other uses for deviating from a platform
default (I'm thinking of Atari SCSI and floppy drivers here). But we
could chose the correct mask to set in arch_setup_pdev_archdata()
instead, as it's a platform property not a driver property in that case.

>> ... But I don't think there are smaller DMA masks used by m68k drivers 
>> that use the platform device mechanism at present. I've only looked at 
>> arch/m68k though.
> 
> So we're back at the same problem that Geert's suggestion also raised: how 
> to identify potentially affected platform devices and drivers?
> 
> Maybe we can take a leaf out of Christoph's book, and leave a noisy 
> WARNING splat in the log.
> 
> void arch_setup_pdev_archdata(struct platform_device *pdev)
> {
>         WARN_ON_ONCE(pdev->dev.coherent_dma_mask != DMA_MASK_NONE ||
>                      pdev->dev.dma_mask != NULL);

I'd suggest using WARN_ON() so we catch all uses on a particular platform.

I initially thought it necessary to warn on unset mask here, but I see
that would throw up a lot of redundant false positives.

Cheers,

	Michael

^ permalink raw reply

* Re: [PATCH net-next 4/4] bonding: allow carrier and link status to determine link state
From: Jay Vosburgh @ 2018-05-11 22:04 UTC (permalink / raw)
  To: Debabrata Banerjee
  Cc: David S . Miller, netdev, Veaceslav Falico, Andy Gospodarek
In-Reply-To: <20180511192548.8119-5-dbanerje@akamai.com>

Debabrata Banerjee <dbanerje@akamai.com> wrote:

>In a mixed environment it may be difficult to tell if your hardware
>support carrier, if it does not it can always report true. With a new
>use_carrier option of 2, we can check both carrier and link status
>sequentially, instead of one or the other

	What do you mean by "mixed environment," and under what
circumstances are you seeing an actual benefit from doing the MII /
ethtool test in addition to the standard netif_carrier_ok test?

	The use_carrier option was meant for backwards compatibility
with old-in-2005 device drivers, so this seem counterintuitive to me.  I
don't recall seeing any devices lacking netif_carrier support for some
time.  At this point, I would tend to argue that a new device driver
that does not implement netif_carrier support should be fixed, and not
have another hack added to bonding to work around it.

	-J

>Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
>---
> Documentation/networking/bonding.txt |  4 ++--
> drivers/net/bonding/bond_main.c      | 12 ++++++++----
> drivers/net/bonding/bond_options.c   |  7 ++++---
> 3 files changed, 14 insertions(+), 9 deletions(-)
>
>diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
>index 9ba04c0bab8d..f063730e7e73 100644
>--- a/Documentation/networking/bonding.txt
>+++ b/Documentation/networking/bonding.txt
>@@ -828,8 +828,8 @@ use_carrier
> 	MII / ETHTOOL ioctl method to determine the link state.
> 
> 	A value of 1 enables the use of netif_carrier_ok(), a value of
>-	0 will use the deprecated MII / ETHTOOL ioctls.  The default
>-	value is 1.
>+	0 will use the deprecated MII / ETHTOOL ioctls. A value of 2
>+	will check both.  The default value is 1.
> 
> xmit_hash_policy
> 
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index f7f8a49cb32b..7e9652c4b35c 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -132,7 +132,7 @@ MODULE_PARM_DESC(downdelay, "Delay before considering link down, "
> 			    "in milliseconds");
> module_param(use_carrier, int, 0);
> MODULE_PARM_DESC(use_carrier, "Use netif_carrier_ok (vs MII ioctls) in miimon; "
>-			      "0 for off, 1 for on (default)");
>+			      "0 for off, 1 for on (default), 2 for carrier then legacy checks");
> module_param(mode, charp, 0);
> MODULE_PARM_DESC(mode, "Mode of operation; 0 for balance-rr, "
> 		       "1 for active-backup, 2 for balance-xor, "
>@@ -434,12 +434,16 @@ static int bond_check_dev_link(struct bonding *bond,
> 	int (*ioctl)(struct net_device *, struct ifreq *, int);
> 	struct ifreq ifr;
> 	struct mii_ioctl_data *mii;
>+	bool carrier = true;
> 
> 	if (!reporting && !netif_running(slave_dev))
> 		return 0;
> 
> 	if (bond->params.use_carrier)
>-		return netif_carrier_ok(slave_dev) ? BMSR_LSTATUS : 0;
>+		carrier = netif_carrier_ok(slave_dev) ? BMSR_LSTATUS : 0;
>+
>+	if (!carrier)
>+		return carrier;
> 
> 	/* Try to get link status using Ethtool first. */
> 	if (slave_dev->ethtool_ops->get_link)
>@@ -4399,8 +4403,8 @@ static int bond_check_params(struct bond_params *params)
> 		downdelay = 0;
> 	}
> 
>-	if ((use_carrier != 0) && (use_carrier != 1)) {
>-		pr_warn("Warning: use_carrier module parameter (%d), not of valid value (0/1), so it was set to 1\n",
>+	if (use_carrier < 0 || use_carrier > 2) {
>+		pr_warn("Warning: use_carrier module parameter (%d), not of valid value (0-2), so it was set to 1\n",
> 			use_carrier);
> 		use_carrier = 1;
> 	}
>diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
>index 8a945c9341d6..dba6cef05134 100644
>--- a/drivers/net/bonding/bond_options.c
>+++ b/drivers/net/bonding/bond_options.c
>@@ -164,9 +164,10 @@ static const struct bond_opt_value bond_primary_reselect_tbl[] = {
> };
> 
> static const struct bond_opt_value bond_use_carrier_tbl[] = {
>-	{ "off", 0,  0},
>-	{ "on",  1,  BOND_VALFLAG_DEFAULT},
>-	{ NULL,  -1, 0}
>+	{ "off",  0,  0},
>+	{ "on",   1,  BOND_VALFLAG_DEFAULT},
>+	{ "both", 2,  0},
>+	{ NULL,  -1,  0}
> };
> 
> static const struct bond_opt_value bond_all_slaves_active_tbl[] = {
>-- 
>2.17.0
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply

* Re: [PATCH ghak81 RFC V1 1/5] audit: normalize loginuid read access
From: Richard Guy Briggs @ 2018-05-11 22:17 UTC (permalink / raw)
  To: Paul Moore
  Cc: Linux NetDev Upstream Mailing List, LKML, David Howells,
	Linux Security Module list, Linux-Audit Mailing List,
	Netfilter Devel List, SElinux list,
	Integrity Measurement Architecture, Ingo Molnar
In-Reply-To: <20180510212114.pefeyw5cuqlmjewp@madcap2.tricolour.ca>

On 2018-05-10 17:21, Richard Guy Briggs wrote:
> On 2018-05-09 11:13, Paul Moore wrote:
> > On Fri, May 4, 2018 at 4:54 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> > > Recognizing that the loginuid is an internal audit value, use an access
> > > function to retrieve the audit loginuid value for the task rather than
> > > reaching directly into the task struct to get it.
> > >
> > > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > > ---
> > >  kernel/auditsc.c | 16 ++++++++--------
> > >  1 file changed, 8 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > index 479c031..f3817d0 100644
> > > --- a/kernel/auditsc.c
> > > +++ b/kernel/auditsc.c
> > > @@ -374,7 +374,7 @@ static int audit_field_compare(struct task_struct *tsk,
> > >         case AUDIT_COMPARE_EGID_TO_OBJ_GID:
> > >                 return audit_compare_gid(cred->egid, name, f, ctx);
> > >         case AUDIT_COMPARE_AUID_TO_OBJ_UID:
> > > -               return audit_compare_uid(tsk->loginuid, name, f, ctx);
> > > +               return audit_compare_uid(audit_get_loginuid(tsk), name, f, ctx);
> > >         case AUDIT_COMPARE_SUID_TO_OBJ_UID:
> > >                 return audit_compare_uid(cred->suid, name, f, ctx);
> > >         case AUDIT_COMPARE_SGID_TO_OBJ_GID:
> > > @@ -385,7 +385,7 @@ static int audit_field_compare(struct task_struct *tsk,
> > >                 return audit_compare_gid(cred->fsgid, name, f, ctx);
> > >         /* uid comparisons */
> > >         case AUDIT_COMPARE_UID_TO_AUID:
> > > -               return audit_uid_comparator(cred->uid, f->op, tsk->loginuid);
> > > +               return audit_uid_comparator(cred->uid, f->op, audit_get_loginuid(tsk));
> > >         case AUDIT_COMPARE_UID_TO_EUID:
> > >                 return audit_uid_comparator(cred->uid, f->op, cred->euid);
> > >         case AUDIT_COMPARE_UID_TO_SUID:
> > > @@ -394,11 +394,11 @@ static int audit_field_compare(struct task_struct *tsk,
> > >                 return audit_uid_comparator(cred->uid, f->op, cred->fsuid);
> > >         /* auid comparisons */
> > >         case AUDIT_COMPARE_AUID_TO_EUID:
> > > -               return audit_uid_comparator(tsk->loginuid, f->op, cred->euid);
> > > +               return audit_uid_comparator(audit_get_loginuid(tsk), f->op, cred->euid);
> > >         case AUDIT_COMPARE_AUID_TO_SUID:
> > > -               return audit_uid_comparator(tsk->loginuid, f->op, cred->suid);
> > > +               return audit_uid_comparator(audit_get_loginuid(tsk), f->op, cred->suid);
> > >         case AUDIT_COMPARE_AUID_TO_FSUID:
> > > -               return audit_uid_comparator(tsk->loginuid, f->op, cred->fsuid);
> > > +               return audit_uid_comparator(audit_get_loginuid(tsk), f->op, cred->fsuid);
> > >         /* euid comparisons */
> > >         case AUDIT_COMPARE_EUID_TO_SUID:
> > >                 return audit_uid_comparator(cred->euid, f->op, cred->suid);
> > > @@ -611,7 +611,7 @@ static int audit_filter_rules(struct task_struct *tsk,
> > >                                 result = match_tree_refs(ctx, rule->tree);
> > >                         break;
> > >                 case AUDIT_LOGINUID:
> > > -                       result = audit_uid_comparator(tsk->loginuid, f->op, f->uid);
> > > +                       result = audit_uid_comparator(audit_get_loginuid(tsk), f->op, f->uid);
> > >                         break;
> > >                 case AUDIT_LOGINUID_SET:
> > >                         result = audit_comparator(audit_loginuid_set(tsk), f->op, f->val);
> > > @@ -2287,8 +2287,8 @@ int audit_signal_info(int sig, struct task_struct *t)
> > >             (sig == SIGTERM || sig == SIGHUP ||
> > >              sig == SIGUSR1 || sig == SIGUSR2)) {
> > >                 audit_sig_pid = task_tgid_nr(tsk);
> > > -               if (uid_valid(tsk->loginuid))
> > > -                       audit_sig_uid = tsk->loginuid;
> > > +               if (uid_valid(audit_get_loginuid(tsk)))
> > > +                       audit_sig_uid = audit_get_loginuid(tsk);
> > 
> > I realize this comment is a little silly given the nature of loginuid,
> > but if we are going to abstract away loginuid accesses (which I think
> > is good), we should probably access it once, store it in a local
> > variable, perform the validity check on the local variable, then
> > commit the local variable to audit_sig_uid.  I realize a TOCTOU
> > problem is unlikely here, but with this new layer of abstraction it
> > seems that some additional safety might be a good thing.
> 
> Ok, I'll just assign it to where it is going and check it there, holding
> the audit_ctl_lock the whole time, since it should have been done
> anyways for all of audit_sig_{pid,uid,sid} anyways to get a consistent
> view from the AUDIT_SIGNAL_INFO fetch.

Hmmm, holding audit_ctl_lock won't work because it could sleep trying to
get the lock and the signal info is set in a context where sleeping
isn't permitted.  I'll just use a local var...

> > >                 else
> > >                         audit_sig_uid = uid;
> > >                 security_task_getsecid(tsk, &audit_sig_sid);
> 
> > paul moore
> 
> - RGB

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: [PATCH v6 1/6] net: phy: at803x: Export at803x_debug_reg_mask()
From: Paul Burton @ 2018-05-11 22:22 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Darren Hart, netdev, linux-mips, David S . Miller
In-Reply-To: <20180511192446.GD12738@lunn.ch>

Hi Andrew,

On Fri, May 11, 2018 at 09:24:46PM +0200, Andrew Lunn wrote:
> > I could reorder the probe function a little to initialize the PHY before
> > performing the MAC reset, drop this patch and the AR803X hibernation
> > stuff from patch 2 if you like. But again, I can't actually test the
> > result on the affected hardware.
> 
> Hi Paul
> 
> I don't like a MAC driver poking around in PHY registers.
> 
> So if you can rearrange the code, that would be great.
> 
>    Thanks
> 	Andrew

Sure, I'll give it a shot.

After digging into it I see 2 ways to go here:

  1) We could just always reset the PHY before we reset the MAC. That
     would give us a window of however long the PHY takes to enter its
     low power state & stop providing the RX clock during which we'd
     need the MAC reset to complete. In the case of the AR8031 that's
     "about 10 seconds" according to its data sheet. In this particular
     case that feels like plenty, but it does also feel a bit icky to
     rely on the timing chosen by the PHY manufacturer to line up with
     that of the MAC reset.

  2) We could introduce a couple of new phy_* functions to disable &
     enable low power states like the AR8031's hibernation feature, by
     calling new function pointers in struct phy_driver. Then pch_gbe &
     other MACs could call those to have the PHY driver disable
     hibernation at times where we know we'll need the RX clock and
     re-enable it afterwards.

I'm currently leaning towards option 2. How does that sound to you? Or
can you see another way to handle this?

Thanks,
    Paul

^ permalink raw reply

* safe skb resetting after decapsulation and encapsulation
From: Jason A. Donenfeld @ 2018-05-11 22:56 UTC (permalink / raw)
  To: Netdev

Hey Netdev,

A UDP skb comes in via the encap_rcv interface. I do a lot of wild
things to the bytes in the skb -- change where the head starts, modify
a few fragments, decrypt some stuff, trim off some things at the end,
etc. In other words, I'm decapsulating the skb in a pretty intense
way. I benefit from reusing the same skb, performance wise, but after
I'm done processing it, it's really a totally new skb. Eventually it's
time to pass off my skb to netif_receive_skb/netif_rx, but before I do
that, I need to "reinitialize" the skb. (The same goes for when
sending out an skb -- I get it from userspace via ndo_start_xmit, do
crazy things to it, and eventually pass it off to the udp_tunnel send
functions, but first "reinitializing" it.)

At the moment I'm using a function that looks like this:

static void jasons_wild_and_crazy_skb_reset(struct sk_buff *skb)
{
    skb_scrub_packet(skb, true); //1
    memset(&skb->headers_start, 0, offsetof(struct sk_buff,
headers_end) - offsetof(struct sk_buff, headers_start)); //2
    skb->queue_mapping = 0; //3
    skb->nohdr = 0; //4
    skb->peeked = 0; //5
    skb->mac_len = 0; //6
    skb->dev = NULL; //7
#ifdef CONFIG_NET_SCHED
    skb->tc_index = 0; //8
    skb_reset_tc(skb); //9
#endif
    skb->hdr_len = skb_headroom(skb); //10
    skb_reset_mac_header(skb); //11
    skb_reset_network_header(skb); //12
    skb_probe_transport_header(skb, 0); //13
    skb_reset_inner_headers(skb); //14
}

I'm sure that some of this is wrong. Most of it is based on part of an
Octeon ethernet driver I read a few years ago. I numbered each
statement above, hoping to go through it with you all in detail here,
and see what we can cut away and see what we can approve.

1. Obviously correct and required.
2. This is probably wrong. At least it causes crashes when receiving
packets from RHEL 7.5's latest i40e driver in their vendor
frankenkernel, because those flags there have some critical bits
related to allocation. But there are a lot flags in there that I might
consider going through one by one and zeroing out.
3-5. Fields that should be zero, I assume, after
decapsulating/decrypting (and encapsulating/encrypting).
6. WireGuard is layer 3, so there's no mac.
7. We're later going to change the dev this came in on.
8-9: Same flakey rationale as 2,3-5.
10: Since the headroom has changed during the various modifications, I
need to let the packet field know about it.
11-14: The beginning of the headers has changed, and so resetting and
probing is necessary for this to work at all.

So I'm wondering - how much of this is necessary? How much am I
unnecessarily reinventing things that exist elsewhere? I'm pretty sure
in most cases the driver would work with only 1,10-14, but I worry
that bad things would happen in more unusual configurations. I've
tried to systematically go through the entire stack and see where
these might be used or not used, but it seems really inconsistent.

So, I'm writing wondering if somebody has an easy simplification or
rule for handling this kind of intense decapsulation/decryption (and
encapsulation/encryption operation on the other way) operation. I'd
like to make sure I get this down solid.

Thanks,
Jason

^ permalink raw reply

* Re: [PATCH net-next] udp: Fix kernel panic in UDP GSO path
From: Willem de Bruijn @ 2018-05-11 23:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Sean Tranchetti, Willem de Bruijn, David Miller,
	Network Development, Subash Abhinov Kasiviswanathan
In-Reply-To: <ffd8b916-a060-55b5-35c6-13cf81902301@gmail.com>

On Thu, May 10, 2018 at 8:51 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 05/10/2018 05:38 PM, Sean Tranchetti wrote:
>> Using GSO in the UDP path on a device with
>> scatter-gather netdevice feature disabled will result in a kernel
>> panic with the following call stack:
>>
>> This panic is the result of allocating SKBs with small size
>> for the newly segmented SKB. If the scatter-gather feature is
>> disabled, the code attempts to call skb_put() on the small SKB
>> with an argument of nearly the entire unsegmented SKB length.
>>
>> After this patch, attempting to use GSO with scatter-gather
>> disabled will result in -EINVAL being returned.
>>
>> Fixes: 15e36f5b8e98 ("udp: paged allocation with gso")
>> Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
>> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
>> ---
>>  net/ipv4/ip_output.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>> index b5e21eb..0d63690 100644
>> --- a/net/ipv4/ip_output.c
>> +++ b/net/ipv4/ip_output.c
>> @@ -1054,8 +1054,16 @@ static int __ip_append_data(struct sock *sk,
>>                       copy = length;
>>
>>               if (!(rt->dst.dev->features&NETIF_F_SG)) {
>> +                     struct sk_buff *tmp;
>>                       unsigned int off;
>>
>> +                     if (paged) {
>> +                             err = -EINVAL;
>> +                             while ((tmp = __skb_dequeue(queue)) != NULL)
>> +                                     kfree(tmp);
>> +                             goto error;
>> +                     }
>> +
>>                       off = skb->len;
>>                       if (getfrag(from, skb_put(skb, copy),
>>                                       offset, copy, off, skb) < 0) {
>>
>
>
> Hmm, no, we absolutely need to fix GSO instead.
>
> Think of a bonding device (or any virtual devices), your patch wont avoid the crash.

Thanks for reporting the issue.

Paged skbuffs is an optimization for gso, but the feature should
continue to work even if gso skbs are linear, indeed (if at the cost
of copying during skb_segment).

We need to make paged contingent on scatter-gather. Rough
patch below. That is for ipv4 only, the same will be needed for ipv6.

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index b5e21eb198d8..b38731d8a44f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -884,7 +884,7 @@ static int __ip_append_data(struct sock *sk,

        exthdrlen = !skb ? rt->dst.header_len : 0;
        mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
-       paged = !!cork->gso_size;
+       paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);

^ permalink raw reply related

* [PATCH net] net: dsa: bcm_sf2: Fix RX_CLS_LOC_ANY overwrite for last rule
From: Florian Fainelli @ 2018-05-11 23:24 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, Andrew Lunn, Vivien Didelot, David S. Miller,
	open list

When we let the kernel pick up a rule location with RX_CLS_LOC_ANY, we
would be able to overwrite the last rules because of a number of issues:

- the IPv4 code path would not be checking that rule_index is within
  bounds, the IPv6 code path would only be checking the second index and
  not the first one

- find_first_zero_bit() needs to operate on the full bitmap size
  (priv->num_cfp_rules) otherwise it would be off by one in the results
  it returns and the checks against bcm_sf2_cfp_rule_size() would be non
  functioning

Fixes: 3306145866b6 ("net: dsa: bcm_sf2: Move IPv4 CFP processing to specific functions")
Fixes: ba0696c22e7c ("net: dsa: bcm_sf2: Add support for IPv6 CFP rules")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/dsa/bcm_sf2_cfp.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2_cfp.c b/drivers/net/dsa/bcm_sf2_cfp.c
index 23b45da784cb..ade5fa3d747d 100644
--- a/drivers/net/dsa/bcm_sf2_cfp.c
+++ b/drivers/net/dsa/bcm_sf2_cfp.c
@@ -354,10 +354,13 @@ static int bcm_sf2_cfp_ipv4_rule_set(struct bcm_sf2_priv *priv, int port,
 	/* Locate the first rule available */
 	if (fs->location == RX_CLS_LOC_ANY)
 		rule_index = find_first_zero_bit(priv->cfp.used,
-						 bcm_sf2_cfp_rule_size(priv));
+						 priv->num_cfp_rules);
 	else
 		rule_index = fs->location;
 
+	if (rule_index > bcm_sf2_cfp_rule_size(priv))
+		return -ENOSPC;
+
 	layout = &udf_tcpip4_layout;
 	/* We only use one UDF slice for now */
 	slice_num = bcm_sf2_get_slice_number(layout, 0);
@@ -563,9 +566,11 @@ static int bcm_sf2_cfp_ipv6_rule_set(struct bcm_sf2_priv *priv, int port,
 	 */
 	if (fs->location == RX_CLS_LOC_ANY)
 		rule_index[0] = find_first_zero_bit(priv->cfp.used,
-						    bcm_sf2_cfp_rule_size(priv));
+						    priv->num_cfp_rules);
 	else
 		rule_index[0] = fs->location;
+	if (rule_index[0] > bcm_sf2_cfp_rule_size(priv))
+		return -ENOSPC;
 
 	/* Flag it as used (cleared on error path) such that we can immediately
 	 * obtain a second one to chain from.
@@ -573,7 +578,7 @@ static int bcm_sf2_cfp_ipv6_rule_set(struct bcm_sf2_priv *priv, int port,
 	set_bit(rule_index[0], priv->cfp.used);
 
 	rule_index[1] = find_first_zero_bit(priv->cfp.used,
-					    bcm_sf2_cfp_rule_size(priv));
+					    priv->num_cfp_rules);
 	if (rule_index[1] > bcm_sf2_cfp_rule_size(priv)) {
 		ret = -ENOSPC;
 		goto out_err;
-- 
2.14.1

^ permalink raw reply related

* [PATCH net-next 0/8] sctp: refactor sctp_outq_flush
From: Marcelo Ricardo Leitner @ 2018-05-11 23:28 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich, Xin Long

Currently sctp_outq_flush does many different things and arguably
unrelated, such as doing transport selection and outq dequeueing.

This patchset refactors it into smaller and more dedicated functions.
The end behavior should be the same.

The next patchset will rework the function parameters.

Marcelo Ricardo Leitner (8):
  sctp: add sctp_packet_singleton
  sctp: factor out sctp_outq_select_transport
  sctp: move the flush of ctrl chunks into its own function
  sctp: move outq data rtx code out of sctp_outq_flush
  sctp: move flushing of data chunks out of sctp_outq_flush
  sctp: move transport flush code out of sctp_outq_flush
  sctp: make use of gfp on retransmissions
  sctp: rework switch cases in sctp_outq_flush_data

 net/sctp/outqueue.c | 593 +++++++++++++++++++++++++++-------------------------
 1 file changed, 311 insertions(+), 282 deletions(-)

^ permalink raw reply

* [PATCH net-next 1/8] sctp: add sctp_packet_singleton
From: Marcelo Ricardo Leitner @ 2018-05-11 23:28 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich, Xin Long
In-Reply-To: <cover.1526077476.git.marcelo.leitner@gmail.com>

Factor out the code for generating singletons. It's used only once, but
helps to keep the context contained.

The const variables are to ease the reading of subsequent calls in there.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/outqueue.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index dee7cbd5483149024f2f3195db2fe4d473b1a00a..300bd0dfc7c14c9df579dbe2f9e78dd8356ae1a3 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -776,6 +776,20 @@ void sctp_outq_uncork(struct sctp_outq *q, gfp_t gfp)
 	sctp_outq_flush(q, 0, gfp);
 }
 
+static int sctp_packet_singleton(struct sctp_transport *transport,
+				 struct sctp_chunk *chunk, gfp_t gfp)
+{
+	const struct sctp_association *asoc = transport->asoc;
+	const __u16 sport = asoc->base.bind_addr.port;
+	const __u16 dport = asoc->peer.port;
+	const __u32 vtag = asoc->peer.i.init_tag;
+	struct sctp_packet singleton;
+
+	sctp_packet_init(&singleton, transport, sport, dport);
+	sctp_packet_config(&singleton, vtag, 0);
+	sctp_packet_append_chunk(&singleton, chunk);
+	return sctp_packet_transmit(&singleton, gfp);
+}
 
 /*
  * Try to flush an outqueue.
@@ -789,10 +803,7 @@ void sctp_outq_uncork(struct sctp_outq *q, gfp_t gfp)
 static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 {
 	struct sctp_packet *packet;
-	struct sctp_packet singleton;
 	struct sctp_association *asoc = q->asoc;
-	__u16 sport = asoc->base.bind_addr.port;
-	__u16 dport = asoc->peer.port;
 	__u32 vtag = asoc->peer.i.init_tag;
 	struct sctp_transport *transport = NULL;
 	struct sctp_transport *new_transport;
@@ -905,10 +916,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 		case SCTP_CID_INIT:
 		case SCTP_CID_INIT_ACK:
 		case SCTP_CID_SHUTDOWN_COMPLETE:
-			sctp_packet_init(&singleton, transport, sport, dport);
-			sctp_packet_config(&singleton, vtag, 0);
-			sctp_packet_append_chunk(&singleton, chunk);
-			error = sctp_packet_transmit(&singleton, gfp);
+			error = sctp_packet_singleton(transport, chunk, gfp);
 			if (error < 0) {
 				asoc->base.sk->sk_err = -error;
 				return;
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 2/8] sctp: factor out sctp_outq_select_transport
From: Marcelo Ricardo Leitner @ 2018-05-11 23:28 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich, Xin Long
In-Reply-To: <cover.1526077476.git.marcelo.leitner@gmail.com>

We had two spots doing such complex operation and they were very close to
each other, a bit more tailored to here or there.

This patch unifies these under the same function,
sctp_outq_select_transport, which knows how to handle control chunks and
original transmissions (but not retransmissions).

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/outqueue.c | 187 +++++++++++++++++++++++++---------------------------
 1 file changed, 90 insertions(+), 97 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 300bd0dfc7c14c9df579dbe2f9e78dd8356ae1a3..bda50596d4bfebeac03966c5a161473df1c1986a 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -791,6 +791,90 @@ static int sctp_packet_singleton(struct sctp_transport *transport,
 	return sctp_packet_transmit(&singleton, gfp);
 }
 
+static bool sctp_outq_select_transport(struct sctp_chunk *chunk,
+				       struct sctp_association *asoc,
+				       struct sctp_transport **transport,
+				       struct list_head *transport_list)
+{
+	struct sctp_transport *new_transport = chunk->transport;
+	struct sctp_transport *curr = *transport;
+	bool changed = false;
+
+	if (!new_transport) {
+		if (!sctp_chunk_is_data(chunk)) {
+			/*
+			 * If we have a prior transport pointer, see if
+			 * the destination address of the chunk
+			 * matches the destination address of the
+			 * current transport.  If not a match, then
+			 * try to look up the transport with a given
+			 * destination address.  We do this because
+			 * after processing ASCONFs, we may have new
+			 * transports created.
+			 */
+			if (curr && sctp_cmp_addr_exact(&chunk->dest,
+							&curr->ipaddr))
+				new_transport = curr;
+			else
+				new_transport = sctp_assoc_lookup_paddr(asoc,
+								  &chunk->dest);
+		}
+
+		/* if we still don't have a new transport, then
+		 * use the current active path.
+		 */
+		if (!new_transport)
+			new_transport = asoc->peer.active_path;
+	} else {
+		__u8 type;
+
+		switch (new_transport->state) {
+		case SCTP_INACTIVE:
+		case SCTP_UNCONFIRMED:
+		case SCTP_PF:
+			/* If the chunk is Heartbeat or Heartbeat Ack,
+			 * send it to chunk->transport, even if it's
+			 * inactive.
+			 *
+			 * 3.3.6 Heartbeat Acknowledgement:
+			 * ...
+			 * A HEARTBEAT ACK is always sent to the source IP
+			 * address of the IP datagram containing the
+			 * HEARTBEAT chunk to which this ack is responding.
+			 * ...
+			 *
+			 * ASCONF_ACKs also must be sent to the source.
+			 */
+			type = chunk->chunk_hdr->type;
+			if (type != SCTP_CID_HEARTBEAT &&
+			    type != SCTP_CID_HEARTBEAT_ACK &&
+			    type != SCTP_CID_ASCONF_ACK)
+				new_transport = asoc->peer.active_path;
+			break;
+		default:
+			break;
+		}
+	}
+
+	/* Are we switching transports? Take care of transport locks. */
+	if (new_transport != curr) {
+		changed = true;
+		curr = new_transport;
+		*transport = curr;
+		if (list_empty(&curr->send_ready))
+			list_add_tail(&curr->send_ready, transport_list);
+
+		sctp_packet_config(&curr->packet, asoc->peer.i.init_tag,
+				   asoc->peer.ecn_capable);
+		/* We've switched transports, so apply the
+		 * Burst limit to the new transport.
+		 */
+		sctp_transport_burst_limited(curr);
+	}
+
+	return changed;
+}
+
 /*
  * Try to flush an outqueue.
  *
@@ -806,7 +890,6 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 	struct sctp_association *asoc = q->asoc;
 	__u32 vtag = asoc->peer.i.init_tag;
 	struct sctp_transport *transport = NULL;
-	struct sctp_transport *new_transport;
 	struct sctp_chunk *chunk, *tmp;
 	enum sctp_xmit status;
 	int error = 0;
@@ -843,68 +926,12 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 
 		list_del_init(&chunk->list);
 
-		/* Pick the right transport to use. */
-		new_transport = chunk->transport;
-
-		if (!new_transport) {
-			/*
-			 * If we have a prior transport pointer, see if
-			 * the destination address of the chunk
-			 * matches the destination address of the
-			 * current transport.  If not a match, then
-			 * try to look up the transport with a given
-			 * destination address.  We do this because
-			 * after processing ASCONFs, we may have new
-			 * transports created.
-			 */
-			if (transport &&
-			    sctp_cmp_addr_exact(&chunk->dest,
-						&transport->ipaddr))
-					new_transport = transport;
-			else
-				new_transport = sctp_assoc_lookup_paddr(asoc,
-								&chunk->dest);
-
-			/* if we still don't have a new transport, then
-			 * use the current active path.
-			 */
-			if (!new_transport)
-				new_transport = asoc->peer.active_path;
-		} else if ((new_transport->state == SCTP_INACTIVE) ||
-			   (new_transport->state == SCTP_UNCONFIRMED) ||
-			   (new_transport->state == SCTP_PF)) {
-			/* If the chunk is Heartbeat or Heartbeat Ack,
-			 * send it to chunk->transport, even if it's
-			 * inactive.
-			 *
-			 * 3.3.6 Heartbeat Acknowledgement:
-			 * ...
-			 * A HEARTBEAT ACK is always sent to the source IP
-			 * address of the IP datagram containing the
-			 * HEARTBEAT chunk to which this ack is responding.
-			 * ...
-			 *
-			 * ASCONF_ACKs also must be sent to the source.
-			 */
-			if (chunk->chunk_hdr->type != SCTP_CID_HEARTBEAT &&
-			    chunk->chunk_hdr->type != SCTP_CID_HEARTBEAT_ACK &&
-			    chunk->chunk_hdr->type != SCTP_CID_ASCONF_ACK)
-				new_transport = asoc->peer.active_path;
-		}
-
-		/* Are we switching transports?
-		 * Take care of transport locks.
+		/* Pick the right transport to use. Should always be true for
+		 * the first chunk as we don't have a transport by then.
 		 */
-		if (new_transport != transport) {
-			transport = new_transport;
-			if (list_empty(&transport->send_ready)) {
-				list_add_tail(&transport->send_ready,
-					      &transport_list);
-			}
+		if (sctp_outq_select_transport(chunk, asoc, &transport,
+					       &transport_list))
 			packet = &transport->packet;
-			sctp_packet_config(packet, vtag,
-					   asoc->peer.ecn_capable);
-		}
 
 		switch (chunk->chunk_hdr->type) {
 		/*
@@ -1072,43 +1099,9 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 				goto sctp_flush_out;
 			}
 
-			/* If there is a specified transport, use it.
-			 * Otherwise, we want to use the active path.
-			 */
-			new_transport = chunk->transport;
-			if (!new_transport ||
-			    ((new_transport->state == SCTP_INACTIVE) ||
-			     (new_transport->state == SCTP_UNCONFIRMED) ||
-			     (new_transport->state == SCTP_PF)))
-				new_transport = asoc->peer.active_path;
-			if (new_transport->state == SCTP_UNCONFIRMED) {
-				WARN_ONCE(1, "Attempt to send packet on unconfirmed path.");
-				sctp_sched_dequeue_done(q, chunk);
-				sctp_chunk_fail(chunk, 0);
-				sctp_chunk_free(chunk);
-				continue;
-			}
-
-			/* Change packets if necessary.  */
-			if (new_transport != transport) {
-				transport = new_transport;
-
-				/* Schedule to have this transport's
-				 * packet flushed.
-				 */
-				if (list_empty(&transport->send_ready)) {
-					list_add_tail(&transport->send_ready,
-						      &transport_list);
-				}
-
+			if (sctp_outq_select_transport(chunk, asoc, &transport,
+						       &transport_list))
 				packet = &transport->packet;
-				sctp_packet_config(packet, vtag,
-						   asoc->peer.ecn_capable);
-				/* We've switched transports, so apply the
-				 * Burst limit to the new transport.
-				 */
-				sctp_transport_burst_limited(transport);
-			}
 
 			pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p "
 				 "skb->users:%d\n",
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 3/8] sctp: move the flush of ctrl chunks into its own function
From: Marcelo Ricardo Leitner @ 2018-05-11 23:28 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich, Xin Long
In-Reply-To: <cover.1526077476.git.marcelo.leitner@gmail.com>

Named sctp_outq_flush_ctrl and, with that, keep the contexts contained.

One small fix embedded is the reset of one_packet at every iteration.
This allows bundling of some control chunks in case they were preceded by
another control chunk that cannot be bundled.

Other than this, it has the same behavior.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/outqueue.c | 89 ++++++++++++++++++++++++++++++++---------------------
 1 file changed, 54 insertions(+), 35 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index bda50596d4bfebeac03966c5a161473df1c1986a..1081e1eea703be5d65d9828c3e4265fbb7a155f9 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -875,45 +875,21 @@ static bool sctp_outq_select_transport(struct sctp_chunk *chunk,
 	return changed;
 }

-/*
- * Try to flush an outqueue.
- *
- * Description: Send everything in q which we legally can, subject to
- * congestion limitations.
- * * Note: This function can be called from multiple contexts so appropriate
- * locking concerns must be made.  Today we use the sock lock to protect
- * this function.
- */
-static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
+static void sctp_outq_flush_ctrl(struct sctp_outq *q,
+				 struct sctp_transport **_transport,
+				 struct list_head *transport_list,
+				 gfp_t gfp)
 {
-	struct sctp_packet *packet;
+	struct sctp_transport *transport = *_transport;
 	struct sctp_association *asoc = q->asoc;
-	__u32 vtag = asoc->peer.i.init_tag;
-	struct sctp_transport *transport = NULL;
+	struct sctp_packet *packet = NULL;
 	struct sctp_chunk *chunk, *tmp;
 	enum sctp_xmit status;
-	int error = 0;
-	int start_timer = 0;
-	int one_packet = 0;
-
-	/* These transports have chunks to send. */
-	struct list_head transport_list;
-	struct list_head *ltransport;
-
-	INIT_LIST_HEAD(&transport_list);
-	packet = NULL;
-
-	/*
-	 * 6.10 Bundling
-	 *   ...
-	 *   When bundling control chunks with DATA chunks, an
-	 *   endpoint MUST place control chunks first in the outbound
-	 *   SCTP packet.  The transmitter MUST transmit DATA chunks
-	 *   within a SCTP packet in increasing order of TSN.
-	 *   ...
-	 */
+	int one_packet, error;

 	list_for_each_entry_safe(chunk, tmp, &q->control_chunk_list, list) {
+		one_packet = 0;
+
 		/* RFC 5061, 5.3
 		 * F1) This means that until such time as the ASCONF
 		 * containing the add is acknowledged, the sender MUST
@@ -930,8 +906,10 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 		 * the first chunk as we don't have a transport by then.
 		 */
 		if (sctp_outq_select_transport(chunk, asoc, &transport,
-					       &transport_list))
+					       &transport_list)) {
+			transport = *_transport;
 			packet = &transport->packet;
+		}

 		switch (chunk->chunk_hdr->type) {
 		/*
@@ -954,6 +932,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 			if (sctp_test_T_bit(chunk))
 				packet->vtag = asoc->c.my_vtag;
 			/* fallthru */
+
 		/* The following chunks are "response" chunks, i.e.
 		 * they are generated in response to something we
 		 * received.  If we are sending these, then we can
@@ -979,7 +958,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 		case SCTP_CID_RECONF:
 			status = sctp_packet_transmit_chunk(packet, chunk,
 							    one_packet, gfp);
-			if (status  != SCTP_XMIT_OK) {
+			if (status != SCTP_XMIT_OK) {
 				/* put the chunk back */
 				list_add(&chunk->list, &q->control_chunk_list);
 				break;
@@ -1006,6 +985,46 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 			BUG();
 		}
 	}
+}
+
+/*
+ * Try to flush an outqueue.
+ *
+ * Description: Send everything in q which we legally can, subject to
+ * congestion limitations.
+ * * Note: This function can be called from multiple contexts so appropriate
+ * locking concerns must be made.  Today we use the sock lock to protect
+ * this function.
+ */
+static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
+{
+	struct sctp_packet *packet;
+	struct sctp_association *asoc = q->asoc;
+	__u32 vtag = asoc->peer.i.init_tag;
+	struct sctp_transport *transport = NULL;
+	struct sctp_chunk *chunk;
+	enum sctp_xmit status;
+	int error = 0;
+	int start_timer = 0;
+
+	/* These transports have chunks to send. */
+	struct list_head transport_list;
+	struct list_head *ltransport;
+
+	INIT_LIST_HEAD(&transport_list);
+	packet = NULL;
+
+	/*
+	 * 6.10 Bundling
+	 *   ...
+	 *   When bundling control chunks with DATA chunks, an
+	 *   endpoint MUST place control chunks first in the outbound
+	 *   SCTP packet.  The transmitter MUST transmit DATA chunks
+	 *   within a SCTP packet in increasing order of TSN.
+	 *   ...
+	 */
+
+	sctp_outq_flush_ctrl(q, &transport, &transport_list, gfp);

 	if (q->asoc->src_out_of_asoc_ok)
 		goto sctp_flush_out;

^ permalink raw reply related

* [PATCH net-next 4/8] sctp: move outq data rtx code out of sctp_outq_flush
From: Marcelo Ricardo Leitner @ 2018-05-11 23:28 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich, Xin Long
In-Reply-To: <cover.1526077476.git.marcelo.leitner@gmail.com>

This patch renames current sctp_outq_flush_rtx to __sctp_outq_flush_rtx
and create a new sctp_outq_flush_rtx, with the code that was on
sctp_outq_flush. Again, the idea is to have functions with small and
defined objectives.

Yes, there is an open-coded path selection in the now sctp_outq_flush_rtx.
That is kept as is for now because it may be very different when we
implement retransmission path selection algorithms for CMT-SCTP.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/outqueue.c | 101 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 58 insertions(+), 43 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 1081e1eea703be5d65d9828c3e4265fbb7a155f9..74c3961eec4fca8b4ce9bb380f8465fae4625763 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -601,14 +601,14 @@ void sctp_retransmit(struct sctp_outq *q, struct sctp_transport *transport,
 
 /*
  * Transmit DATA chunks on the retransmit queue.  Upon return from
- * sctp_outq_flush_rtx() the packet 'pkt' may contain chunks which
+ * __sctp_outq_flush_rtx() the packet 'pkt' may contain chunks which
  * need to be transmitted by the caller.
  * We assume that pkt->transport has already been set.
  *
  * The return value is a normal kernel error return value.
  */
-static int sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt,
-			       int rtx_timeout, int *start_timer)
+static int __sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt,
+				 int rtx_timeout, int *start_timer)
 {
 	struct sctp_transport *transport = pkt->transport;
 	struct sctp_chunk *chunk, *chunk1;
@@ -987,6 +987,57 @@ static void sctp_outq_flush_ctrl(struct sctp_outq *q,
 	}
 }
 
+/* Returns false if new data shouldn't be sent */
+static bool sctp_outq_flush_rtx(struct sctp_outq *q,
+				struct sctp_transport **_transport,
+				struct list_head *transport_list,
+				int rtx_timeout)
+{
+	struct sctp_transport *transport = *_transport;
+	struct sctp_packet *packet = transport ? &transport->packet : NULL;
+	struct sctp_association *asoc = q->asoc;
+	int error, start_timer = 0;
+
+	if (asoc->peer.retran_path->state == SCTP_UNCONFIRMED)
+		return false;
+
+	if (transport != asoc->peer.retran_path) {
+		/* Switch transports & prepare the packet.  */
+		transport = asoc->peer.retran_path;
+		*_transport = transport;
+
+		if (list_empty(&transport->send_ready))
+			list_add_tail(&transport->send_ready,
+				      transport_list);
+
+		packet = &transport->packet;
+		sctp_packet_config(packet, asoc->peer.i.init_tag,
+				   asoc->peer.ecn_capable);
+	}
+
+	error = __sctp_outq_flush_rtx(q, packet, rtx_timeout, &start_timer);
+	if (error < 0)
+		asoc->base.sk->sk_err = -error;
+
+	if (start_timer) {
+		sctp_transport_reset_t3_rtx(transport);
+		transport->last_time_sent = jiffies;
+	}
+
+	/* This can happen on COOKIE-ECHO resend.  Only
+	 * one chunk can get bundled with a COOKIE-ECHO.
+	 */
+	if (packet->has_cookie_echo)
+		return false;
+
+	/* Don't send new data if there is still data
+	 * waiting to retransmit.
+	 */
+	if (!list_empty(&q->retransmit))
+		return false;
+
+	return true;
+}
 /*
  * Try to flush an outqueue.
  *
@@ -1000,12 +1051,10 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 {
 	struct sctp_packet *packet;
 	struct sctp_association *asoc = q->asoc;
-	__u32 vtag = asoc->peer.i.init_tag;
 	struct sctp_transport *transport = NULL;
 	struct sctp_chunk *chunk;
 	enum sctp_xmit status;
 	int error = 0;
-	int start_timer = 0;
 
 	/* These transports have chunks to send. */
 	struct list_head transport_list;
@@ -1052,45 +1101,11 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 		 * current cwnd).
 		 */
 		if (!list_empty(&q->retransmit)) {
-			if (asoc->peer.retran_path->state == SCTP_UNCONFIRMED)
-				goto sctp_flush_out;
-			if (transport == asoc->peer.retran_path)
-				goto retran;
-
-			/* Switch transports & prepare the packet.  */
-
-			transport = asoc->peer.retran_path;
-
-			if (list_empty(&transport->send_ready)) {
-				list_add_tail(&transport->send_ready,
-					      &transport_list);
-			}
-
+			if (!sctp_outq_flush_rtx(q, _transport, transport_list,
+						 rtx_timeout))
+				break;
+			/* We may have switched current transport */
 			packet = &transport->packet;
-			sctp_packet_config(packet, vtag,
-					   asoc->peer.ecn_capable);
-		retran:
-			error = sctp_outq_flush_rtx(q, packet,
-						    rtx_timeout, &start_timer);
-			if (error < 0)
-				asoc->base.sk->sk_err = -error;
-
-			if (start_timer) {
-				sctp_transport_reset_t3_rtx(transport);
-				transport->last_time_sent = jiffies;
-			}
-
-			/* This can happen on COOKIE-ECHO resend.  Only
-			 * one chunk can get bundled with a COOKIE-ECHO.
-			 */
-			if (packet->has_cookie_echo)
-				goto sctp_flush_out;
-
-			/* Don't send new data if there is still data
-			 * waiting to retransmit.
-			 */
-			if (!list_empty(&q->retransmit))
-				goto sctp_flush_out;
 		}
 
 		/* Apply Max.Burst limitation to the current transport in
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 5/8] sctp: move flushing of data chunks out of sctp_outq_flush
From: Marcelo Ricardo Leitner @ 2018-05-11 23:28 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich, Xin Long
In-Reply-To: <cover.1526077476.git.marcelo.leitner@gmail.com>

To the new sctp_outq_flush_data. Again, smaller functions and with well
defined objectives.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/outqueue.c | 144 ++++++++++++++++++++++++++--------------------------
 1 file changed, 73 insertions(+), 71 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 74c3961eec4fca8b4ce9bb380f8465fae4625763..e445a59db26004553984088d50e458a93b03dcb8 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1038,45 +1038,17 @@ static bool sctp_outq_flush_rtx(struct sctp_outq *q,
 
 	return true;
 }
-/*
- * Try to flush an outqueue.
- *
- * Description: Send everything in q which we legally can, subject to
- * congestion limitations.
- * * Note: This function can be called from multiple contexts so appropriate
- * locking concerns must be made.  Today we use the sock lock to protect
- * this function.
- */
-static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
+
+static void sctp_outq_flush_data(struct sctp_outq *q,
+				 struct sctp_transport **_transport,
+				 struct list_head *transport_list,
+				 int rtx_timeout, gfp_t gfp)
 {
-	struct sctp_packet *packet;
+	struct sctp_transport *transport = *_transport;
+	struct sctp_packet *packet = transport ? &transport->packet : NULL;
 	struct sctp_association *asoc = q->asoc;
-	struct sctp_transport *transport = NULL;
 	struct sctp_chunk *chunk;
 	enum sctp_xmit status;
-	int error = 0;
-
-	/* These transports have chunks to send. */
-	struct list_head transport_list;
-	struct list_head *ltransport;
-
-	INIT_LIST_HEAD(&transport_list);
-	packet = NULL;
-
-	/*
-	 * 6.10 Bundling
-	 *   ...
-	 *   When bundling control chunks with DATA chunks, an
-	 *   endpoint MUST place control chunks first in the outbound
-	 *   SCTP packet.  The transmitter MUST transmit DATA chunks
-	 *   within a SCTP packet in increasing order of TSN.
-	 *   ...
-	 */
-
-	sctp_outq_flush_ctrl(q, &transport, &transport_list, gfp);
-
-	if (q->asoc->src_out_of_asoc_ok)
-		goto sctp_flush_out;
 
 	/* Is it OK to send data chunks?  */
 	switch (asoc->state) {
@@ -1105,6 +1077,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 						 rtx_timeout))
 				break;
 			/* We may have switched current transport */
+			transport = *_transport;
 			packet = &transport->packet;
 		}
 
@@ -1130,12 +1103,14 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 
 			if (asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
 				sctp_outq_head_data(q, chunk);
-				goto sctp_flush_out;
+				break;
 			}
 
 			if (sctp_outq_select_transport(chunk, asoc, &transport,
-						       &transport_list))
+						       &transport_list)) {
+				transport = *_transport;
 				packet = &transport->packet;
+			}
 
 			pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p "
 				 "skb->users:%d\n",
@@ -1147,8 +1122,10 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 
 			/* Add the chunk to the packet.  */
 			status = sctp_packet_transmit_chunk(packet, chunk, 0, gfp);
-
 			switch (status) {
+			case SCTP_XMIT_OK:
+				break;
+
 			case SCTP_XMIT_PMTU_FULL:
 			case SCTP_XMIT_RWND_FULL:
 			case SCTP_XMIT_DELAY:
@@ -1160,41 +1137,25 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 					 status);
 
 				sctp_outq_head_data(q, chunk);
-				goto sctp_flush_out;
-
-			case SCTP_XMIT_OK:
-				/* The sender is in the SHUTDOWN-PENDING state,
-				 * The sender MAY set the I-bit in the DATA
-				 * chunk header.
-				 */
-				if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING)
-					chunk->chunk_hdr->flags |= SCTP_DATA_SACK_IMM;
-				if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED)
-					asoc->stats.ouodchunks++;
-				else
-					asoc->stats.oodchunks++;
-
-				/* Only now it's safe to consider this
-				 * chunk as sent, sched-wise.
-				 */
-				sctp_sched_dequeue_done(q, chunk);
-
-				break;
-
-			default:
-				BUG();
+				return;
 			}
 
-			/* BUG: We assume that the sctp_packet_transmit()
-			 * call below will succeed all the time and add the
-			 * chunk to the transmitted list and restart the
-			 * timers.
-			 * It is possible that the call can fail under OOM
-			 * conditions.
-			 *
-			 * Is this really a problem?  Won't this behave
-			 * like a lost TSN?
+			/* The sender is in the SHUTDOWN-PENDING state,
+			 * The sender MAY set the I-bit in the DATA
+			 * chunk header.
 			 */
+			if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING)
+				chunk->chunk_hdr->flags |= SCTP_DATA_SACK_IMM;
+			if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED)
+				asoc->stats.ouodchunks++;
+			else
+				asoc->stats.oodchunks++;
+
+			/* Only now it's safe to consider this
+			 * chunk as sent, sched-wise.
+			 */
+			sctp_sched_dequeue_done(q, chunk);
+
 			list_add_tail(&chunk->transmitted_list,
 				      &transport->transmitted);
 
@@ -1205,7 +1166,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 			 * COOKIE-ECHO chunk.
 			 */
 			if (packet->has_cookie_echo)
-				goto sctp_flush_out;
+				break;
 		}
 		break;
 
@@ -1213,6 +1174,47 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 		/* Do nothing.  */
 		break;
 	}
+}
+
+/*
+ * Try to flush an outqueue.
+ *
+ * Description: Send everything in q which we legally can, subject to
+ * congestion limitations.
+ * * Note: This function can be called from multiple contexts so appropriate
+ * locking concerns must be made.  Today we use the sock lock to protect
+ * this function.
+ */
+static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
+{
+	struct sctp_packet *packet;
+	struct sctp_association *asoc = q->asoc;
+	struct sctp_transport *transport = NULL;
+	int error = 0;
+
+	/* These transports have chunks to send. */
+	struct list_head transport_list;
+	struct list_head *ltransport;
+
+	INIT_LIST_HEAD(&transport_list);
+	packet = NULL;
+
+	/*
+	 * 6.10 Bundling
+	 *   ...
+	 *   When bundling control chunks with DATA chunks, an
+	 *   endpoint MUST place control chunks first in the outbound
+	 *   SCTP packet.  The transmitter MUST transmit DATA chunks
+	 *   within a SCTP packet in increasing order of TSN.
+	 *   ...
+	 */
+
+	sctp_outq_flush_ctrl(q, &transport, &transport_list, gfp);
+
+	if (q->asoc->src_out_of_asoc_ok)
+		goto sctp_flush_out;
+
+	sctp_outq_flush_data(q, &transport, &transport_list, rtx_timeout, gfp);
 
 sctp_flush_out:
 
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 6/8] sctp: move transport flush code out of sctp_outq_flush
From: Marcelo Ricardo Leitner @ 2018-05-11 23:29 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich, Xin Long
In-Reply-To: <cover.1526077476.git.marcelo.leitner@gmail.com>

To the new sctp_outq_flush_transports.

Comment on Nagle is outdated and removed. Nagle is performed earlier, while
checking if the chunk fits the packet: if the outq length is not enough to
fill the packet, it returns SCTP_XMIT_DELAY.

So by when it gets to sctp_outq_flush_transports, it has to go through all
enlisted transports.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/outqueue.c | 56 +++++++++++++++++++++++++----------------------------
 1 file changed, 26 insertions(+), 30 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index e445a59db26004553984088d50e458a93b03dcb8..e867bde0b2d93f730f0cb89ad2f54a2094f47833 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1176,6 +1176,29 @@ static void sctp_outq_flush_data(struct sctp_outq *q,
 	}
 }
 
+static void sctp_outq_flush_transports(struct sctp_outq *q,
+				       struct list_head *transport_list,
+				       gfp_t gfp)
+{
+	struct list_head *ltransport;
+	struct sctp_packet *packet;
+	struct sctp_transport *t;
+	int error = 0;
+
+	while ((ltransport = sctp_list_dequeue(transport_list)) != NULL) {
+		t = list_entry(ltransport, struct sctp_transport, send_ready);
+		packet = &t->packet;
+		if (!sctp_packet_empty(packet)) {
+			error = sctp_packet_transmit(packet, gfp);
+			if (error < 0)
+				q->asoc->base.sk->sk_err = -error;
+		}
+
+		/* Clear the burst limited state, if any */
+		sctp_transport_burst_reset(t);
+	}
+}
+
 /*
  * Try to flush an outqueue.
  *
@@ -1187,17 +1210,10 @@ static void sctp_outq_flush_data(struct sctp_outq *q,
  */
 static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 {
-	struct sctp_packet *packet;
-	struct sctp_association *asoc = q->asoc;
+	/* Current transport being used. It's NOT the same as curr active one */
 	struct sctp_transport *transport = NULL;
-	int error = 0;
-
 	/* These transports have chunks to send. */
-	struct list_head transport_list;
-	struct list_head *ltransport;
-
-	INIT_LIST_HEAD(&transport_list);
-	packet = NULL;
+	LIST_HEAD(transport_list);
 
 	/*
 	 * 6.10 Bundling
@@ -1218,27 +1234,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 
 sctp_flush_out:
 
-	/* Before returning, examine all the transports touched in
-	 * this call.  Right now, we bluntly force clear all the
-	 * transports.  Things might change after we implement Nagle.
-	 * But such an examination is still required.
-	 *
-	 * --xguo
-	 */
-	while ((ltransport = sctp_list_dequeue(&transport_list)) != NULL) {
-		struct sctp_transport *t = list_entry(ltransport,
-						      struct sctp_transport,
-						      send_ready);
-		packet = &t->packet;
-		if (!sctp_packet_empty(packet)) {
-			error = sctp_packet_transmit(packet, gfp);
-			if (error < 0)
-				asoc->base.sk->sk_err = -error;
-		}
-
-		/* Clear the burst limited state, if any */
-		sctp_transport_burst_reset(t);
-	}
+	sctp_outq_flush_transports(q, &transport_list, gfp);
 }
 
 /* Update unack_data based on the incoming SACK chunk */
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 7/8] sctp: make use of gfp on retransmissions
From: Marcelo Ricardo Leitner @ 2018-05-11 23:29 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich, Xin Long
In-Reply-To: <cover.1526077476.git.marcelo.leitner@gmail.com>

Retransmissions may be triggered when in user context, so lets make use
of gfp.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/outqueue.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index e867bde0b2d93f730f0cb89ad2f54a2094f47833..388e0665057be6ca7864b8bfdc0925e95e8b2858 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -608,7 +608,7 @@ void sctp_retransmit(struct sctp_outq *q, struct sctp_transport *transport,
  * The return value is a normal kernel error return value.
  */
 static int __sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt,
-				 int rtx_timeout, int *start_timer)
+				 int rtx_timeout, int *start_timer, gfp_t gfp)
 {
 	struct sctp_transport *transport = pkt->transport;
 	struct sctp_chunk *chunk, *chunk1;
@@ -684,12 +684,12 @@ static int __sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt,
 				 * control chunks are already freed so there
 				 * is nothing we can do.
 				 */
-				sctp_packet_transmit(pkt, GFP_ATOMIC);
+				sctp_packet_transmit(pkt, gfp);
 				goto redo;
 			}
 
 			/* Send this packet.  */
-			error = sctp_packet_transmit(pkt, GFP_ATOMIC);
+			error = sctp_packet_transmit(pkt, gfp);
 
 			/* If we are retransmitting, we should only
 			 * send a single packet.
@@ -705,7 +705,7 @@ static int __sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt,
 
 		case SCTP_XMIT_RWND_FULL:
 			/* Send this packet. */
-			error = sctp_packet_transmit(pkt, GFP_ATOMIC);
+			error = sctp_packet_transmit(pkt, gfp);
 
 			/* Stop sending DATA as there is no more room
 			 * at the receiver.
@@ -715,7 +715,7 @@ static int __sctp_outq_flush_rtx(struct sctp_outq *q, struct sctp_packet *pkt,
 
 		case SCTP_XMIT_DELAY:
 			/* Send this packet. */
-			error = sctp_packet_transmit(pkt, GFP_ATOMIC);
+			error = sctp_packet_transmit(pkt, gfp);
 
 			/* Stop sending DATA because of nagle delay. */
 			done = 1;
@@ -991,7 +991,7 @@ static void sctp_outq_flush_ctrl(struct sctp_outq *q,
 static bool sctp_outq_flush_rtx(struct sctp_outq *q,
 				struct sctp_transport **_transport,
 				struct list_head *transport_list,
-				int rtx_timeout)
+				int rtx_timeout, gfp_t gfp)
 {
 	struct sctp_transport *transport = *_transport;
 	struct sctp_packet *packet = transport ? &transport->packet : NULL;
@@ -1015,7 +1015,8 @@ static bool sctp_outq_flush_rtx(struct sctp_outq *q,
 				   asoc->peer.ecn_capable);
 	}
 
-	error = __sctp_outq_flush_rtx(q, packet, rtx_timeout, &start_timer);
+	error = __sctp_outq_flush_rtx(q, packet, rtx_timeout, &start_timer,
+				      gfp);
 	if (error < 0)
 		asoc->base.sk->sk_err = -error;
 
@@ -1074,7 +1075,7 @@ static void sctp_outq_flush_data(struct sctp_outq *q,
 		 */
 		if (!list_empty(&q->retransmit)) {
 			if (!sctp_outq_flush_rtx(q, _transport, transport_list,
-						 rtx_timeout))
+						 rtx_timeout, gfp))
 				break;
 			/* We may have switched current transport */
 			transport = *_transport;
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 8/8] sctp: rework switch cases in sctp_outq_flush_data
From: Marcelo Ricardo Leitner @ 2018-05-11 23:29 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich, Xin Long
In-Reply-To: <cover.1526077476.git.marcelo.leitner@gmail.com>

Remove an inner one, which tended to be error prone due to the cascading
and it can be replaced by a simple if ().

Rework the outer one so that the actual flush code is not inside it. Now
we first validate if we can or cannot send data, return if not, and then
the flush code.

Suggested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/outqueue.c | 191 +++++++++++++++++++++++++---------------------------
 1 file changed, 93 insertions(+), 98 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 388e0665057be6ca7864b8bfdc0925e95e8b2858..c7f65bcd7bd6ee6996080d091bda1651f7bb8c44 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1058,122 +1058,117 @@ static void sctp_outq_flush_data(struct sctp_outq *q,
 		 * chunk.
 		 */
 		if (!packet || !packet->has_cookie_echo)
-			break;
+			return;
 
 		/* fallthru */
 	case SCTP_STATE_ESTABLISHED:
 	case SCTP_STATE_SHUTDOWN_PENDING:
 	case SCTP_STATE_SHUTDOWN_RECEIVED:
-		/*
-		 * RFC 2960 6.1  Transmission of DATA Chunks
-		 *
-		 * C) When the time comes for the sender to transmit,
-		 * before sending new DATA chunks, the sender MUST
-		 * first transmit any outstanding DATA chunks which
-		 * are marked for retransmission (limited by the
-		 * current cwnd).
-		 */
-		if (!list_empty(&q->retransmit)) {
-			if (!sctp_outq_flush_rtx(q, _transport, transport_list,
-						 rtx_timeout, gfp))
-				break;
-			/* We may have switched current transport */
-			transport = *_transport;
-			packet = &transport->packet;
-		}
+		break;
 
-		/* Apply Max.Burst limitation to the current transport in
-		 * case it will be used for new data.  We are going to
-		 * rest it before we return, but we want to apply the limit
-		 * to the currently queued data.
-		 */
-		if (transport)
-			sctp_transport_burst_limited(transport);
-
-		/* Finally, transmit new packets.  */
-		while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
-			__u32 sid = ntohs(chunk->subh.data_hdr->stream);
-
-			/* Has this chunk expired? */
-			if (sctp_chunk_abandoned(chunk)) {
-				sctp_sched_dequeue_done(q, chunk);
-				sctp_chunk_fail(chunk, 0);
-				sctp_chunk_free(chunk);
-				continue;
-			}
+	default:
+		/* Do nothing. */
+		return;
+	}
 
-			if (asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
-				sctp_outq_head_data(q, chunk);
-				break;
-			}
+	/*
+	 * RFC 2960 6.1  Transmission of DATA Chunks
+	 *
+	 * C) When the time comes for the sender to transmit,
+	 * before sending new DATA chunks, the sender MUST
+	 * first transmit any outstanding DATA chunks which
+	 * are marked for retransmission (limited by the
+	 * current cwnd).
+	 */
+	if (!list_empty(&q->retransmit)) {
+		if (!sctp_outq_flush_rtx(q, _transport, transport_list,
+					 rtx_timeout, gfp))
+			return;
+		/* We may have switched current transport */
+		transport = *_transport;
+		packet = &transport->packet;
+	}
 
-			if (sctp_outq_select_transport(chunk, asoc, &transport,
-						       &transport_list)) {
-				transport = *_transport;
-				packet = &transport->packet;
-			}
+	/* Apply Max.Burst limitation to the current transport in
+	 * case it will be used for new data.  We are going to
+	 * rest it before we return, but we want to apply the limit
+	 * to the currently queued data.
+	 */
+	if (transport)
+		sctp_transport_burst_limited(transport);
 
-			pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p "
-				 "skb->users:%d\n",
-				 __func__, q, chunk, chunk && chunk->chunk_hdr ?
-				 sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) :
-				 "illegal chunk", ntohl(chunk->subh.data_hdr->tsn),
-				 chunk->skb ? chunk->skb->head : NULL, chunk->skb ?
-				 refcount_read(&chunk->skb->users) : -1);
-
-			/* Add the chunk to the packet.  */
-			status = sctp_packet_transmit_chunk(packet, chunk, 0, gfp);
-			switch (status) {
-			case SCTP_XMIT_OK:
-				break;
+	/* Finally, transmit new packets.  */
+	while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
+		__u32 sid = ntohs(chunk->subh.data_hdr->stream);
 
-			case SCTP_XMIT_PMTU_FULL:
-			case SCTP_XMIT_RWND_FULL:
-			case SCTP_XMIT_DELAY:
-				/* We could not append this chunk, so put
-				 * the chunk back on the output queue.
-				 */
-				pr_debug("%s: could not transmit tsn:0x%x, status:%d\n",
-					 __func__, ntohl(chunk->subh.data_hdr->tsn),
-					 status);
+		/* Has this chunk expired? */
+		if (sctp_chunk_abandoned(chunk)) {
+			sctp_sched_dequeue_done(q, chunk);
+			sctp_chunk_fail(chunk, 0);
+			sctp_chunk_free(chunk);
+			continue;
+		}
 
-				sctp_outq_head_data(q, chunk);
-				return;
-			}
+		if (asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
+			sctp_outq_head_data(q, chunk);
+			break;
+		}
 
-			/* The sender is in the SHUTDOWN-PENDING state,
-			 * The sender MAY set the I-bit in the DATA
-			 * chunk header.
-			 */
-			if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING)
-				chunk->chunk_hdr->flags |= SCTP_DATA_SACK_IMM;
-			if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED)
-				asoc->stats.ouodchunks++;
-			else
-				asoc->stats.oodchunks++;
+		if (sctp_outq_select_transport(chunk, asoc, &transport,
+					       &transport_list)) {
+			transport = *_transport;
+			packet = &transport->packet;
+		}
 
-			/* Only now it's safe to consider this
-			 * chunk as sent, sched-wise.
+		pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p "
+			 "skb->users:%d\n",
+			 __func__, q, chunk, chunk && chunk->chunk_hdr ?
+			 sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) :
+			 "illegal chunk", ntohl(chunk->subh.data_hdr->tsn),
+			 chunk->skb ? chunk->skb->head : NULL, chunk->skb ?
+			 refcount_read(&chunk->skb->users) : -1);
+
+		/* Add the chunk to the packet.  */
+		status = sctp_packet_transmit_chunk(packet, chunk, 0, gfp);
+		if (status != SCTP_XMIT_OK) {
+			/* We could not append this chunk, so put
+			 * the chunk back on the output queue.
 			 */
-			sctp_sched_dequeue_done(q, chunk);
+			pr_debug("%s: could not transmit tsn:0x%x, status:%d\n",
+				 __func__, ntohl(chunk->subh.data_hdr->tsn),
+				 status);
 
-			list_add_tail(&chunk->transmitted_list,
-				      &transport->transmitted);
+			sctp_outq_head_data(q, chunk);
+			break;
+		}
 
-			sctp_transport_reset_t3_rtx(transport);
-			transport->last_time_sent = jiffies;
+		/* The sender is in the SHUTDOWN-PENDING state,
+		 * The sender MAY set the I-bit in the DATA
+		 * chunk header.
+		 */
+		if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING)
+			chunk->chunk_hdr->flags |= SCTP_DATA_SACK_IMM;
+		if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED)
+			asoc->stats.ouodchunks++;
+		else
+			asoc->stats.oodchunks++;
 
-			/* Only let one DATA chunk get bundled with a
-			 * COOKIE-ECHO chunk.
-			 */
-			if (packet->has_cookie_echo)
-				break;
-		}
-		break;
+		/* Only now it's safe to consider this
+		 * chunk as sent, sched-wise.
+		 */
+		sctp_sched_dequeue_done(q, chunk);
 
-	default:
-		/* Do nothing.  */
-		break;
+		list_add_tail(&chunk->transmitted_list,
+			      &transport->transmitted);
+
+		sctp_transport_reset_t3_rtx(transport);
+		transport->last_time_sent = jiffies;
+
+		/* Only let one DATA chunk get bundled with a
+		 * COOKIE-ECHO chunk.
+		 */
+		if (packet->has_cookie_echo)
+			break;
 	}
 }
 
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next 0/3] sctp: Introduce sctp_flush_ctx
From: Marcelo Ricardo Leitner @ 2018-05-11 23:29 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Vlad Yasevich, Xin Long

This struct will hold all the context used during the outq flush, so we
don't have to pass lots of pointers all around.

Checked on x86_64, the compiler inlines all these functions and there is no
derreference added because of the struct.

Marcelo Ricardo Leitner (3):
  sctp: add sctp_flush_ctx, a context struct on outq_flush routines
  sctp: add asoc and packet to sctp_flush_ctx
  sctp: checkpatch fixups

 net/sctp/outqueue.c | 259 ++++++++++++++++++++++++----------------------------
 1 file changed, 119 insertions(+), 140 deletions(-)

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox