Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH RFC 0/2] ila: ilarouter bpf code for tc and xdp
From: Alexei Starovoitov @ 2016-09-23 17:16 UTC (permalink / raw)
  To: David S . Miller
  Cc: Daniel Borkmann, Jesper Dangaard Brouer, Tom Herbert,
	Jamal Hadi Salim, Thomas Graf, netdev

From: Aaron Yue <haoxuany@andrew.cmu.edu>

Jesper,

here is old email and cover letter that didn't make it to the list
due to vger outage (I guess).
The verifier patch that Aaron is talking about has landed long ago.

The dataplane of ILA router is very short and simple.
Control plane is very different matter. It's not ready for prime time yet.

----------

This patch contains the tc and xdp implementation of kernelspace bpf code.
It requires userspace to insert to the ILA bpf maps, in tc's case, the 
precomputed ILA mappings, and in xdp's case, both the precomputed ILA
mappings and the MAC address.

The xdp bpf code also requires a verifier patch to allow direct map access
from the packet (will be patched in by Alexei).

Aaron Yue (2):
  samples/bpf: ilarouter for tc
  samples/bpf: ilarouter for xdp

 samples/bpf/Makefile        |   2 +
 samples/bpf/ila.h           |  80 ++++++++++++++++++++++++++++
 samples/bpf/ilarouter_tc.c  | 124 ++++++++++++++++++++++++++++++++++++++++++++
 samples/bpf/ilarouter_xdp.c |  88 +++++++++++++++++++++++++++++++
 samples/bpf/inet_helper.h   |  38 ++++++++++++++
 5 files changed, 332 insertions(+)
 create mode 100644 samples/bpf/ila.h
 create mode 100644 samples/bpf/ilarouter_tc.c
 create mode 100644 samples/bpf/ilarouter_xdp.c
 create mode 100644 samples/bpf/inet_helper.h

-- 
2.8.0.rc2

^ permalink raw reply

* [PATCH net] ip6_gre: fix flowi6_proto value in ip6gre_xmit_other()
From: Lance Richardson @ 2016-09-23 16:54 UTC (permalink / raw)
  To: netdev

Similar to commit 3be07244b733 ("ip6_gre: fix flowi6_proto value in
xmit path"), set flowi6_proto to IPPROTO_GRE for output route lookup.
Since the correct proto is already set in the tunnel flowi6 template via
commit 252f3f5a1189 ("ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit
path."), simply delete the line setting the incorrect flowi6_proto value.

Suggested-by: Jiri Benc <jbenc@redhat.com>
Fixes: commit c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: Lance Richardson <lrichard@redhat.com>
---
 net/ipv6/ip6_gre.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 704274c..edc3daa 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -648,7 +648,6 @@ static int ip6gre_xmit_other(struct sk_buff *skb, struct net_device *dev)
 		encap_limit = t->parms.encap_limit;
 
 	memcpy(&fl6, &t->fl.u.ip6, sizeof(fl6));
-	fl6.flowi6_proto = skb->protocol;
 
 	err = gre_handle_offloads(skb, !!(t->parms.o_flags & TUNNEL_CSUM));
 	if (err)
-- 
2.5.5

^ permalink raw reply related

* Re: Alignment issues with freescale FEC driver
From: Eric Dumazet @ 2016-09-23 16:54 UTC (permalink / raw)
  To: Eric Nelson
  Cc: linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	rmk+kernel, Fugang Duan, Troy Kisky, Otavio Salvador, Simone
In-Reply-To: <02afb707-65de-5101-a79b-355929c4e00b@nelint.com>

On Fri, Sep 23, 2016 at 9:43 AM, Eric Nelson <eric@nelint.com> wrote:
>
> Hello all,
>
> We're seeing alignment issues from the ethernet stack on an i.MX6UL board:
>
> root@mx6ul:~# cat /proc/cpu/alignment
> User: 0
> System: 470010 (inet_gro_receive+0x104/0x278)
>
> This seems to be related to the ip header alignment, and there
> was much discussion in mailing list threads [1] and [2].
>
> In particular, Russell referred to a patch here, but I haven't been
> able to find it:
> https://lists.linaro.org/pipermail/linaro-toolchain/2012-October/002844.html
>
> Eric Dumazet also suggested a path toward fixing it, but I don't quite
> understand the suggestion:
> http://www.spinics.net/lists/netdev/msg213166.htm
>
> The immediate problem is addressed by just reading the id and frag_offs
> fields in the iphdr structure as shown in this patch:
>
> commit 98810abc911b1286a7e4a2ebdfbad66f12fae19d
> Author: Eric Nelson <eric@nelint.com>
> Date: Fri Sep 23 08:26:03 2016 -0700
>
> net: ipv4: af_inet: don't read multiple 16-bit iphdr fields as a 32-bit
> value
>
> Change-Id: Idc7122c22c13ca078be31907d30ab1c3148ba807
> Signed-off-by: Eric Nelson <eric@nelint.com>
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 0cc98b1..c17ef6e 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1301,6 +1301,7 @@ static struct sk_buff **inet_gro_receive(struct
> sk_buff **head,
> unsigned int hlen;
> unsigned int off;
> unsigned int id;
> + unsigned int frag;
> int flush = 1;
> int proto;
>
> @@ -1326,9 +1327,9 @@ static struct sk_buff **inet_gro_receive(struct
> sk_buff **head,
> if (unlikely(ip_fast_csum((u8 *)iph, 5)))
> goto out_unlock;
>
> - id = ntohl(*(__be32 *)&iph->id);
> - flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF));
> - id >>= 16;
> + id = ntohs(*(__be16 *)&iph->id);
> + frag = ntohs(*(__be16 *)&iph->frag_off);
> + flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (frag &
> ~IP_DF));
>
> for (p = *head; p; p = p->next) {
> struct iphdr *iph2;
>

This solves nothing, because a few lines after you'll have yet another
unaligned access :


((__force u32)iph->saddr ^ (__force u32)iph2->saddr) |
((__force u32)iph->daddr ^ (__force u32)iph2->daddr)) {

So you might have one less problematic access, out of hundreds of them
all over the places.

Really the problem is that whole stack depends on the assumption that
IP headers are aligned on arches that care
(ie where NET_IP_ALIGN == 2)

If your build does have NET_IP_ALIGN = 2 and you get a fault here, it
might be because of a buggy driver.

The other known case is some GRE encapsulations that break the
assumption, and this is discussed somewhere else.

^ permalink raw reply

* Re: [PATCH] net: bcmgenet: Fix EPHY reset in power up
From: Florian Fainelli @ 2016-09-23 16:54 UTC (permalink / raw)
  To: Jaedon Shin, Andrew Lunn; +Cc: David Miller, Philippe Reynes, netdev
In-Reply-To: <5A71930B-ED19-4690-942D-779F1454C77A@gmail.com>

On 09/23/2016 08:04 AM, Jaedon Shin wrote:
> Hi Andrew,
> 
> On 23 Sep 2016, at 11:06 PM, Andrew Lunn <andrew@lunn.ch> wrote:
>>
>> On Fri, Sep 23, 2016 at 10:20:04PM +0900, Jaedon Shin wrote:
>>> The bcmgenet_mii_reset() is always not running in power up sequence
>>> after 'commit 62469c76007e ("net: ethernet: bcmgenet: use phydev from
>>> struct net_device")'. This'll show extremely high latency and duplicate
>>> packets while interface down and up repeatedly.
>>>
>>> For now, adds again a private phydev for mii reset when runs power up to
>>> open interface.
>>
>> Hi Jaedon
>>
>> How does this fix the issue? It sounds like you are papering over the
>> crack, not truly fixing it.
>>
>>       Andrew
> 
> Yes, It feel like a workaround, but I think it must need v4.8 stable
> version. If we find better way that fixes internal PHY to initialize
> after re-open interface, this patch will be dropped.

I can observe the faulting behavior with 4.8-rc7 that the link below
fixed initially:

# ping fainelli-linux
PING fainelli-linux (10.112.156.244): 56 data bytes
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.352 ms
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.472 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.496 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.517 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.536 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.557 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=752.448 ms (DUP!)
64 bytes from 10.112.156.244: seq=2 ttl=61 time=1.291 ms
64 bytes from 10.112.156.244: seq=2 ttl=61 time=1.421 ms (DUP!)
64 bytes from 10.112.156.244: seq=2 ttl=61 time=1.444 ms (DUP!)
64 bytes from 10.112.156.244: seq=2 ttl=61 time=1.464 ms (DUP!)
64 bytes from 10.112.156.244: seq=2 ttl=61 time=1.483 ms (DUP!)
64 bytes from 10.112.156.244: seq=2 ttl=61 time=1.505 ms (DUP!)
64 bytes from 10.112.156.244: seq=2 ttl=61 time=24.964 ms (DUP!)

If we revert this patch, we indeed get the normal and expected behavior
back:

# ping fainelli-linux
PING fainelli-linux (10.112.156.244): 56 data bytes
64 bytes from 10.112.156.244: seq=0 ttl=61 time=0.417 ms
64 bytes from 10.112.156.244: seq=1 ttl=61 time=0.415 ms
64 bytes from 10.112.156.244: seq=2 ttl=61 time=0.424 ms

Actually, the key thing is this:

- without Philippe's patch we call twice bcmgenet_mii_reset, and that is
intended:
	- first time from bcmgenet_power_up() to make sure the PHY is
initialized *before* we get to initialize the UniMAC, this is critical
	- second time from bcmgenet_mii_probe(), through the normal phy_init_hw()

- with Philippe's patch, we only get to call bcmgenet_mii_reset once, in
bcmgenet_mii_probe() because the first time in bcmgenet_power_up(),
dev->phydev is NULL, because of a prior call to phy_disconnect() in
bcmgenet_close(), unfortunately, there has been MAC activity, so the PHY
gets in a bad state

Jaedon, feel free to use the explanation above, and send a plain revert
of commit 62469c76007e11428e2ee3c6de90cbe74b588d44.

Thanks!

Thanks!
-- 
Florian

^ permalink raw reply

* Re: [PATCH] softirq: let ksoftirqd do its job
From: Jesper Dangaard Brouer @ 2016-09-23 16:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Borkmann, David Miller, eric.dumazet, riel, pabeni, hannes,
	linux-kernel, netdev, corbet, Ingo Molnar
In-Reply-To: <20160923115333.GJ5008@twins.programming.kicks-ass.net>

On Fri, 23 Sep 2016 13:53:33 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Sep 23, 2016 at 01:35:59PM +0200, Daniel Borkmann wrote:
> > On 09/02/2016 08:39 AM, David Miller wrote:  
> > >
> > >I'm just kind of assuming this won't go through my tree, but I can take
> > >it if that's what everyone agrees to.  
> > 
> > Was this actually picked up somewhere in the mean time?  
> 
> I can queue it for tip. In fact, I've just done so to avoid loosing it.
> If anybody else wants it holler.

Good that you are picking this up! It is a very important fix, as least
for networking.

This is your git tree, right:
 https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/

Doesn't look like you pushed it yet, or do I need to look at a specific
branch?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH] fs/select: add vmalloc fallback for select(2)
From: Jason Baron @ 2016-09-23 16:47 UTC (permalink / raw)
  To: Nicholas Piggin, Hillf Danton
  Cc: 'Vlastimil Babka', 'Alexander Viro',
	linux-fsdevel, linux-kernel, linux-mm, 'Michal Hocko',
	netdev, Eric Dumazet
In-Reply-To: <20160923172434.7ad8f2e0@roar.ozlabs.ibm.com>

Hi,

On 09/23/2016 03:24 AM, Nicholas Piggin wrote:
> On Fri, 23 Sep 2016 14:42:53 +0800
> "Hillf Danton" <hillf.zj@alibaba-inc.com> wrote:
>
>>>
>>> The select(2) syscall performs a kmalloc(size, GFP_KERNEL) where size grows
>>> with the number of fds passed. We had a customer report page allocation
>>> failures of order-4 for this allocation. This is a costly order, so it might
>>> easily fail, as the VM expects such allocation to have a lower-order fallback.
>>>
>>> Such trivial fallback is vmalloc(), as the memory doesn't have to be
>>> physically contiguous. Also the allocation is temporary for the duration of the
>>> syscall, so it's unlikely to stress vmalloc too much.
>>>
>>> Note that the poll(2) syscall seems to use a linked list of order-0 pages, so
>>> it doesn't need this kind of fallback.
>
> How about something like this? (untested)
>
> Eric isn't wrong about vmalloc sucking :)
>
> Thanks,
> Nick
>
>
> ---
>   fs/select.c | 57 +++++++++++++++++++++++++++++++++++++++++++--------------
>   1 file changed, 43 insertions(+), 14 deletions(-)
>
> diff --git a/fs/select.c b/fs/select.c
> index 8ed9da5..3b4834c 100644
> --- a/fs/select.c
> +++ b/fs/select.c
> @@ -555,6 +555,7 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
>   	void *bits;
>   	int ret, max_fds;
>   	unsigned int size;
> +	size_t nr_bytes;
>   	struct fdtable *fdt;
>   	/* Allocate small arguments on the stack to save memory and be faster */
>   	long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];
> @@ -576,21 +577,39 @@ int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
>   	 * since we used fdset we need to allocate memory in units of
>   	 * long-words.
>   	 */
> -	size = FDS_BYTES(n);
> +	ret = -ENOMEM;
>   	bits = stack_fds;
> -	if (size > sizeof(stack_fds) / 6) {
> -		/* Not enough space in on-stack array; must use kmalloc */
> +	size = FDS_BYTES(n);
> +	nr_bytes = 6 * size;
> +
> +	if (unlikely(nr_bytes > PAGE_SIZE)) {
> +		/* Avoid multi-page allocation if possible */
>   		ret = -ENOMEM;
> -		bits = kmalloc(6 * size, GFP_KERNEL);
> -		if (!bits)
> -			goto out_nofds;
> +		fds.in = kmalloc(size, GFP_KERNEL);
> +		fds.out = kmalloc(size, GFP_KERNEL);
> +		fds.ex = kmalloc(size, GFP_KERNEL);
> +		fds.res_in = kmalloc(size, GFP_KERNEL);
> +		fds.res_out = kmalloc(size, GFP_KERNEL);
> +		fds.res_ex = kmalloc(size, GFP_KERNEL);
> +
> +		if (!(fds.in && fds.out && fds.ex &&
> +				fds.res_in && fds.res_out && fds.res_ex))
> +			goto out;
> +	} else {
> +		if (nr_bytes > sizeof(stack_fds)) {
> +			/* Not enough space in on-stack array */
> +			if (nr_bytes > PAGE_SIZE * 2)

The 'if' looks extraneous?

Also, I wonder if we can just avoid some allocations altogether by 
checking by if the user fd_set pointers are NULL? That can avoid failures :)

Thanks,

-Jason

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Alignment issues with freescale FEC driver
From: Eric Nelson @ 2016-09-23 16:43 UTC (permalink / raw)
  To: linux-arm-kernel@lists.infradead.org, netdev@vger.kernel.org,
	rmk+kernel, edumazet, Fugang Duan, Troy Kisky
  Cc: Otavio Salvador, Simone

Hello all,

We're seeing alignment issues from the ethernet stack on an i.MX6UL board:

root@mx6ul:~# cat /proc/cpu/alignment
User: 0
System: 470010 (inet_gro_receive+0x104/0x278)

This seems to be related to the ip header alignment, and there
was much discussion in mailing list threads [1] and [2].

In particular, Russell referred to a patch here, but I haven't been
able to find it:
https://lists.linaro.org/pipermail/linaro-toolchain/2012-October/002844.html

Eric Dumazet also suggested a path toward fixing it, but I don't quite
understand the suggestion:
http://www.spinics.net/lists/netdev/msg213166.htm

The immediate problem is addressed by just reading the id and frag_offs
fields in the iphdr structure as shown in this patch:

commit 98810abc911b1286a7e4a2ebdfbad66f12fae19d
Author: Eric Nelson <eric@nelint.com>
Date: Fri Sep 23 08:26:03 2016 -0700

net: ipv4: af_inet: don't read multiple 16-bit iphdr fields as a 32-bit
value

Change-Id: Idc7122c22c13ca078be31907d30ab1c3148ba807
Signed-off-by: Eric Nelson <eric@nelint.com>

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 0cc98b1..c17ef6e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1301,6 +1301,7 @@ static struct sk_buff **inet_gro_receive(struct
sk_buff **head,
unsigned int hlen;
unsigned int off;
unsigned int id;
+ unsigned int frag;
int flush = 1;
int proto;

@@ -1326,9 +1327,9 @@ static struct sk_buff **inet_gro_receive(struct
sk_buff **head,
if (unlikely(ip_fast_csum((u8 *)iph, 5)))
goto out_unlock;

- id = ntohl(*(__be32 *)&iph->id);
- flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id & ~IP_DF));
- id >>= 16;
+ id = ntohs(*(__be16 *)&iph->id);
+ frag = ntohs(*(__be16 *)&iph->frag_off);
+ flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (frag &
~IP_DF));

for (p = *head; p; p = p->next) {
struct iphdr *iph2;


The reading of both fields in one "ntohl" seems obfuscated at best and
certainly worthy of a comment about the optimization but I understand
from other notes that the fundamental problem is that the IP header should
be aligned on a 4-byte boundary and that's not possible without a memcpy.

I'd like to hear suggestions about how we can address this.

Regards,


Eric

[1] - http://www.spinics.net/lists/netdev/msg213114.html
[2] -
https://lists.linaro.org/pipermail/linaro-toolchain/2012-October/002828.html

^ permalink raw reply related

* Re: device-tree support for writing to phy registers?
From: Florian Fainelli @ 2016-09-23 16:29 UTC (permalink / raw)
  To: Tim Harvey, netdev
In-Reply-To: <CAJ+vNU0sEv_5B-2w-25HtkP1mubbHOYi278prqsDar23Sz80pw@mail.gmail.com>

On 09/23/2016 08:40 AM, Tim Harvey wrote:
> Greetings,
> 
> I've got a TI DP83867 GbE phy that requires some register writes to
> configure its refclock output. Is there a generic device-tree API for
> writing to raw registers or is that something that would be need to be
> added to a specific phy driver with a device-tree binding?

There are no standard properties that indicate how to write to register
from Device Tree (unfortunately there are non standard that allow this
to happen, e.g: marvell,reg-init), because that would mean that Device
Tree acts as some kind of firmware/binary interface, which is a bit of
stretch. Some bindings may indicate how to write to registers in a way
that accepts a address = value pair, but quite frankly, this is
absolutely horrible and not controllable nor easily transferable from
one model of device to the other, strongly discouraged.

> There is a
> DP83867 phy driver but it doesn't contain anything related to
> configuring its CLKOUT via register 0x170.

Then, I guess you should add a set of properties and corresponding code
reading these properties that would result in getting the register
programmed with the values you need.

> 
> Alternatively, is it generally considered 'ok' to take care of this in
> the bootloader and not provide the MAC driver the gpio for phy-reset
> so that bootloader configuration persists through the kernel?

It depends on what your platform does, punting on the bootloader is
usually fine, but also breaks nicely when you start implementing power
management in the kernel properly (e.g: deep sleep states) and you are
not calling back into the bootloader, yet your hardware lost its state
between power transitions.

-- 
Florian

^ permalink raw reply

* [PATCH] netns: move {inc,dec}_net_namespaces into #ifdef
From: Arnd Bergmann @ 2016-09-23 16:06 UTC (permalink / raw)
  To: David S. Miller
  Cc: Arnd Bergmann, Eric W. Biederman, WANG Cong, netdev, linux-kernel

With the newly enforced limit on the number of namespaces,
we get a build warning if CONFIG_NETNS is disabled:

net/core/net_namespace.c:273:13: error: 'dec_net_namespaces' defined but not used [-Werror=unused-function]
net/core/net_namespace.c:268:24: error: 'inc_net_namespaces' defined but not used [-Werror=unused-function]

This moves the two added functions inside the #ifdef that guards
their callers.

Fixes: 703286608a22 ("netns: Add a limit on the number of net namespaces")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 net/core/net_namespace.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index d0eb13d3226b..989434f36f96 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -265,16 +265,6 @@ struct net *get_net_ns_by_id(struct net *net, int id)
 	return peer;
 }
 
-static struct ucounts *inc_net_namespaces(struct user_namespace *ns)
-{
-	return inc_ucount(ns, current_euid(), UCOUNT_NET_NAMESPACES);
-}
-
-static void dec_net_namespaces(struct ucounts *ucounts)
-{
-	dec_ucount(ucounts, UCOUNT_NET_NAMESPACES);
-}
-
 /*
  * setup_net runs the initializers for the network namespace object.
  */
@@ -319,6 +309,16 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 
 
 #ifdef CONFIG_NET_NS
+static struct ucounts *inc_net_namespaces(struct user_namespace *ns)
+{
+	return inc_ucount(ns, current_euid(), UCOUNT_NET_NAMESPACES);
+}
+
+static void dec_net_namespaces(struct ucounts *ucounts)
+{
+	dec_ucount(ucounts, UCOUNT_NET_NAMESPACES);
+}
+
 static struct kmem_cache *net_cachep;
 static struct workqueue_struct *netns_wq;
 
-- 
2.9.0

^ permalink raw reply related

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
From: Shmulik Ladkani @ 2016-09-23 15:40 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: David S. Miller, WANG Cong, Eric Dumazet, netdev
In-Reply-To: <0037729a-a3fc-c1c9-a620-905c73e0b9d4@mojatatu.com>

On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > Even today, one may create loops using existing 'egress redirect',
> > e.g. this rediculously errorneous construct:
> >
> >  # ip l add v0 type veth peer name v0p
> >  # tc filter add dev v0p parent ffff: basic \
> >     action mirred egress redirect dev v0
> 
> I think we actually recover from this one by eventually
> dropping (theres a ttl field).

[off topic]

Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%,
and after one second I got:

# ip -s l show type veth
16: v0p@v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    71660305923 469890864 0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    3509       24       0       0       0       0       
17: v0@v0p: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    3509       24       0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    71660713017 469893555 0       0       0       0

> The other question is what to set skb->dev and skb->iif?
> Some information will be lost if you move around netdevs a
> bit.

[back to topic]

Good point.

Similarly to all constructs injecting skbs to device rx (bond/team,
vlan, macvlan, tunnels, ifb, __dev_forward_skb callers, etc..), we are
obligated to assign 'skb2->dev' as the new rx device.

Regarding 'skb2->skb_iif', original act_mirred code already has:

 	skb2->skb_iif = skb->dev->ifindex;   <--- THIS IS ORIG DEV IIF
 	skb2->dev = dev;                     <--- THIS IS TARGET DEV
	err = dev_queue_xmit(skb2);

I'm preserving this; OTOH the suggested modification in the patch is

-	err = dev_queue_xmit(skb2);
+	if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS)
+		err = dev_queue_xmit(skb2);
+	else
+		netif_receive_skb(skb2);

now, the call to 'netif_receive_skb' will eventually override skb_iif to
the target RX dev's index, upon entry to __netif_receive_skb_core.

I think this IS the expected behavior - as done by other "rx injection"
constructs.

My doubts were around whether we should call 'dev_forward_skb' instead
of 'netif_receive_skb'.
The former does some things I assumed we're not interested of, like
testing 'is_skb_forwardable' and re-running 'eth_type_trans'.
OTOH, it DOES scrub the skb.
Maybe we should scrub it as well prior the netif_receive_skb call?

Thanks,
Shmulik

^ permalink raw reply

* device-tree support for writing to phy registers?
From: Tim Harvey @ 2016-09-23 15:40 UTC (permalink / raw)
  To: netdev

Greetings,

I've got a TI DP83867 GbE phy that requires some register writes to
configure its refclock output. Is there a generic device-tree API for
writing to raw registers or is that something that would be need to be
added to a specific phy driver with a device-tree binding? There is a
DP83867 phy driver but it doesn't contain anything related to
configuring its CLKOUT via register 0x170.

Alternatively, is it generally considered 'ok' to take care of this in
the bootloader and not provide the MAC driver the gpio for phy-reset
so that bootloader configuration persists through the kernel?

Regards,

Tim

^ permalink raw reply

* Re: [RFC] net: store port/representative id in metadata_dst
From: Jakub Kicinski @ 2016-09-23 15:29 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jiri Benc, Jiri Pirko, netdev, Thomas Graf, Roopa Prabhu,
	ogerlitz, sridhar.samudrala, ast, daniel, simon.horman,
	Paolo Abeni, Pravin B Shelar, hannes, kubakici
In-Reply-To: <57E53ADE.10806@gmail.com>

On Fri, 23 Sep 2016 07:23:26 -0700, John Fastabend wrote:
> On 16-09-23 05:55 AM, Jakub Kicinski wrote:
> > On Fri, 23 Sep 2016 11:06:09 +0200, Jiri Benc wrote:  
> >> On Fri, 23 Sep 2016 08:34:29 +0200, Jiri Pirko wrote:  
> >>> So if I understand that correctly, this would need some "shared netdev"
> >>> which would effectively serve only as a sink for all port netdevices to
> >>> tx packets to. On RX, this would be completely avoided. This lower
> >>> device looks like half zombie to me.    
> >>
> >> Looks more like a quarter zombie. Even tx would not be allowed unless
> >> going through one of the ports, as all skbs without
> >> METADATA_HW_PORT_MUX metadata_dst would be dropped. But it would be
> >> possible to attach qdisc to the "lower" netdevice and it would actually
> >> have an effect. On rx this netdevice would be ignored completely. This
> >> is very weird behavior.
> >>  
> >>> I don't like it :( I wonder if the
> >>> solution would not be possible without this lower netdev.    
> >>
> >> I agree. This approach doesn't sound correct. The skbs should not be
> >> requeued.  
> > 
> > Thanks for the responses!  
> 
> Nice timing we were just thinking about this.
> 
> > 
> > I think SR-IOV NICs are coming at this problem from a different angle,
> > we already have a big, feature-full per-port netdevs and now we want to
> > spawn representators for VFs to handle fallback traffic.  This patch
> > would help us mux VFR traffic on all the queues of the physical port
> > netdevs (the ones which were already present in legacy mode, that's the
> > lower device).  
> 
> Yep, I like the idea in general. I had a slightly different approach in
> mind though. If you look at __dev_queue_xmit() there is a void
> accel_priv pointer (gather you found this based on your commit note).
> My take was we could extend this a bit so it can be used by the VFR
> devices and they could do a dev_queue_xmit_accel(). In this way there is
> no need to touch /net/core/{filter, dst, ip_tunnel}.c etc. Maybe the
> accel logic needs to be extended to push the priv pointer all the way
> through the xmit routine of the target netdev though. This should look
> a lot like the macvlan accelerated xmit device path without the
> switching logic.
> 
> Of course maybe the name would be extended to dev_queue_xmit_extended()
> or something.
> 
> So the flow on ingress would be,
> 
>   1. pkt_received_by_PF_netdev
>   2. PF_netdev reads some tag off packet/descriptor and sets correct
>      skb->dev field. This is needed so stack "sees" packets from
>      correct VF ports.
>   3. packet passed up to stack.
> 
> I guess it is a bit "zombie" like on the receive path because the packet
> is never actually handled by VF netdev code per se and on egress can
> traverse both the VFR and PF netdevs qdiscs. But on the other hand the
> VFR netdevs and PF netdevs are all in the same driver. Plus using a
> queue per VFR is a bit of a waste as its not needed and also hardware
> may not have any mechanism to push VF traffic onto a rx queue.
> 
> On egress,
> 
>   1. VFR xmit is called
>   2. VFR xmit calls dev_queue_xmit_accel() with some meta-data if needed
>      for the lower netdev
>   3. lower netdev sends out the packet.
> 
> Again we don't need to waste any queues for each VFR and the VFR can be
> a LLTX device. In this scheme I think you avoid much of the changes in
> your patch and keep it all contained in the driver. Any thoughts?

Goes without saying that you have a much better understanding of packet
scheduling so please bear with me :)  My target model is that I have
n_cpus x "n_tc/prio" queues on the PF and I want to transmit the
fallback traffic over those same queues.  So no new HW queues are used
for VFRs at all.  This is a reverse of macvlan offload which AFAICT has
"bastard hw queues" which actually TX for a separate software device.

My understanding was that I can rework this model to have software
queues for VFRs (#sw queues == #PF queues + #VFRs) but no extra HW
queues (#hw queues == #PF queues) but then when the driver sees a
packet on sw-only VFR queue it has to pick one of the PF queues (which
one?), lock PF software queue to own it, and only then can it
transmit.  With the dst_metadata there is no need for extra locking or
queue selection.

> To address 'I wonder if the solution can be done without this lower
> netdev' I think it can be but it creates two issues which I'm not sure
> have a good solution.
> 
> Without a lowerdev we either (a) give each VFR its own queue which I
> don't like because it complicates mgmt and uses resources or (b) we
> implicitly share queues. The later could be fine it just looks a bit
> cleaner IMO to make it explicit.
> 
> With regard to VF-PF flow rules if we allow matching on ingress port
> then can all your flow rules be pushed through the PF netdevices or
> if you want any of the VFR netdevs? After all I expsect the flow rule
> table is actually a shared resource between all attached ports.

With the VF-PF forwarding rules I was just inching towards re-opening
the discussion on whether there should be an CPU port netdev.  I guess
there are good reasons why there isn't so maybe let's not go there :)
The meaning of PF netdevs in SR-IOV switchdev mode is "external ports"
AFAICT which could make it cumbersome to reach the host.

^ permalink raw reply

* Re: [PATCH V3 1/3] Documentation: devicetree: add qca8k binding
From: Rob Herring @ 2016-09-23 15:23 UTC (permalink / raw)
  To: John Crispin
  Cc: David S. Miller, Andrew Lunn, Florian Fainelli,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, qsdk-review-A+ZNKFmMK5xy9aJCnZT0Uw,
	devicetree-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1473949601-20674-2-git-send-email-john-Pj+rj9U5foFAfugRpC6u6w@public.gmane.org>

On Thu, Sep 15, 2016 at 04:26:39PM +0200, John Crispin wrote:
> Add device-tree binding for ar8xxx switch families.
> 
> Cc: devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Signed-off-by: John Crispin <john-Pj+rj9U5foFAfugRpC6u6w@public.gmane.org>
> ---
> Changes in V2
> * fixup example to include phy nodes and corresponding phandles
> * add a note explaining why we need to phy nodes
> 
> Changes in V3
> * add note stating that the cpu port is always 0
> 
>  .../devicetree/bindings/net/dsa/qca8k.txt          |   89 ++++++++++++++++++++
>  1 file changed, 89 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/dsa/qca8k.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.txt b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
> new file mode 100644
> index 0000000..9c67ee4
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
> @@ -0,0 +1,89 @@
> +* Qualcomm Atheros QCA8xxx switch family
> +
> +Required properties:
> +
> +- compatible: should be "qca,qca8337"
> +- #size-cells: must be 0
> +- #address-cells: must be 1
> +
> +Subnodes:
> +
> +The integrated switch subnode should be specified according to the binding
> +described in dsa/dsa.txt. As the QCA8K switches do not have a N:N mapping of
> +port and PHY id, each subnode describing a port needs to have a valid phandle
> +referencing the internal PHY connected to it. The CPU port of this switch is
> +always port 0.
> +
> +Example:
> +
> +
> +	&mdio0 {
> +		phy_port1: phy@0 {
> +			reg = <0>;
> +		};
> +
> +		phy_port2: phy@1 {
> +			reg = <1>;
> +		};
> +
> +		phy_port3: phy@2 {
> +			reg = <2>;
> +		};
> +
> +		phy_port4: phy@3 {
> +			reg = <3>;
> +		};
> +
> +		phy_port5: phy@4 {
> +			reg = <4>;
> +		};
> +
> +		switch0@0 {

The unit address here is the mdio device address and should be unique. 
You have 2 devices at 0.

> +			compatible = "qca,qca8337";
> +			#address-cells = <1>;
> +			#size-cells = <0>;
> +
> +			reg = <0>;

Not documented.

> +
> +			ports {
> +				#address-cells = <1>;
> +				#size-cells = <0>;
> +				port@0 {
> +					reg = <0>;
> +					label = "cpu";
> +					ethernet = <&gmac1>;
> +					phy-mode = "rgmii";
> +				};
> +
> +				port@1 {
> +					reg = <1>;
> +					label = "lan1";
> +					phy-handle = <&phy_port1>;
> +				};
> +
> +				port@2 {
> +					reg = <2>;
> +					label = "lan2";
> +					phy-handle = <&phy_port2>;
> +				};
> +
> +				port@3 {
> +					reg = <3>;
> +					label = "lan3";
> +					phy-handle = <&phy_port3>;
> +				};
> +
> +				port@4 {
> +					reg = <4>;
> +					label = "lan4";
> +					phy-handle = <&phy_port4>;
> +				};
> +
> +				port@5 {
> +					reg = <5>;
> +					label = "wan";
> +					phy-handle = <&phy_port5>;
> +				};
> +			};
> +		};
> +	};
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next 15/15] rxrpc: Add a tracepoint to log which packets will be retransmitted
From: David Howells @ 2016-09-23 15:17 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147464371753.5090.1634919599283321856.stgit@warthog.procyon.org.uk>

Add a tracepoint to log in rxrpc_resend() which packets will be
retransmitted.  Note that if a positive ACK comes in whilst we have dropped
the lock to retransmit another packet, the actual retransmission may not
happen, though some of the effects will (such as altering the congestion
management).

Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/trace/events/rxrpc.h |   27 +++++++++++++++++++++++++++
 net/rxrpc/call_event.c       |    2 ++
 2 files changed, 29 insertions(+)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index d67a8c6b085a..56475497043d 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -543,6 +543,33 @@ TRACE_EVENT(rxrpc_propose_ack,
 		      rxrpc_propose_ack_outcomes[__entry->outcome])
 	    );
 
+TRACE_EVENT(rxrpc_retransmit,
+	    TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t seq, u8 annotation,
+		     s64 expiry),
+
+	    TP_ARGS(call, seq, annotation, expiry),
+
+	    TP_STRUCT__entry(
+		    __field(struct rxrpc_call *,	call		)
+		    __field(rxrpc_seq_t,		seq		)
+		    __field(u8,				annotation	)
+		    __field(s64,			expiry		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->call = call;
+		    __entry->seq = seq;
+		    __entry->annotation = annotation;
+		    __entry->expiry = expiry;
+			   ),
+
+	    TP_printk("c=%p q=%x a=%02x xp=%lld",
+		      __entry->call,
+		      __entry->seq,
+		      __entry->annotation,
+		      __entry->expiry)
+	    );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index fd5b11339ffb..a78a92fe5d77 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -196,6 +196,8 @@ static void rxrpc_resend(struct rxrpc_call *call)
 
 		/* Okay, we need to retransmit a packet. */
 		call->rxtx_annotations[ix] = RXRPC_TX_ANNO_RETRANS | annotation;
+		trace_rxrpc_retransmit(call, seq, annotation | anno_type,
+				       ktime_to_ns(ktime_sub(skb->tstamp, max_age)));
 	}
 
 	resend_at = ktime_sub(ktime_add_ms(oldest, rxrpc_resend_timeout), now);

^ permalink raw reply related

* [PATCH net-next 14/15] rxrpc: Add tracepoint for ACK proposal
From: David Howells @ 2016-09-23 15:16 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147464371753.5090.1634919599283321856.stgit@warthog.procyon.org.uk>

Add a tracepoint to log proposed ACKs, including whether the proposal is
used to update a pending ACK or is discarded in favour of an easlier,
higher priority ACK.

Whilst we're at it, get rid of the rxrpc_acks() function and access the
name array directly.  We do, however, need to validate the ACK reason
number given to trace_rxrpc_rx_ack() to make sure we don't overrun the
array.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/rxrpc/packet.h       |    1 +
 include/trace/events/rxrpc.h |   42 ++++++++++++++++++++++++++++++++++++++++--
 net/rxrpc/ar-internal.h      |   25 +++++++++++++++++++++++--
 net/rxrpc/call_event.c       |   21 ++++++++++++++-------
 net/rxrpc/input.c            |   19 +++++++++++++------
 net/rxrpc/misc.c             |   30 +++++++++++++++++++-----------
 net/rxrpc/output.c           |    3 ++-
 net/rxrpc/recvmsg.c          |    3 ++-
 8 files changed, 114 insertions(+), 30 deletions(-)

diff --git a/include/rxrpc/packet.h b/include/rxrpc/packet.h
index fd6eb3a60a8c..703a64b4681a 100644
--- a/include/rxrpc/packet.h
+++ b/include/rxrpc/packet.h
@@ -123,6 +123,7 @@ struct rxrpc_ackpacket {
 #define RXRPC_ACK_PING_RESPONSE		7	/* response to RXRPC_ACK_PING */
 #define RXRPC_ACK_DELAY			8	/* nothing happened since received packet */
 #define RXRPC_ACK_IDLE			9	/* ACK due to fully received ACK window */
+#define RXRPC_ACK__INVALID		10	/* Representation of invalid ACK reason */
 
 	uint8_t		nAcks;		/* number of ACKs */
 #define RXRPC_MAXACKS	255
diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 9413b17ba04b..d67a8c6b085a 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -251,7 +251,7 @@ TRACE_EVENT(rxrpc_rx_ack,
 
 	    TP_printk("c=%p %s f=%08x n=%u",
 		      __entry->call,
-		      rxrpc_acks(__entry->reason),
+		      rxrpc_ack_names[__entry->reason],
 		      __entry->first,
 		      __entry->n_acks)
 	    );
@@ -314,7 +314,7 @@ TRACE_EVENT(rxrpc_tx_ack,
 	    TP_printk(" c=%p ACK  %08x %s f=%08x r=%08x n=%u",
 		      __entry->call,
 		      __entry->serial,
-		      rxrpc_acks(__entry->reason),
+		      rxrpc_ack_names[__entry->reason],
 		      __entry->ack_first,
 		      __entry->ack_serial,
 		      __entry->n_acks)
@@ -505,6 +505,44 @@ TRACE_EVENT(rxrpc_rx_lose,
 		      __entry->hdr.type <= 15 ? rxrpc_pkts[__entry->hdr.type] : "?UNK")
 	    );
 
+TRACE_EVENT(rxrpc_propose_ack,
+	    TP_PROTO(struct rxrpc_call *call, enum rxrpc_propose_ack_trace why,
+		     u8 ack_reason, rxrpc_serial_t serial, bool immediate,
+		     bool background, enum rxrpc_propose_ack_outcome outcome),
+
+	    TP_ARGS(call, why, ack_reason, serial, immediate, background,
+		    outcome),
+
+	    TP_STRUCT__entry(
+		    __field(struct rxrpc_call *,		call		)
+		    __field(enum rxrpc_propose_ack_trace,	why		)
+		    __field(rxrpc_serial_t,			serial		)
+		    __field(u8,					ack_reason	)
+		    __field(bool,				immediate	)
+		    __field(bool,				background	)
+		    __field(enum rxrpc_propose_ack_outcome,	outcome		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->call	= call;
+		    __entry->why	= why;
+		    __entry->serial	= serial;
+		    __entry->ack_reason	= ack_reason;
+		    __entry->immediate	= immediate;
+		    __entry->background	= background;
+		    __entry->outcome	= outcome;
+			   ),
+
+	    TP_printk("c=%p %s %s r=%08x i=%u b=%u%s",
+		      __entry->call,
+		      rxrpc_propose_ack_traces[__entry->why],
+		      rxrpc_ack_names[__entry->ack_reason],
+		      __entry->serial,
+		      __entry->immediate,
+		      __entry->background,
+		      rxrpc_propose_ack_outcomes[__entry->outcome])
+	    );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index e564eca75985..042dbcc52654 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -689,8 +689,28 @@ enum rxrpc_timer_trace {
 
 extern const char rxrpc_timer_traces[rxrpc_timer__nr_trace][8];
 
+enum rxrpc_propose_ack_trace {
+	rxrpc_propose_ack_input_data,
+	rxrpc_propose_ack_ping_for_params,
+	rxrpc_propose_ack_respond_to_ack,
+	rxrpc_propose_ack_respond_to_ping,
+	rxrpc_propose_ack_retry_tx,
+	rxrpc_propose_ack_terminal_ack,
+	rxrpc_propose_ack__nr_trace
+};
+
+enum rxrpc_propose_ack_outcome {
+	rxrpc_propose_ack_use,
+	rxrpc_propose_ack_update,
+	rxrpc_propose_ack_subsume,
+	rxrpc_propose_ack__nr_outcomes
+};
+
+extern const char rxrpc_propose_ack_traces[rxrpc_propose_ack__nr_trace][8];
+extern const char *const rxrpc_propose_ack_outcomes[rxrpc_propose_ack__nr_outcomes];
+
 extern const char *const rxrpc_pkts[];
-extern const char *rxrpc_acks(u8 reason);
+extern const char const rxrpc_ack_names[RXRPC_ACK__INVALID + 1][4];
 
 #include <trace/events/rxrpc.h>
 
@@ -719,7 +739,8 @@ int rxrpc_reject_call(struct rxrpc_sock *);
  * call_event.c
  */
 void rxrpc_set_timer(struct rxrpc_call *, enum rxrpc_timer_trace);
-void rxrpc_propose_ACK(struct rxrpc_call *, u8, u16, u32, bool, bool);
+void rxrpc_propose_ACK(struct rxrpc_call *, u8, u16, u32, bool, bool,
+		       enum rxrpc_propose_ack_trace);
 void rxrpc_process_call(struct work_struct *);
 
 /*
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 90e970ba048a..fd5b11339ffb 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -58,14 +58,13 @@ out:
  */
 static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason,
 				u16 skew, u32 serial, bool immediate,
-				bool background)
+				bool background,
+				enum rxrpc_propose_ack_trace why)
 {
+	enum rxrpc_propose_ack_outcome outcome = rxrpc_propose_ack_use;
 	unsigned long now, ack_at, expiry = rxrpc_soft_ack_delay;
 	s8 prior = rxrpc_ack_priority[ack_reason];
 
-	_enter("{%d},%s,%%%x,%u",
-	       call->debug_id, rxrpc_acks(ack_reason), serial, immediate);
-
 	/* Update DELAY, IDLE, REQUESTED and PING_RESPONSE ACK serial
 	 * numbers, but we don't alter the timeout.
 	 */
@@ -74,15 +73,18 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason,
 	       call->ackr_reason, rxrpc_ack_priority[call->ackr_reason]);
 	if (ack_reason == call->ackr_reason) {
 		if (RXRPC_ACK_UPDATEABLE & (1 << ack_reason)) {
+			outcome = rxrpc_propose_ack_update;
 			call->ackr_serial = serial;
 			call->ackr_skew = skew;
 		}
 		if (!immediate)
-			return;
+			goto trace;
 	} else if (prior > rxrpc_ack_priority[call->ackr_reason]) {
 		call->ackr_reason = ack_reason;
 		call->ackr_serial = serial;
 		call->ackr_skew = skew;
+	} else {
+		outcome = rxrpc_propose_ack_subsume;
 	}
 
 	switch (ack_reason) {
@@ -124,17 +126,22 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason,
 			rxrpc_set_timer(call, rxrpc_timer_set_for_ack);
 		}
 	}
+
+trace:
+	trace_rxrpc_propose_ack(call, why, ack_reason, serial, immediate,
+				background, outcome);
 }
 
 /*
  * propose an ACK be sent, locking the call structure
  */
 void rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason,
-		       u16 skew, u32 serial, bool immediate, bool background)
+		       u16 skew, u32 serial, bool immediate, bool background,
+		       enum rxrpc_propose_ack_trace why)
 {
 	spin_lock_bh(&call->lock);
 	__rxrpc_propose_ACK(call, ack_reason, skew, serial,
-			    immediate, background);
+			    immediate, background, why);
 	spin_unlock_bh(&call->lock);
 }
 
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 19b1e189f5dc..349698d87ad1 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -49,7 +49,8 @@ static void rxrpc_send_ping(struct rxrpc_call *call, struct sk_buff *skb,
 	if (call->peer->rtt_usage < 3 ||
 	    ktime_before(ktime_add_ms(call->peer->rtt_last_req, 1000), now))
 		rxrpc_propose_ACK(call, RXRPC_ACK_PING, skew, sp->hdr.serial,
-				  true, true);
+				  true, true,
+				  rxrpc_propose_ack_ping_for_params);
 }
 
 /*
@@ -382,7 +383,8 @@ skip:
 ack:
 	if (ack)
 		rxrpc_propose_ACK(call, ack, skew, ack_serial,
-				  immediate_ack, true);
+				  immediate_ack, true,
+				  rxrpc_propose_ack_input_data);
 
 	if (sp->hdr.seq == READ_ONCE(call->rx_hard_ack) + 1)
 		rxrpc_notify_socket(call);
@@ -539,6 +541,7 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, u8 *acks,
 static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb,
 			    u16 skew)
 {
+	u8 ack_reason;
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
 	union {
 		struct rxrpc_ackpacket ack;
@@ -561,8 +564,10 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb,
 	first_soft_ack = ntohl(buf.ack.firstPacket);
 	hard_ack = first_soft_ack - 1;
 	nr_acks = buf.ack.nAcks;
+	ack_reason = (buf.ack.reason < RXRPC_ACK__INVALID ?
+		      buf.ack.reason : RXRPC_ACK__INVALID);
 
-	trace_rxrpc_rx_ack(call, first_soft_ack, buf.ack.reason, nr_acks);
+	trace_rxrpc_rx_ack(call, first_soft_ack, ack_reason, nr_acks);
 
 	_proto("Rx ACK %%%u { m=%hu f=#%u p=#%u s=%%%u r=%s n=%u }",
 	       sp->hdr.serial,
@@ -570,7 +575,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb,
 	       first_soft_ack,
 	       ntohl(buf.ack.previousPacket),
 	       acked_serial,
-	       rxrpc_acks(buf.ack.reason),
+	       rxrpc_ack_names[ack_reason],
 	       buf.ack.nAcks);
 
 	if (buf.ack.reason == RXRPC_ACK_PING_RESPONSE)
@@ -583,10 +588,12 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb,
 	if (buf.ack.reason == RXRPC_ACK_PING) {
 		_proto("Rx ACK %%%u PING Request", sp->hdr.serial);
 		rxrpc_propose_ACK(call, RXRPC_ACK_PING_RESPONSE,
-				  skew, sp->hdr.serial, true, true);
+				  skew, sp->hdr.serial, true, true,
+				  rxrpc_propose_ack_respond_to_ping);
 	} else if (sp->hdr.flags & RXRPC_REQUEST_ACK) {
 		rxrpc_propose_ACK(call, RXRPC_ACK_REQUESTED,
-				  skew, sp->hdr.serial, true, true);
+				  skew, sp->hdr.serial, true, true,
+				  rxrpc_propose_ack_respond_to_ack);
 	}
 
 	offset = sp->offset + nr_acks + 3;
diff --git a/net/rxrpc/misc.c b/net/rxrpc/misc.c
index fa9942fabdf2..1ca14835d87f 100644
--- a/net/rxrpc/misc.c
+++ b/net/rxrpc/misc.c
@@ -91,17 +91,10 @@ const s8 rxrpc_ack_priority[] = {
 	[RXRPC_ACK_PING]		= 9,
 };
 
-const char *rxrpc_acks(u8 reason)
-{
-	static const char *const str[] = {
-		"---", "REQ", "DUP", "OOS", "WIN", "MEM", "PNG", "PNR", "DLY",
-		"IDL", "-?-"
-	};
-
-	if (reason >= ARRAY_SIZE(str))
-		reason = ARRAY_SIZE(str) - 1;
-	return str[reason];
-}
+const char const rxrpc_ack_names[RXRPC_ACK__INVALID + 1][4] = {
+	"---", "REQ", "DUP", "OOS", "WIN", "MEM", "PNG", "PNR", "DLY",
+	"IDL", "-?-"
+};
 
 const char rxrpc_skb_traces[rxrpc_skb__nr_trace][7] = {
 	[rxrpc_skb_rx_cleaned]		= "Rx CLN",
@@ -202,3 +195,18 @@ const char rxrpc_timer_traces[rxrpc_timer__nr_trace][8] = {
 	[rxrpc_timer_set_for_send]		= "SetTx ",
 	[rxrpc_timer_set_for_resend]		= "SetRTx",
 };
+
+const char rxrpc_propose_ack_traces[rxrpc_propose_ack__nr_trace][8] = {
+	[rxrpc_propose_ack_input_data]		= "DataIn ",
+	[rxrpc_propose_ack_ping_for_params]	= "Params ",
+	[rxrpc_propose_ack_respond_to_ack]	= "Rsp2Ack",
+	[rxrpc_propose_ack_respond_to_ping]	= "Rsp2Png",
+	[rxrpc_propose_ack_retry_tx]		= "RetryTx",
+	[rxrpc_propose_ack_terminal_ack]	= "ClTerm ",
+};
+
+const char *const rxrpc_propose_ack_outcomes[rxrpc_propose_ack__nr_outcomes] = {
+	[rxrpc_propose_ack_use]			= "",
+	[rxrpc_propose_ack_update]		= " Update",
+	[rxrpc_propose_ack_subsume]		= " Subsume",
+};
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index e47fbd1c836d..0c563e325c9d 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -210,7 +210,8 @@ int rxrpc_send_call_packet(struct rxrpc_call *call, u8 type)
 			rxrpc_propose_ACK(call, pkt->ack.reason,
 					  ntohs(pkt->ack.maxSkew),
 					  ntohl(pkt->ack.serial),
-					  true, true);
+					  true, true,
+					  rxrpc_propose_ack_retry_tx);
 			break;
 		case RXRPC_PACKET_TYPE_ABORT:
 			break;
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 99e4c0ae30f1..8c7f3de45bac 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -141,7 +141,8 @@ static void rxrpc_end_rx_phase(struct rxrpc_call *call)
 	ASSERTCMP(call->rx_hard_ack, ==, call->rx_top);
 
 	if (call->state == RXRPC_CALL_CLIENT_RECV_REPLY) {
-		rxrpc_propose_ACK(call, RXRPC_ACK_IDLE, 0, 0, true, false);
+		rxrpc_propose_ACK(call, RXRPC_ACK_IDLE, 0, 0, true, false,
+				  rxrpc_propose_ack_terminal_ack);
 		rxrpc_send_call_packet(call, RXRPC_PACKET_TYPE_ACK);
 	}
 

^ permalink raw reply related

* [PATCH net-next 13/15] rxrpc: Add a tracepoint to log injected Rx packet loss
From: David Howells @ 2016-09-23 15:16 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147464371753.5090.1634919599283321856.stgit@warthog.procyon.org.uk>

Add a tracepoint to log received packets that get discarded due to Rx
packet loss.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/trace/events/rxrpc.h |   21 +++++++++++++++++++++
 net/rxrpc/input.c            |   11 +++++------
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 6001bf93dc79..9413b17ba04b 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -484,6 +484,27 @@ TRACE_EVENT(rxrpc_timer,
 		      __entry->timer - __entry->now)
 	    );
 
+TRACE_EVENT(rxrpc_rx_lose,
+	    TP_PROTO(struct rxrpc_skb_priv *sp),
+
+	    TP_ARGS(sp),
+
+	    TP_STRUCT__entry(
+		    __field_struct(struct rxrpc_host_header,	hdr		)
+			     ),
+
+	    TP_fast_assign(
+		    memcpy(&__entry->hdr, &sp->hdr, sizeof(__entry->hdr));
+			   ),
+
+	    TP_printk("%08x:%08x:%08x:%04x %08x %08x %02x %02x %s *LOSE*",
+		      __entry->hdr.epoch, __entry->hdr.cid,
+		      __entry->hdr.callNumber, __entry->hdr.serviceId,
+		      __entry->hdr.serial, __entry->hdr.seq,
+		      __entry->hdr.type, __entry->hdr.flags,
+		      __entry->hdr.type <= 15 ? rxrpc_pkts[__entry->hdr.type] : "?UNK")
+	    );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index fb3e2f6afa3b..19b1e189f5dc 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -837,20 +837,19 @@ void rxrpc_data_ready(struct sock *udp_sk)
 	skb_orphan(skb);
 	sp = rxrpc_skb(skb);
 
+	/* dig out the RxRPC connection details */
+	if (rxrpc_extract_header(sp, skb) < 0)
+		goto bad_message;
+
 	if (IS_ENABLED(CONFIG_AF_RXRPC_INJECT_LOSS)) {
 		static int lose;
 		if ((lose++ & 7) == 7) {
+			trace_rxrpc_rx_lose(sp);
 			rxrpc_lose_skb(skb, rxrpc_skb_rx_lost);
 			return;
 		}
 	}
 
-	_net("Rx UDP packet from %08x:%04hu",
-	     ntohl(ip_hdr(skb)->saddr), ntohs(udp_hdr(skb)->source));
-
-	/* dig out the RxRPC connection details */
-	if (rxrpc_extract_header(sp, skb) < 0)
-		goto bad_message;
 	trace_rxrpc_rx_packet(sp);
 
 	_net("Rx RxRPC %s ep=%x call=%x:%x",

^ permalink raw reply related

* [PATCH net-next 12/15] rxrpc: Add data Tx tracepoint and adjust Tx ACK tracepoint
From: David Howells @ 2016-09-23 15:16 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147464371753.5090.1634919599283321856.stgit@warthog.procyon.org.uk>

Add a tracepoint to log transmission of DATA packets (including loss
injection).

Adjust the ACK transmission tracepoint to include the packet serial number
and to line this up with the DATA transmission display.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/trace/events/rxrpc.h |   50 +++++++++++++++++++++++++++++++++++-------
 net/rxrpc/conn_event.c       |    5 ++--
 net/rxrpc/output.c           |    5 +++-
 3 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 57322897d745..6001bf93dc79 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -256,33 +256,67 @@ TRACE_EVENT(rxrpc_rx_ack,
 		      __entry->n_acks)
 	    );
 
+TRACE_EVENT(rxrpc_tx_data,
+	    TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t seq,
+		     rxrpc_serial_t serial, u8 flags, bool lose),
+
+	    TP_ARGS(call, seq, serial, flags, lose),
+
+	    TP_STRUCT__entry(
+		    __field(struct rxrpc_call *,	call		)
+		    __field(rxrpc_seq_t,		seq		)
+		    __field(rxrpc_serial_t,		serial		)
+		    __field(u8,				flags		)
+		    __field(bool,			lose		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->call = call;
+		    __entry->seq = seq;
+		    __entry->serial = serial;
+		    __entry->flags = flags;
+		    __entry->lose = lose;
+			   ),
+
+	    TP_printk("c=%p DATA %08x q=%08x fl=%02x%s",
+		      __entry->call,
+		      __entry->serial,
+		      __entry->seq,
+		      __entry->flags,
+		      __entry->lose ? " *LOSE*" : "")
+	    );
+
 TRACE_EVENT(rxrpc_tx_ack,
-	    TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t first,
-		     rxrpc_serial_t serial, u8 reason, u8 n_acks),
+	    TP_PROTO(struct rxrpc_call *call, rxrpc_serial_t serial,
+		     rxrpc_seq_t ack_first, rxrpc_serial_t ack_serial,
+		     u8 reason, u8 n_acks),
 
-	    TP_ARGS(call, first, serial, reason, n_acks),
+	    TP_ARGS(call, serial, ack_first, ack_serial, reason, n_acks),
 
 	    TP_STRUCT__entry(
 		    __field(struct rxrpc_call *,	call		)
-		    __field(rxrpc_seq_t,		first		)
 		    __field(rxrpc_serial_t,		serial		)
+		    __field(rxrpc_seq_t,		ack_first	)
+		    __field(rxrpc_serial_t,		ack_serial	)
 		    __field(u8,				reason		)
 		    __field(u8,				n_acks		)
 			     ),
 
 	    TP_fast_assign(
 		    __entry->call = call;
-		    __entry->first = first;
 		    __entry->serial = serial;
+		    __entry->ack_first = ack_first;
+		    __entry->ack_serial = ack_serial;
 		    __entry->reason = reason;
 		    __entry->n_acks = n_acks;
 			   ),
 
-	    TP_printk("c=%p %s f=%08x r=%08x n=%u",
+	    TP_printk(" c=%p ACK  %08x %s f=%08x r=%08x n=%u",
 		      __entry->call,
-		      rxrpc_acks(__entry->reason),
-		      __entry->first,
 		      __entry->serial,
+		      rxrpc_acks(__entry->reason),
+		      __entry->ack_first,
+		      __entry->ack_serial,
 		      __entry->n_acks)
 	    );
 
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index 75a15a4c74c3..a1cf1ec5f29e 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -98,9 +98,6 @@ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn,
 		pkt.info.rwind		= htonl(rxrpc_rx_window_size);
 		pkt.info.jumbo_max	= htonl(rxrpc_rx_jumbo_max);
 		len += sizeof(pkt.ack) + sizeof(pkt.info);
-
-		trace_rxrpc_tx_ack(NULL, chan->last_seq, 0,
-				   RXRPC_ACK_DUPLICATE, 0);
 		break;
 	}
 
@@ -122,6 +119,8 @@ static void rxrpc_conn_retransmit_call(struct rxrpc_connection *conn,
 		_proto("Tx ABORT %%%u { %d } [re]", serial, conn->local_abort);
 		break;
 	case RXRPC_PACKET_TYPE_ACK:
+		trace_rxrpc_tx_ack(NULL, serial, chan->last_seq, 0,
+				   RXRPC_ACK_DUPLICATE, 0);
 		_proto("Tx ACK %%%u [re]", serial);
 		break;
 	}
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 5c1e008a5323..e47fbd1c836d 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -177,7 +177,7 @@ int rxrpc_send_call_packet(struct rxrpc_call *call, u8 type)
 	pkt->whdr.serial = htonl(serial);
 	switch (type) {
 	case RXRPC_PACKET_TYPE_ACK:
-		trace_rxrpc_tx_ack(call,
+		trace_rxrpc_tx_ack(call, serial,
 				   ntohl(pkt->ack.firstPacket),
 				   ntohl(pkt->ack.serial),
 				   pkt->ack.reason, pkt->ack.nAcks);
@@ -275,6 +275,8 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct sk_buff *skb)
 	if (IS_ENABLED(CONFIG_AF_RXRPC_INJECT_LOSS)) {
 		static int lose;
 		if ((lose++ & 7) == 7) {
+			trace_rxrpc_tx_data(call, sp->hdr.seq, serial,
+					    whdr.flags, true);
 			rxrpc_lose_skb(skb, rxrpc_skb_tx_lost);
 			_leave(" = 0 [lose]");
 			return 0;
@@ -302,6 +304,7 @@ int rxrpc_send_data_packet(struct rxrpc_call *call, struct sk_buff *skb)
 		goto send_fragmentable;
 
 done:
+	trace_rxrpc_tx_data(call, sp->hdr.seq, serial, whdr.flags, false);
 	if (ret >= 0) {
 		ktime_t now = ktime_get_real();
 		skb->tstamp = now;

^ permalink raw reply related

* [PATCH net-next 11/15] rxrpc: Add a tracepoint for the call timer
From: David Howells @ 2016-09-23 15:16 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147464371753.5090.1634919599283321856.stgit@warthog.procyon.org.uk>

Add a tracepoint to log call timer initiation, setting and expiry.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/trace/events/rxrpc.h |   36 ++++++++++++++++++++++++++++++++++++
 net/rxrpc/ar-internal.h      |   13 ++++++++++++-
 net/rxrpc/call_event.c       |    7 ++++---
 net/rxrpc/call_object.c      |    6 ++++--
 net/rxrpc/misc.c             |    8 ++++++++
 net/rxrpc/sendmsg.c          |    2 +-
 6 files changed, 65 insertions(+), 7 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index e8f2afbbe0bf..57322897d745 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -414,6 +414,42 @@ TRACE_EVENT(rxrpc_rtt_rx,
 		      __entry->avg)
 	    );
 
+TRACE_EVENT(rxrpc_timer,
+	    TP_PROTO(struct rxrpc_call *call, enum rxrpc_timer_trace why,
+		     unsigned long now),
+
+	    TP_ARGS(call, why, now),
+
+	    TP_STRUCT__entry(
+		    __field(struct rxrpc_call *,		call		)
+		    __field(enum rxrpc_timer_trace,		why		)
+		    __field(unsigned long,			now		)
+		    __field(unsigned long,			expire_at	)
+		    __field(unsigned long,			ack_at		)
+		    __field(unsigned long,			resend_at	)
+		    __field(unsigned long,			timer		)
+			     ),
+
+	    TP_fast_assign(
+		    __entry->call	= call;
+		    __entry->why	= why;
+		    __entry->now	= now;
+		    __entry->expire_at	= call->expire_at;
+		    __entry->ack_at	= call->ack_at;
+		    __entry->resend_at	= call->resend_at;
+		    __entry->timer	= call->timer.expires;
+			   ),
+
+	    TP_printk("c=%p %s now=%lx x=%ld a=%ld r=%ld t=%ld",
+		      __entry->call,
+		      rxrpc_timer_traces[__entry->why],
+		      __entry->now,
+		      __entry->expire_at - __entry->now,
+		      __entry->ack_at - __entry->now,
+		      __entry->resend_at - __entry->now,
+		      __entry->timer - __entry->now)
+	    );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index a494d56eb236..e564eca75985 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -678,6 +678,17 @@ enum rxrpc_rtt_rx_trace {
 
 extern const char rxrpc_rtt_rx_traces[rxrpc_rtt_rx__nr_trace][5];
 
+enum rxrpc_timer_trace {
+	rxrpc_timer_begin,
+	rxrpc_timer_expired,
+	rxrpc_timer_set_for_ack,
+	rxrpc_timer_set_for_resend,
+	rxrpc_timer_set_for_send,
+	rxrpc_timer__nr_trace
+};
+
+extern const char rxrpc_timer_traces[rxrpc_timer__nr_trace][8];
+
 extern const char *const rxrpc_pkts[];
 extern const char *rxrpc_acks(u8 reason);
 
@@ -707,7 +718,7 @@ int rxrpc_reject_call(struct rxrpc_sock *);
 /*
  * call_event.c
  */
-void rxrpc_set_timer(struct rxrpc_call *);
+void rxrpc_set_timer(struct rxrpc_call *, enum rxrpc_timer_trace);
 void rxrpc_propose_ACK(struct rxrpc_call *, u8, u16, u32, bool, bool);
 void rxrpc_process_call(struct work_struct *);
 
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 8bc5c8e37ab4..90e970ba048a 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -24,7 +24,7 @@
 /*
  * Set the timer
  */
-void rxrpc_set_timer(struct rxrpc_call *call)
+void rxrpc_set_timer(struct rxrpc_call *call, enum rxrpc_timer_trace why)
 {
 	unsigned long t, now = jiffies;
 
@@ -45,6 +45,7 @@ void rxrpc_set_timer(struct rxrpc_call *call)
 
 		if (call->timer.expires != t || !timer_pending(&call->timer)) {
 			mod_timer(&call->timer, t);
+			trace_rxrpc_timer(call, why, now);
 		}
 	}
 
@@ -120,7 +121,7 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 ack_reason,
 		_debug("deferred ACK %ld < %ld", expiry, call->ack_at - now);
 		if (time_before(ack_at, call->ack_at)) {
 			call->ack_at = ack_at;
-			rxrpc_set_timer(call);
+			rxrpc_set_timer(call, rxrpc_timer_set_for_ack);
 		}
 	}
 }
@@ -293,7 +294,7 @@ recheck_state:
 		goto recheck_state;
 	}
 
-	rxrpc_set_timer(call);
+	rxrpc_set_timer(call, rxrpc_timer_set_for_resend);
 
 	/* other events may have been raised since we started checking */
 	if (call->events && call->state < RXRPC_CALL_COMPLETE) {
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index f2fadf667e19..a53f4c2c0025 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -76,8 +76,10 @@ static void rxrpc_call_timer_expired(unsigned long _call)
 
 	_enter("%d", call->debug_id);
 
-	if (call->state < RXRPC_CALL_COMPLETE)
+	if (call->state < RXRPC_CALL_COMPLETE) {
+		trace_rxrpc_timer(call, rxrpc_timer_expired, jiffies);
 		rxrpc_queue_call(call);
+	}
 }
 
 /*
@@ -200,7 +202,7 @@ static void rxrpc_start_call_timer(struct rxrpc_call *call)
 	call->ack_at = expire_at;
 	call->resend_at = expire_at;
 	call->timer.expires = expire_at + 1;
-	rxrpc_set_timer(call);
+	rxrpc_set_timer(call, rxrpc_timer_begin);
 }
 
 /*
diff --git a/net/rxrpc/misc.c b/net/rxrpc/misc.c
index fe648711c2f7..fa9942fabdf2 100644
--- a/net/rxrpc/misc.c
+++ b/net/rxrpc/misc.c
@@ -194,3 +194,11 @@ const char rxrpc_rtt_rx_traces[rxrpc_rtt_rx__nr_trace][5] = {
 	[rxrpc_rtt_rx_ping_response]	= "PONG",
 	[rxrpc_rtt_rx_requested_ack]	= "RACK",
 };
+
+const char rxrpc_timer_traces[rxrpc_timer__nr_trace][8] = {
+	[rxrpc_timer_begin]			= "Begin ",
+	[rxrpc_timer_expired]			= "*EXPR*",
+	[rxrpc_timer_set_for_ack]		= "SetAck",
+	[rxrpc_timer_set_for_send]		= "SetTx ",
+	[rxrpc_timer_set_for_resend]		= "SetRTx",
+};
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 93e6584cd751..99939372b5a4 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -153,7 +153,7 @@ static void rxrpc_queue_packet(struct rxrpc_call *call, struct sk_buff *skb,
 
 		if (time_before(resend_at, call->resend_at)) {
 			call->resend_at = resend_at;
-			rxrpc_set_timer(call);
+			rxrpc_set_timer(call, rxrpc_timer_set_for_send);
 		}
 	}
 

^ permalink raw reply related

* [PATCH net-next 10/15] rxrpc: Don't call the tx_ack tracepoint if don't generate an ACK
From: David Howells @ 2016-09-23 15:16 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147464371753.5090.1634919599283321856.stgit@warthog.procyon.org.uk>

rxrpc_send_call_packet() is invoking the tx_ack tracepoint before it checks
whether there's an ACK to transmit (another thread may jump in and transmit
it).

Fix this by only invoking the tracepoint if we get a valid ACK to transmit.

Further, only allocate a serial number if we're going to actually transmit
something.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/output.c |   26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 282cb1e36d06..5c1e008a5323 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -80,9 +80,6 @@ static size_t rxrpc_fill_out_ack(struct rxrpc_call *call,
 	pkt->ackinfo.rwind	= htonl(call->rx_winsize);
 	pkt->ackinfo.jumbo_max	= htonl(jmax);
 
-	trace_rxrpc_tx_ack(call, hard_ack + 1, serial, call->ackr_reason,
-			   top - hard_ack);
-
 	*ackp++ = 0;
 	*ackp++ = 0;
 	*ackp++ = 0;
@@ -119,8 +116,6 @@ int rxrpc_send_call_packet(struct rxrpc_call *call, u8 type)
 		return -ENOMEM;
 	}
 
-	serial = atomic_inc_return(&conn->serial);
-
 	msg.msg_name	= &call->peer->srx.transport;
 	msg.msg_namelen	= call->peer->srx.transport_len;
 	msg.msg_control	= NULL;
@@ -131,7 +126,6 @@ int rxrpc_send_call_packet(struct rxrpc_call *call, u8 type)
 	pkt->whdr.cid		= htonl(call->cid);
 	pkt->whdr.callNumber	= htonl(call->call_id);
 	pkt->whdr.seq		= 0;
-	pkt->whdr.serial	= htonl(serial);
 	pkt->whdr.type		= type;
 	pkt->whdr.flags		= conn->out_clientflag;
 	pkt->whdr.userStatus	= 0;
@@ -157,14 +151,6 @@ int rxrpc_send_call_packet(struct rxrpc_call *call, u8 type)
 
 		spin_unlock_bh(&call->lock);
 
-		_proto("Tx ACK %%%u { m=%hu f=#%u p=#%u s=%%%u r=%s n=%u }",
-		       serial,
-		       ntohs(pkt->ack.maxSkew),
-		       ntohl(pkt->ack.firstPacket),
-		       ntohl(pkt->ack.previousPacket),
-		       ntohl(pkt->ack.serial),
-		       rxrpc_acks(pkt->ack.reason),
-		       pkt->ack.nAcks);
 
 		iov[0].iov_len += sizeof(pkt->ack) + n;
 		iov[1].iov_base = &pkt->ackinfo;
@@ -176,7 +162,6 @@ int rxrpc_send_call_packet(struct rxrpc_call *call, u8 type)
 	case RXRPC_PACKET_TYPE_ABORT:
 		abort_code = call->abort_code;
 		pkt->abort_code = htonl(abort_code);
-		_proto("Tx ABORT %%%u { %d }", serial, abort_code);
 		iov[0].iov_len += sizeof(pkt->abort_code);
 		len += sizeof(pkt->abort_code);
 		ioc = 1;
@@ -188,6 +173,17 @@ int rxrpc_send_call_packet(struct rxrpc_call *call, u8 type)
 		goto out;
 	}
 
+	serial = atomic_inc_return(&conn->serial);
+	pkt->whdr.serial = htonl(serial);
+	switch (type) {
+	case RXRPC_PACKET_TYPE_ACK:
+		trace_rxrpc_tx_ack(call,
+				   ntohl(pkt->ack.firstPacket),
+				   ntohl(pkt->ack.serial),
+				   pkt->ack.reason, pkt->ack.nAcks);
+		break;
+	}
+
 	if (ping) {
 		call->ackr_ping = serial;
 		smp_wmb();

^ permalink raw reply related

* [PATCH net-next 09/15] rxrpc: Pass the last Tx packet marker in the annotation buffer
From: David Howells @ 2016-09-23 15:16 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147464371753.5090.1634919599283321856.stgit@warthog.procyon.org.uk>

When the last packet of data to be transmitted on a call is queued, tx_top
is set and then the RXRPC_CALL_TX_LAST flag is set.  Unfortunately, this
leaves a race in the ACK processing side of things because the flag affects
the interpretation of tx_top and also allows us to start receiving reply
data before we've finished transmitting.

To fix this, make the following changes:

 (1) rxrpc_queue_packet() now sets a marker in the annotation buffer
     instead of setting the RXRPC_CALL_TX_LAST flag.

 (2) rxrpc_rotate_tx_window() detects the marker and sets the flag in the
     same context as the routines that use it.

 (3) rxrpc_end_tx_phase() is simplified to just shift the call state.
     The Tx window must have been rotated before calling to discard the
     last packet.

 (4) rxrpc_receiving_reply() is added to handle the arrival of the first
     DATA packet of a reply to a client call (which is an implicit ACK of
     the Tx phase).

 (5) The last part of rxrpc_input_ack() is reordered to perform Tx
     rotation, then soft-ACK application and then to end the phase if we've
     rotated the last packet.  In the event of a terminal ACK, the soft-ACK
     application will be skipped as nAcks should be 0.

 (6) rxrpc_input_ackall() now has to rotate as well as ending the phase.

In addition:

 (7) Alter the transmit tracepoint to log the rotation of the last packet.

 (8) Remove the no-longer relevant queue_reqack tracepoint note.  The
     ACK-REQUESTED packet header flag is now set as needed when we actually
     transmit the packet and may vary by retransmission.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/ar-internal.h |    7 ++-
 net/rxrpc/input.c       |  102 +++++++++++++++++++++++++++++++----------------
 net/rxrpc/misc.c        |    3 +
 net/rxrpc/sendmsg.c     |   14 +++---
 4 files changed, 81 insertions(+), 45 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 9e3ba4dc9578..a494d56eb236 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -508,7 +508,9 @@ struct rxrpc_call {
 #define RXRPC_TX_ANNO_NAK	2
 #define RXRPC_TX_ANNO_RETRANS	3
 #define RXRPC_TX_ANNO_MASK	0x03
-#define RXRPC_TX_ANNO_RESENT	0x04
+#define RXRPC_TX_ANNO_LAST	0x04
+#define RXRPC_TX_ANNO_RESENT	0x08
+
 #define RXRPC_RX_ANNO_JUMBO	0x3f		/* Jumbo subpacket number + 1 if not zero */
 #define RXRPC_RX_ANNO_JLAST	0x40		/* Set if last element of a jumbo packet */
 #define RXRPC_RX_ANNO_VERIFIED	0x80		/* Set if verified and decrypted */
@@ -621,9 +623,10 @@ extern const char rxrpc_call_traces[rxrpc_call__nr_trace][4];
 enum rxrpc_transmit_trace {
 	rxrpc_transmit_wait,
 	rxrpc_transmit_queue,
-	rxrpc_transmit_queue_reqack,
 	rxrpc_transmit_queue_last,
 	rxrpc_transmit_rotate,
+	rxrpc_transmit_rotate_last,
+	rxrpc_transmit_await_reply,
 	rxrpc_transmit_end,
 	rxrpc_transmit__nr_trace
 };
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index d3d69ab1f0a1..fb3e2f6afa3b 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -59,6 +59,7 @@ static void rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to)
 {
 	struct sk_buff *skb, *list = NULL;
 	int ix;
+	u8 annotation;
 
 	spin_lock(&call->lock);
 
@@ -66,16 +67,22 @@ static void rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to)
 		call->tx_hard_ack++;
 		ix = call->tx_hard_ack & RXRPC_RXTX_BUFF_MASK;
 		skb = call->rxtx_buffer[ix];
+		annotation = call->rxtx_annotations[ix];
 		rxrpc_see_skb(skb, rxrpc_skb_tx_rotated);
 		call->rxtx_buffer[ix] = NULL;
 		call->rxtx_annotations[ix] = 0;
 		skb->next = list;
 		list = skb;
+
+		if (annotation & RXRPC_TX_ANNO_LAST)
+			set_bit(RXRPC_CALL_TX_LAST, &call->flags);
 	}
 
 	spin_unlock(&call->lock);
 
-	trace_rxrpc_transmit(call, rxrpc_transmit_rotate);
+	trace_rxrpc_transmit(call, (test_bit(RXRPC_CALL_TX_LAST, &call->flags) ?
+				    rxrpc_transmit_rotate_last :
+				    rxrpc_transmit_rotate));
 	wake_up(&call->waitq);
 
 	while (list) {
@@ -92,42 +99,65 @@ static void rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to)
  * This occurs when we get an ACKALL packet, the first DATA packet of a reply,
  * or a final ACK packet.
  */
-static bool rxrpc_end_tx_phase(struct rxrpc_call *call, const char *abort_why)
+static bool rxrpc_end_tx_phase(struct rxrpc_call *call, bool reply_begun,
+			       const char *abort_why)
 {
-	_enter("");
-
-	switch (call->state) {
-	case RXRPC_CALL_CLIENT_RECV_REPLY:
-		return true;
-	case RXRPC_CALL_CLIENT_AWAIT_REPLY:
-	case RXRPC_CALL_SERVER_AWAIT_ACK:
-		break;
-	default:
-		rxrpc_proto_abort(abort_why, call, call->tx_top);
-		return false;
-	}
 
-	rxrpc_rotate_tx_window(call, call->tx_top);
+	ASSERT(test_bit(RXRPC_CALL_TX_LAST, &call->flags));
 
 	write_lock(&call->state_lock);
 
 	switch (call->state) {
-	default:
-		break;
+	case RXRPC_CALL_CLIENT_SEND_REQUEST:
 	case RXRPC_CALL_CLIENT_AWAIT_REPLY:
-		call->tx_phase = false;
-		call->state = RXRPC_CALL_CLIENT_RECV_REPLY;
+		if (reply_begun)
+			call->state = RXRPC_CALL_CLIENT_RECV_REPLY;
+		else
+			call->state = RXRPC_CALL_CLIENT_AWAIT_REPLY;
 		break;
+
 	case RXRPC_CALL_SERVER_AWAIT_ACK:
 		__rxrpc_call_completed(call);
 		rxrpc_notify_socket(call);
 		break;
+
+	default:
+		goto bad_state;
 	}
 
 	write_unlock(&call->state_lock);
-	trace_rxrpc_transmit(call, rxrpc_transmit_end);
+	if (call->state == RXRPC_CALL_CLIENT_AWAIT_REPLY) {
+		trace_rxrpc_transmit(call, rxrpc_transmit_await_reply);
+	} else {
+		trace_rxrpc_transmit(call, rxrpc_transmit_end);
+	}
 	_leave(" = ok");
 	return true;
+
+bad_state:
+	write_unlock(&call->state_lock);
+	kdebug("end_tx %s", rxrpc_call_states[call->state]);
+	rxrpc_proto_abort(abort_why, call, call->tx_top);
+	return false;
+}
+
+/*
+ * Begin the reply reception phase of a call.
+ */
+static bool rxrpc_receiving_reply(struct rxrpc_call *call)
+{
+	rxrpc_seq_t top = READ_ONCE(call->tx_top);
+
+	if (!test_bit(RXRPC_CALL_TX_LAST, &call->flags))
+		rxrpc_rotate_tx_window(call, top);
+	if (!test_bit(RXRPC_CALL_TX_LAST, &call->flags)) {
+		rxrpc_proto_abort("TXL", call, top);
+		return false;
+	}
+	if (!rxrpc_end_tx_phase(call, true, "ETD"))
+		return false;
+	call->tx_phase = false;
+	return true;
 }
 
 /*
@@ -226,8 +256,9 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb,
 	/* Received data implicitly ACKs all of the request packets we sent
 	 * when we're acting as a client.
 	 */
-	if (call->state == RXRPC_CALL_CLIENT_AWAIT_REPLY &&
-	    !rxrpc_end_tx_phase(call, "ETD"))
+	if ((call->state == RXRPC_CALL_CLIENT_SEND_REQUEST ||
+	     call->state == RXRPC_CALL_CLIENT_AWAIT_REPLY) &&
+	    !rxrpc_receiving_reply(call))
 		return;
 
 	call->ackr_prev_seq = seq;
@@ -587,27 +618,26 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb,
 	}
 	call->acks_latest = sp->hdr.serial;
 
-	if (test_bit(RXRPC_CALL_TX_LAST, &call->flags) &&
-	    hard_ack == call->tx_top) {
-		rxrpc_end_tx_phase(call, "ETA");
-		return;
-	}
-
 	if (before(hard_ack, call->tx_hard_ack) ||
 	    after(hard_ack, call->tx_top))
 		return rxrpc_proto_abort("AKW", call, 0);
+	if (nr_acks > call->tx_top - hard_ack)
+		return rxrpc_proto_abort("AKN", call, 0);
 
 	if (after(hard_ack, call->tx_hard_ack))
 		rxrpc_rotate_tx_window(call, hard_ack);
 
-	if (after(first_soft_ack, call->tx_top))
+	if (nr_acks > 0) {
+		if (skb_copy_bits(skb, sp->offset, buf.acks, nr_acks) < 0)
+			return rxrpc_proto_abort("XSA", call, 0);
+		rxrpc_input_soft_acks(call, buf.acks, first_soft_ack, nr_acks);
+	}
+
+	if (test_bit(RXRPC_CALL_TX_LAST, &call->flags)) {
+		rxrpc_end_tx_phase(call, false, "ETA");
 		return;
+	}
 
-	if (nr_acks > call->tx_top - first_soft_ack + 1)
-		nr_acks = first_soft_ack - call->tx_top + 1;
-	if (skb_copy_bits(skb, sp->offset, buf.acks, nr_acks) < 0)
-		return rxrpc_proto_abort("XSA", call, 0);
-	rxrpc_input_soft_acks(call, buf.acks, first_soft_ack, nr_acks);
 }
 
 /*
@@ -619,7 +649,9 @@ static void rxrpc_input_ackall(struct rxrpc_call *call, struct sk_buff *skb)
 
 	_proto("Rx ACKALL %%%u", sp->hdr.serial);
 
-	rxrpc_end_tx_phase(call, "ETL");
+	rxrpc_rotate_tx_window(call, call->tx_top);
+	if (test_bit(RXRPC_CALL_TX_LAST, &call->flags))
+		rxrpc_end_tx_phase(call, false, "ETL");
 }
 
 /*
diff --git a/net/rxrpc/misc.c b/net/rxrpc/misc.c
index 0d425e707f22..fe648711c2f7 100644
--- a/net/rxrpc/misc.c
+++ b/net/rxrpc/misc.c
@@ -155,9 +155,10 @@ const char rxrpc_client_traces[rxrpc_client__nr_trace][7] = {
 const char rxrpc_transmit_traces[rxrpc_transmit__nr_trace][4] = {
 	[rxrpc_transmit_wait]		= "WAI",
 	[rxrpc_transmit_queue]		= "QUE",
-	[rxrpc_transmit_queue_reqack]	= "QRA",
 	[rxrpc_transmit_queue_last]	= "QLS",
 	[rxrpc_transmit_rotate]		= "ROT",
+	[rxrpc_transmit_rotate_last]	= "RLS",
+	[rxrpc_transmit_await_reply]	= "AWR",
 	[rxrpc_transmit_end]		= "END",
 };
 
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 7cb34b2dfba9..93e6584cd751 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -94,11 +94,15 @@ static void rxrpc_queue_packet(struct rxrpc_call *call, struct sk_buff *skb,
 	struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
 	rxrpc_seq_t seq = sp->hdr.seq;
 	int ret, ix;
+	u8 annotation = RXRPC_TX_ANNO_UNACK;
 
 	_net("queue skb %p [%d]", skb, seq);
 
 	ASSERTCMP(seq, ==, call->tx_top + 1);
 
+	if (last)
+		annotation |= RXRPC_TX_ANNO_LAST;
+
 	/* We have to set the timestamp before queueing as the retransmit
 	 * algorithm can see the packet as soon as we queue it.
 	 */
@@ -106,18 +110,14 @@ static void rxrpc_queue_packet(struct rxrpc_call *call, struct sk_buff *skb,
 
 	ix = seq & RXRPC_RXTX_BUFF_MASK;
 	rxrpc_get_skb(skb, rxrpc_skb_tx_got);
-	call->rxtx_annotations[ix] = RXRPC_TX_ANNO_UNACK;
+	call->rxtx_annotations[ix] = annotation;
 	smp_wmb();
 	call->rxtx_buffer[ix] = skb;
 	call->tx_top = seq;
-	if (last) {
-		set_bit(RXRPC_CALL_TX_LAST, &call->flags);
+	if (last)
 		trace_rxrpc_transmit(call, rxrpc_transmit_queue_last);
-	} else if (sp->hdr.flags & RXRPC_REQUEST_ACK) {
-		trace_rxrpc_transmit(call, rxrpc_transmit_queue_reqack);
-	} else {
+	else
 		trace_rxrpc_transmit(call, rxrpc_transmit_queue);
-	}
 
 	if (last || call->state == RXRPC_CALL_SERVER_ACK_REQUEST) {
 		_debug("________awaiting reply/ACK__________");

^ permalink raw reply related

* [PATCH net-next 08/15] rxrpc: Fix call timer
From: David Howells @ 2016-09-23 15:16 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147464371753.5090.1634919599283321856.stgit@warthog.procyon.org.uk>

Fix the call timer in the following ways:

 (1) If call->resend_at or call->ack_at are before or equal to the current
     time, then ignore that timeout.

 (2) If call->expire_at is before or equal to the current time, then don't
     set the timer at all (possibly we should queue the call).

 (3) Don't skip modifying the timer if timer_pending() is true.  This
     indicates that the timer is working, not that it has expired and is
     running/waiting to run its expiry handler.

Also call rxrpc_set_timer() to start the call timer going rather than
calling add_timer().

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/call_event.c  |   25 ++++++++++++++-----------
 net/rxrpc/call_object.c |    4 ++--
 2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 3a7f90a2659c..8bc5c8e37ab4 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -28,24 +28,27 @@ void rxrpc_set_timer(struct rxrpc_call *call)
 {
 	unsigned long t, now = jiffies;
 
-	_enter("{%ld,%ld,%ld:%ld}",
-	       call->ack_at - now, call->resend_at - now, call->expire_at - now,
-	       call->timer.expires - now);
-
 	read_lock_bh(&call->state_lock);
 
 	if (call->state < RXRPC_CALL_COMPLETE) {
-		t = call->ack_at;
-		if (time_before(call->resend_at, t))
+		t = call->expire_at;
+		if (time_before_eq(t, now))
+			goto out;
+
+		if (time_after(call->resend_at, now) &&
+		    time_before(call->resend_at, t))
 			t = call->resend_at;
-		if (time_before(call->expire_at, t))
-			t = call->expire_at;
-		if (!timer_pending(&call->timer) ||
-		    time_before(t, call->timer.expires)) {
-			_debug("set timer %ld", t - now);
+
+		if (time_after(call->ack_at, now) &&
+		    time_before(call->ack_at, t))
+			t = call->ack_at;
+
+		if (call->timer.expires != t || !timer_pending(&call->timer)) {
 			mod_timer(&call->timer, t);
 		}
 	}
+
+out:
 	read_unlock_bh(&call->state_lock);
 }
 
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index f50a6094e198..f2fadf667e19 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -199,8 +199,8 @@ static void rxrpc_start_call_timer(struct rxrpc_call *call)
 	call->expire_at = expire_at;
 	call->ack_at = expire_at;
 	call->resend_at = expire_at;
-	call->timer.expires = expire_at;
-	add_timer(&call->timer);
+	call->timer.expires = expire_at + 1;
+	rxrpc_set_timer(call);
 }
 
 /*

^ permalink raw reply related

* [PATCH net-next 07/15] rxrpc: Fix accidental cancellation of scheduled resend by ACK parser
From: David Howells @ 2016-09-23 15:16 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147464371753.5090.1634919599283321856.stgit@warthog.procyon.org.uk>

When rxrpc_input_soft_acks() is parsing the soft-ACKs from an ACK packet,
it updates the Tx packet annotations in the annotation buffer.  If a
soft-ACK is an ACK, then we overwrite unack'd, nak'd or to-be-retransmitted
states and that is fine; but if the soft-ACK is an NACK, we overwrite the
to-be-retransmitted with a nak - which isn't.

Instead, we need to let any scheduled retransmission stand if the packet
was NAK'd.

Note that we don't reissue a resend if the annotation is in the
to-be-retransmitted state because someone else must've scheduled the
resend already.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/input.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 06027b6d9c19..d3d69ab1f0a1 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -479,6 +479,8 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, u8 *acks,
 		case RXRPC_ACK_TYPE_NACK:
 			if (anno_type == RXRPC_TX_ANNO_NAK)
 				continue;
+			if (anno_type == RXRPC_TX_ANNO_RETRANS)
+				continue;
 			call->rxtx_annotations[ix] =
 				RXRPC_TX_ANNO_NAK | annotation;
 			resend = true;

^ permalink raw reply related

* [PATCH net-next 06/15] rxrpc: Need to start the resend timer on initial transmission
From: David Howells @ 2016-09-23 15:15 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147464371753.5090.1634919599283321856.stgit@warthog.procyon.org.uk>

When a DATA packet has its initial transmission, we may need to start or
adjust the resend timer.  Without this we end up relying on being sent a
NACK to initiate the resend.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/ar-internal.h |    1 +
 net/rxrpc/call_event.c  |    2 +-
 net/rxrpc/sendmsg.c     |    9 +++++++++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 808ab750dc6b..9e3ba4dc9578 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -704,6 +704,7 @@ int rxrpc_reject_call(struct rxrpc_sock *);
 /*
  * call_event.c
  */
+void rxrpc_set_timer(struct rxrpc_call *);
 void rxrpc_propose_ACK(struct rxrpc_call *, u8, u16, u32, bool, bool);
 void rxrpc_process_call(struct work_struct *);
 
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index a2909da5d581..3a7f90a2659c 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -24,7 +24,7 @@
 /*
  * Set the timer
  */
-static void rxrpc_set_timer(struct rxrpc_call *call)
+void rxrpc_set_timer(struct rxrpc_call *call)
 {
 	unsigned long t, now = jiffies;
 
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index ca3811bfbd17..7cb34b2dfba9 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -146,6 +146,15 @@ static void rxrpc_queue_packet(struct rxrpc_call *call, struct sk_buff *skb,
 	if (ret < 0) {
 		_debug("need instant resend %d", ret);
 		rxrpc_instant_resend(call, ix);
+	} else {
+		unsigned long resend_at;
+
+		resend_at = jiffies + msecs_to_jiffies(rxrpc_resend_timeout);
+
+		if (time_before(resend_at, call->resend_at)) {
+			call->resend_at = resend_at;
+			rxrpc_set_timer(call);
+		}
 	}
 
 	rxrpc_free_skb(skb, rxrpc_skb_tx_freed);

^ permalink raw reply related

* [PATCH net-next 05/15] rxrpc: Use before_eq() and friends to compare serial numbers
From: David Howells @ 2016-09-23 15:15 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147464371753.5090.1634919599283321856.stgit@warthog.procyon.org.uk>

before_eq() and friends should be used to compare serial numbers (when not
checking for (non)equality) rather than casting to int, subtracting and
checking the result.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/input.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index cbb5d53f09d7..06027b6d9c19 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -578,7 +578,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb,
 	}
 
 	/* Discard any out-of-order or duplicate ACKs. */
-	if ((int)sp->hdr.serial - (int)call->acks_latest <= 0) {
+	if (before_eq(sp->hdr.serial, call->acks_latest)) {
 		_debug("discard ACK %d <= %d",
 		       sp->hdr.serial, call->acks_latest);
 		return;

^ permalink raw reply related

* [PATCH net-next 04/15] rxrpc: Should be using ktime_add_ms() not ktime_add_ns()
From: David Howells @ 2016-09-23 15:15 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <147464371753.5090.1634919599283321856.stgit@warthog.procyon.org.uk>

ktime_add_ms() should be used to add the resend time (in ms) rather than
ktime_add_ns().

Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/call_event.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 6e2ea8f4ae75..a2909da5d581 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -187,7 +187,7 @@ static void rxrpc_resend(struct rxrpc_call *call)
 		call->rxtx_annotations[ix] = RXRPC_TX_ANNO_RETRANS | annotation;
 	}
 
-	resend_at = ktime_sub(ktime_add_ns(oldest, rxrpc_resend_timeout), now);
+	resend_at = ktime_sub(ktime_add_ms(oldest, rxrpc_resend_timeout), now);
 	call->resend_at = jiffies + nsecs_to_jiffies(ktime_to_ns(resend_at));
 
 	/* Now go through the Tx window and perform the retransmissions.  We

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox