* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: Jiri Pirko @ 2019-08-29 13:49 UTC (permalink / raw)
To: Andrew Lunn
Cc: Horatiu Vultur, alexandre.belloni, UNGLinuxDriver, davem,
allan.nielsen, ivecera, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190829132611.GC6998@lunn.ch>
Thu, Aug 29, 2019 at 03:26:11PM CEST, andrew@lunn.ch wrote:
>> NACK
>>
>> This is invalid usecase for switchdev infra. Switchdev is there for
>> bridge offload purposes only.
>
>Hi Jiri
>
>I would argue this is for bridge offload. In another email, you say
>promisc is promisc. Does that mean the Mellonox hardware forwards
>every frame ingressing a port to the CPU by default as soon as it is
>enslaved to a bridge and promisc mode turned on? Or course not. At the
>moment, every switchdev driver wrongly implement promisc mode.
>
>This patchset is about correctly implementing promisc mode, so that
>applications can use it as expected. And that means configuring the
>hardware bridge to also forward a copy of frames to the CPU.
Wait, I believe there has been some misundestanding. Promisc mode is NOT
about getting packets to the cpu. It's about setting hw filters in a way
that no rx packet is dropped. For normal nics it means that all packets
get to the cpu, but that is just because it is the only direction they
can make.
If you want to get packets from the hw forwarding dataplane to cpu, you
should not use promisc mode for that. That would be incorrect.
If you want to get packets from the hw forwarding dataplane to cpu, you
should use tc trap action. It is there exactly for this purpose.
Promisc is for setting rx filters.
>
>I see trap as a different use case. tcpdump/pcap is not going to use
>traps.
>
> Andrew
^ permalink raw reply
* RE: [PATCH][net-next] net: drop_monitor: change the stats variable to u64 in net_dm_stats_put
From: David Laight @ 2019-08-29 13:45 UTC (permalink / raw)
To: 'Li,Rongqing', Ido Schimmel
Cc: netdev@vger.kernel.org, idosch@mellanox.com
In-Reply-To: <84063fe4df95437d81beb0d18f4043a5@baidu.com>
From: Li,Rongqing
> Sent: 22 August 2019 13:32
> > On Thu, Aug 22, 2019 at 02:22:33PM +0800, Li RongQing wrote:
> > > only the element drop of struct net_dm_stats is used, so simplify it
> > > to u64
> >
> > Thanks for the patch, but I don't really see the value here. The struct allows for
> > easy extensions in the future. What do you gain from this change? We merely
> > read stats and report them to user space, so I guess it's not about
> > performance either.
> >
>
> I think u64 can reduce to call memset and dereference stats.drop
The compiler should inline the memset().
Also you should have called it 'dropped' not 'stats'.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply
* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: Ivan Vecera @ 2019-08-29 13:39 UTC (permalink / raw)
To: Andrew Lunn
Cc: Horatiu Vultur, Jiri Pirko, alexandre.belloni, UNGLinuxDriver,
davem, allan.nielsen, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190829131543.GB6998@lunn.ch>
On Thu, 29 Aug 2019 15:15:43 +0200
Andrew Lunn <andrew@lunn.ch> wrote:
> The problem with this is, the driver only gets called when promisc
> goes from 0 to !0. So, the port is added to the bridge. Promisc goes
> 0->1, and the driver gets called. We can evaluate as you said above,
> and leave the port filtering frames, not forwarding everything. When
> tcpdump is started, the core does promisc 1->2, but does not call into
> the driver. Also, currently sending a notification is not
> unconditional.
Hi Andrew,
got it. What about to change the existing notify call so NETDEV_CHANGE
notification will be also sent when (old_promiscuity !=
new_promiscuity)?
Ivan
^ permalink raw reply
* [patch net-next] sched: act_vlan: implement stats_update callback
From: Jiri Pirko @ 2019-08-29 13:38 UTC (permalink / raw)
To: netdev; +Cc: davem, mlxsw, pengfeil, jhs, xiyou.wangcong
From: Jiri Pirko <jiri@mellanox.com>
Implement this callback in order to get the offloaded stats added to the
kernel stats.
Reported-by: Pengfei Liu <pengfeil@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
net/sched/act_vlan.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index a3c9eea1ee8a..216b75709875 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -301,6 +301,19 @@ static int tcf_vlan_walker(struct net *net, struct sk_buff *skb,
return tcf_generic_walker(tn, skb, cb, type, ops, extack);
}
+static void tcf_vlan_stats_update(struct tc_action *a, u64 bytes, u32 packets,
+ u64 lastuse, bool hw)
+{
+ struct tcf_vlan *v = to_vlan(a);
+ struct tcf_t *tm = &v->tcf_tm;
+
+ _bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets);
+ if (hw)
+ _bstats_cpu_update(this_cpu_ptr(a->cpu_bstats_hw),
+ bytes, packets);
+ tm->lastuse = max_t(u64, tm->lastuse, lastuse);
+}
+
static int tcf_vlan_search(struct net *net, struct tc_action **a, u32 index)
{
struct tc_action_net *tn = net_generic(net, vlan_net_id);
@@ -325,6 +338,7 @@ static struct tc_action_ops act_vlan_ops = {
.init = tcf_vlan_init,
.cleanup = tcf_vlan_cleanup,
.walk = tcf_vlan_walker,
+ .stats_update = tcf_vlan_stats_update,
.get_fill_size = tcf_vlan_get_fill_size,
.lookup = tcf_vlan_search,
.size = sizeof(struct tcf_vlan),
--
2.20.1
^ permalink raw reply related
* Re: [PATCH v2 bpf-next 1/3] capability: introduce CAP_BPF and CAP_TRACING
From: Nicolas Dichtel @ 2019-08-29 13:36 UTC (permalink / raw)
To: Alexei Starovoitov, luto
Cc: davem, peterz, rostedt, netdev, bpf, kernel-team, linux-api
In-Reply-To: <20190829051253.1927291-1-ast@kernel.org>
Le 29/08/2019 à 07:12, Alexei Starovoitov a écrit :
[snip]
> CAP_BPF and CAP_NET_ADMIN together allow the following:
> - Attach to cgroup-bpf hooks and query
> - skb, xdp, flow_dissector test_run command
>
> CAP_NET_ADMIN allows:
> - Attach networking bpf programs to xdp, tc, lwt, flow dissector
I'm not sure to understand the difference between these last two points.
But, with the current kernel, CAP_NET_ADMIN is not enough to attach bpf prog
with tc and it's still not enough after your patch.
The following command is rejected:
$ tc filter add dev eth0 ingress matchall action bpf obj ./tc_test_kern.o sec test
Prog section 'test' rejected: Operation not permitted (1)!
- Type: 4
- Instructions: 22 (0 over limit)
- License: GPL
Verifier analysis:
Error fetching program/map!
bad action parsing
parse_action: bad value (5:bpf)!
Illegal "action"
$
Like Andy, I'm also wondering about the backward compatibility. With my current
docker, I'm able to play with tc bpf with CAP_SYS_ADMIN. But if I update my
kernel with your patches, CAP_SYS_ADMIN doesn't allow anymore that and CAP_BPF
is not implemented in my current docker, thus I cannot give the correct
capabilities. In other words, an old docker cannot run on a new kernel.
Regards,
Nicolas
^ permalink raw reply
* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Steven Rostedt @ 2019-08-29 13:34 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Peter Zijlstra, Andy Lutomirski, Alexei Starovoitov, Kees Cook,
LSM List, James Morris, Jann Horn, Masami Hiramatsu,
David S. Miller, Daniel Borkmann, Network Development, bpf,
kernel-team, Linux API
In-Reply-To: <20190828220826.nlkpp632rsomocve@ast-mbp.dhcp.thefacebook.com>
On Wed, 28 Aug 2019 15:08:28 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> On Wed, Aug 28, 2019 at 09:14:21AM +0200, Peter Zijlstra wrote:
> > On Tue, Aug 27, 2019 at 04:01:08PM -0700, Andy Lutomirski wrote:
> >
> > > > Tracing:
> > > >
> > > > CAP_BPF and perf_paranoid_tracepoint_raw() (which is kernel.perf_event_paranoid == -1)
> > > > are necessary to:
> >
> > That's not tracing, that's perf.
> >
> re: your first comment above.
> I'm not sure what difference you see in words 'tracing' and 'perf'.
> I really hope we don't partition the overall tracing category
> into CAP_PERF and CAP_FTRACE only because these pieces are maintained
> by different people.
I think Peter meant: It's not tracing, it's profiling.
And there is a bit of separation between the two, although there is an
overlap.
Yes, perf can do tracing but it's designed more for profiling.
> On one side perf_event_open() isn't really doing tracing (as step by
> step ftracing of function sequences), but perf_event_open() opens
> an event and the sequence of events (may include IP) becomes a trace.
> imo CAP_TRACING is the best name to descibe the privileged space
> of operations possible via perf_event_open, ftrace, kprobe, stack traces, etc.
I have no issue with what you suggest. I guess it comes down to how
fine grain people want to go. Do we want it to be all or nothing?
Should CAP_TRACING allow for write access to tracefs? Or should we go
with needing both CAP_TRACING and permissions in that directory
(like changing the group ownership of the files at every boot).
Perhaps we should have a CAP_TRACING_RO, that gives read access to
tracefs (and write if the users have permissions). And have CAP_TRACING
to allow full write access as well (allowing for users to add kprobe
events and enabling tracers like the function tracer).
>
> Another reason are kuprobes. They can be crated via perf_event_open
> and via tracefs. Are they in CAP_PERF or in CAP_FTRACE ? In both, right?
> Should then CAP_KPROBE be used ? that would be an overkill.
> It would partition the space even further without obvious need.
>
> Looking from BPF angle... BPF doesn't have integration with ftrace yet.
> bpf_trace_printk is using ftrace mechanism, but that's 1% of ftrace.
> In the long run I really like to see bpf using all of ftrace.
> Whereas bpf is using a lot of 'perf'.
> And extending some perf things in bpf specific way.
> Take a look at how BPF_F_STACK_BUILD_ID. It's clearly perf/stack_tracing
> feature that generic perf can use one day.
> Currently it sits in bpf land and accessible via bpf only.
> Though its bpf only today I categorize it under CAP_TRACING.
>
> I think CAP_TRACING privilege should allow task to do all of perf_event_open,
> kuprobe, stack trace, ftrace, and kallsyms.
> We can think of some exceptions that should stay under CAP_SYS_ADMIN,
> but most of the functionality available by 'perf' binary should be
> usable with CAP_TRACING. 'perf' can do bpf too.
> With CAP_BPF it would be all set.
As the above seems to favor the idea of CAP_TRACING allowing write
access to tracefs, should we have a CAP_TRACING_RO for just read access
and limited perf abilities?
-- Steve
^ permalink raw reply
* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: Andrew Lunn @ 2019-08-29 13:26 UTC (permalink / raw)
To: Jiri Pirko
Cc: Horatiu Vultur, alexandre.belloni, UNGLinuxDriver, davem,
allan.nielsen, ivecera, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190829095100.GH2312@nanopsycho>
> NACK
>
> This is invalid usecase for switchdev infra. Switchdev is there for
> bridge offload purposes only.
Hi Jiri
I would argue this is for bridge offload. In another email, you say
promisc is promisc. Does that mean the Mellonox hardware forwards
every frame ingressing a port to the CPU by default as soon as it is
enslaved to a bridge and promisc mode turned on? Or course not. At the
moment, every switchdev driver wrongly implement promisc mode.
This patchset is about correctly implementing promisc mode, so that
applications can use it as expected. And that means configuring the
hardware bridge to also forward a copy of frames to the CPU.
I see trap as a different use case. tcpdump/pcap is not going to use
traps.
Andrew
^ permalink raw reply
* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: Andrew Lunn @ 2019-08-29 13:15 UTC (permalink / raw)
To: Ivan Vecera
Cc: Horatiu Vultur, Jiri Pirko, alexandre.belloni, UNGLinuxDriver,
davem, allan.nielsen, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190829145518.393fb99d@ceranb>
On Thu, Aug 29, 2019 at 02:55:18PM +0200, Ivan Vecera wrote:
> On Thu, 29 Aug 2019 14:44:14 +0200
> Horatiu Vultur <horatiu.vultur@microchip.com> wrote:
>
> > When a port is added to a bridge, then the port gets in promisc mode.
> > But in our case the HW has bridge capabilities so it is not required
> > to set the port in promisc mode.
> > But if someone else requires the port to be in promisc mode (tcpdump
> > or any other application) then I would like to set the port in promisc
> > mode.
> > In previous emails Andrew came with the suggestion to look at
> > dev->promiscuity and check if the port is a bridge port. Using this
> > information I could see when to add the port in promisc mode. He also
> > suggested to add a new switchdev call(maybe I missunderstood him, or I
> > have done it at the wrong place) in case there are no callbacks in the
> > driver to get this information.
>
> I would use the 1st suggestion.
>
> for/in your driver:
> if (dev->promiscuity > 0) {
> if (dev->promiscuity == 1 && netif_is_bridge_port(dev)) {
> /* no need to set promisc mode because promiscuity
> * was requested by bridge
> */
> ...
> } else {
> /* need to set promisc mode as someone else requested
> * promiscuity
> */
> }
> }
Hi Ivan
The problem with this is, the driver only gets called when promisc
goes from 0 to !0. So, the port is added to the bridge. Promisc goes
0->1, and the driver gets called. We can evaluate as you said above,
and leave the port filtering frames, not forwarding everything. When
tcpdump is started, the core does promisc 1->2, but does not call into
the driver. Also, currently sending a notification is not
unconditional.
What this patch adds is an unconditional call into the switchdev
driver so it can evaluate the condition above, with every change to
the promisc counter.
Andrew
^ permalink raw reply
* Re: [PATCH v3 1/2] net: core: Notify on changes to dev->promiscuity.
From: Horatiu Vultur @ 2019-08-29 13:15 UTC (permalink / raw)
To: Ivan Vecera
Cc: Jiri Pirko, alexandre.belloni, UNGLinuxDriver, davem, andrew,
allan.nielsen, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190829145518.393fb99d@ceranb>
The 08/29/2019 14:55, Ivan Vecera wrote:
> External E-Mail
>
>
> On Thu, 29 Aug 2019 14:44:14 +0200
> Horatiu Vultur <horatiu.vultur@microchip.com> wrote:
>
> > When a port is added to a bridge, then the port gets in promisc mode.
> > But in our case the HW has bridge capabilities so it is not required
> > to set the port in promisc mode.
> > But if someone else requires the port to be in promisc mode (tcpdump
> > or any other application) then I would like to set the port in promisc
> > mode.
> > In previous emails Andrew came with the suggestion to look at
> > dev->promiscuity and check if the port is a bridge port. Using this
> > information I could see when to add the port in promisc mode. He also
> > suggested to add a new switchdev call(maybe I missunderstood him, or I
> > have done it at the wrong place) in case there are no callbacks in the
> > driver to get this information.
>
Hi Ivan,
> I would use the 1st suggestion.
>
> for/in your driver:
> if (dev->promiscuity > 0) {
> if (dev->promiscuity == 1 && netif_is_bridge_port(dev)) {
> /* no need to set promisc mode because promiscuity
> * was requested by bridge
> */
> ...
> } else {
> /* need to set promisc mode as someone else requested
> * promiscuity
> */
> }
> }
Yes that was my first try. It worked for many cases but there was one
which it didn't work.
If first add the port to the bridge, then you get callbacks and
notifications in the driver where you can use the code that you
suggested. But if next you enable promisc mode on the port (ip link set
dev eth0 promisc on), then the driver will not get any callbacks or
notifications. Therefore it could not see the new value of
dev->promiscuity to update the HW.
The code that updates the promisc mode it would not notify in case
the promiscuity is changed and because the port has already the flag
IFF_PROMISC it would not notify any listeners because the flags are not
changed.
Here [1] is the code that I refer to.
[1] https://elixir.bootlin.com/linux/latest/source/net/core/dev.c#L7592
>
> Thanks,
> Ivan
>
--
/Horatiu
^ permalink raw reply
* [PATCH net] rxrpc: Fix lack of conn cleanup when local endpoint is cleaned up [ver #2]
From: David Howells @ 2019-08-29 13:12 UTC (permalink / raw)
To: netdev; +Cc: dhowells, marc.dionne, linux-afs, linux-kernel
When a local endpoint is ceases to be in use, such as when the kafs module
is unloaded, the kernel will emit an assertion failure if there are any
outstanding client connections:
rxrpc: Assertion failed
------------[ cut here ]------------
kernel BUG at net/rxrpc/local_object.c:433!
and even beyond that, will evince other oopses if there are service
connections still present.
Fix this by:
(1) Removing the triggering of connection reaping when an rxrpc socket is
released. These don't actually clean up the connections anyway - and
further, the local endpoint may still be in use through another
socket.
(2) Mark the local endpoint as dead when we start the process of tearing
it down.
(3) When destroying a local endpoint, strip all of its client connections
from the idle list and discard the ref on each that the list was
holding.
(4) When destroying a local endpoint, call the service connection reaper
directly (rather than through a workqueue) to immediately kill off all
outstanding service connections.
(5) Make the service connection reaper reap connections for which the
local endpoint is marked dead.
Only after destroying the connections can we close the socket lest we get
an oops in a workqueue that's looking at a connection or a peer.
Fixes: 3d18cbb7fd0c ("rxrpc: Fix conn expiry timers")
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
---
net/rxrpc/af_rxrpc.c | 3 ---
net/rxrpc/ar-internal.h | 1 +
net/rxrpc/conn_client.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
net/rxrpc/conn_object.c | 2 +-
net/rxrpc/local_object.c | 5 ++++-
5 files changed, 50 insertions(+), 5 deletions(-)
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 0dbbfd1b6487..d72ddb67bb74 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -862,7 +862,6 @@ static void rxrpc_sock_destructor(struct sock *sk)
static int rxrpc_release_sock(struct sock *sk)
{
struct rxrpc_sock *rx = rxrpc_sk(sk);
- struct rxrpc_net *rxnet = rxrpc_net(sock_net(&rx->sk));
_enter("%p{%d,%d}", sk, sk->sk_state, refcount_read(&sk->sk_refcnt));
@@ -898,8 +897,6 @@ static int rxrpc_release_sock(struct sock *sk)
rxrpc_release_calls_on_socket(rx);
flush_workqueue(rxrpc_workqueue);
rxrpc_purge_queue(&sk->sk_receive_queue);
- rxrpc_queue_work(&rxnet->service_conn_reaper);
- rxrpc_queue_work(&rxnet->client_conn_reaper);
rxrpc_unuse_local(rx->local);
rx->local = NULL;
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 852e58781fda..8051dfdcf26d 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -910,6 +910,7 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *);
void rxrpc_put_client_conn(struct rxrpc_connection *);
void rxrpc_discard_expired_client_conns(struct work_struct *);
void rxrpc_destroy_all_client_connections(struct rxrpc_net *);
+void rxrpc_clean_up_local_conns(struct rxrpc_local *);
/*
* conn_event.c
diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index aea82f909c60..3f1da1b49f69 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -1162,3 +1162,47 @@ void rxrpc_destroy_all_client_connections(struct rxrpc_net *rxnet)
_leave("");
}
+
+/*
+ * Clean up the client connections on a local endpoint.
+ */
+void rxrpc_clean_up_local_conns(struct rxrpc_local *local)
+{
+ struct rxrpc_connection *conn, *tmp;
+ struct rxrpc_net *rxnet = local->rxnet;
+ unsigned int nr_active;
+ LIST_HEAD(graveyard);
+
+ _enter("");
+
+ spin_lock(&rxnet->client_conn_cache_lock);
+ nr_active = rxnet->nr_active_client_conns;
+
+ list_for_each_entry_safe(conn, tmp, &rxnet->idle_client_conns,
+ cache_link) {
+ if (conn->params.local == local) {
+ ASSERTCMP(conn->cache_state, ==, RXRPC_CONN_CLIENT_IDLE);
+
+ trace_rxrpc_client(conn, -1, rxrpc_client_discard);
+ if (!test_and_clear_bit(RXRPC_CONN_EXPOSED, &conn->flags))
+ BUG();
+ conn->cache_state = RXRPC_CONN_CLIENT_INACTIVE;
+ list_move(&conn->cache_link, &graveyard);
+ nr_active--;
+ }
+ }
+
+ rxnet->nr_active_client_conns = nr_active;
+ spin_unlock(&rxnet->client_conn_cache_lock);
+ ASSERTCMP(nr_active, >=, 0);
+
+ while (!list_empty(&graveyard)) {
+ conn = list_entry(graveyard.next,
+ struct rxrpc_connection, cache_link);
+ list_del_init(&conn->cache_link);
+
+ rxrpc_put_connection(conn);
+ }
+
+ _leave(" [culled]");
+}
diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
index 434ef392212b..ed05b6922132 100644
--- a/net/rxrpc/conn_object.c
+++ b/net/rxrpc/conn_object.c
@@ -398,7 +398,7 @@ void rxrpc_service_connection_reaper(struct work_struct *work)
if (conn->state == RXRPC_CONN_SERVICE_PREALLOC)
continue;
- if (rxnet->live) {
+ if (rxnet->live && !conn->params.local->dead) {
idle_timestamp = READ_ONCE(conn->idle_timestamp);
expire_at = idle_timestamp + rxrpc_connection_expiry * HZ;
if (conn->params.local->service_closed)
diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index 72a6e12a9304..36587260cabd 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -426,11 +426,14 @@ static void rxrpc_local_destroyer(struct rxrpc_local *local)
_enter("%d", local->debug_id);
+ local->dead = true;
+
mutex_lock(&rxnet->local_mutex);
list_del_init(&local->link);
mutex_unlock(&rxnet->local_mutex);
- ASSERT(RB_EMPTY_ROOT(&local->client_conns));
+ rxrpc_clean_up_local_conns(local);
+ rxrpc_service_connection_reaper(&rxnet->service_conn_reaper);
ASSERT(!local->service);
if (socket) {
^ permalink raw reply related
* Re: [PATCH net 0/5] rxrpc: Fix use of skb_cow_data()
From: David Howells @ 2019-08-29 13:11 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156708397261.25861.2158085372781699276.stgit@warthog.procyon.org.uk>
Apologies, I started the patch selection in the wrong place and missed a
couple. Will resend.
David
^ permalink raw reply
* [PATCH net 7/7] rxrpc: Use skb_unshare() rather than skb_cow_data()
From: David Howells @ 2019-08-29 13:08 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156708405310.26102.7954021163316252673.stgit@warthog.procyon.org.uk>
The in-place decryption routines in AF_RXRPC's rxkad security module
currently call skb_cow_data() to make sure the data isn't shared and that
the skb can be written over. This has a problem, however, as the softirq
handler may be still holding a ref or the Rx ring may be holding multiple
refs when skb_cow_data() is called in rxkad_verify_packet() - and so
skb_shared() returns true and __pskb_pull_tail() dislikes that. If this
occurs, something like the following report will be generated.
kernel BUG at net/core/skbuff.c:1463!
...
RIP: 0010:pskb_expand_head+0x253/0x2b0
...
Call Trace:
__pskb_pull_tail+0x49/0x460
skb_cow_data+0x6f/0x300
rxkad_verify_packet+0x18b/0xb10 [rxrpc]
rxrpc_recvmsg_data.isra.11+0x4a8/0xa10 [rxrpc]
rxrpc_kernel_recv_data+0x126/0x240 [rxrpc]
afs_extract_data+0x51/0x2d0 [kafs]
afs_deliver_fs_fetch_data+0x188/0x400 [kafs]
afs_deliver_to_call+0xac/0x430 [kafs]
afs_wait_for_call_to_complete+0x22f/0x3d0 [kafs]
afs_make_call+0x282/0x3f0 [kafs]
afs_fs_fetch_data+0x164/0x300 [kafs]
afs_fetch_data+0x54/0x130 [kafs]
afs_readpages+0x20d/0x340 [kafs]
read_pages+0x66/0x180
__do_page_cache_readahead+0x188/0x1a0
ondemand_readahead+0x17d/0x2e0
generic_file_read_iter+0x740/0xc10
__vfs_read+0x145/0x1a0
vfs_read+0x8c/0x140
ksys_read+0x4a/0xb0
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fix this by using skb_unshare() instead in the input path for DATA packets
that have a security index != 0. Non-DATA packets don't need in-place
encryption and neither do unencrypted DATA packets.
Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Reported-by: Julian Wollrath <jwollrath@web.de>
Signed-off-by: David Howells <dhowells@redhat.com>
---
include/trace/events/rxrpc.h | 12 ++++++++----
net/rxrpc/ar-internal.h | 1 +
net/rxrpc/input.c | 18 ++++++++++++++++++
net/rxrpc/rxkad.c | 32 +++++++++-----------------------
net/rxrpc/skbuff.c | 25 ++++++++++++++++++++-----
5 files changed, 56 insertions(+), 32 deletions(-)
diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index e2356c51883b..a13a62db3565 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -32,6 +32,8 @@ enum rxrpc_skb_trace {
rxrpc_skb_received,
rxrpc_skb_rotated,
rxrpc_skb_seen,
+ rxrpc_skb_unshared,
+ rxrpc_skb_unshared_nomem,
};
enum rxrpc_local_trace {
@@ -231,7 +233,9 @@ enum rxrpc_tx_point {
EM(rxrpc_skb_purged, "PUR") \
EM(rxrpc_skb_received, "RCV") \
EM(rxrpc_skb_rotated, "ROT") \
- E_(rxrpc_skb_seen, "SEE")
+ EM(rxrpc_skb_seen, "SEE") \
+ EM(rxrpc_skb_unshared, "UNS") \
+ E_(rxrpc_skb_unshared_nomem, "US0")
#define rxrpc_local_traces \
EM(rxrpc_local_got, "GOT") \
@@ -633,9 +637,9 @@ TRACE_EVENT(rxrpc_call,
TRACE_EVENT(rxrpc_skb,
TP_PROTO(struct sk_buff *skb, enum rxrpc_skb_trace op,
- int usage, int mod_count, const void *where),
+ int usage, int mod_count, u8 flags, const void *where),
- TP_ARGS(skb, op, usage, mod_count, where),
+ TP_ARGS(skb, op, usage, mod_count, flags, where),
TP_STRUCT__entry(
__field(struct sk_buff *, skb )
@@ -648,7 +652,7 @@ TRACE_EVENT(rxrpc_skb,
TP_fast_assign(
__entry->skb = skb;
- __entry->flags = rxrpc_skb(skb)->rx_flags;
+ __entry->flags = flags;
__entry->op = op;
__entry->usage = usage;
__entry->mod_count = mod_count;
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 2d5294f3e62f..852e58781fda 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -1110,6 +1110,7 @@ void rxrpc_kernel_data_consumed(struct rxrpc_call *, struct sk_buff *);
void rxrpc_packet_destructor(struct sk_buff *);
void rxrpc_new_skb(struct sk_buff *, enum rxrpc_skb_trace);
void rxrpc_see_skb(struct sk_buff *, enum rxrpc_skb_trace);
+void rxrpc_eaten_skb(struct sk_buff *, enum rxrpc_skb_trace);
void rxrpc_get_skb(struct sk_buff *, enum rxrpc_skb_trace);
void rxrpc_free_skb(struct sk_buff *, enum rxrpc_skb_trace);
void rxrpc_purge_queue(struct sk_buff_head *);
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 31090bdf1fae..d122c53c8697 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1249,6 +1249,24 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
goto bad_message;
if (!rxrpc_validate_data(skb))
goto bad_message;
+
+ /* Unshare the packet so that it can be modified for in-place
+ * decryption.
+ */
+ if (sp->hdr.securityIndex != 0) {
+ struct sk_buff *nskb = skb_unshare(skb, GFP_ATOMIC);
+ if (!nskb) {
+ rxrpc_eaten_skb(skb, rxrpc_skb_unshared_nomem);
+ goto out;
+ }
+
+ if (nskb != skb) {
+ rxrpc_eaten_skb(skb, rxrpc_skb_received);
+ rxrpc_new_skb(skb, rxrpc_skb_unshared);
+ skb = nskb;
+ sp = rxrpc_skb(skb);
+ }
+ }
break;
case RXRPC_PACKET_TYPE_CHALLENGE:
diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index ae8cd8926456..c60c520fde7c 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -187,10 +187,8 @@ static int rxkad_secure_packet_encrypt(const struct rxrpc_call *call,
struct rxrpc_skb_priv *sp;
struct rxrpc_crypt iv;
struct scatterlist sg[16];
- struct sk_buff *trailer;
unsigned int len;
u16 check;
- int nsg;
int err;
sp = rxrpc_skb(skb);
@@ -214,15 +212,14 @@ static int rxkad_secure_packet_encrypt(const struct rxrpc_call *call,
crypto_skcipher_encrypt(req);
/* we want to encrypt the skbuff in-place */
- nsg = skb_cow_data(skb, 0, &trailer);
- err = -ENOMEM;
- if (nsg < 0 || nsg > 16)
+ err = -EMSGSIZE;
+ if (skb_shinfo(skb)->nr_frags > 16)
goto out;
len = data_size + call->conn->size_align - 1;
len &= ~(call->conn->size_align - 1);
- sg_init_table(sg, nsg);
+ sg_init_table(sg, ARRAY_SIZE(sg));
err = skb_to_sgvec(skb, sg, 0, len);
if (unlikely(err < 0))
goto out;
@@ -319,11 +316,10 @@ static int rxkad_verify_packet_1(struct rxrpc_call *call, struct sk_buff *skb,
struct rxkad_level1_hdr sechdr;
struct rxrpc_crypt iv;
struct scatterlist sg[16];
- struct sk_buff *trailer;
bool aborted;
u32 data_size, buf;
u16 check;
- int nsg, ret;
+ int ret;
_enter("");
@@ -336,11 +332,7 @@ static int rxkad_verify_packet_1(struct rxrpc_call *call, struct sk_buff *skb,
/* Decrypt the skbuff in-place. TODO: We really want to decrypt
* directly into the target buffer.
*/
- nsg = skb_cow_data(skb, 0, &trailer);
- if (nsg < 0 || nsg > 16)
- goto nomem;
-
- sg_init_table(sg, nsg);
+ sg_init_table(sg, ARRAY_SIZE(sg));
ret = skb_to_sgvec(skb, sg, offset, 8);
if (unlikely(ret < 0))
return ret;
@@ -388,10 +380,6 @@ static int rxkad_verify_packet_1(struct rxrpc_call *call, struct sk_buff *skb,
if (aborted)
rxrpc_send_abort_packet(call);
return -EPROTO;
-
-nomem:
- _leave(" = -ENOMEM");
- return -ENOMEM;
}
/*
@@ -406,7 +394,6 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
struct rxkad_level2_hdr sechdr;
struct rxrpc_crypt iv;
struct scatterlist _sg[4], *sg;
- struct sk_buff *trailer;
bool aborted;
u32 data_size, buf;
u16 check;
@@ -423,12 +410,11 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
/* Decrypt the skbuff in-place. TODO: We really want to decrypt
* directly into the target buffer.
*/
- nsg = skb_cow_data(skb, 0, &trailer);
- if (nsg < 0)
- goto nomem;
-
sg = _sg;
- if (unlikely(nsg > 4)) {
+ nsg = skb_shinfo(skb)->nr_frags;
+ if (nsg <= 4) {
+ nsg = 4;
+ } else {
sg = kmalloc_array(nsg, sizeof(*sg), GFP_NOIO);
if (!sg)
goto nomem;
diff --git a/net/rxrpc/skbuff.c b/net/rxrpc/skbuff.c
index 8e6f45f84b9b..0348d2bf6f7d 100644
--- a/net/rxrpc/skbuff.c
+++ b/net/rxrpc/skbuff.c
@@ -24,7 +24,8 @@ void rxrpc_new_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
{
const void *here = __builtin_return_address(0);
int n = atomic_inc_return(select_skb_count(skb));
- trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
+ trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n,
+ rxrpc_skb(skb)->rx_flags, here);
}
/*
@@ -35,7 +36,8 @@ void rxrpc_see_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
const void *here = __builtin_return_address(0);
if (skb) {
int n = atomic_read(select_skb_count(skb));
- trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
+ trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n,
+ rxrpc_skb(skb)->rx_flags, here);
}
}
@@ -46,10 +48,21 @@ void rxrpc_get_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
{
const void *here = __builtin_return_address(0);
int n = atomic_inc_return(select_skb_count(skb));
- trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
+ trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n,
+ rxrpc_skb(skb)->rx_flags, here);
skb_get(skb);
}
+/*
+ * Note the dropping of a ref on a socket buffer by the core.
+ */
+void rxrpc_eaten_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
+{
+ const void *here = __builtin_return_address(0);
+ int n = atomic_inc_return(&rxrpc_n_rx_skbs);
+ trace_rxrpc_skb(skb, op, 0, n, 0, here);
+}
+
/*
* Note the destruction of a socket buffer.
*/
@@ -60,7 +73,8 @@ void rxrpc_free_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
int n;
CHECK_SLAB_OKAY(&skb->users);
n = atomic_dec_return(select_skb_count(skb));
- trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
+ trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n,
+ rxrpc_skb(skb)->rx_flags, here);
kfree_skb(skb);
}
}
@@ -75,7 +89,8 @@ void rxrpc_purge_queue(struct sk_buff_head *list)
while ((skb = skb_dequeue((list))) != NULL) {
int n = atomic_dec_return(select_skb_count(skb));
trace_rxrpc_skb(skb, rxrpc_skb_purged,
- refcount_read(&skb->users), n, here);
+ refcount_read(&skb->users), n,
+ rxrpc_skb(skb)->rx_flags, here);
kfree_skb(skb);
}
}
^ permalink raw reply related
* [PATCH net 6/7] rxrpc: Use the tx-phase skb flag to simplify tracing
From: David Howells @ 2019-08-29 13:08 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156708405310.26102.7954021163316252673.stgit@warthog.procyon.org.uk>
Use the previously-added transmit-phase skbuff private flag to simplify the
socket buffer tracing a bit. Which phase the skbuff comes from can now be
divined from the skb rather than having to be guessed from the call state.
We can also reduce the number of rxrpc_skb_trace values by eliminating the
difference between Tx and Rx in the symbols.
Signed-off-by: David Howells <dhowells@redhat.com>
---
include/trace/events/rxrpc.h | 51 ++++++++++++++++++------------------------
net/rxrpc/ar-internal.h | 1 +
net/rxrpc/call_event.c | 8 +++----
net/rxrpc/call_object.c | 6 ++---
net/rxrpc/conn_event.c | 6 ++---
net/rxrpc/input.c | 22 +++++++++---------
net/rxrpc/local_event.c | 4 ++-
net/rxrpc/output.c | 6 ++---
net/rxrpc/peer_event.c | 10 ++++----
net/rxrpc/recvmsg.c | 6 ++---
net/rxrpc/sendmsg.c | 10 ++++----
net/rxrpc/skbuff.c | 15 +++++++-----
12 files changed, 69 insertions(+), 76 deletions(-)
diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index fa06b528c73c..e2356c51883b 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -23,20 +23,15 @@
#define __RXRPC_DECLARE_TRACE_ENUMS_ONCE_ONLY
enum rxrpc_skb_trace {
- rxrpc_skb_rx_cleaned,
- rxrpc_skb_rx_freed,
- rxrpc_skb_rx_got,
- rxrpc_skb_rx_lost,
- rxrpc_skb_rx_purged,
- rxrpc_skb_rx_received,
- rxrpc_skb_rx_rotated,
- rxrpc_skb_rx_seen,
- rxrpc_skb_tx_cleaned,
- rxrpc_skb_tx_freed,
- rxrpc_skb_tx_got,
- rxrpc_skb_tx_new,
- rxrpc_skb_tx_rotated,
- rxrpc_skb_tx_seen,
+ rxrpc_skb_cleaned,
+ rxrpc_skb_freed,
+ rxrpc_skb_got,
+ rxrpc_skb_lost,
+ rxrpc_skb_new,
+ rxrpc_skb_purged,
+ rxrpc_skb_received,
+ rxrpc_skb_rotated,
+ rxrpc_skb_seen,
};
enum rxrpc_local_trace {
@@ -228,20 +223,15 @@ enum rxrpc_tx_point {
* Declare tracing information enums and their string mappings for display.
*/
#define rxrpc_skb_traces \
- EM(rxrpc_skb_rx_cleaned, "Rx CLN") \
- EM(rxrpc_skb_rx_freed, "Rx FRE") \
- EM(rxrpc_skb_rx_got, "Rx GOT") \
- EM(rxrpc_skb_rx_lost, "Rx *L*") \
- EM(rxrpc_skb_rx_purged, "Rx PUR") \
- EM(rxrpc_skb_rx_received, "Rx RCV") \
- EM(rxrpc_skb_rx_rotated, "Rx ROT") \
- EM(rxrpc_skb_rx_seen, "Rx SEE") \
- EM(rxrpc_skb_tx_cleaned, "Tx CLN") \
- EM(rxrpc_skb_tx_freed, "Tx FRE") \
- EM(rxrpc_skb_tx_got, "Tx GOT") \
- EM(rxrpc_skb_tx_new, "Tx NEW") \
- EM(rxrpc_skb_tx_rotated, "Tx ROT") \
- E_(rxrpc_skb_tx_seen, "Tx SEE")
+ EM(rxrpc_skb_cleaned, "CLN") \
+ EM(rxrpc_skb_freed, "FRE") \
+ EM(rxrpc_skb_got, "GOT") \
+ EM(rxrpc_skb_lost, "*L*") \
+ EM(rxrpc_skb_new, "NEW") \
+ EM(rxrpc_skb_purged, "PUR") \
+ EM(rxrpc_skb_received, "RCV") \
+ EM(rxrpc_skb_rotated, "ROT") \
+ E_(rxrpc_skb_seen, "SEE")
#define rxrpc_local_traces \
EM(rxrpc_local_got, "GOT") \
@@ -650,6 +640,7 @@ TRACE_EVENT(rxrpc_skb,
TP_STRUCT__entry(
__field(struct sk_buff *, skb )
__field(enum rxrpc_skb_trace, op )
+ __field(u8, flags )
__field(int, usage )
__field(int, mod_count )
__field(const void *, where )
@@ -657,14 +648,16 @@ TRACE_EVENT(rxrpc_skb,
TP_fast_assign(
__entry->skb = skb;
+ __entry->flags = rxrpc_skb(skb)->rx_flags;
__entry->op = op;
__entry->usage = usage;
__entry->mod_count = mod_count;
__entry->where = where;
),
- TP_printk("s=%p %s u=%d m=%d p=%pSR",
+ TP_printk("s=%p %cx %s u=%d m=%d p=%pSR",
__entry->skb,
+ __entry->flags & RXRPC_SKB_TX_BUFFER ? 'T' : 'R',
__print_symbolic(__entry->op, rxrpc_skb_traces),
__entry->usage,
__entry->mod_count,
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 63d3a91ce5e9..2d5294f3e62f 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -185,6 +185,7 @@ struct rxrpc_host_header {
* - max 48 bytes (struct sk_buff::cb)
*/
struct rxrpc_skb_priv {
+ atomic_t nr_ring_pins; /* Number of rxtx ring pins */
u8 nr_subpackets; /* Number of subpackets */
u8 rx_flags; /* Received packet flags */
#define RXRPC_SKB_INCL_LAST 0x01 /* - Includes last packet */
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index c767679bfa5d..cedbbb3a7c2e 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -199,7 +199,7 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j)
continue;
skb = call->rxtx_buffer[ix];
- rxrpc_see_skb(skb, rxrpc_skb_tx_seen);
+ rxrpc_see_skb(skb, rxrpc_skb_seen);
if (anno_type == RXRPC_TX_ANNO_UNACK) {
if (ktime_after(skb->tstamp, max_age)) {
@@ -255,18 +255,18 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j)
continue;
skb = call->rxtx_buffer[ix];
- rxrpc_get_skb(skb, rxrpc_skb_tx_got);
+ rxrpc_get_skb(skb, rxrpc_skb_got);
spin_unlock_bh(&call->lock);
if (rxrpc_send_data_packet(call, skb, true) < 0) {
- rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
return;
}
if (rxrpc_is_client_call(call))
rxrpc_expose_client_call(call);
- rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
spin_lock_bh(&call->lock);
/* We need to clear the retransmit state, but there are two
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index c9ab2da957fe..014548c259ce 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -429,9 +429,7 @@ static void rxrpc_cleanup_ring(struct rxrpc_call *call)
int i;
for (i = 0; i < RXRPC_RXTX_BUFF_SIZE; i++) {
- rxrpc_free_skb(call->rxtx_buffer[i],
- (call->tx_phase ? rxrpc_skb_tx_cleaned :
- rxrpc_skb_rx_cleaned));
+ rxrpc_free_skb(call->rxtx_buffer[i], rxrpc_skb_cleaned);
call->rxtx_buffer[i] = NULL;
}
}
@@ -587,7 +585,7 @@ void rxrpc_cleanup_call(struct rxrpc_call *call)
ASSERTCMP(call->conn, ==, NULL);
rxrpc_cleanup_ring(call);
- rxrpc_free_skb(call->tx_pending, rxrpc_skb_tx_cleaned);
+ rxrpc_free_skb(call->tx_pending, rxrpc_skb_cleaned);
call_rcu(&call->rcu, rxrpc_rcu_destroy_call);
}
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index df6624c140be..a1ceef4f5cd0 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -472,7 +472,7 @@ void rxrpc_process_connection(struct work_struct *work)
/* go through the conn-level event packets, releasing the ref on this
* connection that each one has when we've finished with it */
while ((skb = skb_dequeue(&conn->rx_queue))) {
- rxrpc_see_skb(skb, rxrpc_skb_rx_seen);
+ rxrpc_see_skb(skb, rxrpc_skb_seen);
ret = rxrpc_process_event(conn, skb, &abort_code);
switch (ret) {
case -EPROTO:
@@ -484,7 +484,7 @@ void rxrpc_process_connection(struct work_struct *work)
goto requeue_and_leave;
case -ECONNABORTED:
default:
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
break;
}
}
@@ -501,6 +501,6 @@ void rxrpc_process_connection(struct work_struct *work)
protocol_error:
if (rxrpc_abort_connection(conn, ret, abort_code) < 0)
goto requeue_and_leave;
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
goto out;
}
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 140cede77655..31090bdf1fae 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -233,7 +233,7 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to,
ix = call->tx_hard_ack & RXRPC_RXTX_BUFF_MASK;
skb = call->rxtx_buffer[ix];
annotation = call->rxtx_annotations[ix];
- rxrpc_see_skb(skb, rxrpc_skb_tx_rotated);
+ rxrpc_see_skb(skb, rxrpc_skb_rotated);
call->rxtx_buffer[ix] = NULL;
call->rxtx_annotations[ix] = 0;
skb->next = list;
@@ -258,7 +258,7 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to,
skb = list;
list = skb->next;
skb_mark_not_on_list(skb);
- rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
}
return rot_last;
@@ -443,7 +443,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
state = READ_ONCE(call->state);
if (state >= RXRPC_CALL_COMPLETE) {
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
return;
}
@@ -559,7 +559,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
* and also rxrpc_fill_out_ack().
*/
if (!terminal)
- rxrpc_get_skb(skb, rxrpc_skb_rx_got);
+ rxrpc_get_skb(skb, rxrpc_skb_got);
call->rxtx_annotations[ix] = annotation;
smp_wmb();
call->rxtx_buffer[ix] = skb;
@@ -620,7 +620,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
unlock:
spin_unlock(&call->input_lock);
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
_leave(" [queued]");
}
@@ -1056,7 +1056,7 @@ static void rxrpc_input_call_packet(struct rxrpc_call *call,
break;
}
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
no_free:
_leave("");
}
@@ -1119,7 +1119,7 @@ static void rxrpc_post_packet_to_local(struct rxrpc_local *local,
skb_queue_tail(&local->event_queue, skb);
rxrpc_queue_local(local);
} else {
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
}
}
@@ -1134,7 +1134,7 @@ static void rxrpc_reject_packet(struct rxrpc_local *local, struct sk_buff *skb)
skb_queue_tail(&local->reject_queue, skb);
rxrpc_queue_local(local);
} else {
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
}
}
@@ -1198,7 +1198,7 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
if (skb->tstamp == 0)
skb->tstamp = ktime_get_real();
- rxrpc_new_skb(skb, rxrpc_skb_rx_received);
+ rxrpc_new_skb(skb, rxrpc_skb_received);
skb_pull(skb, sizeof(struct udphdr));
@@ -1215,7 +1215,7 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
static int lose;
if ((lose++ & 7) == 7) {
trace_rxrpc_rx_lose(sp);
- rxrpc_free_skb(skb, rxrpc_skb_rx_lost);
+ rxrpc_free_skb(skb, rxrpc_skb_lost);
return 0;
}
}
@@ -1389,7 +1389,7 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
goto out;
discard:
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
out:
trace_rxrpc_rx_done(0, 0);
return 0;
diff --git a/net/rxrpc/local_event.c b/net/rxrpc/local_event.c
index e93a78f7c05e..3ce6d628cd75 100644
--- a/net/rxrpc/local_event.c
+++ b/net/rxrpc/local_event.c
@@ -90,7 +90,7 @@ void rxrpc_process_local_events(struct rxrpc_local *local)
if (skb) {
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
- rxrpc_see_skb(skb, rxrpc_skb_rx_seen);
+ rxrpc_see_skb(skb, rxrpc_skb_seen);
_debug("{%d},{%u}", local->debug_id, sp->hdr.type);
switch (sp->hdr.type) {
@@ -108,7 +108,7 @@ void rxrpc_process_local_events(struct rxrpc_local *local)
break;
}
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
}
_leave("");
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 369e516c4bdf..935bb60fff56 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -565,7 +565,7 @@ void rxrpc_reject_packets(struct rxrpc_local *local)
memset(&whdr, 0, sizeof(whdr));
while ((skb = skb_dequeue(&local->reject_queue))) {
- rxrpc_see_skb(skb, rxrpc_skb_rx_seen);
+ rxrpc_see_skb(skb, rxrpc_skb_seen);
sp = rxrpc_skb(skb);
switch (skb->mark) {
@@ -581,7 +581,7 @@ void rxrpc_reject_packets(struct rxrpc_local *local)
ioc = 2;
break;
default:
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
continue;
}
@@ -606,7 +606,7 @@ void rxrpc_reject_packets(struct rxrpc_local *local)
rxrpc_tx_point_reject);
}
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
}
_leave("");
diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c
index 7666ec72d37e..c97ebdc043e4 100644
--- a/net/rxrpc/peer_event.c
+++ b/net/rxrpc/peer_event.c
@@ -163,11 +163,11 @@ void rxrpc_error_report(struct sock *sk)
_leave("UDP socket errqueue empty");
return;
}
- rxrpc_new_skb(skb, rxrpc_skb_rx_received);
+ rxrpc_new_skb(skb, rxrpc_skb_received);
serr = SKB_EXT_ERR(skb);
if (!skb->len && serr->ee.ee_origin == SO_EE_ORIGIN_TIMESTAMPING) {
_leave("UDP empty message");
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
return;
}
@@ -177,7 +177,7 @@ void rxrpc_error_report(struct sock *sk)
peer = NULL;
if (!peer) {
rcu_read_unlock();
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
_leave(" [no peer]");
return;
}
@@ -189,7 +189,7 @@ void rxrpc_error_report(struct sock *sk)
serr->ee.ee_code == ICMP_FRAG_NEEDED)) {
rxrpc_adjust_mtu(peer, serr);
rcu_read_unlock();
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
rxrpc_put_peer(peer);
_leave(" [MTU update]");
return;
@@ -197,7 +197,7 @@ void rxrpc_error_report(struct sock *sk)
rxrpc_store_error(peer, serr);
rcu_read_unlock();
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
rxrpc_put_peer(peer);
_leave("");
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index e49eacfaf4d6..3b0becb12041 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -190,7 +190,7 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
hard_ack++;
ix = hard_ack & RXRPC_RXTX_BUFF_MASK;
skb = call->rxtx_buffer[ix];
- rxrpc_see_skb(skb, rxrpc_skb_rx_rotated);
+ rxrpc_see_skb(skb, rxrpc_skb_rotated);
sp = rxrpc_skb(skb);
subpacket = call->rxtx_annotations[ix] & RXRPC_RX_ANNO_SUBPACKET;
@@ -205,7 +205,7 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
/* Barrier against rxrpc_input_data(). */
smp_store_release(&call->rx_hard_ack, hard_ack);
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
trace_rxrpc_receive(call, rxrpc_receive_rotate, serial, hard_ack);
if (last) {
@@ -340,7 +340,7 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct rxrpc_call *call,
break;
}
smp_rmb();
- rxrpc_see_skb(skb, rxrpc_skb_rx_seen);
+ rxrpc_see_skb(skb, rxrpc_skb_seen);
sp = rxrpc_skb(skb);
if (!(flags & MSG_PEEK)) {
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 472dc3b7d91f..6a1547b270fe 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -176,7 +176,7 @@ static int rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call,
skb->tstamp = ktime_get_real();
ix = seq & RXRPC_RXTX_BUFF_MASK;
- rxrpc_get_skb(skb, rxrpc_skb_tx_got);
+ rxrpc_get_skb(skb, rxrpc_skb_got);
call->rxtx_annotations[ix] = annotation;
smp_wmb();
call->rxtx_buffer[ix] = skb;
@@ -248,7 +248,7 @@ static int rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call,
}
out:
- rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
_leave(" = %d", ret);
return ret;
}
@@ -289,7 +289,7 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
skb = call->tx_pending;
call->tx_pending = NULL;
- rxrpc_see_skb(skb, rxrpc_skb_tx_seen);
+ rxrpc_see_skb(skb, rxrpc_skb_seen);
copied = 0;
do {
@@ -338,7 +338,7 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
sp = rxrpc_skb(skb);
sp->rx_flags |= RXRPC_SKB_TX_BUFFER;
- rxrpc_new_skb(skb, rxrpc_skb_tx_new);
+ rxrpc_new_skb(skb, rxrpc_skb_new);
_debug("ALLOC SEND %p", skb);
@@ -440,7 +440,7 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
return ret;
call_terminated:
- rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
_leave(" = %d", call->error);
return call->error;
diff --git a/net/rxrpc/skbuff.c b/net/rxrpc/skbuff.c
index 9ad5045b7c2f..8e6f45f84b9b 100644
--- a/net/rxrpc/skbuff.c
+++ b/net/rxrpc/skbuff.c
@@ -14,7 +14,8 @@
#include <net/af_rxrpc.h>
#include "ar-internal.h"
-#define select_skb_count(op) (op >= rxrpc_skb_tx_cleaned ? &rxrpc_n_tx_skbs : &rxrpc_n_rx_skbs)
+#define is_tx_skb(skb) (rxrpc_skb(skb)->rx_flags & RXRPC_SKB_TX_BUFFER)
+#define select_skb_count(skb) (is_tx_skb(skb) ? &rxrpc_n_tx_skbs : &rxrpc_n_rx_skbs)
/*
* Note the allocation or reception of a socket buffer.
@@ -22,7 +23,7 @@
void rxrpc_new_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
{
const void *here = __builtin_return_address(0);
- int n = atomic_inc_return(select_skb_count(op));
+ int n = atomic_inc_return(select_skb_count(skb));
trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
}
@@ -33,7 +34,7 @@ void rxrpc_see_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
{
const void *here = __builtin_return_address(0);
if (skb) {
- int n = atomic_read(select_skb_count(op));
+ int n = atomic_read(select_skb_count(skb));
trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
}
}
@@ -44,7 +45,7 @@ void rxrpc_see_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
void rxrpc_get_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
{
const void *here = __builtin_return_address(0);
- int n = atomic_inc_return(select_skb_count(op));
+ int n = atomic_inc_return(select_skb_count(skb));
trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
skb_get(skb);
}
@@ -58,7 +59,7 @@ void rxrpc_free_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
if (skb) {
int n;
CHECK_SLAB_OKAY(&skb->users);
- n = atomic_dec_return(select_skb_count(op));
+ n = atomic_dec_return(select_skb_count(skb));
trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
kfree_skb(skb);
}
@@ -72,8 +73,8 @@ void rxrpc_purge_queue(struct sk_buff_head *list)
const void *here = __builtin_return_address(0);
struct sk_buff *skb;
while ((skb = skb_dequeue((list))) != NULL) {
- int n = atomic_dec_return(select_skb_count(rxrpc_skb_rx_purged));
- trace_rxrpc_skb(skb, rxrpc_skb_rx_purged,
+ int n = atomic_dec_return(select_skb_count(skb));
+ trace_rxrpc_skb(skb, rxrpc_skb_purged,
refcount_read(&skb->users), n, here);
kfree_skb(skb);
}
^ permalink raw reply related
* [PATCH net 5/7] rxrpc: Add a private skb flag to indicate transmission-phase skbs
From: David Howells @ 2019-08-29 13:08 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156708405310.26102.7954021163316252673.stgit@warthog.procyon.org.uk>
Add a flag in the private data on an skbuff to indicate that this is a
transmission-phase buffer rather than a receive-phase buffer.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/ar-internal.h | 1 +
net/rxrpc/sendmsg.c | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 20d7907a5bc6..63d3a91ce5e9 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -188,6 +188,7 @@ struct rxrpc_skb_priv {
u8 nr_subpackets; /* Number of subpackets */
u8 rx_flags; /* Received packet flags */
#define RXRPC_SKB_INCL_LAST 0x01 /* - Includes last packet */
+#define RXRPC_SKB_TX_BUFFER 0x02 /* - Is transmit buffer */
union {
int remain; /* amount of space remaining for next write */
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index bae14438f869..472dc3b7d91f 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -336,6 +336,8 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
if (!skb)
goto maybe_error;
+ sp = rxrpc_skb(skb);
+ sp->rx_flags |= RXRPC_SKB_TX_BUFFER;
rxrpc_new_skb(skb, rxrpc_skb_tx_new);
_debug("ALLOC SEND %p", skb);
@@ -346,7 +348,6 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
skb_reserve(skb, call->conn->security_size);
skb->len += call->conn->security_size;
- sp = rxrpc_skb(skb);
sp->remain = chunk;
if (sp->remain > skb_tailroom(skb))
sp->remain = skb_tailroom(skb);
^ permalink raw reply related
* [PATCH net 4/7] rxrpc: Abstract out rxtx ring cleanup
From: David Howells @ 2019-08-29 13:08 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156708405310.26102.7954021163316252673.stgit@warthog.procyon.org.uk>
Abstract out rxtx ring cleanup into its own function from its two callers.
This makes it easier to apply the same changes to both.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/call_object.c | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 217b12be9e08..c9ab2da957fe 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -421,6 +421,21 @@ void rxrpc_get_call(struct rxrpc_call *call, enum rxrpc_call_trace op)
trace_rxrpc_call(call, op, n, here, NULL);
}
+/*
+ * Clean up the RxTx skb ring.
+ */
+static void rxrpc_cleanup_ring(struct rxrpc_call *call)
+{
+ int i;
+
+ for (i = 0; i < RXRPC_RXTX_BUFF_SIZE; i++) {
+ rxrpc_free_skb(call->rxtx_buffer[i],
+ (call->tx_phase ? rxrpc_skb_tx_cleaned :
+ rxrpc_skb_rx_cleaned));
+ call->rxtx_buffer[i] = NULL;
+ }
+}
+
/*
* Detach a call from its owning socket.
*/
@@ -429,7 +444,6 @@ void rxrpc_release_call(struct rxrpc_sock *rx, struct rxrpc_call *call)
const void *here = __builtin_return_address(0);
struct rxrpc_connection *conn = call->conn;
bool put = false;
- int i;
_enter("{%d,%d}", call->debug_id, atomic_read(&call->usage));
@@ -479,13 +493,7 @@ void rxrpc_release_call(struct rxrpc_sock *rx, struct rxrpc_call *call)
if (conn)
rxrpc_disconnect_call(call);
- for (i = 0; i < RXRPC_RXTX_BUFF_SIZE; i++) {
- rxrpc_free_skb(call->rxtx_buffer[i],
- (call->tx_phase ? rxrpc_skb_tx_cleaned :
- rxrpc_skb_rx_cleaned));
- call->rxtx_buffer[i] = NULL;
- }
-
+ rxrpc_cleanup_ring(call);
_leave("");
}
@@ -568,8 +576,6 @@ static void rxrpc_rcu_destroy_call(struct rcu_head *rcu)
*/
void rxrpc_cleanup_call(struct rxrpc_call *call)
{
- int i;
-
_net("DESTROY CALL %d", call->debug_id);
memset(&call->sock_node, 0xcd, sizeof(call->sock_node));
@@ -580,12 +586,7 @@ void rxrpc_cleanup_call(struct rxrpc_call *call)
ASSERT(test_bit(RXRPC_CALL_RELEASED, &call->flags));
ASSERTCMP(call->conn, ==, NULL);
- /* Clean up the Rx/Tx buffer */
- for (i = 0; i < RXRPC_RXTX_BUFF_SIZE; i++)
- rxrpc_free_skb(call->rxtx_buffer[i],
- (call->tx_phase ? rxrpc_skb_tx_cleaned :
- rxrpc_skb_rx_cleaned));
-
+ rxrpc_cleanup_ring(call);
rxrpc_free_skb(call->tx_pending, rxrpc_skb_tx_cleaned);
call_rcu(&call->rcu, rxrpc_rcu_destroy_call);
^ permalink raw reply related
* [PATCH net 3/7] rxrpc: Pass the input handler's data skb reference to the Rx ring
From: David Howells @ 2019-08-29 13:07 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156708405310.26102.7954021163316252673.stgit@warthog.procyon.org.uk>
Pass the reference held on a DATA skb in the rxrpc input handler into the
Rx ring rather than getting an additional ref for this and then dropping
the original ref at the end.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/input.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 35b1a9368d80..140cede77655 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -422,7 +422,8 @@ static void rxrpc_input_dup_data(struct rxrpc_call *call, rxrpc_seq_t seq,
}
/*
- * Process a DATA packet, adding the packet to the Rx ring.
+ * Process a DATA packet, adding the packet to the Rx ring. The caller's
+ * packet ref must be passed on or discarded.
*/
static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
{
@@ -441,8 +442,10 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
sp->hdr.serial, seq0, sp->hdr.flags, sp->nr_subpackets);
state = READ_ONCE(call->state);
- if (state >= RXRPC_CALL_COMPLETE)
+ if (state >= RXRPC_CALL_COMPLETE) {
+ rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
return;
+ }
if (call->state == RXRPC_CALL_SERVER_RECV_REQUEST) {
unsigned long timo = READ_ONCE(call->next_req_timo);
@@ -555,7 +558,8 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
* Barriers against rxrpc_recvmsg_data() and rxrpc_rotate_rx_window()
* and also rxrpc_fill_out_ack().
*/
- rxrpc_get_skb(skb, rxrpc_skb_rx_got);
+ if (!terminal)
+ rxrpc_get_skb(skb, rxrpc_skb_rx_got);
call->rxtx_annotations[ix] = annotation;
smp_wmb();
call->rxtx_buffer[ix] = skb;
@@ -616,6 +620,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
unlock:
spin_unlock(&call->input_lock);
+ rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
_leave(" [queued]");
}
@@ -1024,7 +1029,7 @@ static void rxrpc_input_call_packet(struct rxrpc_call *call,
switch (sp->hdr.type) {
case RXRPC_PACKET_TYPE_DATA:
rxrpc_input_data(call, skb);
- break;
+ goto no_free;
case RXRPC_PACKET_TYPE_ACK:
rxrpc_input_ack(call, skb);
@@ -1051,6 +1056,8 @@ static void rxrpc_input_call_packet(struct rxrpc_call *call,
break;
}
+ rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+no_free:
_leave("");
}
@@ -1375,8 +1382,11 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
mutex_unlock(&call->user_mutex);
}
+ /* Process a call packet; this either discards or passes on the ref
+ * elsewhere.
+ */
rxrpc_input_call_packet(call, skb);
- goto discard;
+ goto out;
discard:
rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
^ permalink raw reply related
* [PATCH net 2/7] rxrpc: Use info in skbuff instead of reparsing a jumbo packet
From: David Howells @ 2019-08-29 13:07 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156708405310.26102.7954021163316252673.stgit@warthog.procyon.org.uk>
Use the information now cached in the skbuff private data to avoid the need
to reparse a jumbo packet. We can find all the subpackets by dead
reckoning, so it's only necessary to note how many there are, whether the
last one is flagged as LAST_PACKET and whether any have the REQUEST_ACK
flag set.
This is necessary as once recvmsg() can see the packet, it can start
modifying it, such as doing in-place decryption.
Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/ar-internal.h | 3 -
net/rxrpc/input.c | 231 +++++++++++++++++++++++------------------------
net/rxrpc/recvmsg.c | 41 +++++---
3 files changed, 139 insertions(+), 136 deletions(-)
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 87cff6c218b6..20d7907a5bc6 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -617,8 +617,7 @@ struct rxrpc_call {
#define RXRPC_TX_ANNO_LAST 0x04
#define RXRPC_TX_ANNO_RESENT 0x08
-#define RXRPC_RX_ANNO_JUMBO 0x3f /* Jumbo subpacket number + 1 if not zero */
-#define RXRPC_RX_ANNO_JLAST 0x40 /* Set if last element of a jumbo packet */
+#define RXRPC_RX_ANNO_SUBPACKET 0x3f /* Subpacket number in jumbogram */
#define RXRPC_RX_ANNO_VERIFIED 0x80 /* Set if verified and decrypted */
rxrpc_seq_t tx_hard_ack; /* Dead slot in buffer; the first transmitted but
* not hard-ACK'd packet follows this.
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index ffcec5117954..35b1a9368d80 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -405,10 +405,10 @@ static bool rxrpc_validate_data(struct sk_buff *skb)
* (that information is encoded in the ACK packet).
*/
static void rxrpc_input_dup_data(struct rxrpc_call *call, rxrpc_seq_t seq,
- u8 annotation, bool *_jumbo_bad)
+ bool is_jumbo, bool *_jumbo_bad)
{
/* Discard normal packets that are duplicates. */
- if (annotation == 0)
+ if (is_jumbo)
return;
/* Skip jumbo subpackets that are duplicates. When we've had three or
@@ -428,19 +428,17 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
{
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
enum rxrpc_call_state state;
- unsigned int offset = sizeof(struct rxrpc_wire_header);
- unsigned int ix;
+ unsigned int j;
rxrpc_serial_t serial = sp->hdr.serial, ack_serial = 0;
- rxrpc_seq_t seq = sp->hdr.seq, hard_ack;
- bool immediate_ack = false, jumbo_bad = false, queued;
- u16 len;
- u8 ack = 0, flags, annotation = 0;
+ rxrpc_seq_t seq0 = sp->hdr.seq, hard_ack;
+ bool immediate_ack = false, jumbo_bad = false;
+ u8 ack = 0;
_enter("{%u,%u},{%u,%u}",
- call->rx_hard_ack, call->rx_top, skb->len, seq);
+ call->rx_hard_ack, call->rx_top, skb->len, seq0);
- _proto("Rx DATA %%%u { #%u f=%02x }",
- sp->hdr.serial, seq, sp->hdr.flags);
+ _proto("Rx DATA %%%u { #%u f=%02x n=%u }",
+ sp->hdr.serial, seq0, sp->hdr.flags, sp->nr_subpackets);
state = READ_ONCE(call->state);
if (state >= RXRPC_CALL_COMPLETE)
@@ -469,137 +467,136 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
!rxrpc_receiving_reply(call))
goto unlock;
- call->ackr_prev_seq = seq;
-
+ call->ackr_prev_seq = seq0;
hard_ack = READ_ONCE(call->rx_hard_ack);
- if (after(seq, hard_ack + call->rx_winsize)) {
- ack = RXRPC_ACK_EXCEEDS_WINDOW;
- ack_serial = serial;
- goto ack;
- }
- flags = sp->hdr.flags;
- if (flags & RXRPC_JUMBO_PACKET) {
+ if (sp->nr_subpackets > 1) {
if (call->nr_jumbo_bad > 3) {
ack = RXRPC_ACK_NOSPACE;
ack_serial = serial;
goto ack;
}
- annotation = 1;
}
-next_subpacket:
- queued = false;
- ix = seq & RXRPC_RXTX_BUFF_MASK;
- len = skb->len;
- if (flags & RXRPC_JUMBO_PACKET)
- len = RXRPC_JUMBO_DATALEN;
-
- if (flags & RXRPC_LAST_PACKET) {
- if (test_bit(RXRPC_CALL_RX_LAST, &call->flags) &&
- seq != call->rx_top) {
- rxrpc_proto_abort("LSN", call, seq);
- goto unlock;
- }
- } else {
- if (test_bit(RXRPC_CALL_RX_LAST, &call->flags) &&
- after_eq(seq, call->rx_top)) {
- rxrpc_proto_abort("LSA", call, seq);
- goto unlock;
+ for (j = 0; j < sp->nr_subpackets; j++) {
+ rxrpc_serial_t serial = sp->hdr.serial + j;
+ rxrpc_seq_t seq = seq0 + j;
+ unsigned int ix = seq & RXRPC_RXTX_BUFF_MASK;
+ bool terminal = (j == sp->nr_subpackets - 1);
+ bool last = terminal && (sp->rx_flags & RXRPC_SKB_INCL_LAST);
+ u8 flags, annotation = j;
+
+ _proto("Rx DATA+%u %%%u { #%x t=%u l=%u }",
+ j, serial, seq, terminal, last);
+
+ if (last) {
+ if (test_bit(RXRPC_CALL_RX_LAST, &call->flags) &&
+ seq != call->rx_top) {
+ rxrpc_proto_abort("LSN", call, seq);
+ goto unlock;
+ }
+ } else {
+ if (test_bit(RXRPC_CALL_RX_LAST, &call->flags) &&
+ after_eq(seq, call->rx_top)) {
+ rxrpc_proto_abort("LSA", call, seq);
+ goto unlock;
+ }
}
- }
-
- trace_rxrpc_rx_data(call->debug_id, seq, serial, flags, annotation);
- if (before_eq(seq, hard_ack)) {
- ack = RXRPC_ACK_DUPLICATE;
- ack_serial = serial;
- goto skip;
- }
- if (flags & RXRPC_REQUEST_ACK && !ack) {
- ack = RXRPC_ACK_REQUESTED;
- ack_serial = serial;
- }
+ flags = 0;
+ if (last)
+ flags |= RXRPC_LAST_PACKET;
+ if (!terminal)
+ flags |= RXRPC_JUMBO_PACKET;
+ if (test_bit(j, sp->rx_req_ack))
+ flags |= RXRPC_REQUEST_ACK;
+ trace_rxrpc_rx_data(call->debug_id, seq, serial, flags, annotation);
- if (call->rxtx_buffer[ix]) {
- rxrpc_input_dup_data(call, seq, annotation, &jumbo_bad);
- if (ack != RXRPC_ACK_DUPLICATE) {
+ if (before_eq(seq, hard_ack)) {
ack = RXRPC_ACK_DUPLICATE;
ack_serial = serial;
+ continue;
}
- immediate_ack = true;
- goto skip;
- }
-
- /* Queue the packet. We use a couple of memory barriers here as need
- * to make sure that rx_top is perceived to be set after the buffer
- * pointer and that the buffer pointer is set after the annotation and
- * the skb data.
- *
- * Barriers against rxrpc_recvmsg_data() and rxrpc_rotate_rx_window()
- * and also rxrpc_fill_out_ack().
- */
- rxrpc_get_skb(skb, rxrpc_skb_rx_got);
- call->rxtx_annotations[ix] = annotation;
- smp_wmb();
- call->rxtx_buffer[ix] = skb;
- if (after(seq, call->rx_top)) {
- smp_store_release(&call->rx_top, seq);
- } else if (before(seq, call->rx_top)) {
- /* Send an immediate ACK if we fill in a hole */
- if (!ack) {
- ack = RXRPC_ACK_DELAY;
- ack_serial = serial;
- }
- immediate_ack = true;
- }
- if (flags & RXRPC_LAST_PACKET) {
- set_bit(RXRPC_CALL_RX_LAST, &call->flags);
- trace_rxrpc_receive(call, rxrpc_receive_queue_last, serial, seq);
- } else {
- trace_rxrpc_receive(call, rxrpc_receive_queue, serial, seq);
- }
- queued = true;
- if (after_eq(seq, call->rx_expect_next)) {
- if (after(seq, call->rx_expect_next)) {
- _net("OOS %u > %u", seq, call->rx_expect_next);
- ack = RXRPC_ACK_OUT_OF_SEQUENCE;
- ack_serial = serial;
+ if (call->rxtx_buffer[ix]) {
+ rxrpc_input_dup_data(call, seq, sp->nr_subpackets > 1,
+ &jumbo_bad);
+ if (ack != RXRPC_ACK_DUPLICATE) {
+ ack = RXRPC_ACK_DUPLICATE;
+ ack_serial = serial;
+ }
+ immediate_ack = true;
+ continue;
}
- call->rx_expect_next = seq + 1;
- }
-skip:
- offset += len;
- if (flags & RXRPC_JUMBO_PACKET) {
- if (skb_copy_bits(skb, offset, &flags, 1) < 0) {
- rxrpc_proto_abort("XJF", call, seq);
- goto unlock;
- }
- offset += sizeof(struct rxrpc_jumbo_header);
- seq++;
- serial++;
- annotation++;
- if (flags & RXRPC_JUMBO_PACKET)
- annotation |= RXRPC_RX_ANNO_JLAST;
if (after(seq, hard_ack + call->rx_winsize)) {
ack = RXRPC_ACK_EXCEEDS_WINDOW;
ack_serial = serial;
- if (!jumbo_bad) {
- call->nr_jumbo_bad++;
- jumbo_bad = true;
+ if (flags & RXRPC_JUMBO_PACKET) {
+ if (!jumbo_bad) {
+ call->nr_jumbo_bad++;
+ jumbo_bad = true;
+ }
}
+
goto ack;
}
- _proto("Rx DATA Jumbo %%%u", serial);
- goto next_subpacket;
- }
+ if (flags & RXRPC_REQUEST_ACK && !ack) {
+ ack = RXRPC_ACK_REQUESTED;
+ ack_serial = serial;
+ }
+
+ /* Queue the packet. We use a couple of memory barriers here as need
+ * to make sure that rx_top is perceived to be set after the buffer
+ * pointer and that the buffer pointer is set after the annotation and
+ * the skb data.
+ *
+ * Barriers against rxrpc_recvmsg_data() and rxrpc_rotate_rx_window()
+ * and also rxrpc_fill_out_ack().
+ */
+ rxrpc_get_skb(skb, rxrpc_skb_rx_got);
+ call->rxtx_annotations[ix] = annotation;
+ smp_wmb();
+ call->rxtx_buffer[ix] = skb;
+ if (after(seq, call->rx_top)) {
+ smp_store_release(&call->rx_top, seq);
+ } else if (before(seq, call->rx_top)) {
+ /* Send an immediate ACK if we fill in a hole */
+ if (!ack) {
+ ack = RXRPC_ACK_DELAY;
+ ack_serial = serial;
+ }
+ immediate_ack = true;
+ }
+
+ if (terminal) {
+ /* From this point on, we're not allowed to touch the
+ * packet any longer as its ref now belongs to the Rx
+ * ring.
+ */
+ skb = NULL;
+ }
- if (queued && flags & RXRPC_LAST_PACKET && !ack) {
- ack = RXRPC_ACK_DELAY;
- ack_serial = serial;
+ if (last) {
+ set_bit(RXRPC_CALL_RX_LAST, &call->flags);
+ if (!ack) {
+ ack = RXRPC_ACK_DELAY;
+ ack_serial = serial;
+ }
+ trace_rxrpc_receive(call, rxrpc_receive_queue_last, serial, seq);
+ } else {
+ trace_rxrpc_receive(call, rxrpc_receive_queue, serial, seq);
+ }
+
+ if (after_eq(seq, call->rx_expect_next)) {
+ if (after(seq, call->rx_expect_next)) {
+ _net("OOS %u > %u", seq, call->rx_expect_next);
+ ack = RXRPC_ACK_OUT_OF_SEQUENCE;
+ ack_serial = serial;
+ }
+ call->rx_expect_next = seq + 1;
+ }
}
ack:
@@ -612,7 +609,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
false, true,
rxrpc_propose_ack_input_data);
- if (sp->hdr.seq == READ_ONCE(call->rx_hard_ack) + 1) {
+ if (seq0 == READ_ONCE(call->rx_hard_ack) + 1) {
trace_rxrpc_notify_socket(call->debug_id, serial);
rxrpc_notify_socket(call);
}
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 9a7e1bc9791d..e49eacfaf4d6 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -177,7 +177,8 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
struct sk_buff *skb;
rxrpc_serial_t serial;
rxrpc_seq_t hard_ack, top;
- u8 flags;
+ bool last = false;
+ u8 subpacket;
int ix;
_enter("%d", call->debug_id);
@@ -191,10 +192,13 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
skb = call->rxtx_buffer[ix];
rxrpc_see_skb(skb, rxrpc_skb_rx_rotated);
sp = rxrpc_skb(skb);
- flags = sp->hdr.flags;
- serial = sp->hdr.serial;
- if (call->rxtx_annotations[ix] & RXRPC_RX_ANNO_JUMBO)
- serial += (call->rxtx_annotations[ix] & RXRPC_RX_ANNO_JUMBO) - 1;
+
+ subpacket = call->rxtx_annotations[ix] & RXRPC_RX_ANNO_SUBPACKET;
+ serial = sp->hdr.serial + subpacket;
+
+ if (subpacket == sp->nr_subpackets - 1 &&
+ sp->rx_flags & RXRPC_SKB_INCL_LAST)
+ last = true;
call->rxtx_buffer[ix] = NULL;
call->rxtx_annotations[ix] = 0;
@@ -203,9 +207,8 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
- _debug("%u,%u,%02x", hard_ack, top, flags);
trace_rxrpc_receive(call, rxrpc_receive_rotate, serial, hard_ack);
- if (flags & RXRPC_LAST_PACKET) {
+ if (last) {
rxrpc_end_rx_phase(call, serial);
} else {
/* Check to see if there's an ACK that needs sending. */
@@ -233,18 +236,19 @@ static int rxrpc_verify_packet(struct rxrpc_call *call, struct sk_buff *skb,
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
rxrpc_seq_t seq = sp->hdr.seq;
u16 cksum = sp->hdr.cksum;
+ u8 subpacket = annotation & RXRPC_RX_ANNO_SUBPACKET;
_enter("");
/* For all but the head jumbo subpacket, the security checksum is in a
* jumbo header immediately prior to the data.
*/
- if ((annotation & RXRPC_RX_ANNO_JUMBO) > 1) {
+ if (subpacket > 0) {
__be16 tmp;
if (skb_copy_bits(skb, offset - 2, &tmp, 2) < 0)
BUG();
cksum = ntohs(tmp);
- seq += (annotation & RXRPC_RX_ANNO_JUMBO) - 1;
+ seq += subpacket;
}
return call->conn->security->verify_packet(call, skb, offset, len,
@@ -265,19 +269,18 @@ static int rxrpc_locate_data(struct rxrpc_call *call, struct sk_buff *skb,
u8 *_annotation,
unsigned int *_offset, unsigned int *_len)
{
+ struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
unsigned int offset = sizeof(struct rxrpc_wire_header);
unsigned int len;
int ret;
u8 annotation = *_annotation;
+ u8 subpacket = annotation & RXRPC_RX_ANNO_SUBPACKET;
/* Locate the subpacket */
+ offset += subpacket * RXRPC_JUMBO_SUBPKTLEN;
len = skb->len - offset;
- if ((annotation & RXRPC_RX_ANNO_JUMBO) > 0) {
- offset += (((annotation & RXRPC_RX_ANNO_JUMBO) - 1) *
- RXRPC_JUMBO_SUBPKTLEN);
- len = (annotation & RXRPC_RX_ANNO_JLAST) ?
- skb->len - offset : RXRPC_JUMBO_SUBPKTLEN;
- }
+ if (subpacket < sp->nr_subpackets - 1)
+ len = RXRPC_JUMBO_DATALEN;
if (!(annotation & RXRPC_RX_ANNO_VERIFIED)) {
ret = rxrpc_verify_packet(call, skb, annotation, offset, len);
@@ -303,6 +306,7 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct rxrpc_call *call,
{
struct rxrpc_skb_priv *sp;
struct sk_buff *skb;
+ rxrpc_serial_t serial;
rxrpc_seq_t hard_ack, top, seq;
size_t remain;
bool last;
@@ -339,9 +343,12 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct rxrpc_call *call,
rxrpc_see_skb(skb, rxrpc_skb_rx_seen);
sp = rxrpc_skb(skb);
- if (!(flags & MSG_PEEK))
+ if (!(flags & MSG_PEEK)) {
+ serial = sp->hdr.serial;
+ serial += call->rxtx_annotations[ix] & RXRPC_RX_ANNO_SUBPACKET;
trace_rxrpc_receive(call, rxrpc_receive_front,
- sp->hdr.serial, seq);
+ serial, seq);
+ }
if (msg)
sock_recv_timestamp(msg, sock->sk, skb);
^ permalink raw reply related
* [PATCH net 1/7] rxrpc: Improve jumbo packet counting
From: David Howells @ 2019-08-29 13:07 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156708405310.26102.7954021163316252673.stgit@warthog.procyon.org.uk>
Improve the information stored about jumbo packets so that we don't need to
reparse them so much later.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
---
net/rxrpc/ar-internal.h | 10 +++++++---
net/rxrpc/input.c | 23 ++++++++++++++---------
net/rxrpc/protocol.h | 9 +++++++++
3 files changed, 30 insertions(+), 12 deletions(-)
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 145335611af6..87cff6c218b6 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -185,11 +185,15 @@ struct rxrpc_host_header {
* - max 48 bytes (struct sk_buff::cb)
*/
struct rxrpc_skb_priv {
- union {
- u8 nr_jumbo; /* Number of jumbo subpackets */
- };
+ u8 nr_subpackets; /* Number of subpackets */
+ u8 rx_flags; /* Received packet flags */
+#define RXRPC_SKB_INCL_LAST 0x01 /* - Includes last packet */
union {
int remain; /* amount of space remaining for next write */
+
+ /* List of requested ACKs on subpackets */
+ unsigned long rx_req_ack[(RXRPC_MAX_NR_JUMBO + BITS_PER_LONG - 1) /
+ BITS_PER_LONG];
};
struct rxrpc_host_header hdr; /* RxRPC packet header from this packet */
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index dd47d465d1d3..ffcec5117954 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -347,7 +347,7 @@ static bool rxrpc_receiving_reply(struct rxrpc_call *call)
}
/*
- * Scan a jumbo packet to validate its structure and to work out how many
+ * Scan a data packet to validate its structure and to work out how many
* subpackets it contains.
*
* A jumbo packet is a collection of consecutive packets glued together with
@@ -358,16 +358,21 @@ static bool rxrpc_receiving_reply(struct rxrpc_call *call)
* the last are RXRPC_JUMBO_DATALEN in size. The last subpacket may be of any
* size.
*/
-static bool rxrpc_validate_jumbo(struct sk_buff *skb)
+static bool rxrpc_validate_data(struct sk_buff *skb)
{
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
unsigned int offset = sizeof(struct rxrpc_wire_header);
unsigned int len = skb->len;
- int nr_jumbo = 1;
u8 flags = sp->hdr.flags;
- do {
- nr_jumbo++;
+ for (;;) {
+ if (flags & RXRPC_REQUEST_ACK)
+ __set_bit(sp->nr_subpackets, sp->rx_req_ack);
+ sp->nr_subpackets++;
+
+ if (!(flags & RXRPC_JUMBO_PACKET))
+ break;
+
if (len - offset < RXRPC_JUMBO_SUBPKTLEN)
goto protocol_error;
if (flags & RXRPC_LAST_PACKET)
@@ -376,9 +381,10 @@ static bool rxrpc_validate_jumbo(struct sk_buff *skb)
if (skb_copy_bits(skb, offset, &flags, 1) < 0)
goto protocol_error;
offset += sizeof(struct rxrpc_jumbo_header);
- } while (flags & RXRPC_JUMBO_PACKET);
+ }
- sp->nr_jumbo = nr_jumbo;
+ if (flags & RXRPC_LAST_PACKET)
+ sp->rx_flags |= RXRPC_SKB_INCL_LAST;
return true;
protocol_error:
@@ -1237,8 +1243,7 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
if (sp->hdr.callNumber == 0 ||
sp->hdr.seq == 0)
goto bad_message;
- if (sp->hdr.flags & RXRPC_JUMBO_PACKET &&
- !rxrpc_validate_jumbo(skb))
+ if (!rxrpc_validate_data(skb))
goto bad_message;
break;
diff --git a/net/rxrpc/protocol.h b/net/rxrpc/protocol.h
index 99ce322d7caa..49bb972539aa 100644
--- a/net/rxrpc/protocol.h
+++ b/net/rxrpc/protocol.h
@@ -89,6 +89,15 @@ struct rxrpc_jumbo_header {
#define RXRPC_JUMBO_DATALEN 1412 /* non-terminal jumbo packet data length */
#define RXRPC_JUMBO_SUBPKTLEN (RXRPC_JUMBO_DATALEN + sizeof(struct rxrpc_jumbo_header))
+/*
+ * The maximum number of subpackets that can possibly fit in a UDP packet is:
+ *
+ * ((max_IP - IP_hdr - UDP_hdr) / RXRPC_JUMBO_SUBPKTLEN) + 1
+ * = ((65535 - 28 - 28) / 1416) + 1
+ * = 46 non-terminal packets and 1 terminal packet.
+ */
+#define RXRPC_MAX_NR_JUMBO 47
+
/*****************************************************************************/
/*
* on-the-wire Rx ACK packet data payload
^ permalink raw reply related
* [PATCH net 0/7] rxrpc: Fix use of skb_cow_data()
From: David Howells @ 2019-08-29 13:07 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
Here's a series of patches that replaces the use of skb_cow_data() in rxrpc
with skb_unshare() early on in the input process. The problem that is
being seen is that skb_cow_data() indirectly requires that the maximum
usage count on an sk_buff be 1, and it may generate an assertion failure in
pskb_expand_head() if not.
This can occur because rxrpc_input_data() may be still holding a ref when
it has just attached the sk_buff to the rx ring and given that attachment
its own ref. If recvmsg happens fast enough, skb_cow_data() can see the
ref still held by the softirq handler.
Further, a packet may contain multiple subpackets, each of which gets its
own attachment to the ring and its own ref - also making skb_cow_data() go
bang.
Fix this by:
(1) The DATA packet is currently parsed for subpackets twice by the input
routines. Parse it just once instead and make notes in the sk_buff
private data.
(2) Use the notes from (1) when attaching the packet to the ring multiple
times. Once the packet is attached to the ring, recvmsg can see it
and start modifying it, so the softirq handler is not permitted to
look inside it from that point.
(3) Pass the ref from the input code to the ring rather than getting an
extra ref. rxrpc_input_data() uses a ref on the second refcount to
prevent the packet from evaporating under it.
(4) Call skb_unshare() on secured DATA packets in rxrpc_input_packet()
before we take call->input_lock. Other sorts of packets don't get
modified and so can be left.
A trace is emitted if skb_unshare() eats the skb. Note that
skb_share() for our accounting in this regard as we can't see the
parameters in the packet to log in a trace line if it releases it.
(5) Remove the calls to skb_cow_data(). These are then no longer
necessary.
There are also patches to improve the rxrpc_skb tracepoint to make sure
that Tx-derived buffers are identified separately from Rx-derived buffers
in the trace.
The patches are tagged here:
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-fixes-20190827
and can also be found on the following branch:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes
David
---
David Howells (7):
rxrpc: Improve jumbo packet counting
rxrpc: Use info in skbuff instead of reparsing a jumbo packet
rxrpc: Pass the input handler's data skb reference to the Rx ring
rxrpc: Abstract out rxtx ring cleanup
rxrpc: Add a private skb flag to indicate transmission-phase skbs
rxrpc: Use the tx-phase skb flag to simplify tracing
rxrpc: Use skb_unshare() rather than skb_cow_data()
include/trace/events/rxrpc.h | 59 ++++----
net/rxrpc/ar-internal.h | 16 ++
net/rxrpc/call_event.c | 8 +
net/rxrpc/call_object.c | 33 ++---
net/rxrpc/conn_event.c | 6 -
net/rxrpc/input.c | 304 +++++++++++++++++++++++-------------------
net/rxrpc/local_event.c | 4 -
net/rxrpc/output.c | 6 -
net/rxrpc/peer_event.c | 10 +
net/rxrpc/protocol.h | 9 +
net/rxrpc/recvmsg.c | 47 ++++--
net/rxrpc/rxkad.c | 32 +---
net/rxrpc/sendmsg.c | 13 +-
net/rxrpc/skbuff.c | 40 ++++--
14 files changed, 319 insertions(+), 268 deletions(-)
^ permalink raw reply
* [PATCH net 5/5] rxrpc: Use skb_unshare() rather than skb_cow_data()
From: David Howells @ 2019-08-29 13:06 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156708397261.25861.2158085372781699276.stgit@warthog.procyon.org.uk>
The in-place decryption routines in AF_RXRPC's rxkad security module
currently call skb_cow_data() to make sure the data isn't shared and that
the skb can be written over. This has a problem, however, as the softirq
handler may be still holding a ref or the Rx ring may be holding multiple
refs when skb_cow_data() is called in rxkad_verify_packet() - and so
skb_shared() returns true and __pskb_pull_tail() dislikes that. If this
occurs, something like the following report will be generated.
kernel BUG at net/core/skbuff.c:1463!
...
RIP: 0010:pskb_expand_head+0x253/0x2b0
...
Call Trace:
__pskb_pull_tail+0x49/0x460
skb_cow_data+0x6f/0x300
rxkad_verify_packet+0x18b/0xb10 [rxrpc]
rxrpc_recvmsg_data.isra.11+0x4a8/0xa10 [rxrpc]
rxrpc_kernel_recv_data+0x126/0x240 [rxrpc]
afs_extract_data+0x51/0x2d0 [kafs]
afs_deliver_fs_fetch_data+0x188/0x400 [kafs]
afs_deliver_to_call+0xac/0x430 [kafs]
afs_wait_for_call_to_complete+0x22f/0x3d0 [kafs]
afs_make_call+0x282/0x3f0 [kafs]
afs_fs_fetch_data+0x164/0x300 [kafs]
afs_fetch_data+0x54/0x130 [kafs]
afs_readpages+0x20d/0x340 [kafs]
read_pages+0x66/0x180
__do_page_cache_readahead+0x188/0x1a0
ondemand_readahead+0x17d/0x2e0
generic_file_read_iter+0x740/0xc10
__vfs_read+0x145/0x1a0
vfs_read+0x8c/0x140
ksys_read+0x4a/0xb0
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fix this by using skb_unshare() instead in the input path for DATA packets
that have a security index != 0. Non-DATA packets don't need in-place
encryption and neither do unencrypted DATA packets.
Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
Reported-by: Julian Wollrath <jwollrath@web.de>
Signed-off-by: David Howells <dhowells@redhat.com>
---
include/trace/events/rxrpc.h | 12 ++++++++----
net/rxrpc/ar-internal.h | 1 +
net/rxrpc/input.c | 18 ++++++++++++++++++
net/rxrpc/rxkad.c | 32 +++++++++-----------------------
net/rxrpc/skbuff.c | 25 ++++++++++++++++++++-----
5 files changed, 56 insertions(+), 32 deletions(-)
diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index e2356c51883b..a13a62db3565 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -32,6 +32,8 @@ enum rxrpc_skb_trace {
rxrpc_skb_received,
rxrpc_skb_rotated,
rxrpc_skb_seen,
+ rxrpc_skb_unshared,
+ rxrpc_skb_unshared_nomem,
};
enum rxrpc_local_trace {
@@ -231,7 +233,9 @@ enum rxrpc_tx_point {
EM(rxrpc_skb_purged, "PUR") \
EM(rxrpc_skb_received, "RCV") \
EM(rxrpc_skb_rotated, "ROT") \
- E_(rxrpc_skb_seen, "SEE")
+ EM(rxrpc_skb_seen, "SEE") \
+ EM(rxrpc_skb_unshared, "UNS") \
+ E_(rxrpc_skb_unshared_nomem, "US0")
#define rxrpc_local_traces \
EM(rxrpc_local_got, "GOT") \
@@ -633,9 +637,9 @@ TRACE_EVENT(rxrpc_call,
TRACE_EVENT(rxrpc_skb,
TP_PROTO(struct sk_buff *skb, enum rxrpc_skb_trace op,
- int usage, int mod_count, const void *where),
+ int usage, int mod_count, u8 flags, const void *where),
- TP_ARGS(skb, op, usage, mod_count, where),
+ TP_ARGS(skb, op, usage, mod_count, flags, where),
TP_STRUCT__entry(
__field(struct sk_buff *, skb )
@@ -648,7 +652,7 @@ TRACE_EVENT(rxrpc_skb,
TP_fast_assign(
__entry->skb = skb;
- __entry->flags = rxrpc_skb(skb)->rx_flags;
+ __entry->flags = flags;
__entry->op = op;
__entry->usage = usage;
__entry->mod_count = mod_count;
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 2d5294f3e62f..852e58781fda 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -1110,6 +1110,7 @@ void rxrpc_kernel_data_consumed(struct rxrpc_call *, struct sk_buff *);
void rxrpc_packet_destructor(struct sk_buff *);
void rxrpc_new_skb(struct sk_buff *, enum rxrpc_skb_trace);
void rxrpc_see_skb(struct sk_buff *, enum rxrpc_skb_trace);
+void rxrpc_eaten_skb(struct sk_buff *, enum rxrpc_skb_trace);
void rxrpc_get_skb(struct sk_buff *, enum rxrpc_skb_trace);
void rxrpc_free_skb(struct sk_buff *, enum rxrpc_skb_trace);
void rxrpc_purge_queue(struct sk_buff_head *);
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 31090bdf1fae..d122c53c8697 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1249,6 +1249,24 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
goto bad_message;
if (!rxrpc_validate_data(skb))
goto bad_message;
+
+ /* Unshare the packet so that it can be modified for in-place
+ * decryption.
+ */
+ if (sp->hdr.securityIndex != 0) {
+ struct sk_buff *nskb = skb_unshare(skb, GFP_ATOMIC);
+ if (!nskb) {
+ rxrpc_eaten_skb(skb, rxrpc_skb_unshared_nomem);
+ goto out;
+ }
+
+ if (nskb != skb) {
+ rxrpc_eaten_skb(skb, rxrpc_skb_received);
+ rxrpc_new_skb(skb, rxrpc_skb_unshared);
+ skb = nskb;
+ sp = rxrpc_skb(skb);
+ }
+ }
break;
case RXRPC_PACKET_TYPE_CHALLENGE:
diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index ae8cd8926456..c60c520fde7c 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -187,10 +187,8 @@ static int rxkad_secure_packet_encrypt(const struct rxrpc_call *call,
struct rxrpc_skb_priv *sp;
struct rxrpc_crypt iv;
struct scatterlist sg[16];
- struct sk_buff *trailer;
unsigned int len;
u16 check;
- int nsg;
int err;
sp = rxrpc_skb(skb);
@@ -214,15 +212,14 @@ static int rxkad_secure_packet_encrypt(const struct rxrpc_call *call,
crypto_skcipher_encrypt(req);
/* we want to encrypt the skbuff in-place */
- nsg = skb_cow_data(skb, 0, &trailer);
- err = -ENOMEM;
- if (nsg < 0 || nsg > 16)
+ err = -EMSGSIZE;
+ if (skb_shinfo(skb)->nr_frags > 16)
goto out;
len = data_size + call->conn->size_align - 1;
len &= ~(call->conn->size_align - 1);
- sg_init_table(sg, nsg);
+ sg_init_table(sg, ARRAY_SIZE(sg));
err = skb_to_sgvec(skb, sg, 0, len);
if (unlikely(err < 0))
goto out;
@@ -319,11 +316,10 @@ static int rxkad_verify_packet_1(struct rxrpc_call *call, struct sk_buff *skb,
struct rxkad_level1_hdr sechdr;
struct rxrpc_crypt iv;
struct scatterlist sg[16];
- struct sk_buff *trailer;
bool aborted;
u32 data_size, buf;
u16 check;
- int nsg, ret;
+ int ret;
_enter("");
@@ -336,11 +332,7 @@ static int rxkad_verify_packet_1(struct rxrpc_call *call, struct sk_buff *skb,
/* Decrypt the skbuff in-place. TODO: We really want to decrypt
* directly into the target buffer.
*/
- nsg = skb_cow_data(skb, 0, &trailer);
- if (nsg < 0 || nsg > 16)
- goto nomem;
-
- sg_init_table(sg, nsg);
+ sg_init_table(sg, ARRAY_SIZE(sg));
ret = skb_to_sgvec(skb, sg, offset, 8);
if (unlikely(ret < 0))
return ret;
@@ -388,10 +380,6 @@ static int rxkad_verify_packet_1(struct rxrpc_call *call, struct sk_buff *skb,
if (aborted)
rxrpc_send_abort_packet(call);
return -EPROTO;
-
-nomem:
- _leave(" = -ENOMEM");
- return -ENOMEM;
}
/*
@@ -406,7 +394,6 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
struct rxkad_level2_hdr sechdr;
struct rxrpc_crypt iv;
struct scatterlist _sg[4], *sg;
- struct sk_buff *trailer;
bool aborted;
u32 data_size, buf;
u16 check;
@@ -423,12 +410,11 @@ static int rxkad_verify_packet_2(struct rxrpc_call *call, struct sk_buff *skb,
/* Decrypt the skbuff in-place. TODO: We really want to decrypt
* directly into the target buffer.
*/
- nsg = skb_cow_data(skb, 0, &trailer);
- if (nsg < 0)
- goto nomem;
-
sg = _sg;
- if (unlikely(nsg > 4)) {
+ nsg = skb_shinfo(skb)->nr_frags;
+ if (nsg <= 4) {
+ nsg = 4;
+ } else {
sg = kmalloc_array(nsg, sizeof(*sg), GFP_NOIO);
if (!sg)
goto nomem;
diff --git a/net/rxrpc/skbuff.c b/net/rxrpc/skbuff.c
index 8e6f45f84b9b..0348d2bf6f7d 100644
--- a/net/rxrpc/skbuff.c
+++ b/net/rxrpc/skbuff.c
@@ -24,7 +24,8 @@ void rxrpc_new_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
{
const void *here = __builtin_return_address(0);
int n = atomic_inc_return(select_skb_count(skb));
- trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
+ trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n,
+ rxrpc_skb(skb)->rx_flags, here);
}
/*
@@ -35,7 +36,8 @@ void rxrpc_see_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
const void *here = __builtin_return_address(0);
if (skb) {
int n = atomic_read(select_skb_count(skb));
- trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
+ trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n,
+ rxrpc_skb(skb)->rx_flags, here);
}
}
@@ -46,10 +48,21 @@ void rxrpc_get_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
{
const void *here = __builtin_return_address(0);
int n = atomic_inc_return(select_skb_count(skb));
- trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
+ trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n,
+ rxrpc_skb(skb)->rx_flags, here);
skb_get(skb);
}
+/*
+ * Note the dropping of a ref on a socket buffer by the core.
+ */
+void rxrpc_eaten_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
+{
+ const void *here = __builtin_return_address(0);
+ int n = atomic_inc_return(&rxrpc_n_rx_skbs);
+ trace_rxrpc_skb(skb, op, 0, n, 0, here);
+}
+
/*
* Note the destruction of a socket buffer.
*/
@@ -60,7 +73,8 @@ void rxrpc_free_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
int n;
CHECK_SLAB_OKAY(&skb->users);
n = atomic_dec_return(select_skb_count(skb));
- trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
+ trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n,
+ rxrpc_skb(skb)->rx_flags, here);
kfree_skb(skb);
}
}
@@ -75,7 +89,8 @@ void rxrpc_purge_queue(struct sk_buff_head *list)
while ((skb = skb_dequeue((list))) != NULL) {
int n = atomic_dec_return(select_skb_count(skb));
trace_rxrpc_skb(skb, rxrpc_skb_purged,
- refcount_read(&skb->users), n, here);
+ refcount_read(&skb->users), n,
+ rxrpc_skb(skb)->rx_flags, here);
kfree_skb(skb);
}
}
^ permalink raw reply related
* [PATCH net 4/5] rxrpc: Use the tx-phase skb flag to simplify tracing
From: David Howells @ 2019-08-29 13:06 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156708397261.25861.2158085372781699276.stgit@warthog.procyon.org.uk>
Use the previously-added transmit-phase skbuff private flag to simplify the
socket buffer tracing a bit. Which phase the skbuff comes from can now be
divined from the skb rather than having to be guessed from the call state.
We can also reduce the number of rxrpc_skb_trace values by eliminating the
difference between Tx and Rx in the symbols.
Signed-off-by: David Howells <dhowells@redhat.com>
---
include/trace/events/rxrpc.h | 51 ++++++++++++++++++------------------------
net/rxrpc/ar-internal.h | 1 +
net/rxrpc/call_event.c | 8 +++----
net/rxrpc/call_object.c | 6 ++---
net/rxrpc/conn_event.c | 6 ++---
net/rxrpc/input.c | 22 +++++++++---------
net/rxrpc/local_event.c | 4 ++-
net/rxrpc/output.c | 6 ++---
net/rxrpc/peer_event.c | 10 ++++----
net/rxrpc/recvmsg.c | 6 ++---
net/rxrpc/sendmsg.c | 10 ++++----
net/rxrpc/skbuff.c | 15 +++++++-----
12 files changed, 69 insertions(+), 76 deletions(-)
diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index fa06b528c73c..e2356c51883b 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -23,20 +23,15 @@
#define __RXRPC_DECLARE_TRACE_ENUMS_ONCE_ONLY
enum rxrpc_skb_trace {
- rxrpc_skb_rx_cleaned,
- rxrpc_skb_rx_freed,
- rxrpc_skb_rx_got,
- rxrpc_skb_rx_lost,
- rxrpc_skb_rx_purged,
- rxrpc_skb_rx_received,
- rxrpc_skb_rx_rotated,
- rxrpc_skb_rx_seen,
- rxrpc_skb_tx_cleaned,
- rxrpc_skb_tx_freed,
- rxrpc_skb_tx_got,
- rxrpc_skb_tx_new,
- rxrpc_skb_tx_rotated,
- rxrpc_skb_tx_seen,
+ rxrpc_skb_cleaned,
+ rxrpc_skb_freed,
+ rxrpc_skb_got,
+ rxrpc_skb_lost,
+ rxrpc_skb_new,
+ rxrpc_skb_purged,
+ rxrpc_skb_received,
+ rxrpc_skb_rotated,
+ rxrpc_skb_seen,
};
enum rxrpc_local_trace {
@@ -228,20 +223,15 @@ enum rxrpc_tx_point {
* Declare tracing information enums and their string mappings for display.
*/
#define rxrpc_skb_traces \
- EM(rxrpc_skb_rx_cleaned, "Rx CLN") \
- EM(rxrpc_skb_rx_freed, "Rx FRE") \
- EM(rxrpc_skb_rx_got, "Rx GOT") \
- EM(rxrpc_skb_rx_lost, "Rx *L*") \
- EM(rxrpc_skb_rx_purged, "Rx PUR") \
- EM(rxrpc_skb_rx_received, "Rx RCV") \
- EM(rxrpc_skb_rx_rotated, "Rx ROT") \
- EM(rxrpc_skb_rx_seen, "Rx SEE") \
- EM(rxrpc_skb_tx_cleaned, "Tx CLN") \
- EM(rxrpc_skb_tx_freed, "Tx FRE") \
- EM(rxrpc_skb_tx_got, "Tx GOT") \
- EM(rxrpc_skb_tx_new, "Tx NEW") \
- EM(rxrpc_skb_tx_rotated, "Tx ROT") \
- E_(rxrpc_skb_tx_seen, "Tx SEE")
+ EM(rxrpc_skb_cleaned, "CLN") \
+ EM(rxrpc_skb_freed, "FRE") \
+ EM(rxrpc_skb_got, "GOT") \
+ EM(rxrpc_skb_lost, "*L*") \
+ EM(rxrpc_skb_new, "NEW") \
+ EM(rxrpc_skb_purged, "PUR") \
+ EM(rxrpc_skb_received, "RCV") \
+ EM(rxrpc_skb_rotated, "ROT") \
+ E_(rxrpc_skb_seen, "SEE")
#define rxrpc_local_traces \
EM(rxrpc_local_got, "GOT") \
@@ -650,6 +640,7 @@ TRACE_EVENT(rxrpc_skb,
TP_STRUCT__entry(
__field(struct sk_buff *, skb )
__field(enum rxrpc_skb_trace, op )
+ __field(u8, flags )
__field(int, usage )
__field(int, mod_count )
__field(const void *, where )
@@ -657,14 +648,16 @@ TRACE_EVENT(rxrpc_skb,
TP_fast_assign(
__entry->skb = skb;
+ __entry->flags = rxrpc_skb(skb)->rx_flags;
__entry->op = op;
__entry->usage = usage;
__entry->mod_count = mod_count;
__entry->where = where;
),
- TP_printk("s=%p %s u=%d m=%d p=%pSR",
+ TP_printk("s=%p %cx %s u=%d m=%d p=%pSR",
__entry->skb,
+ __entry->flags & RXRPC_SKB_TX_BUFFER ? 'T' : 'R',
__print_symbolic(__entry->op, rxrpc_skb_traces),
__entry->usage,
__entry->mod_count,
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 63d3a91ce5e9..2d5294f3e62f 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -185,6 +185,7 @@ struct rxrpc_host_header {
* - max 48 bytes (struct sk_buff::cb)
*/
struct rxrpc_skb_priv {
+ atomic_t nr_ring_pins; /* Number of rxtx ring pins */
u8 nr_subpackets; /* Number of subpackets */
u8 rx_flags; /* Received packet flags */
#define RXRPC_SKB_INCL_LAST 0x01 /* - Includes last packet */
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index c767679bfa5d..cedbbb3a7c2e 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -199,7 +199,7 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j)
continue;
skb = call->rxtx_buffer[ix];
- rxrpc_see_skb(skb, rxrpc_skb_tx_seen);
+ rxrpc_see_skb(skb, rxrpc_skb_seen);
if (anno_type == RXRPC_TX_ANNO_UNACK) {
if (ktime_after(skb->tstamp, max_age)) {
@@ -255,18 +255,18 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j)
continue;
skb = call->rxtx_buffer[ix];
- rxrpc_get_skb(skb, rxrpc_skb_tx_got);
+ rxrpc_get_skb(skb, rxrpc_skb_got);
spin_unlock_bh(&call->lock);
if (rxrpc_send_data_packet(call, skb, true) < 0) {
- rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
return;
}
if (rxrpc_is_client_call(call))
rxrpc_expose_client_call(call);
- rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
spin_lock_bh(&call->lock);
/* We need to clear the retransmit state, but there are two
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index c9ab2da957fe..014548c259ce 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -429,9 +429,7 @@ static void rxrpc_cleanup_ring(struct rxrpc_call *call)
int i;
for (i = 0; i < RXRPC_RXTX_BUFF_SIZE; i++) {
- rxrpc_free_skb(call->rxtx_buffer[i],
- (call->tx_phase ? rxrpc_skb_tx_cleaned :
- rxrpc_skb_rx_cleaned));
+ rxrpc_free_skb(call->rxtx_buffer[i], rxrpc_skb_cleaned);
call->rxtx_buffer[i] = NULL;
}
}
@@ -587,7 +585,7 @@ void rxrpc_cleanup_call(struct rxrpc_call *call)
ASSERTCMP(call->conn, ==, NULL);
rxrpc_cleanup_ring(call);
- rxrpc_free_skb(call->tx_pending, rxrpc_skb_tx_cleaned);
+ rxrpc_free_skb(call->tx_pending, rxrpc_skb_cleaned);
call_rcu(&call->rcu, rxrpc_rcu_destroy_call);
}
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index df6624c140be..a1ceef4f5cd0 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -472,7 +472,7 @@ void rxrpc_process_connection(struct work_struct *work)
/* go through the conn-level event packets, releasing the ref on this
* connection that each one has when we've finished with it */
while ((skb = skb_dequeue(&conn->rx_queue))) {
- rxrpc_see_skb(skb, rxrpc_skb_rx_seen);
+ rxrpc_see_skb(skb, rxrpc_skb_seen);
ret = rxrpc_process_event(conn, skb, &abort_code);
switch (ret) {
case -EPROTO:
@@ -484,7 +484,7 @@ void rxrpc_process_connection(struct work_struct *work)
goto requeue_and_leave;
case -ECONNABORTED:
default:
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
break;
}
}
@@ -501,6 +501,6 @@ void rxrpc_process_connection(struct work_struct *work)
protocol_error:
if (rxrpc_abort_connection(conn, ret, abort_code) < 0)
goto requeue_and_leave;
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
goto out;
}
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 140cede77655..31090bdf1fae 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -233,7 +233,7 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to,
ix = call->tx_hard_ack & RXRPC_RXTX_BUFF_MASK;
skb = call->rxtx_buffer[ix];
annotation = call->rxtx_annotations[ix];
- rxrpc_see_skb(skb, rxrpc_skb_tx_rotated);
+ rxrpc_see_skb(skb, rxrpc_skb_rotated);
call->rxtx_buffer[ix] = NULL;
call->rxtx_annotations[ix] = 0;
skb->next = list;
@@ -258,7 +258,7 @@ static bool rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to,
skb = list;
list = skb->next;
skb_mark_not_on_list(skb);
- rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
}
return rot_last;
@@ -443,7 +443,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
state = READ_ONCE(call->state);
if (state >= RXRPC_CALL_COMPLETE) {
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
return;
}
@@ -559,7 +559,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
* and also rxrpc_fill_out_ack().
*/
if (!terminal)
- rxrpc_get_skb(skb, rxrpc_skb_rx_got);
+ rxrpc_get_skb(skb, rxrpc_skb_got);
call->rxtx_annotations[ix] = annotation;
smp_wmb();
call->rxtx_buffer[ix] = skb;
@@ -620,7 +620,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
unlock:
spin_unlock(&call->input_lock);
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
_leave(" [queued]");
}
@@ -1056,7 +1056,7 @@ static void rxrpc_input_call_packet(struct rxrpc_call *call,
break;
}
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
no_free:
_leave("");
}
@@ -1119,7 +1119,7 @@ static void rxrpc_post_packet_to_local(struct rxrpc_local *local,
skb_queue_tail(&local->event_queue, skb);
rxrpc_queue_local(local);
} else {
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
}
}
@@ -1134,7 +1134,7 @@ static void rxrpc_reject_packet(struct rxrpc_local *local, struct sk_buff *skb)
skb_queue_tail(&local->reject_queue, skb);
rxrpc_queue_local(local);
} else {
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
}
}
@@ -1198,7 +1198,7 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
if (skb->tstamp == 0)
skb->tstamp = ktime_get_real();
- rxrpc_new_skb(skb, rxrpc_skb_rx_received);
+ rxrpc_new_skb(skb, rxrpc_skb_received);
skb_pull(skb, sizeof(struct udphdr));
@@ -1215,7 +1215,7 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
static int lose;
if ((lose++ & 7) == 7) {
trace_rxrpc_rx_lose(sp);
- rxrpc_free_skb(skb, rxrpc_skb_rx_lost);
+ rxrpc_free_skb(skb, rxrpc_skb_lost);
return 0;
}
}
@@ -1389,7 +1389,7 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
goto out;
discard:
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
out:
trace_rxrpc_rx_done(0, 0);
return 0;
diff --git a/net/rxrpc/local_event.c b/net/rxrpc/local_event.c
index e93a78f7c05e..3ce6d628cd75 100644
--- a/net/rxrpc/local_event.c
+++ b/net/rxrpc/local_event.c
@@ -90,7 +90,7 @@ void rxrpc_process_local_events(struct rxrpc_local *local)
if (skb) {
struct rxrpc_skb_priv *sp = rxrpc_skb(skb);
- rxrpc_see_skb(skb, rxrpc_skb_rx_seen);
+ rxrpc_see_skb(skb, rxrpc_skb_seen);
_debug("{%d},{%u}", local->debug_id, sp->hdr.type);
switch (sp->hdr.type) {
@@ -108,7 +108,7 @@ void rxrpc_process_local_events(struct rxrpc_local *local)
break;
}
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
}
_leave("");
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 369e516c4bdf..935bb60fff56 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -565,7 +565,7 @@ void rxrpc_reject_packets(struct rxrpc_local *local)
memset(&whdr, 0, sizeof(whdr));
while ((skb = skb_dequeue(&local->reject_queue))) {
- rxrpc_see_skb(skb, rxrpc_skb_rx_seen);
+ rxrpc_see_skb(skb, rxrpc_skb_seen);
sp = rxrpc_skb(skb);
switch (skb->mark) {
@@ -581,7 +581,7 @@ void rxrpc_reject_packets(struct rxrpc_local *local)
ioc = 2;
break;
default:
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
continue;
}
@@ -606,7 +606,7 @@ void rxrpc_reject_packets(struct rxrpc_local *local)
rxrpc_tx_point_reject);
}
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
}
_leave("");
diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c
index 7666ec72d37e..c97ebdc043e4 100644
--- a/net/rxrpc/peer_event.c
+++ b/net/rxrpc/peer_event.c
@@ -163,11 +163,11 @@ void rxrpc_error_report(struct sock *sk)
_leave("UDP socket errqueue empty");
return;
}
- rxrpc_new_skb(skb, rxrpc_skb_rx_received);
+ rxrpc_new_skb(skb, rxrpc_skb_received);
serr = SKB_EXT_ERR(skb);
if (!skb->len && serr->ee.ee_origin == SO_EE_ORIGIN_TIMESTAMPING) {
_leave("UDP empty message");
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
return;
}
@@ -177,7 +177,7 @@ void rxrpc_error_report(struct sock *sk)
peer = NULL;
if (!peer) {
rcu_read_unlock();
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
_leave(" [no peer]");
return;
}
@@ -189,7 +189,7 @@ void rxrpc_error_report(struct sock *sk)
serr->ee.ee_code == ICMP_FRAG_NEEDED)) {
rxrpc_adjust_mtu(peer, serr);
rcu_read_unlock();
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
rxrpc_put_peer(peer);
_leave(" [MTU update]");
return;
@@ -197,7 +197,7 @@ void rxrpc_error_report(struct sock *sk)
rxrpc_store_error(peer, serr);
rcu_read_unlock();
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
rxrpc_put_peer(peer);
_leave("");
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index e49eacfaf4d6..3b0becb12041 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -190,7 +190,7 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
hard_ack++;
ix = hard_ack & RXRPC_RXTX_BUFF_MASK;
skb = call->rxtx_buffer[ix];
- rxrpc_see_skb(skb, rxrpc_skb_rx_rotated);
+ rxrpc_see_skb(skb, rxrpc_skb_rotated);
sp = rxrpc_skb(skb);
subpacket = call->rxtx_annotations[ix] & RXRPC_RX_ANNO_SUBPACKET;
@@ -205,7 +205,7 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
/* Barrier against rxrpc_input_data(). */
smp_store_release(&call->rx_hard_ack, hard_ack);
- rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
trace_rxrpc_receive(call, rxrpc_receive_rotate, serial, hard_ack);
if (last) {
@@ -340,7 +340,7 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct rxrpc_call *call,
break;
}
smp_rmb();
- rxrpc_see_skb(skb, rxrpc_skb_rx_seen);
+ rxrpc_see_skb(skb, rxrpc_skb_seen);
sp = rxrpc_skb(skb);
if (!(flags & MSG_PEEK)) {
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 472dc3b7d91f..6a1547b270fe 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -176,7 +176,7 @@ static int rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call,
skb->tstamp = ktime_get_real();
ix = seq & RXRPC_RXTX_BUFF_MASK;
- rxrpc_get_skb(skb, rxrpc_skb_tx_got);
+ rxrpc_get_skb(skb, rxrpc_skb_got);
call->rxtx_annotations[ix] = annotation;
smp_wmb();
call->rxtx_buffer[ix] = skb;
@@ -248,7 +248,7 @@ static int rxrpc_queue_packet(struct rxrpc_sock *rx, struct rxrpc_call *call,
}
out:
- rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
_leave(" = %d", ret);
return ret;
}
@@ -289,7 +289,7 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
skb = call->tx_pending;
call->tx_pending = NULL;
- rxrpc_see_skb(skb, rxrpc_skb_tx_seen);
+ rxrpc_see_skb(skb, rxrpc_skb_seen);
copied = 0;
do {
@@ -338,7 +338,7 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
sp = rxrpc_skb(skb);
sp->rx_flags |= RXRPC_SKB_TX_BUFFER;
- rxrpc_new_skb(skb, rxrpc_skb_tx_new);
+ rxrpc_new_skb(skb, rxrpc_skb_new);
_debug("ALLOC SEND %p", skb);
@@ -440,7 +440,7 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
return ret;
call_terminated:
- rxrpc_free_skb(skb, rxrpc_skb_tx_freed);
+ rxrpc_free_skb(skb, rxrpc_skb_freed);
_leave(" = %d", call->error);
return call->error;
diff --git a/net/rxrpc/skbuff.c b/net/rxrpc/skbuff.c
index 9ad5045b7c2f..8e6f45f84b9b 100644
--- a/net/rxrpc/skbuff.c
+++ b/net/rxrpc/skbuff.c
@@ -14,7 +14,8 @@
#include <net/af_rxrpc.h>
#include "ar-internal.h"
-#define select_skb_count(op) (op >= rxrpc_skb_tx_cleaned ? &rxrpc_n_tx_skbs : &rxrpc_n_rx_skbs)
+#define is_tx_skb(skb) (rxrpc_skb(skb)->rx_flags & RXRPC_SKB_TX_BUFFER)
+#define select_skb_count(skb) (is_tx_skb(skb) ? &rxrpc_n_tx_skbs : &rxrpc_n_rx_skbs)
/*
* Note the allocation or reception of a socket buffer.
@@ -22,7 +23,7 @@
void rxrpc_new_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
{
const void *here = __builtin_return_address(0);
- int n = atomic_inc_return(select_skb_count(op));
+ int n = atomic_inc_return(select_skb_count(skb));
trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
}
@@ -33,7 +34,7 @@ void rxrpc_see_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
{
const void *here = __builtin_return_address(0);
if (skb) {
- int n = atomic_read(select_skb_count(op));
+ int n = atomic_read(select_skb_count(skb));
trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
}
}
@@ -44,7 +45,7 @@ void rxrpc_see_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
void rxrpc_get_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
{
const void *here = __builtin_return_address(0);
- int n = atomic_inc_return(select_skb_count(op));
+ int n = atomic_inc_return(select_skb_count(skb));
trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
skb_get(skb);
}
@@ -58,7 +59,7 @@ void rxrpc_free_skb(struct sk_buff *skb, enum rxrpc_skb_trace op)
if (skb) {
int n;
CHECK_SLAB_OKAY(&skb->users);
- n = atomic_dec_return(select_skb_count(op));
+ n = atomic_dec_return(select_skb_count(skb));
trace_rxrpc_skb(skb, op, refcount_read(&skb->users), n, here);
kfree_skb(skb);
}
@@ -72,8 +73,8 @@ void rxrpc_purge_queue(struct sk_buff_head *list)
const void *here = __builtin_return_address(0);
struct sk_buff *skb;
while ((skb = skb_dequeue((list))) != NULL) {
- int n = atomic_dec_return(select_skb_count(rxrpc_skb_rx_purged));
- trace_rxrpc_skb(skb, rxrpc_skb_rx_purged,
+ int n = atomic_dec_return(select_skb_count(skb));
+ trace_rxrpc_skb(skb, rxrpc_skb_purged,
refcount_read(&skb->users), n, here);
kfree_skb(skb);
}
^ permalink raw reply related
* [PATCH net 3/5] rxrpc: Add a private skb flag to indicate transmission-phase skbs
From: David Howells @ 2019-08-29 13:06 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156708397261.25861.2158085372781699276.stgit@warthog.procyon.org.uk>
Add a flag in the private data on an skbuff to indicate that this is a
transmission-phase buffer rather than a receive-phase buffer.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/ar-internal.h | 1 +
net/rxrpc/sendmsg.c | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 20d7907a5bc6..63d3a91ce5e9 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -188,6 +188,7 @@ struct rxrpc_skb_priv {
u8 nr_subpackets; /* Number of subpackets */
u8 rx_flags; /* Received packet flags */
#define RXRPC_SKB_INCL_LAST 0x01 /* - Includes last packet */
+#define RXRPC_SKB_TX_BUFFER 0x02 /* - Is transmit buffer */
union {
int remain; /* amount of space remaining for next write */
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index bae14438f869..472dc3b7d91f 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -336,6 +336,8 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
if (!skb)
goto maybe_error;
+ sp = rxrpc_skb(skb);
+ sp->rx_flags |= RXRPC_SKB_TX_BUFFER;
rxrpc_new_skb(skb, rxrpc_skb_tx_new);
_debug("ALLOC SEND %p", skb);
@@ -346,7 +348,6 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
skb_reserve(skb, call->conn->security_size);
skb->len += call->conn->security_size;
- sp = rxrpc_skb(skb);
sp->remain = chunk;
if (sp->remain > skb_tailroom(skb))
sp->remain = skb_tailroom(skb);
^ permalink raw reply related
* [PATCH net 2/5] rxrpc: Abstract out rxtx ring cleanup
From: David Howells @ 2019-08-29 13:06 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156708397261.25861.2158085372781699276.stgit@warthog.procyon.org.uk>
Abstract out rxtx ring cleanup into its own function from its two callers.
This makes it easier to apply the same changes to both.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/call_object.c | 33 +++++++++++++++++----------------
1 file changed, 17 insertions(+), 16 deletions(-)
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 217b12be9e08..c9ab2da957fe 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -421,6 +421,21 @@ void rxrpc_get_call(struct rxrpc_call *call, enum rxrpc_call_trace op)
trace_rxrpc_call(call, op, n, here, NULL);
}
+/*
+ * Clean up the RxTx skb ring.
+ */
+static void rxrpc_cleanup_ring(struct rxrpc_call *call)
+{
+ int i;
+
+ for (i = 0; i < RXRPC_RXTX_BUFF_SIZE; i++) {
+ rxrpc_free_skb(call->rxtx_buffer[i],
+ (call->tx_phase ? rxrpc_skb_tx_cleaned :
+ rxrpc_skb_rx_cleaned));
+ call->rxtx_buffer[i] = NULL;
+ }
+}
+
/*
* Detach a call from its owning socket.
*/
@@ -429,7 +444,6 @@ void rxrpc_release_call(struct rxrpc_sock *rx, struct rxrpc_call *call)
const void *here = __builtin_return_address(0);
struct rxrpc_connection *conn = call->conn;
bool put = false;
- int i;
_enter("{%d,%d}", call->debug_id, atomic_read(&call->usage));
@@ -479,13 +493,7 @@ void rxrpc_release_call(struct rxrpc_sock *rx, struct rxrpc_call *call)
if (conn)
rxrpc_disconnect_call(call);
- for (i = 0; i < RXRPC_RXTX_BUFF_SIZE; i++) {
- rxrpc_free_skb(call->rxtx_buffer[i],
- (call->tx_phase ? rxrpc_skb_tx_cleaned :
- rxrpc_skb_rx_cleaned));
- call->rxtx_buffer[i] = NULL;
- }
-
+ rxrpc_cleanup_ring(call);
_leave("");
}
@@ -568,8 +576,6 @@ static void rxrpc_rcu_destroy_call(struct rcu_head *rcu)
*/
void rxrpc_cleanup_call(struct rxrpc_call *call)
{
- int i;
-
_net("DESTROY CALL %d", call->debug_id);
memset(&call->sock_node, 0xcd, sizeof(call->sock_node));
@@ -580,12 +586,7 @@ void rxrpc_cleanup_call(struct rxrpc_call *call)
ASSERT(test_bit(RXRPC_CALL_RELEASED, &call->flags));
ASSERTCMP(call->conn, ==, NULL);
- /* Clean up the Rx/Tx buffer */
- for (i = 0; i < RXRPC_RXTX_BUFF_SIZE; i++)
- rxrpc_free_skb(call->rxtx_buffer[i],
- (call->tx_phase ? rxrpc_skb_tx_cleaned :
- rxrpc_skb_rx_cleaned));
-
+ rxrpc_cleanup_ring(call);
rxrpc_free_skb(call->tx_pending, rxrpc_skb_tx_cleaned);
call_rcu(&call->rcu, rxrpc_rcu_destroy_call);
^ permalink raw reply related
* [PATCH net 1/5] rxrpc: Pass the input handler's data skb reference to the Rx ring
From: David Howells @ 2019-08-29 13:06 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156708397261.25861.2158085372781699276.stgit@warthog.procyon.org.uk>
Pass the reference held on a DATA skb in the rxrpc input handler into the
Rx ring rather than getting an additional ref for this and then dropping
the original ref at the end.
Signed-off-by: David Howells <dhowells@redhat.com>
---
net/rxrpc/input.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 35b1a9368d80..140cede77655 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -422,7 +422,8 @@ static void rxrpc_input_dup_data(struct rxrpc_call *call, rxrpc_seq_t seq,
}
/*
- * Process a DATA packet, adding the packet to the Rx ring.
+ * Process a DATA packet, adding the packet to the Rx ring. The caller's
+ * packet ref must be passed on or discarded.
*/
static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
{
@@ -441,8 +442,10 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
sp->hdr.serial, seq0, sp->hdr.flags, sp->nr_subpackets);
state = READ_ONCE(call->state);
- if (state >= RXRPC_CALL_COMPLETE)
+ if (state >= RXRPC_CALL_COMPLETE) {
+ rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
return;
+ }
if (call->state == RXRPC_CALL_SERVER_RECV_REQUEST) {
unsigned long timo = READ_ONCE(call->next_req_timo);
@@ -555,7 +558,8 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
* Barriers against rxrpc_recvmsg_data() and rxrpc_rotate_rx_window()
* and also rxrpc_fill_out_ack().
*/
- rxrpc_get_skb(skb, rxrpc_skb_rx_got);
+ if (!terminal)
+ rxrpc_get_skb(skb, rxrpc_skb_rx_got);
call->rxtx_annotations[ix] = annotation;
smp_wmb();
call->rxtx_buffer[ix] = skb;
@@ -616,6 +620,7 @@ static void rxrpc_input_data(struct rxrpc_call *call, struct sk_buff *skb)
unlock:
spin_unlock(&call->input_lock);
+ rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
_leave(" [queued]");
}
@@ -1024,7 +1029,7 @@ static void rxrpc_input_call_packet(struct rxrpc_call *call,
switch (sp->hdr.type) {
case RXRPC_PACKET_TYPE_DATA:
rxrpc_input_data(call, skb);
- break;
+ goto no_free;
case RXRPC_PACKET_TYPE_ACK:
rxrpc_input_ack(call, skb);
@@ -1051,6 +1056,8 @@ static void rxrpc_input_call_packet(struct rxrpc_call *call,
break;
}
+ rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+no_free:
_leave("");
}
@@ -1375,8 +1382,11 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
mutex_unlock(&call->user_mutex);
}
+ /* Process a call packet; this either discards or passes on the ref
+ * elsewhere.
+ */
rxrpc_input_call_packet(call, skb);
- goto discard;
+ goto out;
discard:
rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
^ permalink raw reply related
* [PATCH net 0/5] rxrpc: Fix use of skb_cow_data()
From: David Howells @ 2019-08-29 13:06 UTC (permalink / raw)
To: netdev; +Cc: dhowells, linux-afs, linux-kernel
Here's a series of patches that replaces the use of skb_cow_data() in rxrpc
with skb_unshare() early on in the input process. The problem that is
being seen is that skb_cow_data() indirectly requires that the maximum
usage count on an sk_buff be 1, and it may generate an assertion failure in
pskb_expand_head() if not.
This can occur because rxrpc_input_data() may be still holding a ref when
it has just attached the sk_buff to the rx ring and given that attachment
its own ref. If recvmsg happens fast enough, skb_cow_data() can see the
ref still held by the softirq handler.
Further, a packet may contain multiple subpackets, each of which gets its
own attachment to the ring and its own ref - also making skb_cow_data() go
bang.
Fix this by:
(1) The DATA packet is currently parsed for subpackets twice by the input
routines. Parse it just once instead and make notes in the sk_buff
private data.
(2) Use the notes from (1) when attaching the packet to the ring multiple
times. Once the packet is attached to the ring, recvmsg can see it
and start modifying it, so the softirq handler is not permitted to
look inside it from that point.
(3) Pass the ref from the input code to the ring rather than getting an
extra ref. rxrpc_input_data() uses a ref on the second refcount to
prevent the packet from evaporating under it.
(4) Call skb_unshare() on secured DATA packets in rxrpc_input_packet()
before we take call->input_lock. Other sorts of packets don't get
modified and so can be left.
A trace is emitted if skb_unshare() eats the skb. Note that
skb_share() for our accounting in this regard as we can't see the
parameters in the packet to log in a trace line if it releases it.
(5) Remove the calls to skb_cow_data(). These are then no longer
necessary.
There are also patches to improve the rxrpc_skb tracepoint to make sure
that Tx-derived buffers are identified separately from Rx-derived buffers
in the trace.
The patches are tagged here:
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-fixes-20190827
and can also be found on the following branch:
http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes
David
---
David Howells (5):
rxrpc: Pass the input handler's data skb reference to the Rx ring
rxrpc: Abstract out rxtx ring cleanup
rxrpc: Add a private skb flag to indicate transmission-phase skbs
rxrpc: Use the tx-phase skb flag to simplify tracing
rxrpc: Use skb_unshare() rather than skb_cow_data()
include/trace/events/rxrpc.h | 59 ++++++++++++++++++++----------------------
net/rxrpc/ar-internal.h | 3 ++
net/rxrpc/call_event.c | 8 +++---
net/rxrpc/call_object.c | 33 +++++++++++------------
net/rxrpc/conn_event.c | 6 ++--
net/rxrpc/input.c | 52 ++++++++++++++++++++++++++++---------
net/rxrpc/local_event.c | 4 +--
net/rxrpc/output.c | 6 ++--
net/rxrpc/peer_event.c | 10 ++++---
net/rxrpc/recvmsg.c | 6 ++--
net/rxrpc/rxkad.c | 32 ++++++-----------------
net/rxrpc/sendmsg.c | 13 +++++----
net/rxrpc/skbuff.c | 40 ++++++++++++++++++++--------
13 files changed, 151 insertions(+), 121 deletions(-)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox