* [PATCHv2 net] sunvnet: set queue mapping when doing packet copies
@ 2015-01-29 19:42 David L Stevens
2015-01-29 20:39 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: David L Stevens @ 2015-01-29 19:42 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Sowmini Varadhan, Eric Dumazet
This patch fixes a bug where vnet_skb_shape() didn't set the already-selected
queue mapping when a packet copy was required. This results in using the
wrong queue index for stops/starts, hung tx queues and watchdog timeouts
under heavy load. It also transfers the destructor, truesize, and the owning
socket for flow control (code adapted from skb_segment).
Signed-off-by: David L Stevens <david.stevens@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
--
Changes since v1:
move truesize, destructor and socket per Eric Dumazet <eric.dumazet@gmail.com>
---
drivers/net/ethernet/sun/sunvnet.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
index 2b719cc..0a1d3eb 100644
--- a/drivers/net/ethernet/sun/sunvnet.c
+++ b/drivers/net/ethernet/sun/sunvnet.c
@@ -1123,6 +1123,16 @@ static inline struct sk_buff *vnet_skb_shape(struct sk_buff *skb, int ncookies)
skb_shinfo(nskb)->gso_size = skb_shinfo(skb)->gso_size;
skb_shinfo(nskb)->gso_type = skb_shinfo(skb)->gso_type;
}
+ nskb->queue_mapping = skb->queue_mapping;
+ /* Following permits correct back-pressure, for protocols
+ * using skb_set_owner_w().
+ * Idea is to transfer ownership from skb to nskb.
+ */
+ if (skb->destructor == sock_wfree) {
+ swap(nskb->truesize, skb->truesize);
+ swap(nskb->destructor, skb->destructor);
+ swap(nskb->sk, skb->sk);
+ }
dev_kfree_skb(skb);
skb = nskb;
}
--
1.7.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCHv2 net] sunvnet: set queue mapping when doing packet copies
2015-01-29 19:42 [PATCHv2 net] sunvnet: set queue mapping when doing packet copies David L Stevens
@ 2015-01-29 20:39 ` Eric Dumazet
2015-01-29 21:36 ` David L Stevens
2015-01-29 22:15 ` David Miller
0 siblings, 2 replies; 7+ messages in thread
From: Eric Dumazet @ 2015-01-29 20:39 UTC (permalink / raw)
To: David L Stevens; +Cc: David Miller, netdev, Sowmini Varadhan
On Thu, 2015-01-29 at 14:42 -0500, David L Stevens wrote:
> This patch fixes a bug where vnet_skb_shape() didn't set the already-selected
> queue mapping when a packet copy was required. This results in using the
> wrong queue index for stops/starts, hung tx queues and watchdog timeouts
> under heavy load. It also transfers the destructor, truesize, and the owning
> socket for flow control (code adapted from skb_segment).
>
> Signed-off-by: David L Stevens <david.stevens@oracle.com>
> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
>
> --
> Changes since v1:
> move truesize, destructor and socket per Eric Dumazet <eric.dumazet@gmail.com>
>
> ---
> drivers/net/ethernet/sun/sunvnet.c | 10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
> index 2b719cc..0a1d3eb 100644
> --- a/drivers/net/ethernet/sun/sunvnet.c
> +++ b/drivers/net/ethernet/sun/sunvnet.c
> @@ -1123,6 +1123,16 @@ static inline struct sk_buff *vnet_skb_shape(struct sk_buff *skb, int ncookies)
> skb_shinfo(nskb)->gso_size = skb_shinfo(skb)->gso_size;
> skb_shinfo(nskb)->gso_type = skb_shinfo(skb)->gso_type;
> }
> + nskb->queue_mapping = skb->queue_mapping;
> + /* Following permits correct back-pressure, for protocols
> + * using skb_set_owner_w().
> + * Idea is to transfer ownership from skb to nskb.
> + */
> + if (skb->destructor == sock_wfree) {
Sorry, but you should remove this test.
(TCP uses another destructor, not sock_wfree())
All sent packets will support these swap() operations,
regardless of destructor.
> + swap(nskb->truesize, skb->truesize);
> + swap(nskb->destructor, skb->destructor);
> + swap(nskb->sk, skb->sk);
> + }
> dev_kfree_skb(skb);
> skb = nskb;
> }
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv2 net] sunvnet: set queue mapping when doing packet copies
2015-01-29 20:39 ` Eric Dumazet
@ 2015-01-29 21:36 ` David L Stevens
2015-01-29 22:19 ` Eric Dumazet
2015-01-29 22:15 ` David Miller
1 sibling, 1 reply; 7+ messages in thread
From: David L Stevens @ 2015-01-29 21:36 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, netdev, Sowmini Varadhan
On 01/29/2015 03:39 PM, Eric Dumazet wrote:
> Sorry, but you should remove this test.
>
> (TCP uses another destructor, not sock_wfree())
>
> All sent packets will support these swap() operations,
> regardless of destructor.
In general the destructor should match the allocator, right? So it bothers me
here that we'd be replacing it in an skb allocated with alloc_and_align_skb(),
with an an arbitrary destructor from an skb NOT allocated with alloc_and_align_skb().
If it has a different destructor that does special handling related to the allocator,
it is the original skb, not the new one, that needs the old destructor. This TCP
accounting has less to do with the buffer destructor than with the freeing of the
contents of the buffer, but that isn't necessarily true for all destructors.
Checking for a known, specific destructor is less troubling, so I don't want
to remove the test entirely.
Since the concern here is specifically TCP flow control, do you think it's sufficient
to substitute tcp_wfree for the sock_wfree here?
+-DLS
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv2 net] sunvnet: set queue mapping when doing packet copies
2015-01-29 20:39 ` Eric Dumazet
2015-01-29 21:36 ` David L Stevens
@ 2015-01-29 22:15 ` David Miller
2015-01-29 22:30 ` Eric Dumazet
1 sibling, 1 reply; 7+ messages in thread
From: David Miller @ 2015-01-29 22:15 UTC (permalink / raw)
To: eric.dumazet; +Cc: david.stevens, netdev, sowmini.varadhan
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 29 Jan 2015 12:39:56 -0800
> On Thu, 2015-01-29 at 14:42 -0500, David L Stevens wrote:
>> @@ -1123,6 +1123,16 @@ static inline struct sk_buff *vnet_skb_shape(struct sk_buff *skb, int ncookies)
>> skb_shinfo(nskb)->gso_size = skb_shinfo(skb)->gso_size;
>> skb_shinfo(nskb)->gso_type = skb_shinfo(skb)->gso_type;
>> }
>> + nskb->queue_mapping = skb->queue_mapping;
>> + /* Following permits correct back-pressure, for protocols
>> + * using skb_set_owner_w().
>> + * Idea is to transfer ownership from skb to nskb.
>> + */
>> + if (skb->destructor == sock_wfree) {
>
> Sorry, but you should remove this test.
>
> (TCP uses another destructor, not sock_wfree())
>
> All sent packets will support these swap() operations,
> regardless of destructor.
Then we need to fix skb_segment() too.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv2 net] sunvnet: set queue mapping when doing packet copies
2015-01-29 21:36 ` David L Stevens
@ 2015-01-29 22:19 ` Eric Dumazet
2015-01-29 22:22 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2015-01-29 22:19 UTC (permalink / raw)
To: David L Stevens; +Cc: David Miller, netdev, Sowmini Varadhan
On Thu, 2015-01-29 at 16:36 -0500, David L Stevens wrote:
> In general the destructor should match the allocator, right? So it bothers me
> here that we'd be replacing it in an skb allocated with alloc_and_align_skb(),
> with an an arbitrary destructor from an skb NOT allocated with alloc_and_align_skb().
> If it has a different destructor that does special handling related to the allocator,
> it is the original skb, not the new one, that needs the old destructor. This TCP
> accounting has less to do with the buffer destructor than with the freeing of the
> contents of the buffer, but that isn't necessarily true for all destructors.
>
> Checking for a known, specific destructor is less troubling, so I don't want
> to remove the test entirely.
>
> Since the concern here is specifically TCP flow control, do you think it's sufficient
> to substitute tcp_wfree for the sock_wfree here?
The concern is also for UDP.
Right now a single UDP flow can flood your network, even if application
or admin cared to set a low SO_SNDBUF.
I guess you can extend the test to sock_wfree and tcp_wfree, but :
You have to EXPORT_SYMBOL(tcp_wfree)
Add an #ifdef CONFIG_INET (take a look at skb_orphan_partial())
Thanks
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv2 net] sunvnet: set queue mapping when doing packet copies
2015-01-29 22:19 ` Eric Dumazet
@ 2015-01-29 22:22 ` Eric Dumazet
0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2015-01-29 22:22 UTC (permalink / raw)
To: David L Stevens; +Cc: David Miller, netdev, Sowmini Varadhan
On Thu, 2015-01-29 at 14:19 -0800, Eric Dumazet wrote:
> I guess you can extend the test to sock_wfree and tcp_wfree, but :
>
> You have to EXPORT_SYMBOL(tcp_wfree)
>
> Add an #ifdef CONFIG_INET (take a look at skb_orphan_partial())
Or even better, add a new helper in the same spirit than
skb_orphan_partial() : You would EXPORT_SYMBOL() it and leave
tcp_wfree() as non exported.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCHv2 net] sunvnet: set queue mapping when doing packet copies
2015-01-29 22:15 ` David Miller
@ 2015-01-29 22:30 ` Eric Dumazet
0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2015-01-29 22:30 UTC (permalink / raw)
To: David Miller; +Cc: david.stevens, netdev, sowmini.varadhan
On Thu, 2015-01-29 at 14:15 -0800, David Miller wrote:
> Then we need to fix skb_segment() too.
My concern was that skb_segment() could be used in rx paths, with quite
different destructor semantics.
While from ndo_start_xmit(), I believe sane destructors should not
depend on skb being invariant, sk 'refcount' must be using
skb->truesize.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-01-29 22:30 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-29 19:42 [PATCHv2 net] sunvnet: set queue mapping when doing packet copies David L Stevens
2015-01-29 20:39 ` Eric Dumazet
2015-01-29 21:36 ` David L Stevens
2015-01-29 22:19 ` Eric Dumazet
2015-01-29 22:22 ` Eric Dumazet
2015-01-29 22:15 ` David Miller
2015-01-29 22:30 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).