* [PATCH 0/2] xen/netback: fix issue introduced recently
@ 2023-03-27 8:36 Juergen Gross
2023-03-27 8:36 ` [PATCH 1/2] xen/netback: don't do grant copy across page boundary Juergen Gross
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Juergen Gross @ 2023-03-27 8:36 UTC (permalink / raw)
To: linux-kernel, netdev
Cc: Juergen Gross, Wei Liu, Paul Durrant, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, xen-devel, stable
The fix for XSA-423 introduced a bug which resulted in loss of network
connection in some configurations.
The first patch is fixing the issue, while the second one is removing
a test which isn't needed.
Juergen Gross (2):
xen/netback: don't do grant copy across page boundary
xen/netback: remove not needed test in xenvif_tx_build_gops()
drivers/net/xen-netback/common.h | 2 +-
drivers/net/xen-netback/netback.c | 29 +++++++++++++++++++++++------
2 files changed, 24 insertions(+), 7 deletions(-)
--
2.35.3
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] xen/netback: don't do grant copy across page boundary
2023-03-27 8:36 [PATCH 0/2] xen/netback: fix issue introduced recently Juergen Gross
@ 2023-03-27 8:36 ` Juergen Gross
2023-03-27 9:04 ` Paul Durrant
2023-03-27 9:49 ` Jan Beulich
2023-03-27 8:36 ` [PATCH 2/2] xen/netback: remove not needed test in xenvif_tx_build_gops() Juergen Gross
2023-03-28 13:30 ` [PATCH 0/2] xen/netback: fix issue introduced recently patchwork-bot+netdevbpf
2 siblings, 2 replies; 11+ messages in thread
From: Juergen Gross @ 2023-03-27 8:36 UTC (permalink / raw)
To: linux-kernel, netdev
Cc: Juergen Gross, Wei Liu, Paul Durrant, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, xen-devel, stable
Fix xenvif_get_requests() not to do grant copy operations across local
page boundaries. This requires to double the maximum number of copy
operations per queue, as each copy could now be split into 2.
Make sure that struct xenvif_tx_cb doesn't grow too large.
Cc: stable@vger.kernel.org
Fixes: ad7f402ae4f4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area")
Signed-off-by: Juergen Gross <jgross@suse.com>
---
drivers/net/xen-netback/common.h | 2 +-
drivers/net/xen-netback/netback.c | 25 +++++++++++++++++++++++--
2 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 3dbfc8a6924e..1fcbd83f7ff2 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -166,7 +166,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
grant_handle_t grant_tx_handle[MAX_PENDING_REQS];
- struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
+ struct gnttab_copy tx_copy_ops[2 * MAX_PENDING_REQS];
struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
/* passed to gnttab_[un]map_refs with pages under (un)mapping */
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 1b42676ca141..111c179f161b 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -334,6 +334,7 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
struct xenvif_tx_cb {
u16 copy_pending_idx[XEN_NETBK_LEGACY_SLOTS_MAX + 1];
u8 copy_count;
+ u32 split_mask;
};
#define XENVIF_TX_CB(skb) ((struct xenvif_tx_cb *)(skb)->cb)
@@ -361,6 +362,8 @@ static inline struct sk_buff *xenvif_alloc_skb(unsigned int size)
struct sk_buff *skb =
alloc_skb(size + NET_SKB_PAD + NET_IP_ALIGN,
GFP_ATOMIC | __GFP_NOWARN);
+
+ BUILD_BUG_ON(sizeof(*XENVIF_TX_CB(skb)) > sizeof(skb->cb));
if (unlikely(skb == NULL))
return NULL;
@@ -396,11 +399,13 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
nr_slots = shinfo->nr_frags + 1;
copy_count(skb) = 0;
+ XENVIF_TX_CB(skb)->split_mask = 0;
/* Create copy ops for exactly data_len bytes into the skb head. */
__skb_put(skb, data_len);
while (data_len > 0) {
int amount = data_len > txp->size ? txp->size : data_len;
+ bool split = false;
cop->source.u.ref = txp->gref;
cop->source.domid = queue->vif->domid;
@@ -413,6 +418,13 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
cop->dest.u.gmfn = virt_to_gfn(skb->data + skb_headlen(skb)
- data_len);
+ /* Don't cross local page boundary! */
+ if (cop->dest.offset + amount > XEN_PAGE_SIZE) {
+ amount = XEN_PAGE_SIZE - cop->dest.offset;
+ XENVIF_TX_CB(skb)->split_mask |= 1U << copy_count(skb);
+ split = true;
+ }
+
cop->len = amount;
cop->flags = GNTCOPY_source_gref;
@@ -420,7 +432,8 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
pending_idx = queue->pending_ring[index];
callback_param(queue, pending_idx).ctx = NULL;
copy_pending_idx(skb, copy_count(skb)) = pending_idx;
- copy_count(skb)++;
+ if (!split)
+ copy_count(skb)++;
cop++;
data_len -= amount;
@@ -441,7 +454,8 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
nr_slots--;
} else {
/* The copy op partially covered the tx_request.
- * The remainder will be mapped.
+ * The remainder will be mapped or copied in the next
+ * iteration.
*/
txp->offset += amount;
txp->size -= amount;
@@ -539,6 +553,13 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue,
pending_idx = copy_pending_idx(skb, i);
newerr = (*gopp_copy)->status;
+
+ /* Split copies need to be handled together. */
+ if (XENVIF_TX_CB(skb)->split_mask & (1U << i)) {
+ (*gopp_copy)++;
+ if (!newerr)
+ newerr = (*gopp_copy)->status;
+ }
if (likely(!newerr)) {
/* The first frag might still have this slot mapped */
if (i < copy_count(skb) - 1 || !sharedslot)
--
2.35.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2] xen/netback: remove not needed test in xenvif_tx_build_gops()
2023-03-27 8:36 [PATCH 0/2] xen/netback: fix issue introduced recently Juergen Gross
2023-03-27 8:36 ` [PATCH 1/2] xen/netback: don't do grant copy across page boundary Juergen Gross
@ 2023-03-27 8:36 ` Juergen Gross
2023-03-27 9:04 ` Paul Durrant
2023-03-28 13:30 ` [PATCH 0/2] xen/netback: fix issue introduced recently patchwork-bot+netdevbpf
2 siblings, 1 reply; 11+ messages in thread
From: Juergen Gross @ 2023-03-27 8:36 UTC (permalink / raw)
To: linux-kernel, netdev
Cc: Juergen Gross, Wei Liu, Paul Durrant, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, xen-devel, Jan Beulich
The tests for the number of grant mapping or copy operations reaching
the array size of the operations buffer at the end of the main loop in
xenvif_tx_build_gops() isn't needed.
The loop can handle at maximum MAX_PENDING_REQS transfer requests, as
XEN_RING_NR_UNCONSUMED_REQUESTS() is taking unsent responses into
consideration, too.
Remove the tests.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
drivers/net/xen-netback/netback.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 111c179f161b..4943be4fd99d 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1082,10 +1082,6 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
__skb_queue_tail(&queue->tx_queue, skb);
queue->tx.req_cons = idx;
-
- if ((*map_ops >= ARRAY_SIZE(queue->tx_map_ops)) ||
- (*copy_ops >= ARRAY_SIZE(queue->tx_copy_ops)))
- break;
}
return;
--
2.35.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] xen/netback: don't do grant copy across page boundary
2023-03-27 8:36 ` [PATCH 1/2] xen/netback: don't do grant copy across page boundary Juergen Gross
@ 2023-03-27 9:04 ` Paul Durrant
2023-03-27 9:49 ` Jan Beulich
1 sibling, 0 replies; 11+ messages in thread
From: Paul Durrant @ 2023-03-27 9:04 UTC (permalink / raw)
To: Juergen Gross, linux-kernel, netdev
Cc: Wei Liu, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, xen-devel, stable
On 27/03/2023 09:36, Juergen Gross wrote:
> Fix xenvif_get_requests() not to do grant copy operations across local
> page boundaries. This requires to double the maximum number of copy
> operations per queue, as each copy could now be split into 2.
>
> Make sure that struct xenvif_tx_cb doesn't grow too large.
>
> Cc: stable@vger.kernel.org
> Fixes: ad7f402ae4f4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area")
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> drivers/net/xen-netback/common.h | 2 +-
> drivers/net/xen-netback/netback.c | 25 +++++++++++++++++++++++--
> 2 files changed, 24 insertions(+), 3 deletions(-)
>
Reviewed-by: Paul Durrant <paul@xen.org>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] xen/netback: remove not needed test in xenvif_tx_build_gops()
2023-03-27 8:36 ` [PATCH 2/2] xen/netback: remove not needed test in xenvif_tx_build_gops() Juergen Gross
@ 2023-03-27 9:04 ` Paul Durrant
0 siblings, 0 replies; 11+ messages in thread
From: Paul Durrant @ 2023-03-27 9:04 UTC (permalink / raw)
To: Juergen Gross, linux-kernel, netdev
Cc: Wei Liu, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, xen-devel, Jan Beulich
On 27/03/2023 09:36, Juergen Gross wrote:
> The tests for the number of grant mapping or copy operations reaching
> the array size of the operations buffer at the end of the main loop in
> xenvif_tx_build_gops() isn't needed.
>
> The loop can handle at maximum MAX_PENDING_REQS transfer requests, as
> XEN_RING_NR_UNCONSUMED_REQUESTS() is taking unsent responses into
> consideration, too.
>
> Remove the tests.
>
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> drivers/net/xen-netback/netback.c | 4 ----
> 1 file changed, 4 deletions(-)
>
Reviewed-by: Paul Durrant <paul@xen.org>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] xen/netback: don't do grant copy across page boundary
2023-03-27 8:36 ` [PATCH 1/2] xen/netback: don't do grant copy across page boundary Juergen Gross
2023-03-27 9:04 ` Paul Durrant
@ 2023-03-27 9:49 ` Jan Beulich
2023-03-27 10:07 ` Juergen Gross
1 sibling, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2023-03-27 9:49 UTC (permalink / raw)
To: Juergen Gross
Cc: Wei Liu, Paul Durrant, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, xen-devel, stable, linux-kernel,
netdev
On 27.03.2023 10:36, Juergen Gross wrote:
> Fix xenvif_get_requests() not to do grant copy operations across local
> page boundaries. This requires to double the maximum number of copy
> operations per queue, as each copy could now be split into 2.
>
> Make sure that struct xenvif_tx_cb doesn't grow too large.
>
> Cc: stable@vger.kernel.org
> Fixes: ad7f402ae4f4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area")
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
> drivers/net/xen-netback/common.h | 2 +-
> drivers/net/xen-netback/netback.c | 25 +++++++++++++++++++++++--
> 2 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index 3dbfc8a6924e..1fcbd83f7ff2 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -166,7 +166,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
> struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
> grant_handle_t grant_tx_handle[MAX_PENDING_REQS];
>
> - struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
> + struct gnttab_copy tx_copy_ops[2 * MAX_PENDING_REQS];
> struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
> struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
> /* passed to gnttab_[un]map_refs with pages under (un)mapping */
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 1b42676ca141..111c179f161b 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -334,6 +334,7 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
> struct xenvif_tx_cb {
> u16 copy_pending_idx[XEN_NETBK_LEGACY_SLOTS_MAX + 1];
> u8 copy_count;
> + u32 split_mask;
> };
>
> #define XENVIF_TX_CB(skb) ((struct xenvif_tx_cb *)(skb)->cb)
> @@ -361,6 +362,8 @@ static inline struct sk_buff *xenvif_alloc_skb(unsigned int size)
> struct sk_buff *skb =
> alloc_skb(size + NET_SKB_PAD + NET_IP_ALIGN,
> GFP_ATOMIC | __GFP_NOWARN);
> +
> + BUILD_BUG_ON(sizeof(*XENVIF_TX_CB(skb)) > sizeof(skb->cb));
> if (unlikely(skb == NULL))
> return NULL;
>
> @@ -396,11 +399,13 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
> nr_slots = shinfo->nr_frags + 1;
>
> copy_count(skb) = 0;
> + XENVIF_TX_CB(skb)->split_mask = 0;
>
> /* Create copy ops for exactly data_len bytes into the skb head. */
> __skb_put(skb, data_len);
> while (data_len > 0) {
> int amount = data_len > txp->size ? txp->size : data_len;
> + bool split = false;
>
> cop->source.u.ref = txp->gref;
> cop->source.domid = queue->vif->domid;
> @@ -413,6 +418,13 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
> cop->dest.u.gmfn = virt_to_gfn(skb->data + skb_headlen(skb)
> - data_len);
>
> + /* Don't cross local page boundary! */
> + if (cop->dest.offset + amount > XEN_PAGE_SIZE) {
> + amount = XEN_PAGE_SIZE - cop->dest.offset;
> + XENVIF_TX_CB(skb)->split_mask |= 1U << copy_count(skb);
Maybe worthwhile to add a BUILD_BUG_ON() somewhere to make sure this
shift won't grow too large a shift count. The number of slots accepted
could conceivably be grown past XEN_NETBK_LEGACY_SLOTS_MAX (i.e.
XEN_NETIF_NR_SLOTS_MIN) at some point.
> + split = true;
> + }
> +
> cop->len = amount;
> cop->flags = GNTCOPY_source_gref;
>
> @@ -420,7 +432,8 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
> pending_idx = queue->pending_ring[index];
> callback_param(queue, pending_idx).ctx = NULL;
> copy_pending_idx(skb, copy_count(skb)) = pending_idx;
> - copy_count(skb)++;
> + if (!split)
> + copy_count(skb)++;
>
> cop++;
> data_len -= amount;
> @@ -441,7 +454,8 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
> nr_slots--;
> } else {
> /* The copy op partially covered the tx_request.
> - * The remainder will be mapped.
> + * The remainder will be mapped or copied in the next
> + * iteration.
> */
> txp->offset += amount;
> txp->size -= amount;
> @@ -539,6 +553,13 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue,
> pending_idx = copy_pending_idx(skb, i);
>
> newerr = (*gopp_copy)->status;
> +
> + /* Split copies need to be handled together. */
> + if (XENVIF_TX_CB(skb)->split_mask & (1U << i)) {
> + (*gopp_copy)++;
> + if (!newerr)
> + newerr = (*gopp_copy)->status;
> + }
It isn't guaranteed that a slot may be split only once, is it? Assuming a
near-64k packet with all tiny non-primary slots, that'll cause those tiny
slots to all be mapped, but due to
if (ret >= XEN_NETBK_LEGACY_SLOTS_MAX - 1 && data_len < txreq.size)
data_len = txreq.size;
will, afaict, cause a lot of copying for the primary slot. Therefore I
think you need a loop here, not just an if(). Plus tx_copy_ops[]'es
dimension also looks to need further growing to accommodate this. Or
maybe not - at least the extreme example given would still be fine; more
generally packets being limited to below 64k means 2*16 slots would
suffice at one end of the scale, while 2*MAX_PENDING_REQS would at the
other end (all tiny, including the primary slot). What I haven't fully
convinced myself of is whether there might be cases in the middle which
are yet worse.
As I've been struggling with the code fragment quoted above already in
the patch originally introducing it, I'd like to see that relaxed. Can't
we avoid excessive copying by suitably growing tx_map_ops[] and then
deleting that bumping of data_len? Then there also wouldn't be the risk
multiple splits per copy anymore.
Alternatively to all of the above: Am I overlooking a check somewhere
which would also constrain the primary slot (or more precisely its
residual) to within a single page (along the lines of the check for
non-primary slots in xenvif_count_requests())?
Jan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] xen/netback: don't do grant copy across page boundary
2023-03-27 9:49 ` Jan Beulich
@ 2023-03-27 10:07 ` Juergen Gross
2023-03-27 15:38 ` Jan Beulich
0 siblings, 1 reply; 11+ messages in thread
From: Juergen Gross @ 2023-03-27 10:07 UTC (permalink / raw)
To: Jan Beulich
Cc: Wei Liu, Paul Durrant, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, xen-devel, stable, linux-kernel,
netdev
[-- Attachment #1.1.1: Type: text/plain, Size: 6291 bytes --]
On 27.03.23 11:49, Jan Beulich wrote:
> On 27.03.2023 10:36, Juergen Gross wrote:
>> Fix xenvif_get_requests() not to do grant copy operations across local
>> page boundaries. This requires to double the maximum number of copy
>> operations per queue, as each copy could now be split into 2.
>>
>> Make sure that struct xenvif_tx_cb doesn't grow too large.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: ad7f402ae4f4 ("xen/netback: Ensure protocol headers don't fall in the non-linear area")
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>> drivers/net/xen-netback/common.h | 2 +-
>> drivers/net/xen-netback/netback.c | 25 +++++++++++++++++++++++--
>> 2 files changed, 24 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
>> index 3dbfc8a6924e..1fcbd83f7ff2 100644
>> --- a/drivers/net/xen-netback/common.h
>> +++ b/drivers/net/xen-netback/common.h
>> @@ -166,7 +166,7 @@ struct xenvif_queue { /* Per-queue data for xenvif */
>> struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
>> grant_handle_t grant_tx_handle[MAX_PENDING_REQS];
>>
>> - struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
>> + struct gnttab_copy tx_copy_ops[2 * MAX_PENDING_REQS];
>> struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
>> struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
>> /* passed to gnttab_[un]map_refs with pages under (un)mapping */
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index 1b42676ca141..111c179f161b 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -334,6 +334,7 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
>> struct xenvif_tx_cb {
>> u16 copy_pending_idx[XEN_NETBK_LEGACY_SLOTS_MAX + 1];
>> u8 copy_count;
>> + u32 split_mask;
>> };
>>
>> #define XENVIF_TX_CB(skb) ((struct xenvif_tx_cb *)(skb)->cb)
>> @@ -361,6 +362,8 @@ static inline struct sk_buff *xenvif_alloc_skb(unsigned int size)
>> struct sk_buff *skb =
>> alloc_skb(size + NET_SKB_PAD + NET_IP_ALIGN,
>> GFP_ATOMIC | __GFP_NOWARN);
>> +
>> + BUILD_BUG_ON(sizeof(*XENVIF_TX_CB(skb)) > sizeof(skb->cb));
>> if (unlikely(skb == NULL))
>> return NULL;
>>
>> @@ -396,11 +399,13 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
>> nr_slots = shinfo->nr_frags + 1;
>>
>> copy_count(skb) = 0;
>> + XENVIF_TX_CB(skb)->split_mask = 0;
>>
>> /* Create copy ops for exactly data_len bytes into the skb head. */
>> __skb_put(skb, data_len);
>> while (data_len > 0) {
>> int amount = data_len > txp->size ? txp->size : data_len;
>> + bool split = false;
>>
>> cop->source.u.ref = txp->gref;
>> cop->source.domid = queue->vif->domid;
>> @@ -413,6 +418,13 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
>> cop->dest.u.gmfn = virt_to_gfn(skb->data + skb_headlen(skb)
>> - data_len);
>>
>> + /* Don't cross local page boundary! */
>> + if (cop->dest.offset + amount > XEN_PAGE_SIZE) {
>> + amount = XEN_PAGE_SIZE - cop->dest.offset;
>> + XENVIF_TX_CB(skb)->split_mask |= 1U << copy_count(skb);
>
> Maybe worthwhile to add a BUILD_BUG_ON() somewhere to make sure this
> shift won't grow too large a shift count. The number of slots accepted
> could conceivably be grown past XEN_NETBK_LEGACY_SLOTS_MAX (i.e.
> XEN_NETIF_NR_SLOTS_MIN) at some point.
This is basically impossible due to the size restriction of struct
xenvif_tx_cb.
>
>> + split = true;
>> + }
>> +
>> cop->len = amount;
>> cop->flags = GNTCOPY_source_gref;
>>
>> @@ -420,7 +432,8 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
>> pending_idx = queue->pending_ring[index];
>> callback_param(queue, pending_idx).ctx = NULL;
>> copy_pending_idx(skb, copy_count(skb)) = pending_idx;
>> - copy_count(skb)++;
>> + if (!split)
>> + copy_count(skb)++;
>>
>> cop++;
>> data_len -= amount;
>> @@ -441,7 +454,8 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
>> nr_slots--;
>> } else {
>> /* The copy op partially covered the tx_request.
>> - * The remainder will be mapped.
>> + * The remainder will be mapped or copied in the next
>> + * iteration.
>> */
>> txp->offset += amount;
>> txp->size -= amount;
>> @@ -539,6 +553,13 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue,
>> pending_idx = copy_pending_idx(skb, i);
>>
>> newerr = (*gopp_copy)->status;
>> +
>> + /* Split copies need to be handled together. */
>> + if (XENVIF_TX_CB(skb)->split_mask & (1U << i)) {
>> + (*gopp_copy)++;
>> + if (!newerr)
>> + newerr = (*gopp_copy)->status;
>> + }
>
> It isn't guaranteed that a slot may be split only once, is it? Assuming a
I think it is guaranteed.
No slot can cover more than XEN_PAGE_SIZE bytes due to the grants being
restricted to that size. There is no way how such a data packet could cross
2 page boundaries.
In the end the problem isn't the copies for the linear area not crossing
multiple page boundaries, but the copies for a single request slot not
doing so. And this can't happen IMO.
> near-64k packet with all tiny non-primary slots, that'll cause those tiny
> slots to all be mapped, but due to
>
> if (ret >= XEN_NETBK_LEGACY_SLOTS_MAX - 1 && data_len < txreq.size)
> data_len = txreq.size;
>
> will, afaict, cause a lot of copying for the primary slot. Therefore I
> think you need a loop here, not just an if(). Plus tx_copy_ops[]'es
> dimension also looks to need further growing to accommodate this. Or
> maybe not - at least the extreme example given would still be fine; more
> generally packets being limited to below 64k means 2*16 slots would
> suffice at one end of the scale, while 2*MAX_PENDING_REQS would at the
> other end (all tiny, including the primary slot). What I haven't fully
> convinced myself of is whether there might be cases in the middle which
> are yet worse.
See above reasoning. I think it is okay, but maybe I'm missing something.
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] xen/netback: don't do grant copy across page boundary
2023-03-27 10:07 ` Juergen Gross
@ 2023-03-27 15:38 ` Jan Beulich
2023-03-27 16:22 ` Juergen Gross
0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2023-03-27 15:38 UTC (permalink / raw)
To: Juergen Gross
Cc: Wei Liu, Paul Durrant, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, xen-devel, stable, linux-kernel,
netdev
On 27.03.2023 12:07, Juergen Gross wrote:
> On 27.03.23 11:49, Jan Beulich wrote:
>> On 27.03.2023 10:36, Juergen Gross wrote:
>>> @@ -413,6 +418,13 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
>>> cop->dest.u.gmfn = virt_to_gfn(skb->data + skb_headlen(skb)
>>> - data_len);
>>>
>>> + /* Don't cross local page boundary! */
>>> + if (cop->dest.offset + amount > XEN_PAGE_SIZE) {
>>> + amount = XEN_PAGE_SIZE - cop->dest.offset;
>>> + XENVIF_TX_CB(skb)->split_mask |= 1U << copy_count(skb);
>>
>> Maybe worthwhile to add a BUILD_BUG_ON() somewhere to make sure this
>> shift won't grow too large a shift count. The number of slots accepted
>> could conceivably be grown past XEN_NETBK_LEGACY_SLOTS_MAX (i.e.
>> XEN_NETIF_NR_SLOTS_MIN) at some point.
>
> This is basically impossible due to the size restriction of struct
> xenvif_tx_cb.
If its size became a problem, it might simply take a level of indirection
to overcome the limitation.
>>> @@ -420,7 +432,8 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
>>> pending_idx = queue->pending_ring[index];
>>> callback_param(queue, pending_idx).ctx = NULL;
>>> copy_pending_idx(skb, copy_count(skb)) = pending_idx;
>>> - copy_count(skb)++;
>>> + if (!split)
>>> + copy_count(skb)++;
>>>
>>> cop++;
>>> data_len -= amount;
>>> @@ -441,7 +454,8 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
>>> nr_slots--;
>>> } else {
>>> /* The copy op partially covered the tx_request.
>>> - * The remainder will be mapped.
>>> + * The remainder will be mapped or copied in the next
>>> + * iteration.
>>> */
>>> txp->offset += amount;
>>> txp->size -= amount;
>>> @@ -539,6 +553,13 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue,
>>> pending_idx = copy_pending_idx(skb, i);
>>>
>>> newerr = (*gopp_copy)->status;
>>> +
>>> + /* Split copies need to be handled together. */
>>> + if (XENVIF_TX_CB(skb)->split_mask & (1U << i)) {
>>> + (*gopp_copy)++;
>>> + if (!newerr)
>>> + newerr = (*gopp_copy)->status;
>>> + }
>>
>> It isn't guaranteed that a slot may be split only once, is it? Assuming a
>
> I think it is guaranteed.
>
> No slot can cover more than XEN_PAGE_SIZE bytes due to the grants being
> restricted to that size. There is no way how such a data packet could cross
> 2 page boundaries.
>
> In the end the problem isn't the copies for the linear area not crossing
> multiple page boundaries, but the copies for a single request slot not
> doing so. And this can't happen IMO.
You're thinking of only well-formed requests. What about said request
providing a large size with only tiny fragments? xenvif_get_requests()
will happily process such, creating bogus grant-copy ops. But them failing
once submitted to Xen will be only after damage may already have occurred
(from bogus updates of internal state; the logic altogether is too
involved for me to be convinced that nothing bad can happen).
Interestingly (as I realize now) the shifts you add are not be at risk of
turning UB in this case, as the shift count won't go beyond 16.
>> near-64k packet with all tiny non-primary slots, that'll cause those tiny
>> slots to all be mapped, but due to
>>
>> if (ret >= XEN_NETBK_LEGACY_SLOTS_MAX - 1 && data_len < txreq.size)
>> data_len = txreq.size;
>>
>> will, afaict, cause a lot of copying for the primary slot. Therefore I
>> think you need a loop here, not just an if(). Plus tx_copy_ops[]'es
>> dimension also looks to need further growing to accommodate this. Or
>> maybe not - at least the extreme example given would still be fine; more
>> generally packets being limited to below 64k means 2*16 slots would
>> suffice at one end of the scale, while 2*MAX_PENDING_REQS would at the
>> other end (all tiny, including the primary slot). What I haven't fully
>> convinced myself of is whether there might be cases in the middle which
>> are yet worse.
>
> See above reasoning. I think it is okay, but maybe I'm missing something.
Well, the main thing I'm missing is a "primary request fits in a page"
check, even more so with the new copying logic that the commit referenced
by Fixes: introduced into xenvif_get_requests().
Jan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] xen/netback: don't do grant copy across page boundary
2023-03-27 15:38 ` Jan Beulich
@ 2023-03-27 16:22 ` Juergen Gross
2023-03-28 7:50 ` Jan Beulich
0 siblings, 1 reply; 11+ messages in thread
From: Juergen Gross @ 2023-03-27 16:22 UTC (permalink / raw)
To: Jan Beulich
Cc: Wei Liu, Paul Durrant, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, xen-devel, stable, linux-kernel,
netdev
[-- Attachment #1.1.1: Type: text/plain, Size: 5229 bytes --]
On 27.03.23 17:38, Jan Beulich wrote:
> On 27.03.2023 12:07, Juergen Gross wrote:
>> On 27.03.23 11:49, Jan Beulich wrote:
>>> On 27.03.2023 10:36, Juergen Gross wrote:
>>>> @@ -413,6 +418,13 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
>>>> cop->dest.u.gmfn = virt_to_gfn(skb->data + skb_headlen(skb)
>>>> - data_len);
>>>>
>>>> + /* Don't cross local page boundary! */
>>>> + if (cop->dest.offset + amount > XEN_PAGE_SIZE) {
>>>> + amount = XEN_PAGE_SIZE - cop->dest.offset;
>>>> + XENVIF_TX_CB(skb)->split_mask |= 1U << copy_count(skb);
>>>
>>> Maybe worthwhile to add a BUILD_BUG_ON() somewhere to make sure this
>>> shift won't grow too large a shift count. The number of slots accepted
>>> could conceivably be grown past XEN_NETBK_LEGACY_SLOTS_MAX (i.e.
>>> XEN_NETIF_NR_SLOTS_MIN) at some point.
>>
>> This is basically impossible due to the size restriction of struct
>> xenvif_tx_cb.
>
> If its size became a problem, it might simply take a level of indirection
> to overcome the limitation.
Maybe.
OTOH this would require some rework, which should take such problems into
consideration.
In the end I'd be fine to add such a BUILD_BUG_ON(), as the code is
complicated enough already.
>
>>>> @@ -420,7 +432,8 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
>>>> pending_idx = queue->pending_ring[index];
>>>> callback_param(queue, pending_idx).ctx = NULL;
>>>> copy_pending_idx(skb, copy_count(skb)) = pending_idx;
>>>> - copy_count(skb)++;
>>>> + if (!split)
>>>> + copy_count(skb)++;
>>>>
>>>> cop++;
>>>> data_len -= amount;
>>>> @@ -441,7 +454,8 @@ static void xenvif_get_requests(struct xenvif_queue *queue,
>>>> nr_slots--;
>>>> } else {
>>>> /* The copy op partially covered the tx_request.
>>>> - * The remainder will be mapped.
>>>> + * The remainder will be mapped or copied in the next
>>>> + * iteration.
>>>> */
>>>> txp->offset += amount;
>>>> txp->size -= amount;
>>>> @@ -539,6 +553,13 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue,
>>>> pending_idx = copy_pending_idx(skb, i);
>>>>
>>>> newerr = (*gopp_copy)->status;
>>>> +
>>>> + /* Split copies need to be handled together. */
>>>> + if (XENVIF_TX_CB(skb)->split_mask & (1U << i)) {
>>>> + (*gopp_copy)++;
>>>> + if (!newerr)
>>>> + newerr = (*gopp_copy)->status;
>>>> + }
>>>
>>> It isn't guaranteed that a slot may be split only once, is it? Assuming a
>>
>> I think it is guaranteed.
>>
>> No slot can cover more than XEN_PAGE_SIZE bytes due to the grants being
>> restricted to that size. There is no way how such a data packet could cross
>> 2 page boundaries.
>>
>> In the end the problem isn't the copies for the linear area not crossing
>> multiple page boundaries, but the copies for a single request slot not
>> doing so. And this can't happen IMO.
>
> You're thinking of only well-formed requests. What about said request
> providing a large size with only tiny fragments? xenvif_get_requests()
> will happily process such, creating bogus grant-copy ops. But them failing
> once submitted to Xen will be only after damage may already have occurred
> (from bogus updates of internal state; the logic altogether is too
> involved for me to be convinced that nothing bad can happen).
There are sanity checks after each relevant RING_COPY_REQUEST() call, which
will bail out if "(txp->offset + txp->size) > XEN_PAGE_SIZE" (the first one
is after the call of xenvif_count_requests(), as this call will decrease the
size of the request, the other check is in xenvif_count_requests()).
> Interestingly (as I realize now) the shifts you add are not be at risk of
> turning UB in this case, as the shift count won't go beyond 16.
>
>>> near-64k packet with all tiny non-primary slots, that'll cause those tiny
>>> slots to all be mapped, but due to
>>>
>>> if (ret >= XEN_NETBK_LEGACY_SLOTS_MAX - 1 && data_len < txreq.size)
>>> data_len = txreq.size;
>>>
>>> will, afaict, cause a lot of copying for the primary slot. Therefore I
>>> think you need a loop here, not just an if(). Plus tx_copy_ops[]'es
>>> dimension also looks to need further growing to accommodate this. Or
>>> maybe not - at least the extreme example given would still be fine; more
>>> generally packets being limited to below 64k means 2*16 slots would
>>> suffice at one end of the scale, while 2*MAX_PENDING_REQS would at the
>>> other end (all tiny, including the primary slot). What I haven't fully
>>> convinced myself of is whether there might be cases in the middle which
>>> are yet worse.
>>
>> See above reasoning. I think it is okay, but maybe I'm missing something.
>
> Well, the main thing I'm missing is a "primary request fits in a page"
> check, even more so with the new copying logic that the commit referenced
> by Fixes: introduced into xenvif_get_requests().
When xenvif_get_requests() gets called, all requests are sanity checked
already (note that xenvif_get_requests() is working on the local copies of
the requests).
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] xen/netback: don't do grant copy across page boundary
2023-03-27 16:22 ` Juergen Gross
@ 2023-03-28 7:50 ` Jan Beulich
0 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2023-03-28 7:50 UTC (permalink / raw)
To: Juergen Gross
Cc: Wei Liu, Paul Durrant, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, xen-devel, stable, linux-kernel,
netdev
On 27.03.2023 18:22, Juergen Gross wrote:
> On 27.03.23 17:38, Jan Beulich wrote:
>> On 27.03.2023 12:07, Juergen Gross wrote:
>>> On 27.03.23 11:49, Jan Beulich wrote:
>>>> On 27.03.2023 10:36, Juergen Gross wrote:
>>>>> @@ -539,6 +553,13 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue,
>>>>> pending_idx = copy_pending_idx(skb, i);
>>>>>
>>>>> newerr = (*gopp_copy)->status;
>>>>> +
>>>>> + /* Split copies need to be handled together. */
>>>>> + if (XENVIF_TX_CB(skb)->split_mask & (1U << i)) {
>>>>> + (*gopp_copy)++;
>>>>> + if (!newerr)
>>>>> + newerr = (*gopp_copy)->status;
>>>>> + }
>>>>
>>>> It isn't guaranteed that a slot may be split only once, is it? Assuming a
>>>
>>> I think it is guaranteed.
>>>
>>> No slot can cover more than XEN_PAGE_SIZE bytes due to the grants being
>>> restricted to that size. There is no way how such a data packet could cross
>>> 2 page boundaries.
>>>
>>> In the end the problem isn't the copies for the linear area not crossing
>>> multiple page boundaries, but the copies for a single request slot not
>>> doing so. And this can't happen IMO.
>>
>> You're thinking of only well-formed requests. What about said request
>> providing a large size with only tiny fragments? xenvif_get_requests()
>> will happily process such, creating bogus grant-copy ops. But them failing
>> once submitted to Xen will be only after damage may already have occurred
>> (from bogus updates of internal state; the logic altogether is too
>> involved for me to be convinced that nothing bad can happen).
>
> There are sanity checks after each relevant RING_COPY_REQUEST() call, which
> will bail out if "(txp->offset + txp->size) > XEN_PAGE_SIZE" (the first one
> is after the call of xenvif_count_requests(), as this call will decrease the
> size of the request, the other check is in xenvif_count_requests()).
Oh, indeed - that's the check I've been overlooking. (The messages logged
there could do with also mentioning "Cross page boundary", like the one
in xenvif_count_requests() does.)
Jan
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] xen/netback: fix issue introduced recently
2023-03-27 8:36 [PATCH 0/2] xen/netback: fix issue introduced recently Juergen Gross
2023-03-27 8:36 ` [PATCH 1/2] xen/netback: don't do grant copy across page boundary Juergen Gross
2023-03-27 8:36 ` [PATCH 2/2] xen/netback: remove not needed test in xenvif_tx_build_gops() Juergen Gross
@ 2023-03-28 13:30 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-03-28 13:30 UTC (permalink / raw)
To: Juergen Gross
Cc: linux-kernel, netdev, wei.liu, paul, davem, edumazet, kuba,
pabeni, xen-devel, stable
Hello:
This series was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Mon, 27 Mar 2023 10:36:44 +0200 you wrote:
> The fix for XSA-423 introduced a bug which resulted in loss of network
> connection in some configurations.
>
> The first patch is fixing the issue, while the second one is removing
> a test which isn't needed.
>
> Juergen Gross (2):
> xen/netback: don't do grant copy across page boundary
> xen/netback: remove not needed test in xenvif_tx_build_gops()
>
> [...]
Here is the summary with links:
- [1/2] xen/netback: don't do grant copy across page boundary
https://git.kernel.org/netdev/net/c/05310f31ca74
- [2/2] xen/netback: remove not needed test in xenvif_tx_build_gops()
https://git.kernel.org/netdev/net/c/8fb8ebf94877
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-03-28 13:31 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-27 8:36 [PATCH 0/2] xen/netback: fix issue introduced recently Juergen Gross
2023-03-27 8:36 ` [PATCH 1/2] xen/netback: don't do grant copy across page boundary Juergen Gross
2023-03-27 9:04 ` Paul Durrant
2023-03-27 9:49 ` Jan Beulich
2023-03-27 10:07 ` Juergen Gross
2023-03-27 15:38 ` Jan Beulich
2023-03-27 16:22 ` Juergen Gross
2023-03-28 7:50 ` Jan Beulich
2023-03-27 8:36 ` [PATCH 2/2] xen/netback: remove not needed test in xenvif_tx_build_gops() Juergen Gross
2023-03-27 9:04 ` Paul Durrant
2023-03-28 13:30 ` [PATCH 0/2] xen/netback: fix issue introduced recently patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox