* [PATCH 0/3] hw/net/e1000e: Fix assertion failures in e1000e_write_payload_frag_to_rx_buffers()
@ 2025-11-03 17:58 Peter Maydell
2025-11-03 17:58 ` [PATCH 1/3] hw/net/e1000e_core: Don't advance desc_offset for NULL buffer RX descriptors Peter Maydell
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Peter Maydell @ 2025-11-03 17:58 UTC (permalink / raw)
To: qemu-devel; +Cc: Dmitry Fleytman, Akihiko Odaki, Jason Wang
This patchset fixes assertion failures in
e1000e_write_payload_frag_to_rx_buffers(). There turn out to be two
ways this assert could be triggered.
The first is the one reported in
https://gitlab.com/qemu-project/qemu/-/issues/537 where a malicious
guest could set up the device into loopback mode and then send a
carefully sized packet that trips a bug in the logic in
e1000e_write_packet_to_guest() that tries to calculate how much data
it is going to put into an RX descriptor, causing us to hit the
assertion.
The second is one I spotted by code inspection, where the assert is
slightly over-eager and could assert in one valid case of a just
exactly large enough packet.
Patch 1 fixes what I believe to be an incorrect-behaviour bug
for descriptors with NULL buffer addresses, largely because
that makes the following patch to fix the assertion less awkward.
thanks
-- PMM
Peter Maydell (3):
hw/net/e1000e_core: Don't advance desc_offset for NULL buffer RX
descriptors
hw/net/e1000e_core: Correct rx oversize packet checks
hw/net/e1000e_core: Adjust e1000e_write_payload_frag_to_rx_buffers()
assert
hw/net/e1000e_core.c | 85 ++++++++++++++++++++++++++++++--------------
1 file changed, 59 insertions(+), 26 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] hw/net/e1000e_core: Don't advance desc_offset for NULL buffer RX descriptors
2025-11-03 17:58 [PATCH 0/3] hw/net/e1000e: Fix assertion failures in e1000e_write_payload_frag_to_rx_buffers() Peter Maydell
@ 2025-11-03 17:58 ` Peter Maydell
2025-11-04 6:11 ` Akihiko Odaki
2025-11-04 16:44 ` Philippe Mathieu-Daudé
2025-11-03 17:58 ` [PATCH 2/3] hw/net/e1000e_core: Correct rx oversize packet checks Peter Maydell
` (2 subsequent siblings)
3 siblings, 2 replies; 10+ messages in thread
From: Peter Maydell @ 2025-11-03 17:58 UTC (permalink / raw)
To: qemu-devel; +Cc: Dmitry Fleytman, Akihiko Odaki, Jason Wang
In e1000e_write_packet_to_guest() we don't write data for RX descriptors
where the buffer address is NULL (as required by the i82574 datasheet
section 7.1.7.2). However, when we do this we still update desc_offset
by the amount of data we would have written to the RX descriptor if
it had a valid buffer pointer, resulting in our dropping that data
entirely. The data sheet is not 100% clear on the subject, but this
seems unlikely to be the correct behaviour.
Rearrange the null-descriptor logic so that we don't treat these
do-nothing descriptors as if we'd really written the data.
This both fixes a bug and also is a prerequisite to cleaning up
the size calculation logic in the next patch.
(Cc to stable largely because it will be needed for the next patch,
which fixes a more serious bug.)
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
hw/net/e1000e_core.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 8fef598b498..ba77cb6011f 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1481,7 +1481,6 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct NetRxPkt *pkt,
PCIDevice *d = core->owner;
dma_addr_t base;
union e1000_rx_desc_union desc;
- size_t desc_size;
size_t desc_offset = 0;
size_t iov_ofs = 0;
@@ -1500,12 +1499,6 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct NetRxPkt *pkt,
E1000EBAState bastate = { { 0 } };
bool is_last = false;
- desc_size = total_size - desc_offset;
-
- if (desc_size > core->rx_desc_buf_size) {
- desc_size = core->rx_desc_buf_size;
- }
-
if (e1000e_ring_empty(core, rxi)) {
return;
}
@@ -1519,6 +1512,12 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct NetRxPkt *pkt,
e1000e_read_rx_descr(core, &desc, ba);
if (ba[0]) {
+ size_t desc_size = total_size - desc_offset;
+
+ if (desc_size > core->rx_desc_buf_size) {
+ desc_size = core->rx_desc_buf_size;
+ }
+
if (desc_offset < size) {
static const uint32_t fcs_pad;
size_t iov_copy;
@@ -1582,13 +1581,13 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct NetRxPkt *pkt,
(const char *) &fcs_pad, e1000x_fcs_len(core->mac));
}
}
+ desc_offset += desc_size;
+ if (desc_offset >= total_size) {
+ is_last = true;
+ }
} else { /* as per intel docs; skip descriptors with null buf addr */
trace_e1000e_rx_null_descriptor();
}
- desc_offset += desc_size;
- if (desc_offset >= total_size) {
- is_last = true;
- }
e1000e_write_rx_descr(core, &desc, is_last ? core->rx_pkt : NULL,
rss_info, do_ps ? ps_hdr_len : 0, &bastate.written);
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] hw/net/e1000e_core: Correct rx oversize packet checks
2025-11-03 17:58 [PATCH 0/3] hw/net/e1000e: Fix assertion failures in e1000e_write_payload_frag_to_rx_buffers() Peter Maydell
2025-11-03 17:58 ` [PATCH 1/3] hw/net/e1000e_core: Don't advance desc_offset for NULL buffer RX descriptors Peter Maydell
@ 2025-11-03 17:58 ` Peter Maydell
2025-11-07 3:50 ` Jason Wang
2025-11-03 17:58 ` [PATCH 3/3] hw/net/e1000e_core: Adjust e1000e_write_payload_frag_to_rx_buffers() assert Peter Maydell
2025-11-05 6:21 ` [PATCH 0/3] hw/net/e1000e: Fix assertion failures in e1000e_write_payload_frag_to_rx_buffers() Akihiko Odaki
3 siblings, 1 reply; 10+ messages in thread
From: Peter Maydell @ 2025-11-03 17:58 UTC (permalink / raw)
To: qemu-devel; +Cc: Dmitry Fleytman, Akihiko Odaki, Jason Wang
In e1000e_write_packet_to_guest() we attempt to ensure that we don't
write more of a packet to a descriptor than will fit in the guest
configured receive buffers. However, this code does not allow for
the "packet split" feature. When packet splitting is enabled, the
first of up to 4 buffers in the descriptor is used for the packet
header only, with the payload going into buffers 2, 3 and 4. Our
length check only checks against the total sizes of all 4 buffers,
which meant that if an incoming packet was large enough to fit in (1
+ 2 + 3 + 4) but not into (2 + 3 + 4) and packet splitting was
enabled, we would run into the assertion in
e1000e_write_hdr_frag_to_rx_buffers() that we had enough buffers for
the data:
qemu-system-i386: ../../hw/net/e1000e_core.c:1418: void e1000e_write_payload_frag_to_rx_buffers(E1000ECore *, hwaddr *, E1000EBAState *, const char *, dma_addr_t): Assertion `bastate->cur_idx < MAX_PS_BUFFERS' failed.
A malicious guest could provoke this assertion by configuring the
device into loopback mode, and then sending itself a suitably sized
packet into a suitably arrange rx descriptor.
The code also fails to deal with the possibility that the descriptor
buffers are sized such that the trailing checksum word does not fit
into the last descriptor which has actual data, which might also
trigger this assertion.
Rework the length handling to use two variables:
* desc_size is the total amount of data DMA'd to the guest
for the descriptor being processed in this iteration of the loop
* rx_desc_buf_size is the total amount of space left in it
As we copy data to the guest (packet header, payload, checksum),
update these two variables. (Previously we attempted to calculate
desc_size once at the top of the loop, but this is too difficult to
do correctly.) Then we can use the variables to ensure that we clamp
the amount of copied payload data to the remaining space in the
descriptor's buffers, even if we've used one of the buffers up in the
packet-split code, and we can tell whether we have enough space for
the full checksum word in this descriptor or whether we're going to
need to split that to the following descriptor.
I have included comments that hopefully help to make the loop
logic a little clearer.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/537
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
hw/net/e1000e_core.c | 61 ++++++++++++++++++++++++++++++++++----------
1 file changed, 47 insertions(+), 14 deletions(-)
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index ba77cb6011f..471c3ed20b8 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1495,6 +1495,13 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct NetRxPkt *pkt,
rxi = rxr->i;
do {
+ /*
+ * Loop processing descriptors while we have packet data to
+ * DMA to the guest. desc_offset tracks how much data we have
+ * sent to the guest in total over all descriptors, and goes
+ * from 0 up to total_size (the size of everything to send to
+ * the guest including possible trailing 4 bytes of CRC data).
+ */
hwaddr ba[MAX_PS_BUFFERS];
E1000EBAState bastate = { { 0 } };
bool is_last = false;
@@ -1512,23 +1519,27 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct NetRxPkt *pkt,
e1000e_read_rx_descr(core, &desc, ba);
if (ba[0]) {
- size_t desc_size = total_size - desc_offset;
-
- if (desc_size > core->rx_desc_buf_size) {
- desc_size = core->rx_desc_buf_size;
- }
+ /* Total amount of data DMA'd to the guest in this iteration */
+ size_t desc_size = 0;
+ /*
+ * Total space available in this descriptor (we will update
+ * this as we use it up)
+ */
+ size_t rx_desc_buf_size = core->rx_desc_buf_size;
if (desc_offset < size) {
- static const uint32_t fcs_pad;
size_t iov_copy;
+ /* Amount of data to copy from the incoming packet */
size_t copy_size = size - desc_offset;
- if (copy_size > core->rx_desc_buf_size) {
- copy_size = core->rx_desc_buf_size;
- }
/* For PS mode copy the packet header first */
if (do_ps) {
if (is_first) {
+ /*
+ * e1000e_do_ps() guarantees that buffer 0 has enough
+ * space for the header; otherwise we will not split
+ * the packet (i.e. do_ps is false).
+ */
size_t ps_hdr_copied = 0;
do {
iov_copy = MIN(ps_hdr_len - ps_hdr_copied,
@@ -1550,14 +1561,26 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct NetRxPkt *pkt,
} while (ps_hdr_copied < ps_hdr_len);
is_first = false;
+ desc_size += ps_hdr_len;
} else {
/* Leave buffer 0 of each descriptor except first */
/* empty as per spec 7.1.5.1 */
e1000e_write_hdr_frag_to_rx_buffers(core, ba, &bastate,
NULL, 0);
}
+ rx_desc_buf_size -= core->rxbuf_sizes[0];
}
+ /*
+ * Clamp the amount of packet data we copy into what will fit
+ * into the remaining buffers in the descriptor.
+ */
+ if (copy_size > rx_desc_buf_size) {
+ copy_size = rx_desc_buf_size;
+ }
+ desc_size += copy_size;
+ rx_desc_buf_size -= copy_size;
+
/* Copy packet payload */
while (copy_size) {
iov_copy = MIN(copy_size, iov->iov_len - iov_ofs);
@@ -1574,12 +1597,22 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct NetRxPkt *pkt,
iov_ofs = 0;
}
}
+ }
- if (desc_offset + desc_size >= total_size) {
- /* Simulate FCS checksum presence in the last descriptor */
- e1000e_write_payload_frag_to_rx_buffers(core, ba, &bastate,
- (const char *) &fcs_pad, e1000x_fcs_len(core->mac));
- }
+ if (rx_desc_buf_size &&
+ desc_offset >= size && desc_offset < total_size) {
+ /*
+ * We are in the last 4 bytes corresponding to the FCS checksum.
+ * We only ever write zeroes here (unlike the hardware).
+ */
+ static const uint32_t fcs_pad;
+ /* Amount of space for the trailing checksum */
+ size_t fcs_len = MIN(rx_desc_buf_size,
+ total_size - desc_offset);
+ e1000e_write_payload_frag_to_rx_buffers(core, ba, &bastate,
+ (const char *)&fcs_pad,
+ fcs_len);
+ desc_size += fcs_len;
}
desc_offset += desc_size;
if (desc_offset >= total_size) {
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] hw/net/e1000e_core: Adjust e1000e_write_payload_frag_to_rx_buffers() assert
2025-11-03 17:58 [PATCH 0/3] hw/net/e1000e: Fix assertion failures in e1000e_write_payload_frag_to_rx_buffers() Peter Maydell
2025-11-03 17:58 ` [PATCH 1/3] hw/net/e1000e_core: Don't advance desc_offset for NULL buffer RX descriptors Peter Maydell
2025-11-03 17:58 ` [PATCH 2/3] hw/net/e1000e_core: Correct rx oversize packet checks Peter Maydell
@ 2025-11-03 17:58 ` Peter Maydell
2025-11-05 6:21 ` [PATCH 0/3] hw/net/e1000e: Fix assertion failures in e1000e_write_payload_frag_to_rx_buffers() Akihiko Odaki
3 siblings, 0 replies; 10+ messages in thread
From: Peter Maydell @ 2025-11-03 17:58 UTC (permalink / raw)
To: qemu-devel; +Cc: Dmitry Fleytman, Akihiko Odaki, Jason Wang
An assertion in e1000e_write_payload_frag_to_rx_buffers() attempts to
guard against the calling code accidentally trying to write too much
data to a single RX descriptor, such that the E1000EBAState::cur_idx
indexes off the end of the EB1000BAState::written[] array.
Unfortunately it is overzealous: it asserts that cur_idx is in
range after it has been incremented. This will fire incorrectly
for the case where the guest configures four buffers and exactly
enough bytes are written to fill all four of them.
The only places where we use cur_idx and index in to the written[]
array are the functions e1000e_write_hdr_frag_to_rx_buffers() and
e1000e_write_payload_frag_to_rx_buffers(), so we can rewrite this to
assert before doing the array dereference, rather than asserting
after updating cur_idx.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
This is just moving the assert to the top, but the patch is
a little more than that to satisfy our "declarations at top
of block" rule.
I haven't seen this assert trip because of this problem (the issue
537 test case does not rely on it), so this is just a "seen on code
inspection" change.
---
hw/net/e1000e_core.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 471c3ed20b8..46e156a5ddc 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1392,10 +1392,13 @@ e1000e_write_payload_frag_to_rx_buffers(E1000ECore *core,
dma_addr_t data_len)
{
while (data_len > 0) {
- uint32_t cur_buf_len = core->rxbuf_sizes[bastate->cur_idx];
- uint32_t cur_buf_bytes_left = cur_buf_len -
- bastate->written[bastate->cur_idx];
- uint32_t bytes_to_write = MIN(data_len, cur_buf_bytes_left);
+ uint32_t cur_buf_len, cur_buf_bytes_left, bytes_to_write;
+
+ assert(bastate->cur_idx < MAX_PS_BUFFERS);
+
+ cur_buf_len = core->rxbuf_sizes[bastate->cur_idx];
+ cur_buf_bytes_left = cur_buf_len - bastate->written[bastate->cur_idx];
+ bytes_to_write = MIN(data_len, cur_buf_bytes_left);
trace_e1000e_rx_desc_buff_write(bastate->cur_idx,
ba[bastate->cur_idx],
@@ -1414,8 +1417,6 @@ e1000e_write_payload_frag_to_rx_buffers(E1000ECore *core,
if (bastate->written[bastate->cur_idx] == cur_buf_len) {
bastate->cur_idx++;
}
-
- assert(bastate->cur_idx < MAX_PS_BUFFERS);
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] hw/net/e1000e_core: Don't advance desc_offset for NULL buffer RX descriptors
2025-11-03 17:58 ` [PATCH 1/3] hw/net/e1000e_core: Don't advance desc_offset for NULL buffer RX descriptors Peter Maydell
@ 2025-11-04 6:11 ` Akihiko Odaki
2025-11-04 9:55 ` Peter Maydell
2025-11-04 16:44 ` Philippe Mathieu-Daudé
1 sibling, 1 reply; 10+ messages in thread
From: Akihiko Odaki @ 2025-11-04 6:11 UTC (permalink / raw)
To: Peter Maydell, qemu-devel; +Cc: Dmitry Fleytman, Jason Wang
On 2025/11/04 2:58, Peter Maydell wrote:
> In e1000e_write_packet_to_guest() we don't write data for RX descriptors
> where the buffer address is NULL (as required by the i82574 datasheet
> section 7.1.7.2). However, when we do this we still update desc_offset
> by the amount of data we would have written to the RX descriptor if
> it had a valid buffer pointer, resulting in our dropping that data
> entirely. The data sheet is not 100% clear on the subject, but this
> seems unlikely to be the correct behaviour.
>
> Rearrange the null-descriptor logic so that we don't treat these
> do-nothing descriptors as if we'd really written the data.
Please make a corresponding change for igb too so that the code of these
two devices will not diverge further.
Regards,
Akihiko Odaki
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] hw/net/e1000e_core: Don't advance desc_offset for NULL buffer RX descriptors
2025-11-04 6:11 ` Akihiko Odaki
@ 2025-11-04 9:55 ` Peter Maydell
0 siblings, 0 replies; 10+ messages in thread
From: Peter Maydell @ 2025-11-04 9:55 UTC (permalink / raw)
To: Akihiko Odaki; +Cc: qemu-devel, Dmitry Fleytman, Jason Wang
On Tue, 4 Nov 2025 at 06:11, Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> wrote:
>
> On 2025/11/04 2:58, Peter Maydell wrote:
> > In e1000e_write_packet_to_guest() we don't write data for RX descriptors
> > where the buffer address is NULL (as required by the i82574 datasheet
> > section 7.1.7.2). However, when we do this we still update desc_offset
> > by the amount of data we would have written to the RX descriptor if
> > it had a valid buffer pointer, resulting in our dropping that data
> > entirely. The data sheet is not 100% clear on the subject, but this
> > seems unlikely to be the correct behaviour.
> >
> > Rearrange the null-descriptor logic so that we don't treat these
> > do-nothing descriptors as if we'd really written the data.
>
> Please make a corresponding change for igb too so that the code of these
> two devices will not diverge further.
The igb_core.c version of this function seems to be
rather different (and rather easier to read). It has a
kind of related bug where igb_write_packet_to_guest()
calls igb_write_to_rx_buffers() and assumes that that
function has correctly set pdma_st.desc_size to the
amount of data written to that descriptor, but the
early-return cases from igb_write_to_rx_buffers()
don't actually update that field so it will be whatever
junk was present from the previous iteration.
So it looks to me like there are similar bugs but the
code in these two devices is already pretty different
and the fixes don't transpose one-to-one. (For instance,
the problem with the assert location fixed in patch 3
here doesn't seem to be present in igb, which already
does the assert() at the top of its loop in
igb_write_payload_frag_to_rx_buffers().)
In any case, I don't think it makes sense to look at
fixing igb before this patchset has been reviewed
and we're confident the fixes are actually right :-)
thanks
-- PMM
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] hw/net/e1000e_core: Don't advance desc_offset for NULL buffer RX descriptors
2025-11-03 17:58 ` [PATCH 1/3] hw/net/e1000e_core: Don't advance desc_offset for NULL buffer RX descriptors Peter Maydell
2025-11-04 6:11 ` Akihiko Odaki
@ 2025-11-04 16:44 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 10+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-11-04 16:44 UTC (permalink / raw)
To: Peter Maydell, qemu-devel; +Cc: Dmitry Fleytman, Akihiko Odaki, Jason Wang
On 3/11/25 18:58, Peter Maydell wrote:
> In e1000e_write_packet_to_guest() we don't write data for RX descriptors
> where the buffer address is NULL (as required by the i82574 datasheet
> section 7.1.7.2). However, when we do this we still update desc_offset
> by the amount of data we would have written to the RX descriptor if
> it had a valid buffer pointer, resulting in our dropping that data
> entirely. The data sheet is not 100% clear on the subject, but this
> seems unlikely to be the correct behaviour.
>
> Rearrange the null-descriptor logic so that we don't treat these
> do-nothing descriptors as if we'd really written the data.
>
> This both fixes a bug and also is a prerequisite to cleaning up
> the size calculation logic in the next patch.
>
> (Cc to stable largely because it will be needed for the next patch,
> which fixes a more serious bug.)
>
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> hw/net/e1000e_core.c | 21 ++++++++++-----------
> 1 file changed, 10 insertions(+), 11 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] hw/net/e1000e: Fix assertion failures in e1000e_write_payload_frag_to_rx_buffers()
2025-11-03 17:58 [PATCH 0/3] hw/net/e1000e: Fix assertion failures in e1000e_write_payload_frag_to_rx_buffers() Peter Maydell
` (2 preceding siblings ...)
2025-11-03 17:58 ` [PATCH 3/3] hw/net/e1000e_core: Adjust e1000e_write_payload_frag_to_rx_buffers() assert Peter Maydell
@ 2025-11-05 6:21 ` Akihiko Odaki
3 siblings, 0 replies; 10+ messages in thread
From: Akihiko Odaki @ 2025-11-05 6:21 UTC (permalink / raw)
To: Peter Maydell, qemu-devel; +Cc: Dmitry Fleytman, Jason Wang
On 2025/11/04 2:58, Peter Maydell wrote:
> This patchset fixes assertion failures in
> e1000e_write_payload_frag_to_rx_buffers(). There turn out to be two
> ways this assert could be triggered.
>
> The first is the one reported in
> https://gitlab.com/qemu-project/qemu/-/issues/537 where a malicious
> guest could set up the device into loopback mode and then send a
> carefully sized packet that trips a bug in the logic in
> e1000e_write_packet_to_guest() that tries to calculate how much data
> it is going to put into an RX descriptor, causing us to hit the
> assertion.
>
> The second is one I spotted by code inspection, where the assert is
> slightly over-eager and could assert in one valid case of a just
> exactly large enough packet.
>
> Patch 1 fixes what I believe to be an incorrect-behaviour bug
> for descriptors with NULL buffer addresses, largely because
> that makes the following patch to fix the assertion less awkward.
>
> thanks
> -- PMM
>
> Peter Maydell (3):
> hw/net/e1000e_core: Don't advance desc_offset for NULL buffer RX
> descriptors
> hw/net/e1000e_core: Correct rx oversize packet checks
> hw/net/e1000e_core: Adjust e1000e_write_payload_frag_to_rx_buffers()
> assert
>
> hw/net/e1000e_core.c | 85 ++++++++++++++++++++++++++++++--------------
> 1 file changed, 59 insertions(+), 26 deletions(-)
>
For the whole series:
Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] hw/net/e1000e_core: Correct rx oversize packet checks
2025-11-03 17:58 ` [PATCH 2/3] hw/net/e1000e_core: Correct rx oversize packet checks Peter Maydell
@ 2025-11-07 3:50 ` Jason Wang
2025-11-07 10:06 ` Peter Maydell
0 siblings, 1 reply; 10+ messages in thread
From: Jason Wang @ 2025-11-07 3:50 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel, Dmitry Fleytman, Akihiko Odaki
On Tue, Nov 4, 2025 at 1:59 AM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> In e1000e_write_packet_to_guest() we attempt to ensure that we don't
> write more of a packet to a descriptor than will fit in the guest
> configured receive buffers. However, this code does not allow for
> the "packet split" feature. When packet splitting is enabled, the
> first of up to 4 buffers in the descriptor is used for the packet
> header only, with the payload going into buffers 2, 3 and 4. Our
> length check only checks against the total sizes of all 4 buffers,
> which meant that if an incoming packet was large enough to fit in (1
> + 2 + 3 + 4) but not into (2 + 3 + 4) and packet splitting was
> enabled, we would run into the assertion in
> e1000e_write_hdr_frag_to_rx_buffers() that we had enough buffers for
> the data:
>
> qemu-system-i386: ../../hw/net/e1000e_core.c:1418: void e1000e_write_payload_frag_to_rx_buffers(E1000ECore *, hwaddr *, E1000EBAState *, const char *, dma_addr_t): Assertion `bastate->cur_idx < MAX_PS_BUFFERS' failed.
>
> A malicious guest could provoke this assertion by configuring the
> device into loopback mode, and then sending itself a suitably sized
> packet into a suitably arrange rx descriptor.
>
> The code also fails to deal with the possibility that the descriptor
> buffers are sized such that the trailing checksum word does not fit
> into the last descriptor which has actual data, which might also
> trigger this assertion.
>
> Rework the length handling to use two variables:
> * desc_size is the total amount of data DMA'd to the guest
> for the descriptor being processed in this iteration of the loop
> * rx_desc_buf_size is the total amount of space left in it
>
> As we copy data to the guest (packet header, payload, checksum),
> update these two variables. (Previously we attempted to calculate
> desc_size once at the top of the loop, but this is too difficult to
> do correctly.) Then we can use the variables to ensure that we clamp
> the amount of copied payload data to the remaining space in the
> descriptor's buffers, even if we've used one of the buffers up in the
> packet-split code, and we can tell whether we have enough space for
> the full checksum word in this descriptor or whether we're going to
> need to split that to the following descriptor.
>
> I have included comments that hopefully help to make the loop
> logic a little clearer.
>
> Cc: qemu-stable@nongnu.org
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/537
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> hw/net/e1000e_core.c | 61 ++++++++++++++++++++++++++++++++++----------
> 1 file changed, 47 insertions(+), 14 deletions(-)
>
> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
> index ba77cb6011f..471c3ed20b8 100644
> --- a/hw/net/e1000e_core.c
> +++ b/hw/net/e1000e_core.c
> @@ -1495,6 +1495,13 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct NetRxPkt *pkt,
> rxi = rxr->i;
>
> do {
> + /*
> + * Loop processing descriptors while we have packet data to
> + * DMA to the guest. desc_offset tracks how much data we have
> + * sent to the guest in total over all descriptors, and goes
> + * from 0 up to total_size (the size of everything to send to
> + * the guest including possible trailing 4 bytes of CRC data).
> + */
> hwaddr ba[MAX_PS_BUFFERS];
> E1000EBAState bastate = { { 0 } };
> bool is_last = false;
> @@ -1512,23 +1519,27 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct NetRxPkt *pkt,
> e1000e_read_rx_descr(core, &desc, ba);
>
> if (ba[0]) {
> - size_t desc_size = total_size - desc_offset;
> -
> - if (desc_size > core->rx_desc_buf_size) {
> - desc_size = core->rx_desc_buf_size;
> - }
> + /* Total amount of data DMA'd to the guest in this iteration */
> + size_t desc_size = 0;
> + /*
> + * Total space available in this descriptor (we will update
> + * this as we use it up)
> + */
> + size_t rx_desc_buf_size = core->rx_desc_buf_size;
>
> if (desc_offset < size) {
> - static const uint32_t fcs_pad;
> size_t iov_copy;
> + /* Amount of data to copy from the incoming packet */
> size_t copy_size = size - desc_offset;
> - if (copy_size > core->rx_desc_buf_size) {
> - copy_size = core->rx_desc_buf_size;
> - }
>
> /* For PS mode copy the packet header first */
> if (do_ps) {
> if (is_first) {
> + /*
> + * e1000e_do_ps() guarantees that buffer 0 has enough
> + * space for the header; otherwise we will not split
> + * the packet (i.e. do_ps is false).
> + */
> size_t ps_hdr_copied = 0;
> do {
> iov_copy = MIN(ps_hdr_len - ps_hdr_copied,
> @@ -1550,14 +1561,26 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct NetRxPkt *pkt,
> } while (ps_hdr_copied < ps_hdr_len);
>
> is_first = false;
> + desc_size += ps_hdr_len;
> } else {
> /* Leave buffer 0 of each descriptor except first */
> /* empty as per spec 7.1.5.1 */
> e1000e_write_hdr_frag_to_rx_buffers(core, ba, &bastate,
> NULL, 0);
> }
> + rx_desc_buf_size -= core->rxbuf_sizes[0];
> }
>
> + /*
> + * Clamp the amount of packet data we copy into what will fit
> + * into the remaining buffers in the descriptor.
> + */
> + if (copy_size > rx_desc_buf_size) {
> + copy_size = rx_desc_buf_size;
> + }
> + desc_size += copy_size;
> + rx_desc_buf_size -= copy_size;
> +
> /* Copy packet payload */
> while (copy_size) {
> iov_copy = MIN(copy_size, iov->iov_len - iov_ofs);
> @@ -1574,12 +1597,22 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct NetRxPkt *pkt,
> iov_ofs = 0;
> }
> }
> + }
>
> - if (desc_offset + desc_size >= total_size) {
> - /* Simulate FCS checksum presence in the last descriptor */
> - e1000e_write_payload_frag_to_rx_buffers(core, ba, &bastate,
> - (const char *) &fcs_pad, e1000x_fcs_len(core->mac));
> - }
> + if (rx_desc_buf_size &&
> + desc_offset >= size && desc_offset < total_size) {
> + /*
> + * We are in the last 4 bytes corresponding to the FCS checksum.
> + * We only ever write zeroes here (unlike the hardware).
> + */
> + static const uint32_t fcs_pad;
> + /* Amount of space for the trailing checksum */
> + size_t fcs_len = MIN(rx_desc_buf_size,
> + total_size - desc_offset);
> + e1000e_write_payload_frag_to_rx_buffers(core, ba, &bastate,
> + (const char *)&fcs_pad,
> + fcs_len);
> + desc_size += fcs_len;
> }
> desc_offset += desc_size;
This should be done before the if (rx_desc_bufs_size && ... ?
> if (desc_offset >= total_size) {
> --
> 2.43.0
>
Thanks
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] hw/net/e1000e_core: Correct rx oversize packet checks
2025-11-07 3:50 ` Jason Wang
@ 2025-11-07 10:06 ` Peter Maydell
0 siblings, 0 replies; 10+ messages in thread
From: Peter Maydell @ 2025-11-07 10:06 UTC (permalink / raw)
To: Jason Wang; +Cc: qemu-devel, Dmitry Fleytman, Akihiko Odaki
On Fri, 7 Nov 2025 at 03:50, Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Nov 4, 2025 at 1:59 AM Peter Maydell <peter.maydell@linaro.org> wrote:
> >
> > do {
> > + /*
> > + * Loop processing descriptors while we have packet data to
> > + * DMA to the guest. desc_offset tracks how much data we have
> > + * sent to the guest in total over all descriptors, and goes
> > + * from 0 up to total_size (the size of everything to send to
> > + * the guest including possible trailing 4 bytes of CRC data).
> > + */
> > hwaddr ba[MAX_PS_BUFFERS];
> > E1000EBAState bastate = { { 0 } };
> > bool is_last = false;
> > @@ -1512,23 +1519,27 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct NetRxPkt *pkt,
> > e1000e_read_rx_descr(core, &desc, ba);
> >
> > if (ba[0]) {
> > - size_t desc_size = total_size - desc_offset;
> > -
> > - if (desc_size > core->rx_desc_buf_size) {
> > - desc_size = core->rx_desc_buf_size;
> > - }
> > + /* Total amount of data DMA'd to the guest in this iteration */
> > + size_t desc_size = 0;
> > + /*
> > + * Total space available in this descriptor (we will update
> > + * this as we use it up)
> > + */
> > + size_t rx_desc_buf_size = core->rx_desc_buf_size;
> >
> > if (desc_offset < size) {
> > - static const uint32_t fcs_pad;
> > size_t iov_copy;
> > + /* Amount of data to copy from the incoming packet */
> > size_t copy_size = size - desc_offset;
> > - if (copy_size > core->rx_desc_buf_size) {
> > - copy_size = core->rx_desc_buf_size;
> > - }
> >
> > /* For PS mode copy the packet header first */
> > if (do_ps) {
> > if (is_first) {
> > + /*
> > + * e1000e_do_ps() guarantees that buffer 0 has enough
> > + * space for the header; otherwise we will not split
> > + * the packet (i.e. do_ps is false).
> > + */
> > size_t ps_hdr_copied = 0;
> > do {
> > iov_copy = MIN(ps_hdr_len - ps_hdr_copied,
> > @@ -1550,14 +1561,26 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct NetRxPkt *pkt,
> > } while (ps_hdr_copied < ps_hdr_len);
> >
> > is_first = false;
> > + desc_size += ps_hdr_len;
> > } else {
> > /* Leave buffer 0 of each descriptor except first */
> > /* empty as per spec 7.1.5.1 */
> > e1000e_write_hdr_frag_to_rx_buffers(core, ba, &bastate,
> > NULL, 0);
> > }
> > + rx_desc_buf_size -= core->rxbuf_sizes[0];
> > }
> >
> > + /*
> > + * Clamp the amount of packet data we copy into what will fit
> > + * into the remaining buffers in the descriptor.
> > + */
> > + if (copy_size > rx_desc_buf_size) {
> > + copy_size = rx_desc_buf_size;
> > + }
> > + desc_size += copy_size;
> > + rx_desc_buf_size -= copy_size;
> > +
> > /* Copy packet payload */
> > while (copy_size) {
> > iov_copy = MIN(copy_size, iov->iov_len - iov_ofs);
> > @@ -1574,12 +1597,22 @@ e1000e_write_packet_to_guest(E1000ECore *core, struct NetRxPkt *pkt,
> > iov_ofs = 0;
> > }
> > }
> > + }
> >
> > - if (desc_offset + desc_size >= total_size) {
> > - /* Simulate FCS checksum presence in the last descriptor */
> > - e1000e_write_payload_frag_to_rx_buffers(core, ba, &bastate,
> > - (const char *) &fcs_pad, e1000x_fcs_len(core->mac));
> > - }
> > + if (rx_desc_buf_size &&
> > + desc_offset >= size && desc_offset < total_size) {
> > + /*
> > + * We are in the last 4 bytes corresponding to the FCS checksum.
> > + * We only ever write zeroes here (unlike the hardware).
> > + */
> > + static const uint32_t fcs_pad;
> > + /* Amount of space for the trailing checksum */
> > + size_t fcs_len = MIN(rx_desc_buf_size,
> > + total_size - desc_offset);
> > + e1000e_write_payload_frag_to_rx_buffers(core, ba, &bastate,
> > + (const char *)&fcs_pad,
> > + fcs_len);
> > + desc_size += fcs_len;
> > }
> > desc_offset += desc_size;
>
> This should be done before the if (rx_desc_bufs_size && ... ?
No. That if() block deals with adding (possibly part of) the 4
checksum bytes. Those are part of the total data we DMA to
the guest, and so the if() block adds fcs_len to desc_size,
and those extra bytes need to be added to desc_offset.
> > if (desc_offset >= total_size) {
Otherwise conditions like this and the loop termination condition
would not fire, because desc_offset would not get up to total_size.
thanks
-- PMM
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-11-07 10:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-03 17:58 [PATCH 0/3] hw/net/e1000e: Fix assertion failures in e1000e_write_payload_frag_to_rx_buffers() Peter Maydell
2025-11-03 17:58 ` [PATCH 1/3] hw/net/e1000e_core: Don't advance desc_offset for NULL buffer RX descriptors Peter Maydell
2025-11-04 6:11 ` Akihiko Odaki
2025-11-04 9:55 ` Peter Maydell
2025-11-04 16:44 ` Philippe Mathieu-Daudé
2025-11-03 17:58 ` [PATCH 2/3] hw/net/e1000e_core: Correct rx oversize packet checks Peter Maydell
2025-11-07 3:50 ` Jason Wang
2025-11-07 10:06 ` Peter Maydell
2025-11-03 17:58 ` [PATCH 3/3] hw/net/e1000e_core: Adjust e1000e_write_payload_frag_to_rx_buffers() assert Peter Maydell
2025-11-05 6:21 ` [PATCH 0/3] hw/net/e1000e: Fix assertion failures in e1000e_write_payload_frag_to_rx_buffers() Akihiko Odaki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).