From: Ed Swierk <eswierk@skyportsystems.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Ed Swierk <eswierk@skyportsystems.com>,
Dmitry Fleytman <dmitry@daynix.com>,
qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC v2] e1000: Faulty tx checksum offload corrupts packets
Date: Thu, 26 Oct 2017 19:31:33 -0700 [thread overview]
Message-ID: <1509071493-82242-1-git-send-email-eswierk@skyportsystems.com> (raw)
In-Reply-To: <5e031fcf-9f97-7d32-2e41-6868f75d284b@redhat.com>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3740 bytes --]
On Mon, Oct 23, 2017 at 8:28 PM, Jason Wang <jasowang@redhat.com> wrote:
>
> On 2017年10月24日 08:22, Ed Swierk wrote:
> > (Another layer of icing on the cake is that QEMU ignores the
> > requirement that a UDP checksum computed as zero be sent as 0xffff,
> > since zero is a special value meaning no checksum. So even when QEMU
> > doesn't corrupt the packet data, the packet sometimes leaves the box
> > with no checksum at all.)
>
> Please submit another patch for this.
Will do.
> > I have instrumented QEMU and reproduced this behavior with a Windows
> > 10 guest, rather easily with a TCP iperf and a UDP iperf running in
> > parallel. I have also attempted a fix, which is below in very rough
> > form.
>
> How do you instrument qemu? Can this be reproduced without this?
I can reproduce the bug with just the patchlet below. It would be even
better to devise a test that detects the corruption without modifying
QEMU, as that could be used as a regression test after the bug itself
is fixed. I'll have to ponder that.
> > One puzzle is what to do about e1000e: it shares shares some data
> > structures and a bit of code with e1000, but little else, which is
> > surprising given how similar they are (or should be). The e1000e's
> > handling of TCP segmentation offload and checksum offload is totally
> > different, and problematic for other reasons (it totally ignores most
> > of the context parameters provided by the driver and basically does
> > what it thinks is best by digging into the packet data). Is this
> > divergence intentional?
>
> Somehow, and if we can find a way to unify the codes, it would be better.
>
> > Is there a reason not to change e1000e as long
> > as I'm trying to make e1000 more datasheet-conformant?
>
> Please fix them individually.
I went ahead and reimplemented e1000 as a variant of e1000e. It's just
a proof of concept, but hopefully a step towards eliminating the
redundancy and manintaining just one codebase rather than two. Please
see the patch series I'm sending separately.
Anyway, here's how I catch the tx checksum offload bug, with a Windows
guest running one TCP iperf and one UDP iperf simultaneously through
the same e1000 interface.
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -534,6 +534,30 @@ e1000_send_packet(E1000State *s, const uint8_t *buf, int size)
}
static void
+debug_csum(struct e1000_tx *tp, uint16_t oldsum)
+{
+ e1000x_txd_props *props = &tp->props;
+ uint8_t proto = tp->data[14 + 9];
+ uint32_t sumoff = props->tucso - props->tucss;
+
+ if ((proto == 17 && sumoff != 6) ||
+ (proto == 6 && sumoff != 16)) {
+ DBGOUT(TXERR, "txsum bug! ver %d src %08x dst %08x len %d proto %d "
+ "cptse %d sum_needed %x oldsum %x newsum %x realsum %x\n",
+ tp->data[14] >> 4,
+ ldl_be_p(tp->data + 14 + 12),
+ ldl_be_p(tp->data + 14 + 16),
+ lduw_be_p(tp->data + 14 + 2),
+ proto,
+ props->cptse,
+ props->sum_needed,
+ oldsum,
+ lduw_be_p(tp->data + props->tucso),
+ lduw_be_p(tp->data + props->tucss + (proto == 6 ? 16 : 6)));
+ }
+}
+
+static void
xmit_seg(E1000State *s)
{
uint16_t len;
@@ -577,8 +601,10 @@ xmit_seg(E1000State *s)
}
if (tp->props.sum_needed & E1000_TXD_POPTS_TXSM) {
+ uint16_t oldsum = lduw_be_p(tp->data + tp->props.tucso);
putsum(tp->data, tp->size, tp->props.tucso,
tp->props.tucss, tp->props.tucse);
+ debug_csum(tp, oldsum);
}
if (tp->props.sum_needed & E1000_TXD_POPTS_IXSM) {
putsum(tp->data, tp->size, tp->props.ipcso,
prev parent reply other threads:[~2017-10-27 2:32 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-12 23:59 [Qemu-devel] [RFC] e1000: Faulty tx checksum offload corrupts packets Ed Swierk
2017-10-13 0:04 ` no-reply
2017-10-13 0:17 ` no-reply
2017-10-13 0:17 ` no-reply
2017-10-23 20:47 ` [Qemu-devel] [RFC v2] " Ed Swierk
2017-10-24 0:22 ` Ed Swierk
2017-10-24 3:28 ` Jason Wang
2017-10-27 2:31 ` Ed Swierk [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1509071493-82242-1-git-send-email-eswierk@skyportsystems.com \
--to=eswierk@skyportsystems.com \
--cc=dmitry@daynix.com \
--cc=jasowang@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).