Re: [Qemu-devel] [RFC v2] e1000: Faulty tx checksum offload corrupts packets

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Ed Swierk <eswierk@skyportsystems.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Ed Swierk <eswierk@skyportsystems.com>,
	Dmitry Fleytman <dmitry@daynix.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC v2] e1000: Faulty tx checksum offload corrupts packets
Date: Thu, 26 Oct 2017 19:31:33 -0700	[thread overview]
Message-ID: <1509071493-82242-1-git-send-email-eswierk@skyportsystems.com> (raw)
In-Reply-To: <5e031fcf-9f97-7d32-2e41-6868f75d284b@redhat.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 3740 bytes --]

On Mon, Oct 23, 2017 at 8:28 PM, Jason Wang <jasowang@redhat.com> wrote:
> 
> On 2017年10月24日 08:22, Ed Swierk wrote:
> > (Another layer of icing on the cake is that QEMU ignores the
> > requirement that a UDP checksum computed as zero be sent as 0xffff,
> > since zero is a special value meaning no checksum. So even when QEMU
> > doesn't corrupt the packet data, the packet sometimes leaves the box
> > with no checksum at all.)
> 
> Please submit another patch for this.

Will do.

> > I have instrumented QEMU and reproduced this behavior with a Windows
> > 10 guest, rather easily with a TCP iperf and a UDP iperf running in
> > parallel. I have also attempted a fix, which is below in very rough
> > form.
> 
> How do you instrument qemu? Can this be reproduced without this?

I can reproduce the bug with just the patchlet below. It would be even
better to devise a test that detects the corruption without modifying
QEMU, as that could be used as a regression test after the bug itself
is fixed. I'll have to ponder that.

> > One puzzle is what to do about e1000e: it shares shares some data
> > structures and a bit of code with e1000, but little else, which is
> > surprising given how similar they are (or should be). The e1000e's
> > handling of TCP segmentation offload and checksum offload is totally
> > different, and problematic for other reasons (it totally ignores most
> > of the context parameters provided by the driver and basically does
> > what it thinks is best by digging into the packet data). Is this
> > divergence intentional?
> 
> Somehow, and if we can find a way to unify the codes, it would be better.
> 
> > Is there a reason not to change e1000e as long
> > as I'm trying to make e1000 more datasheet-conformant?
> 
> Please fix them individually.

I went ahead and reimplemented e1000 as a variant of e1000e. It's just
a proof of concept, but hopefully a step towards eliminating the
redundancy and manintaining just one codebase rather than two. Please
see the patch series I'm sending separately.

Anyway, here's how I catch the tx checksum offload bug, with a Windows
guest running one TCP iperf and one UDP iperf simultaneously through
the same e1000 interface.

 --- a/hw/net/e1000.c
 +++ b/hw/net/e1000.c
 @@ -534,6 +534,30 @@ e1000_send_packet(E1000State *s, const uint8_t *buf, int size)
  }
  
  static void
 +debug_csum(struct e1000_tx *tp, uint16_t oldsum)
 +{
 +    e1000x_txd_props *props = &tp->props;
 +    uint8_t proto = tp->data[14 + 9];
 +    uint32_t sumoff = props->tucso - props->tucss;
 +
 +    if ((proto == 17 && sumoff != 6) ||
 +        (proto == 6 && sumoff != 16)) {
 +        DBGOUT(TXERR, "txsum bug! ver %d src %08x dst %08x len %d proto %d "
 +               "cptse %d sum_needed %x oldsum %x newsum %x realsum %x\n",
 +               tp->data[14] >> 4,
 +               ldl_be_p(tp->data + 14 + 12),
 +               ldl_be_p(tp->data + 14 + 16),
 +               lduw_be_p(tp->data + 14 + 2),
 +               proto,
 +               props->cptse,
 +               props->sum_needed,
 +               oldsum,
 +               lduw_be_p(tp->data + props->tucso),
 +               lduw_be_p(tp->data + props->tucss + (proto == 6 ? 16 : 6)));
 +    }
 +}
 +
 +static void
  xmit_seg(E1000State *s)
  {
      uint16_t len;
 @@ -577,8 +601,10 @@ xmit_seg(E1000State *s)
      }
  
      if (tp->props.sum_needed & E1000_TXD_POPTS_TXSM) {
 +        uint16_t oldsum = lduw_be_p(tp->data + tp->props.tucso);
          putsum(tp->data, tp->size, tp->props.tucso,
                 tp->props.tucss, tp->props.tucse);
 +        debug_csum(tp, oldsum);
      }
      if (tp->props.sum_needed & E1000_TXD_POPTS_IXSM) {
          putsum(tp->data, tp->size, tp->props.ipcso,

     prev parent reply	other threads:[~2017-10-27  2:32 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-12 23:59 [Qemu-devel] [RFC] e1000: Faulty tx checksum offload corrupts packets Ed Swierk
2017-10-13  0:04 ` no-reply
2017-10-13  0:17 ` no-reply
2017-10-13  0:17 ` no-reply
2017-10-23 20:47 ` [Qemu-devel] [RFC v2] " Ed Swierk
2017-10-24  0:22 ` Ed Swierk
2017-10-24  3:28   ` Jason Wang
2017-10-27  2:31     ` Ed Swierk [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1509071493-82242-1-git-send-email-eswierk@skyportsystems.com \
    --to=eswierk@skyportsystems.com \
    --cc=dmitry@daynix.com \
    --cc=jasowang@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).