qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Ed Swierk <eswierk@skyportsystems.com>,
	"Daniel P . Berrange" <berrange@redhat.com>,
	Stefan Hajnoczi <stefanha@gmail.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 0/2] e1000: Correct TX offload context handling
Date: Fri, 17 Nov 2017 11:04:25 +0800	[thread overview]
Message-ID: <e9f7aaa0-6b5d-e494-8da0-26bf3f4f6929@redhat.com> (raw)
In-Reply-To: <1510701814-52973-1-git-send-email-eswierk@skyportsystems.com>



On 2017年11月15日 07:23, Ed Swierk via Qemu-devel wrote:
> The transmit offload implementation in QEMU's e1000 device is
> deficient and causes packet data corruption in some situations.
>
> According to the Intel 8254x software developer's manual[1], the
> device maintains two separate contexts: the TCP segmentation offload
> context includes parameters for both segmentation offload and checksum
> offload, and the normal (checksum-offload-only) context includes only
> checksum offload parameters. These parameters specify over which
> packet data to compute the checksum, and where in the packet to store
> the computed checksum(s).
>
> [1] https://www.intel.com/content/dam/doc/manual/pci-pci-x-family-gbe-controllers-software-dev-manual.pdf
>
> The e1000 driver can update either of these contexts by sending a
> transmit context descriptor. The TSE bit in the TUCMD field controls
> which context is modified by the descriptor. Crucially, a transmit
> context descriptor with TSE=1 changes only the TSO context, leaving
> the non-TSO context unchanged; with TSE=0 the opposite is true.
>
> Fields in the transmit data descriptor determine which (if either) of
> these two contexts the device uses when actually transmitting some
> data:
>
> - If the TSE bit in the DCMD field is set, then the device performs
>    TCP segmentation offload using the parameters previously set in the
>    TSO context. In addition, if TXSM and/or IXSM is set in the POPTS
>    field, the device performs the appropriate checksum offloads using
>    the parameters in the same (TSO) context.
>
> - Otherwise, if the TSE bit in the DCMD field is clear, then there is
>    no TCP segmentation offload. If TXSM and/or IXSM is set in the POPTS
>    field, the device performs the appropriate checksum offloads using
>    the parameters in the non-TSO context.
>
> The e1000 driver is free to set up the TSO and non-TSO contexts and
> then transmit a mixture of data, with each data descriptor using a
> different (or neither) context. This is what the e1000 driver for
> Windows (Intel(R) PRO/1000 MT Network Connection, aka E1G6023E.sys)
> does in certain cases. Sometimes with quite undesirable results, since
> the QEMU e1000 device doesn't work as described above.
>
> Instead, the QEMU e1000 device maintains only one context in its state
> structure. When it receives a transmit context descriptor from the
> driver, it overwrites the context parameters regardless of the TSE bit
> in the TUCMD field.
>
> To see why this is wrong, suppose the driver first sets up a non-TSO
> context with UDP checksum offload parameters (say, TUCSO pointing to
> the appropriate offset for a UDP checksum, 6 bytes into the header),
> and then sets up a TSO context with TCP checksum offload parameters
> (TUCSO pointing to the appropriate offset for a TCP checksum, 16 bytes
> into the header). The driver then sends a transmit data descriptor
> with TSO=0 and TXSM=1 along with a UDP datagram. The QEMU e1000 device
> computes the checksum using the last set of checksum offload
> parameters, and writes the checksum to offset 16, stomping on two
> bytes of UDP data, and leaving the wrong checksum in the UDP checksum
> field.
>
> To make matters worse, if the network stack on the host running QEMU
> treats data transmitted from a VM as locally originated, it may do its
> own UDP checksum computation, "correcting" it to match the corrupt
> data before sending it on the wire. Now the corrupt UDP packet makes
> its way all the way to the peer application.
>
> I have reproduced this behavior with a Windows 10 guest, rather easily
> with a TCP iperf and a UDP iperf running in parallel. With the
> patchlet below, you'll see an error message whenever the bug is
> triggered.
>
>
>   --- a/hw/net/e1000.c
>   +++ b/hw/net/e1000.c
>   @@ -534,6 +534,30 @@ e1000_send_packet(E1000State *s, const uint8_t *buf, int size)
>    }
>    
>    static void
>   +debug_csum(struct e1000_tx *tp, uint16_t oldsum)
>   +{
>   +    e1000x_txd_props *props = &tp->props;
>   +    uint8_t proto = tp->data[14 + 9];
>   +    uint32_t sumoff = props->tucso - props->tucss;
>   +
>   +    if ((proto == 17 && sumoff != 6) ||
>   +        (proto == 6 && sumoff != 16)) {
>   +        DBGOUT(TXERR, "txsum bug! ver %d src %08x dst %08x len %d proto %d "
>   +               "cptse %d sum_needed %x oldsum %x newsum %x realsum %x\n",
>   +               tp->data[14] >> 4,
>   +               ldl_be_p(tp->data + 14 + 12),
>   +               ldl_be_p(tp->data + 14 + 16),
>   +               lduw_be_p(tp->data + 14 + 2),
>   +               proto,
>   +               props->cptse,
>   +               props->sum_needed,
>   +               oldsum,
>   +               lduw_be_p(tp->data + props->tucso),
>   +               lduw_be_p(tp->data + props->tucss + (proto == 6 ? 16 : 6)));
>   +    }
>   +}
>   +
>   +static void
>    xmit_seg(E1000State *s)
>    {
>        uint16_t len;
>   @@ -577,8 +601,10 @@ xmit_seg(E1000State *s)
>        }
>    
>        if (tp->props.sum_needed & E1000_TXD_POPTS_TXSM) {
>   +        uint16_t oldsum = lduw_be_p(tp->data + tp->props.tucso);
>            putsum(tp->data, tp->size, tp->props.tucso,
>                   tp->props.tucss, tp->props.tucse);
>   +        debug_csum(tp, oldsum); /* FIXME: remove */
>        }
>        if (tp->props.sum_needed & E1000_TXD_POPTS_IXSM) {
>            putsum(tp->data, tp->size, tp->props.ipcso,
>
>
> This patch series first moves per-TX packet flags out of the context
> struct, which is used by both e1000 and e1000e. ("Used" is generous,
> as e1000e ignores most of the context parameters supplied by the guest
> and does its own thing, which is only sometimes correct. Taming e1000e
> is a project for another day. The change to e1000e here is a baby step
> in that direction.)
>
> Then we separate the e1000 TSO and non-TSO contexts, which fixes the
> UDP TX corruption bug.
>
> Ed Swierk (2):
>    e1000, e1000e: Move per-packet TX offload flags out of context state
>    e1000: Separate TSO and non-TSO contexts, fixing UDP TX corruption
>
>   hw/net/e1000.c         | 92 ++++++++++++++++++++++++++++----------------------
>   hw/net/e1000e.c        |  4 +--
>   hw/net/e1000e_core.c   | 16 ++++-----
>   hw/net/e1000e_core.h   |  2 ++
>   hw/net/e1000x_common.h |  2 --
>   5 files changed, 64 insertions(+), 52 deletions(-)
>

Consider the risk for rc1, I've queued this for 2.12.

Thanks

      parent reply	other threads:[~2017-11-17  3:04 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-14 23:23 [Qemu-devel] [PATCH 0/2] e1000: Correct TX offload context handling Ed Swierk
2017-11-14 23:23 ` [Qemu-devel] [PATCH 1/2] e1000, e1000e: Move per-packet TX offload flags out of context state Ed Swierk
2017-11-14 23:23 ` [Qemu-devel] [PATCH 2/2] e1000: Separate TSO and non-TSO contexts, fixing UDP TX corruption Ed Swierk
2017-11-17  3:04 ` Jason Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e9f7aaa0-6b5d-e494-8da0-26bf3f4f6929@redhat.com \
    --to=jasowang@redhat.com \
    --cc=berrange@redhat.com \
    --cc=eswierk@skyportsystems.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).