Re: netif.h clarifications

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Paul Durrant <Paul.Durrant@citrix.com>
To: Roger Pau Monne <roger.pau@citrix.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Cc: Wei Liu <wei.liu2@citrix.com>, David Vrabel <david.vrabel@citrix.com>
Subject: Re: netif.h clarifications
Date: Fri, 20 May 2016 09:09:43 +0000	[thread overview]
Message-ID: <3d18e0cafe0648bfa8f8cbd1673e1031@AMSPEX02CL03.citrite.net> (raw)
In-Reply-To: <20160519162750.qfdns6uzhcg5v3ts@mac>

> -----Original Message-----
> From: Roger Pau Monné [mailto:roger.pau@citrix.com]
> Sent: 19 May 2016 17:28
> To: xen-devel@lists.xenproject.org
> Cc: Wei Liu; David Vrabel; Paul Durrant
> Subject: netif.h clarifications
> 
> Hello,
> 
> While trying to solve a FreeBSD netfront bug [0] I came across a couple
> of netif.h dark spots that I think should be documented in the netif.h
> header. I'm willing to make those changes, but I want to make sure my
> understanding is right.
> 
> Regarding checksum offloading, I had a hard time figuring out what the
> different flags actually mean:
> 
> /* Packet data has been validated against protocol checksum. */
> #define _NETRXF_data_validated (0)
> #define  NETRXF_data_validated (1U<<_NETRXF_data_validated)
> 
> /* Protocol checksum field is blank in the packet (hardware offload)? */
> #define _NETRXF_csum_blank     (1)
> #define  NETRXF_csum_blank     (1U<<_NETRXF_csum_blank)
> 
> (Same applies to the TX flags, I'm not copying them there because they are
> the same)
> 
> First of all, I assume "protocol" here refers to Layer 3 and Layer 4
> protocol, so that would be IP and TCP/UDP/SCTP checksum offloading? In
> any
> case this needs clarification and proper wording.
> 
> Then, I have some questions regarding the meaning of the flags themselves
> and the content of the checksum field in all the possible scenarios.
> 
> On RX path:
> 
>  - NETRXF_data_validated only: data has been validated, but what's the state
>    of the checksum field itself? If the data is validated again, would it
>    match against the checksum?
>  - NETRXF_csum_blank only: I don't think this makes much sense, data is in
>    unknown state and checksum is not present, so there's no way to validate
>    it. Packet should be dropped?

Yes, in practice it's not used on its own. As you say, I don't think it makes any sense.

>  - NETRXF_data_validated | NETRXF_csum_blank: this combination seems to
> be
>    the one that makes more sense to me, data is valid, but checksum is not
>    there. This matches what some real NICs already do, that is to provide
>    the result of the checksum check _without_ actually providing the
>    checksum itself on the RX path.
> 

In Linux netback this is set if the checksum info is partial, which I take to mean that the packet has a valid pseudo-header checksum. I think packets coming from NICs are more likely to be 'checksum unnecessary' which results in NETRXF_data_validated only, which I take to mean that the checksum has been verified but may have been trashed in the process.

> On TX path:
> 
>  - NETTXF_data_validated only: I don't think this makes any sense, data is
>    always valid from the senders point of view.
>  - NETTXF_csum_blank only: checksum calculation offload, it should be
>    performed by the other end.
>  - NETTXF_data_validated | NETTXF_csum_blank: again, I don't think it makes
>    much sense, data is always valid from the senders point of view, or else
>    why bother sending it?
> 

In Linux netback, the code goes:

		if (txp->flags & XEN_NETTXF_csum_blank)
			skb->ip_summed = CHECKSUM_PARTIAL;
		else if (txp->flags & XEN_NETTXF_data_validated)
			skb->ip_summed = CHECKSUM_UNNECESSARY;

So, csum_blank with or without data_validated means that it's assumed that the packet contains a valid pseudo-header checksum, but if csum_blank is not set then data_validated means that the data is good but the checksum is in an unknown state which is ok if the packet is then forwarded to another vif, and I assume 'unnecessary' is ignored by NIC drivers on their TX side (I guess they would only be interesting in 'partial').

> So it looks to me like we could get away with just two flags, one on the RX
> side that signals that the packet doesn't have a checksum but that the
> checksum validation has already been performed, and another one on the TX
> side to signal that the packet doesn't have a calculated checksum
> (typical checksum offload).
> 

On the TX side it would be useful to have flags which indicate:

- Full checksum present
- Pseudo-header checksum present
- State of checksum is unknown

On the RX side it would be useful to have

- Data validated, checksum state unknown
- Data validated, checksum correct
- Data not validated

> And then I've also seen some issues with TSO/LRO (GSO in Linux
> terminology)
> when using packet forwarding inside of a FreeBSD DomU. For example in the
> following scenario:
> 
>                                    +
>                                    |
>    +---------+           +--------------------+           +----------+
>    |         |A         B|       router       |C         D|          |
>    | Guest 1 +-----------+         +          +-----------+ Guest 2  |
>    |         |  bridge0  |         |          |  bridge1  |          |
>    +---------+           +--------------------+           +----------+
>    172.16.1.67          172.16.1.66|   10.0.1.1           10.0.1.2
>                                    |
>              +--------------------------------------------->
>               ssh 10.0.1.2         |
>                                    |
>                                    |
>                                    |
>                                    +
> 
> All those VMs are inside of the same host, and one of them acts as a gateway
> between them because they are on two different subnets. In this case I'm
> seeing issues because even though I disable TSO/LRO on the "router" at
> runtime, the backend doesn't watch the xenstore feature flag, and never
> disables it from the vif on the Dom0 bridge. This causes LRO packets
> (non-fragmented) to be received at point 'C', and then when the gateway
> tries to inject them into the other NIC it fails because the size is greater
> than the MTU, and the "no fragment" bit is set.
> 

Yes, GSO cannot be disabled/enabled dynamically on the netback tx side (i.e. guest rx side) so you can't turn it off. The Windows PV driver leave sit on all the time and does the fragmentation itself if the stack doesn't want GRO. Doing the fragmentation in the frontend makes more sense anyway since the cpu cycles are burned by the VM rather than dom0 and so it scales better.

> How does Linux deal with this situation? Does it simply ignore the no
> fragment flag and fragments the packet? Does it simply inject the packet to
> the other end ignoring the MTU and propagating the GSO flag?
> 

I've not looked at the netfront rx code but I assume that the large packet that is passed from netback is just marked as GSO and makes its way to wherever it's going (being fragmented by the stack if it's forwarded to an interface that doesn't have the TSO flag set).

  Paul

> Roger.
> 
> [0] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=188261

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2016-05-20  9:15 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-19 16:27 netif.h clarifications Roger Pau Monné
2016-05-20  9:09 ` Paul Durrant [this message]
2016-05-20 10:46   ` Roger Pau Monne
2016-05-20 11:55     ` Paul Durrant
2016-05-20 12:34       ` Roger Pau Monne
2016-05-20 12:49         ` Paul Durrant

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d18e0cafe0648bfa8f8cbd1673e1031@AMSPEX02CL03.citrite.net \
    --to=paul.durrant@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=roger.pau@citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).