From: Alex Sidorenko <alexandre.sidorenko@hpe.com>
To: David Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org,
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Subject: Re: Receive offloads, small RCVBUF and zero TCP window
Date: Wed, 30 Nov 2016 10:10:45 -0500 [thread overview]
Message-ID: <3885970.YN3y5yqlgB@zbook> (raw)
In-Reply-To: <15427752.RvQkb5CQdb@zbook>
On Monday, November 28, 2016 4:14:04 PM EST Alex Sidorenko wrote:
> On Monday, November 28, 2016 3:54:59 PM EST David Miller wrote:
> > From: Alex Sidorenko <alexandre.sidorenko@hpe.com>
> > Date: Mon, 28 Nov 2016 15:49:26 -0500
> >
> > > Now the question is whether is is OK to have icsk->icsk_ack.rcv_mss
> > > larger than MTU.
> >
> > It absolutely is not OK.
> >
> > If VMWare wants to receive large frames for batching purposes it must
> > use GRO or similar to achieve that, not just send vanilla frames into
> > the stack which are larger than the device MTU.
> >
>
> As VMWare's vmxnet3 driver is open-sourced and part of generic kernel, do you think the problem is in that driver or elsewhere? I looked at vmxnet3 sources and see that it uses LRO/GRO subroutines. Unfortunately, I don't understand its logic enough to see whether they are doing anything incorrectly.
I think this has been already fixed in recent versions of vmxnet3 driver (but not in RHEL6). VMWare/ESX can pass us aggregated large SKBs indeed (> MTU) if LRO is enabled, but the driver takes care of that in vmxnet3_rq_rx_complete():
} else if (segCnt != 0 || skb->len > mtu) {
u32 hlen;
hlen = vmxnet3_get_hdr_len(adapter, skb,
(union Vmxnet3_GenericDesc *)rcd);
if (hlen == 0)
goto not_lro;
skb_shinfo(skb)->gso_type =
rcd->v4 ? SKB_GSO_TCPV4 : SKB_GSO_TCPV6;
if (segCnt != 0) {
skb_shinfo(skb)->gso_segs = segCnt;
skb_shinfo(skb)->gso_size =
DIV_ROUND_UP(skb->len -
hlen, segCnt);
} else {
skb_shinfo(skb)->gso_size = mtu - hlen;
}
}
So if packets have been aggregated,
u8 segCnt; /* Number of aggregated packets */
we compute gso_size by dividing large skb->len by the number.
I still like Marcelo's idea of printing a warning when icsk->icsk_ack.rcv_mss looks unreasonable, should really help with detecting buggy drivers.
next prev parent reply other threads:[~2016-11-30 15:10 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-28 20:49 Receive offloads, small RCVBUF and zero TCP window Alex Sidorenko
2016-11-28 20:54 ` David Miller
2016-11-28 21:14 ` Alex Sidorenko
2016-11-30 15:10 ` Alex Sidorenko [this message]
2016-11-28 22:01 ` Marcelo Ricardo Leitner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3885970.YN3y5yqlgB@zbook \
--to=alexandre.sidorenko@hpe.com \
--cc=davem@davemloft.net \
--cc=marcelo.leitner@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.