Re: [RFC 1/3] lro: Generic LRO for TCP traffic

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Jan-Bernd Themann <ossthema@de.ibm.com>
To: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Thomas Klein <tklein@de.ibm.com>,
	Jan-Bernd Themann <themann@de.ibm.com>,
	netdev <netdev@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-ppc <linuxppc-dev@ozlabs.org>,
	Christoph Raisch <raisch@de.ibm.com>,
	Marcus Eder <meder@de.ibm.com>,
	Stefan Roscher <stefan.roscher@de.ibm.com>
Subject: Re: [RFC 1/3] lro: Generic LRO for TCP traffic
Date: Thu, 12 Jul 2007 13:54:42 +0200	[thread overview]
Message-ID: <200707121354.43407.ossthema@de.ibm.com> (raw)
In-Reply-To: <20070712080137.GA25699@2ka.mipt.ru>

Hi Evgeniy

On Thursday 12 July 2007 10:01, Evgeniy Polyakov wrote:

> > +
> > +	if (tcph->cwr || tcph->ece || tcph->urg || !tcph->ack || tcph->psh
> > +	    || tcph->rst || tcph->syn || tcph->fin)
> > +		return -1;
> 
> I think you do not want to break lro frame because of push flag - it is
> pretty common flag, which does not brak processing (and I'm not sure if
> it has any special meaning this days).
> 
> > +	if (INET_ECN_is_ce(ipv4_get_dsfield(iph)))
> > +		return -1;
> > +
> > +	if (tcph->doff != TCPH_LEN_WO_OPTIONS
> > +	    && tcph->doff != TCPH_LEN_W_TIMESTAMP)
> > +		return -1;
> > +
> > +	/* check tcp options (only timestamp allowed) */
> > +	if (tcph->doff == TCPH_LEN_W_TIMESTAMP) {
> > +		u32 *topt = (u32 *)(tcph + 1);
> > +
> > +		if (*topt != htonl((TCPOPT_NOP << 24) | (TCPOPT_NOP << 16)
> > +				   | (TCPOPT_TIMESTAMP << 8)
> > +				   | TCPOLEN_TIMESTAMP))
> > +			return -1;
> > +
> > +		/* timestamp should be in right order */
> > +		topt++;
> > +		if (lro_desc && (ntohl(lro_desc->tcp_rcv_tsval) > ntohl(*topt)))
> > +			return -1;
> 
> This still does not handle wrapping over 32 bits.
> What about
> if (lro_desc && after(ntohl(lro_desc->tcp_rcv_tsval), ntohl(*topt)))
> 	return -1;

Looks good, will change that

> 
> > +		/* timestamp reply should not be zero */
> > +		topt++;
> > +		if (*topt == 0)
> > +			return -1;
> > +	}
> > +
> > +	return 0;
> > +}
> 
> > +static struct net_lro_desc *lro_get_desc(struct net_lro_mgr *mgr,
> > +					 struct net_lro_desc *lro_arr,
> > +					 struct iphdr *iph,
> > +					 struct tcphdr *tcph)
> > +{
> > +	struct net_lro_desc *lro_desc = NULL;
> > +	struct net_lro_desc *tmp;
> > +	int max_desc = mgr->max_desc;
> > +	int i;
> > +
> > +	for (i = 0; i < max_desc; i++) {
> > +		tmp = &lro_arr[i];
> > +		if (tmp->active)
> > +			if (!lro_check_tcp_conn(tmp, iph, tcph)) {
> > +				lro_desc = tmp;
> > +				goto out;
> > +			}
> > +	}
> 
> Ugh... What about tree structure or hash here?
Our initial version was based on the following assumptions (remember, 8 elements...):

- given a quota of 64 packets, it makes no sense to use huge arrays for LRO descriptors
  (in our case 8 elements seem to work fine).
- trying to benefit from caching effects+branch prediction,
  + use the built in cacheline prefetch

I guess the array mechanism can be improved (finding free entries), but for the
initial version to see how the rest of the stack behaves with LRO
it might be ok this way.

Do you think a tree or hash would improve this with existing 
caching designs (for a small number of elements)?

> 
> > +	for (i = 0; i < max_desc; i++) {
> > +		if(!lro_arr[i].active) {
> > +			lro_desc = &lro_arr[i];
> > +			goto out;
> > +		}
> > +	}
> > +
> > +out:
> > +	return lro_desc;
> > +}
> 
> > +int __lro_proc_skb(struct net_lro_mgr *lro_mgr, struct sk_buff *skb,
> > +		   struct vlan_group *vgrp, u16 vlan_tag, void *priv)
> > +{
> > +	struct net_lro_desc *lro_desc;
> > +        struct iphdr *iph;
> > +        struct tcphdr *tcph;
> > +
> > +	if (!lro_mgr->get_ip_tcp_hdr
> > +	    || lro_mgr->get_ip_tcp_hdr(skb, &iph, &tcph, priv))
> > +		goto out;
> > +
> > +	lro_desc = lro_get_desc(lro_mgr, lro_mgr->lro_arr, iph, tcph);
> > +	if (!lro_desc)
> > +		goto out;
> 
> There is no protection of the descriptor array from accessing from
> different CPUs. Is it forbidden to share net_lro_mgr structure?
> 

Currently we assume that netpoll runs with one device only on one cpu
at a time, and if there are multiple receive queues that can be processed
in parallel the traffic is usually sorted per receive queue. It would make
sense to use an own LRO "Manager" for each queue (would speed up the lookup)

Regards,
Jan-Bernd

     prev parent reply	other threads:[~2007-07-12 12:20 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-11 14:21 [RFC 1/3] lro: Generic LRO for TCP traffic Jan-Bernd Themann
2007-07-12  8:01 ` Evgeniy Polyakov
2007-07-12 11:54   ` Jan-Bernd Themann [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200707121354.43407.ossthema@de.ibm.com \
    --to=ossthema@de.ibm.com \
    --cc=johnpol@2ka.mipt.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=meder@de.ibm.com \
    --cc=netdev@vger.kernel.org \
    --cc=raisch@de.ibm.com \
    --cc=stefan.roscher@de.ibm.com \
    --cc=themann@de.ibm.com \
    --cc=tklein@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).