public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Stefani Seibold <stefani@seibold.net>
Cc: linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
	davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [PATCH] UDPCP Communication Protocol
Date: Fri, 31 Dec 2010 12:54:18 +0100	[thread overview]
Message-ID: <1293796458.2973.59.camel@edumazet-laptop> (raw)
In-Reply-To: <1293794589.5285.16.camel@wall-e>

Le vendredi 31 décembre 2010 à 12:23 +0100, Stefani Seibold a écrit :
> Am Freitag, den 31.12.2010, 11:41 +0100 schrieb Eric Dumazet:
> > Le vendredi 31 décembre 2010 à 11:22 +0100, Stefani Seibold a écrit :
> > > Am Freitag, den 31.12.2010, 11:00 +0100 schrieb Eric Dumazet:
> > > > Le vendredi 31 décembre 2010 à 10:29 +0100, stefani@seibold.net a
> > > > écrit :
> > > > > From: Stefani Seibold <stefani@seibold.net>
> > > > > 
> > > > >  
> > > > >  /*
> > > > >   *	Handle MSG_ERRQUEUE
> > > > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > > > index 2d3ded4..f9890a2 100644
> > > > > --- a/net/ipv4/udp.c
> > > > > +++ b/net/ipv4/udp.c
> > > > > @@ -1310,7 +1310,7 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
> > > > >  	if (inet_sk(sk)->inet_daddr)
> > > > >  		sock_rps_save_rxhash(sk, skb->rxhash);
> > > > >  
> > > > > -	rc = ip_queue_rcv_skb(sk, skb);
> > > > > +	rc = sock_queue_rcv_skb(sk, skb);
> > > > 
> > > > Ouch... Care to explain why you changed this part ???
> > > > 
> > > > You just destroyed commit f84af32cbca70a intent, without any word in
> > > > your changelog. Making UDP slower, while others try to speed it must be
> > > > explained and advertised.
> > > >  
> > > > In general, we prefer a preliminary patch introducing all the changes in
> > > > current stack, then another one with the new protocol.
> > > > 
> > > 
> > > I reverted this for two reasons:
> > > 
> > > First ip_queue_rcv_skb drops the dst entry, which breaks the user land
> > > application which expect packet info after a
> > > 
> > > setsockopt(handle, IPPROTO_IP, IP_PKTINFO, &const_int_1, sizeof(int));
> > > 
> > > But for packets already in the queue this information will be lost. So
> > > it is a potential race condition.
> > > 
> > 
> > Exactly same race with packet filters. 
> > 
> > If your life depends on that, you must flush incoming queue _after_
> > issuing setsockopt(handle, IPPROTO_IP, IP_PKTINFO, &const_int_1,
> > sizeof(int)). So that all following packets have the information needed.
> > 
> > 
> 
> I though always that the linux kernel never breaks user land. This is a
> break!
> 

Only if user land is buggy it breaks. Where is your user land code so
that I can show you the bug ?

This dst refcount avoidance is absolutely crucial and we worked hard on
it.

> > 
> > > Second it breaks my UDPCP communication protocol stack module, which
> > > works very well till 2.6.35. I need this information in the data_ready()
> > > function to generate an ACK.
> > > 
> > > 
> > 
> > See now why you should not proceed like that ?
> > 
> > You know _perfectly_ there is a problem but prefer to keep it for you,
> > and hope this bit will be unnoticed ?
> > 
> 
> Stop to accuse me. There was a feature that was gone. An it took me six
> hours to figure out whats going wrong. I did not saw and see a real
> problem with this patch. It looked for me like an easy and clean
> solution. It was never my intention to trick somebody, especially u.
> 

Silently doing a revert is not an option. How must I tell this to you ?


> > This is not how things are dealed in linux, really.
> > 
> > You'll have to find a way so that things work well for everybody, not
> > only for you.
> > 
> > I guess you must fix UDPCP protocol stack, not 'fix linux'
> > 
> 
> I cannot fix it, because the information is still lost, and i need it. 
> 

You can fix it. Really. If not, you can pay me and I'll fix it for you.

> In my opinion it was a very bad idea to throw away important
> information. I checked it and Linux handle this since 2.6.0 in this way.
> 
> It would be better not to accuse than to work on a solution. 
> 

Where do you see an "accuse" ? Because you tried to silently "fix" the
thing without telling us how the damn thing was broken ? Come on !

> Question: How much performace gain does the early drop give. Are there
> benchmark results?

Thats pretty simple. dst refcount was the only contention point in UDP
stack. Yes, its not a joke.

Re introducing an atomic_inc() at each incoming packet, and atomic_dec()
each time user process dequeues the packet can have a huge impact.

One order of magnitude actually. Depending on number of cpus fighting on
this cache line, this ranges from 20% to 4000% slowdown.

Some people handle thousands of UDP sockets on one machine. Your UDPCP
apparently handle very few sockets (you have one central linked list),
so your use case probably dont care of performance.

  reply	other threads:[~2010-12-31 11:54 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-31  9:29 [PATCH] UDPCP Communication Protocol stefani
2010-12-31 10:00 ` Eric Dumazet
2010-12-31 10:22   ` Stefani Seibold
2010-12-31 10:41     ` Eric Dumazet
2010-12-31 11:23       ` Stefani Seibold
2010-12-31 11:54         ` Eric Dumazet [this message]
2011-01-01 21:40           ` Stefani Seibold
2011-01-10 22:28             ` Hagen Paul Pfeifer
2010-12-31 10:15 ` Eric Dumazet
2010-12-31 10:29   ` Stefani Seibold
2010-12-31 10:35 ` Eric Dumazet
2010-12-31 11:25 ` Eric Dumazet
2010-12-31 12:00   ` Eric Dumazet
2011-01-01 21:28     ` Stefani Seibold
2010-12-31 17:46 ` Stephen Hemminger
2010-12-31 17:46 ` Stephen Hemminger
2011-01-06 20:11 ` Pavel Machek
2011-01-06 20:17   ` David Miller
2011-01-10 22:53 ` Jesper Juhl
2011-01-11  0:49 ` Hagen Paul Pfeifer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1293796458.2973.59.camel@edumazet-laptop \
    --to=eric.dumazet@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=stefani@seibold.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox