[RFC] Yield in netlink_broadcast when congested

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] Yield in netlink_broadcast when congested
@ 2004-10-16 11:30 Herbert Xu
  2004-10-16 23:51 ` Thomas Graf
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Herbert Xu @ 2004-10-16 11:30 UTC (permalink / raw)
  To: Pablo Neira, hadi, David S. Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 1816 bytes --]

Hi:

Before we start please bear in mind that netlink is fundamentally
an *unreliable* protocol.  This is the price we pay in order to use
it in all the contexts that we do.  So what we're looking for here
is not how to make netlink 100% reliable, but what we can do to
improve the quality of its implementation.

I have a proposal for the specific case of overruns with netlink
broadcast messages generated in a synchronous context typified
by the ifconfig example that Jamal gave.

In such contexts it is possible for the sender to sleep.  However, we
don't want to delay them indefinitely since the system must progress
even in the presence of idle multicast listeners.  I also have strong
reservations about introducing any additional queues since all the
ones I've seen don't deliver anything over and above what you can
achieve by increasing the receive queue of the listener itself.

Now I noticed that on SMP machines Jamal's case works successfully.
That is, ip monitor is able to keep up with the flood generated by
ifconfig.

In fact, what's happening on UP is that in the time slice given to
the sender --- ifconfig, the kernel is able to generate a lot more
messages than what the average netlink receive queue can accomodate.

So here is my proposal: if we detect signs of impending congestion
in netlink_broadcast(), and that we're in a sleepable context, then
we yield().

This gives the receivers a chance to pull down the messages without
having the sender spinning indefinitely.  I've tested it on my UP
machine and it does resolve the problem for ip monitor.

Comments anyone?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 1142 bytes --]

===== net/netlink/af_netlink.c 1.56 vs edited =====
--- 1.56/net/netlink/af_netlink.c	2004-09-29 07:47:22 +10:00
+++ edited/net/netlink/af_netlink.c	2004-10-16 21:23:29 +10:00
@@ -593,7 +593,7 @@
 		skb_set_owner_r(skb, sk);
 		skb_queue_tail(&sk->sk_receive_queue, skb);
 		sk->sk_data_ready(sk, skb->len);
-		return 0;
+		return atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf;
 	}
 	return -1;
 }
@@ -606,6 +606,8 @@
 	struct sk_buff *skb2 = NULL;
 	int protocol = ssk->sk_protocol;
 	int failure = 0, delivered = 0;
+	int congested = 0;
+	int val;

 	netlink_trim(skb, allocation);

@@ -640,9 +642,10 @@
 			netlink_overrun(sk);
 			/* Clone failed. Notify ALL listeners. */
 			failure = 1;
-		} else if (netlink_broadcast_deliver(sk, skb2)) {
+		} else if ((val = netlink_broadcast_deliver(sk, skb2)) < 0) {
 			netlink_overrun(sk);
 		} else {
+			congested |= val;
 			delivered = 1;
 			skb2 = NULL;
 		}
@@ -655,8 +658,11 @@
 		kfree_skb(skb2);
 	kfree_skb(skb);

-	if (delivered)
+	if (delivered) {
+		if (congested && (allocation & __GFP_WAIT))
+			yield();
 		return 0;
+	}
 	if (failure)
 		return -ENOBUFS;
 	return -ESRCH;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Yield in netlink_broadcast when congested
  2004-10-16 11:30 [RFC] Yield in netlink_broadcast when congested Herbert Xu
@ 2004-10-16 23:51 ` Thomas Graf
  2004-10-17  7:39   ` Herbert Xu
  2004-10-20  5:28 ` David S. Miller
  2004-10-24 15:22 ` jamal
  2 siblings, 1 reply; 9+ messages in thread
From: Thomas Graf @ 2004-10-16 23:51 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Pablo Neira, hadi, David S. Miller, netdev

* Herbert Xu <20041016113006.GA12843@gondor.apana.org.au> 2004-10-16 21:30
> So here is my proposal: if we detect signs of impending congestion
> in netlink_broadcast(), and that we're in a sleepable context, then
> we yield().
> 
> This gives the receivers a chance to pull down the messages without
> having the sender spinning indefinitely.  I've tested it on my UP
> machine and it does resolve the problem for ip monitor.

Looks good.

Up to how many receivers does that work? We would still see the
effect if too many receivers are registered, right?

I haven't had the time to read through the previous conversion so
I don't know if this has come up so far but we could request an ACK
from each receiver an congest based on the following strategy:

SS := Last serial sent
SR := Last serial received
LL := Lower congestion limit to ignore short bursts
UL := Upper congestion limit to avoid infinite congestion:

Congestion map for (SS - SR)
0..LL: NOP
LL..UL: dp-congestion
UL..MAX: NOP

The biggest problem is that I don't know of any application
handling ACK requests properly yet.

Just a wild idea anyway, I'm not even sure whether it's worth
the effort.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Yield in netlink_broadcast when congested
  2004-10-16 23:51 ` Thomas Graf
@ 2004-10-17  7:39   ` Herbert Xu
  2004-10-17 11:08     ` Thomas Graf
  0 siblings, 1 reply; 9+ messages in thread
From: Herbert Xu @ 2004-10-17  7:39 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Pablo Neira, hadi, David S. Miller, netdev

On Sun, Oct 17, 2004 at 01:51:37AM +0200, Thomas Graf wrote:
> > So here is my proposal: if we detect signs of impending congestion
> > in netlink_broadcast(), and that we're in a sleepable context, then
> > we yield().
> 
> Up to how many receivers does that work? We would still see the
> effect if too many receivers are registered, right?

Assuming the scheduler is fair then every listener should get their
time slice to receive the messages.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Yield in netlink_broadcast when congested
  2004-10-17  7:39   ` Herbert Xu
@ 2004-10-17 11:08     ` Thomas Graf
  2004-10-17 12:42       ` Herbert Xu
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Graf @ 2004-10-17 11:08 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Pablo Neira, hadi, David S. Miller, netdev

* Herbert Xu <20041017073957.GA21632@gondor.apana.org.au> 2004-10-17 17:39
> On Sun, Oct 17, 2004 at 01:51:37AM +0200, Thomas Graf wrote:
> > > So here is my proposal: if we detect signs of impending congestion
> > > in netlink_broadcast(), and that we're in a sleepable context, then
> > > we yield().
> > 
> > Up to how many receivers does that work? We would still see the
> > effect if too many receivers are registered, right?
> 
> Assuming the scheduler is fair then every listener should get their
> time slice to receive the messages.

Assuming there is only a few listeners per process. I tried it out and
my UP system could handle 7 listeners in the same process but would
sporadically overrun above. I couldn't reproduce it with 1 listener
per process anymore.

Therefore I guess this is fine for now, the problem might appear again
if someone finally writes the netlink daemon to solve the locking
problems.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Yield in netlink_broadcast when congested
  2004-10-17 11:08     ` Thomas Graf
@ 2004-10-17 12:42       ` Herbert Xu
  2004-10-17 13:08         ` Thomas Graf
  0 siblings, 1 reply; 9+ messages in thread
From: Herbert Xu @ 2004-10-17 12:42 UTC (permalink / raw)
  To: Thomas Graf; +Cc: herbert, pablo, hadi, davem, netdev

Thomas Graf <tgraf@suug.ch> wrote:
> 
> Assuming there is only a few listeners per process. I tried it out and
> my UP system could handle 7 listeners in the same process but would
> sporadically overrun above. I couldn't reproduce it with 1 listener
> per process anymore.

Please recall what I said in my original message:

: Before we start please bear in mind that netlink is fundamentally
: an *unreliable* protocol.  This is the price we pay in order to use
: it in all the contexts that we do.  So what we're looking for here
: is not how to make netlink 100% reliable, but what we can do to
: improve the quality of its implementation.

Having 7 listeners in the same process isn't really the killer application
I was looking for :)

> Therefore I guess this is fine for now, the problem might appear again
> if someone finally writes the netlink daemon to solve the locking
> problems.

I don't think I understand what you are referring to.  Could you
please elaborte?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Yield in netlink_broadcast when congested
  2004-10-17 12:42       ` Herbert Xu
@ 2004-10-17 13:08         ` Thomas Graf
  2004-10-17 21:31           ` Herbert Xu
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Graf @ 2004-10-17 13:08 UTC (permalink / raw)
  To: Herbert Xu; +Cc: pablo, hadi, davem, netdev

> Having 7 listeners in the same process isn't really the killer application
> I was looking for :)

Agreed, my statement was no offense, I'm totally fine with your
path and I think it suits us well.

> > Therefore I guess this is fine for now, the problem might appear again
> > if someone finally writes the netlink daemon to solve the locking
> > problems.
> 
> I don't think I understand what you are referring to.  Could you
> please elaborte?

Although NLM_F_ATOMIC exists it's not a good way to lock certain
sets of netlink requests which must be atomic. Currently there is no
locking implemented in any application as far as i know, f.e. iproute2
looks up an ifindex and uses it w/o locking so the link could
be renamed or removed in between. If we ever implement the ietf
netconf protocol we would also require to provide locking functionality.

For that reason, the idea of a netlink daemon was mentioned a few
times which would act as a gate and implement the required locking
in userspace. This daemon could easly lead to multiple listeners
in a single process. Nevertheless, I guess we can ignore it for now as
it's hypothetical.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Yield in netlink_broadcast when congested
  2004-10-17 13:08         ` Thomas Graf
@ 2004-10-17 21:31           ` Herbert Xu
  0 siblings, 0 replies; 9+ messages in thread
From: Herbert Xu @ 2004-10-17 21:31 UTC (permalink / raw)
  To: Thomas Graf; +Cc: herbert, pablo, hadi, davem, netdev

Thomas Graf <tgraf@suug.ch> wrote:
>
> locking implemented in any application as far as i know, f.e. iproute2
> looks up an ifindex and uses it w/o locking so the link could
> be renamed or removed in between. If we ever implement the ietf

IMHO that particular case should be dealt with by providing a way for
the application to bind to a device and thus holding a reference on it.
This reference should then ensure that the interface name/ifindex doesn't
go away until the appliation relinquinshes it or dies (i.e., the socket
where the binding resides is closed).

> in userspace. This daemon could easly lead to multiple listeners
> in a single process. Nevertheless, I guess we can ignore it for now as

Multiple listeners in a single process should be fine as long as they're
listening to different groups.  It's only when you listen to the same
thing multiple times that you're going to overrun your time-slice and
then your receive queue.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Yield in netlink_broadcast when congested
  2004-10-16 11:30 [RFC] Yield in netlink_broadcast when congested Herbert Xu
  2004-10-16 23:51 ` Thomas Graf
@ 2004-10-20  5:28 ` David S. Miller
  2004-10-24 15:22 ` jamal
  2 siblings, 0 replies; 9+ messages in thread
From: David S. Miller @ 2004-10-20  5:28 UTC (permalink / raw)
  To: Herbert Xu; +Cc: pablo, hadi, netdev

On Sat, 16 Oct 2004 21:30:06 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> So here is my proposal: if we detect signs of impending congestion
> in netlink_broadcast(), and that we're in a sleepable context, then
> we yield().
> 
> This gives the receivers a chance to pull down the messages without
> having the sender spinning indefinitely.  I've tested it on my UP
> machine and it does resolve the problem for ip monitor.
> 
> Comments anyone?

This looks great, patch applied.

Thanks Herbert.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Yield in netlink_broadcast when congested
  2004-10-16 11:30 [RFC] Yield in netlink_broadcast when congested Herbert Xu
  2004-10-16 23:51 ` Thomas Graf
  2004-10-20  5:28 ` David S. Miller
@ 2004-10-24 15:22 ` jamal
  2 siblings, 0 replies; 9+ messages in thread
From: jamal @ 2004-10-24 15:22 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Pablo Neira, David S. Miller, netdev


Some feedback:
I just tested this and it does appear to be improve things. I have to
think a little on how else to break it. But overall seems to be a good
change to make.

cheers,
jamal

On Sat, 2004-10-16 at 07:30, Herbert Xu wrote:
> Hi:
> 
> Before we start please bear in mind that netlink is fundamentally
> an *unreliable* protocol.  This is the price we pay in order to use
> it in all the contexts that we do.  So what we're looking for here
> is not how to make netlink 100% reliable, but what we can do to
> improve the quality of its implementation.
> 
> I have a proposal for the specific case of overruns with netlink
> broadcast messages generated in a synchronous context typified
> by the ifconfig example that Jamal gave.
> 
> In such contexts it is possible for the sender to sleep.  However, we
> don't want to delay them indefinitely since the system must progress
> even in the presence of idle multicast listeners.  I also have strong
> reservations about introducing any additional queues since all the
> ones I've seen don't deliver anything over and above what you can
> achieve by increasing the receive queue of the listener itself.
> 
> Now I noticed that on SMP machines Jamal's case works successfully.
> That is, ip monitor is able to keep up with the flood generated by
> ifconfig.
> 
> In fact, what's happening on UP is that in the time slice given to
> the sender --- ifconfig, the kernel is able to generate a lot more
> messages than what the average netlink receive queue can accomodate.
> 
> So here is my proposal: if we detect signs of impending congestion
> in netlink_broadcast(), and that we're in a sleepable context, then
> we yield().
> 
> This gives the receivers a chance to pull down the messages without
> having the sender spinning indefinitely.  I've tested it on my UP
> machine and it does resolve the problem for ip monitor.
> 
> Comments anyone?
> 
> Cheers,

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-10-24 15:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-16 11:30 [RFC] Yield in netlink_broadcast when congested Herbert Xu
2004-10-16 23:51 ` Thomas Graf
2004-10-17  7:39   ` Herbert Xu
2004-10-17 11:08     ` Thomas Graf
2004-10-17 12:42       ` Herbert Xu
2004-10-17 13:08         ` Thomas Graf
2004-10-17 21:31           ` Herbert Xu
2004-10-20  5:28 ` David S. Miller
2004-10-24 15:22 ` jamal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).