kernelnewbies.kernelnewbies.org archive mirror
 help / color / mirror / Atom feed
* net_device: limit rate ot tx packets
@ 2013-04-13 12:11 christian+kn at wwad.de
  2013-04-14  6:15 ` michi1 at michaelblizek.twilightparadox.com
  0 siblings, 1 reply; 6+ messages in thread
From: christian+kn at wwad.de @ 2013-04-13 12:11 UTC (permalink / raw)
  To: kernelnewbies

Hi All,

can someone please explain me, how the kernel handles different
transfer rates of different net_devices? Or in other words: How does
the systemcall send() know, when to block?

An example:
  cat /dev/zero | pv | nc -u <someip>
will show different throughput speeds depending on the network device,
the packets are sent over (wlan0 will be slower than eth0).

 - Can someone point out the location in the linux kernel source, where
   this is handled?

 - If I register a net_device. How do I signal to the upper
   network layers that my driver can only accept packets at a
   certain rate? I tried stopping the egress queue by calling
   netif_stop_queue(), but this only has the effect that the queue
   overruns.


I have the feeling that I'm missing out a very vital point on how the
kernel's networking subsystem works. Unfortunately, Understanding Linux
Network Internals couldn't help me out here.


Thanks,
Christian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* net_device: limit rate ot tx packets
  2013-04-13 12:11 net_device: limit rate ot tx packets christian+kn at wwad.de
@ 2013-04-14  6:15 ` michi1 at michaelblizek.twilightparadox.com
  2013-04-14  7:45   ` Valdis.Kletnieks at vt.edu
  0 siblings, 1 reply; 6+ messages in thread
From: michi1 at michaelblizek.twilightparadox.com @ 2013-04-14  6:15 UTC (permalink / raw)
  To: kernelnewbies

Hi!

On 14:11 Sat 13 Apr     , christian+kn at wwad.de wrote:
> Hi All,
> 
> can someone please explain me, how the kernel handles different
> transfer rates of different net_devices? Or in other words: How does
> the systemcall send() know, when to block?
> 
> An example:
>   cat /dev/zero | pv | nc -u <someip>
> will show different throughput speeds depending on the network device,
> the packets are sent over (wlan0 will be slower than eth0).
> 
>  - Can someone point out the location in the linux kernel source, where
>    this is handled?
> 
>  - If I register a net_device. How do I signal to the upper
>    network layers that my driver can only accept packets at a
>    certain rate? I tried stopping the egress queue by calling
>    netif_stop_queue(), but this only has the effect that the queue
>    overruns.
> 
> 
> I have the feeling that I'm missing out a very vital point on how the
> kernel's networking subsystem works. Unfortunately, Understanding Linux
> Network Internals couldn't help me out here.

I think this is what causes the behaviour:

1) If the network device is congested, packets will queue up in the qdisc

2) Socket memory gets used up, semdmsg will probably sleep here:
udp_sendmsg
ip_make_skb
__ip_append_data
sock_alloc_send_skb
sock_alloc_send_pskb
sock_wait_for_wmem

3) When packets are sent, the socket destructor will be called and wake up
the sender:
sock_wfree()
sk->sk_write_space(sk); (which is sock_def_write_space)


It would be interesting to see what will happen if the qdisc is smaller than
the socket memory...

	-Michi
-- 
programing a layer 3+4 network protocol for mesh networks
see http://michaelblizek.twilightparadox.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* net_device: limit rate ot tx packets
  2013-04-14  6:15 ` michi1 at michaelblizek.twilightparadox.com
@ 2013-04-14  7:45   ` Valdis.Kletnieks at vt.edu
  2013-04-14  8:09     ` michi1 at michaelblizek.twilightparadox.com
  0 siblings, 1 reply; 6+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2013-04-14  7:45 UTC (permalink / raw)
  To: kernelnewbies

On Sun, 14 Apr 2013 08:15:21 +0200, michi1 at michaelblizek.twilightparadox.com said:

> It would be interesting to see what will happen if the qdisc is smaller than
> the socket memory...

As long as the qdisc is able to send at least one MSS at a time, it will eventually
clear out the backlog (assuming more isn't added in the meantime).  This of
course requires you to be using a well-behaved qdisc.

But if you're using a broken one, any resulting issues are arguably self-inflicted.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 865 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130414/03b731a6/attachment.bin 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* net_device: limit rate ot tx packets
  2013-04-14  7:45   ` Valdis.Kletnieks at vt.edu
@ 2013-04-14  8:09     ` michi1 at michaelblizek.twilightparadox.com
  2013-04-14 14:35       ` Valdis.Kletnieks at vt.edu
  0 siblings, 1 reply; 6+ messages in thread
From: michi1 at michaelblizek.twilightparadox.com @ 2013-04-14  8:09 UTC (permalink / raw)
  To: kernelnewbies

Hi!

On 03:45 Sun 14 Apr     , Valdis.Kletnieks at vt.edu wrote:
> On Sun, 14 Apr 2013 08:15:21 +0200, michi1 at michaelblizek.twilightparadox.com said:
> 
> > It would be interesting to see what will happen if the qdisc is smaller than
> > the socket memory...
> 
> As long as the qdisc is able to send at least one MSS at a time, it will eventually
> clear out the backlog (assuming more isn't added in the meantime).  This of
> course requires you to be using a well-behaved qdisc.
> 
> But if you're using a broken one, any resulting issues are arguably self-inflicted.

This is not what I meant. When the qdisc has a size of say 256KB and the
socket memory is, say 128kb, the socket memory limit will be reached before
the qdisc limit and the socket will sleep. But when the socket memory limit
is greater than the qdisc limit, it will be interesting whether the socket
still sleeps or starts dropping packets.

man send:
       ENOBUFS
              The  output queue for a network interface was full.  This generally indicates that the interface has stopped sending, but may
              be caused by transient congestion.  (Normally, this does not occur in Linux.  Packets are just silently dropped when a device
              queue overflows.)


Long live the principle of least surprise...

	-Michi
-- 
programing a layer 3+4 network protocol for mesh networks
see http://michaelblizek.twilightparadox.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* net_device: limit rate ot tx packets
  2013-04-14  8:09     ` michi1 at michaelblizek.twilightparadox.com
@ 2013-04-14 14:35       ` Valdis.Kletnieks at vt.edu
  2013-04-17 14:30         ` christian+kn at wwad.de
  0 siblings, 1 reply; 6+ messages in thread
From: Valdis.Kletnieks at vt.edu @ 2013-04-14 14:35 UTC (permalink / raw)
  To: kernelnewbies

On Sun, 14 Apr 2013 10:09:54 +0200, michi1 at michaelblizek.twilightparadox.com said:

> This is not what I meant. When the qdisc has a size of say 256KB and the
> socket memory is, say 128kb, the socket memory limit will be reached before
> the qdisc limit and the socket will sleep. But when the socket memory limit
> is greater than the qdisc limit, it will be interesting whether the socket
> still sleeps or starts dropping packets.

How to figure this out for yourself:

Look at net/sched/sch_plug.c, which is a pretty simple qdisc (transmit packets
until a plug request is recieved, then queue until unplugged). In particular,
look at plug_enqueue() to see what happens when q->limit is exceeded, and
plug_init() to see where q->limit is set.

Then look at the definition of qdisc_reshape_fail() in
include/net/sch_generic.h to figure out what the qdisc returns if q->limit is
exceeded.

Then go look at net/core/dev.c, in function __dev_xmit_skb(), and
watch the variable 'rc'.

Now go find the caller of __dev_xmit_skb() and see what it does with
that return value.

Hope that helps...

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 865 bytes
Desc: not available
Url : http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20130414/ed588d4b/attachment.bin 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* net_device: limit rate ot tx packets
  2013-04-14 14:35       ` Valdis.Kletnieks at vt.edu
@ 2013-04-17 14:30         ` christian+kn at wwad.de
  0 siblings, 0 replies; 6+ messages in thread
From: christian+kn at wwad.de @ 2013-04-17 14:30 UTC (permalink / raw)
  To: kernelnewbies

> 
> How to figure this out for yourself:
> 
> Look at net/sched/sch_plug.c, which is a pretty simple qdisc
> (transmit packets until a plug request is recieved, then queue until
> unplugged). In particular, look at plug_enqueue() to see what happens
> when q->limit is exceeded, and plug_init() to see where q->limit is
> set.
> 
> Then look at the definition of qdisc_reshape_fail() in
> include/net/sch_generic.h to figure out what the qdisc returns if
> q->limit is exceeded.
> 
> Then go look at net/core/dev.c, in function __dev_xmit_skb(), and
> watch the variable 'rc'.
> 
> Now go find the caller of __dev_xmit_skb() and see what it does with
> that return value.
> 
> Hope that helps...
> 

Thank You very much, Michi and Vladis!
With Your help, I could figure it out.

I tried it out, and as Michi stated:
  If the tx_queue_len is small, send() will not sleep
  If the tx_queue_len is large, send() will sleep at some point

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-04-17 14:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-13 12:11 net_device: limit rate ot tx packets christian+kn at wwad.de
2013-04-14  6:15 ` michi1 at michaelblizek.twilightparadox.com
2013-04-14  7:45   ` Valdis.Kletnieks at vt.edu
2013-04-14  8:09     ` michi1 at michaelblizek.twilightparadox.com
2013-04-14 14:35       ` Valdis.Kletnieks at vt.edu
2013-04-17 14:30         ` christian+kn at wwad.de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).