poll broken (for can) (was: Re: Multiple programs trying to access the socket)

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* poll broken (for can) (was: Re: Multiple programs trying to access the socket)
       [not found]       ` <4D90AF67.1080405-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>
@ 2011-03-28 16:13         ` Marc Kleine-Budde
       [not found]           ` <4D90B3B0.2010401-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Marc Kleine-Budde @ 2011-03-28 16:13 UTC (permalink / raw)
  To: Wolfgang Grandegger
  Cc: Netdev-u79uwXL29TY76Z2rM5mHXA,
	socketcan-users-0fE9KPoRgkgATYTw5x5z8w


[-- Attachment #1.1: Type: text/plain, Size: 1179 bytes --]

On 03/28/2011 05:55 PM, Wolfgang Grandegger wrote:
>> BTW: I figured out why poll() wakes you up but the next write will fail
>> with -ENOBUFS again.
> 
> Ah, I'm curious? I also did realize that poll does burn CPU cycles
> (instead of waiting).

The poll callback checks if the used memory is less than the half of per
socket snd buffer (IIRC ~60K). See:

datagram_poll (http://lxr.linux.no/linux+v2.6.38/net/core/datagram.c#L737)
sock_writeable (http://lxr.linux.no/linux+v2.6.38/include/net/sock.h#L1618)

Because the size of a can frame (+the skb overhead) is much less then
the ethernet frame (+overhead) the default value for the snd buffer is
too big for can.

We get the -ENOBUF from write() if the tx_queue_len (default 10) is
exceeded.

http://lxr.linux.no/linux+v2.6.38/drivers/net/can/dev.c#L435
http://lxr.linux.no/linux+v2.6.38/net/can/af_can.c#L268

cheers, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

[-- Attachment #2: Type: text/plain, Size: 191 bytes --]

_______________________________________________
Socketcan-users mailing list
Socketcan-users-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org
https://lists.berlios.de/mailman/listinfo/socketcan-users

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: poll broken (for can)
       [not found]           ` <4D90B3B0.2010401-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
@ 2011-03-28 17:53             ` Oliver Hartkopp
       [not found]               ` <4D90CB17.4030205-fJ+pQTUTwRTk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Oliver Hartkopp @ 2011-03-28 17:53 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Netdev-u79uwXL29TY76Z2rM5mHXA,
	socketcan-users-0fE9KPoRgkgATYTw5x5z8w, Wolfgang Grandegger

On 28.03.2011 18:13, Marc Kleine-Budde wrote:
> On 03/28/2011 05:55 PM, Wolfgang Grandegger wrote:
>>> BTW: I figured out why poll() wakes you up but the next write will fail
>>> with -ENOBUFS again.
>>
>> Ah, I'm curious? I also did realize that poll does burn CPU cycles
>> (instead of waiting).
> 
> The poll callback checks if the used memory is less than the half of per
> socket snd buffer (IIRC ~60K). See:
> 
> datagram_poll (http://lxr.linux.no/linux+v2.6.38/net/core/datagram.c#L737)
> sock_writeable (http://lxr.linux.no/linux+v2.6.38/include/net/sock.h#L1618)
> 
> Because the size of a can frame (+the skb overhead) is much less then
> the ethernet frame (+overhead) the default value for the snd buffer is
> too big for can.
> 
> We get the -ENOBUF from write() if the tx_queue_len (default 10) is
> exceeded.
> 
> http://lxr.linux.no/linux+v2.6.38/drivers/net/can/dev.c#L435
> http://lxr.linux.no/linux+v2.6.38/net/can/af_can.c#L268
> 

What would be your suggestion? Decreasing the socket send buffer for CAN by
default?

Regards,
Oliver

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: poll broken (for can)
       [not found]               ` <4D90CB17.4030205-fJ+pQTUTwRTk1uMJSBkQmQ@public.gmane.org>
@ 2011-03-28 19:32                 ` Marc Kleine-Budde
       [not found]                   ` <4D90E262.1090201-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Marc Kleine-Budde @ 2011-03-28 19:32 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Netdev-u79uwXL29TY76Z2rM5mHXA,
	socketcan-users-0fE9KPoRgkgATYTw5x5z8w, Wolfgang Grandegger

[-- Attachment #1.1: Type: text/plain, Size: 2619 bytes --]

On 03/28/2011 07:53 PM, Oliver Hartkopp wrote:
> On 28.03.2011 18:13, Marc Kleine-Budde wrote:
>> On 03/28/2011 05:55 PM, Wolfgang Grandegger wrote:
>>>> BTW: I figured out why poll() wakes you up but the next write will fail
>>>> with -ENOBUFS again.
>>>
>>> Ah, I'm curious? I also did realize that poll does burn CPU cycles
>>> (instead of waiting).
>>
>> The poll callback checks if the used memory is less than the half of per
>> socket snd buffer (IIRC ~60K). See:
>>
>> datagram_poll (http://lxr.linux.no/linux+v2.6.38/net/core/datagram.c#L737)
>> sock_writeable (http://lxr.linux.no/linux+v2.6.38/include/net/sock.h#L1618)
>>
>> Because the size of a can frame (+the skb overhead) is much less then
>> the ethernet frame (+overhead) the default value for the snd buffer is
>> too big for can.
>>
>> We get the -ENOBUF from write() if the tx_queue_len (default 10) is
>> exceeded.
>>
>> http://lxr.linux.no/linux+v2.6.38/drivers/net/can/dev.c#L435
>> http://lxr.linux.no/linux+v2.6.38/net/can/af_can.c#L268
>>
> 
> What would be your suggestion? Decreasing the socket send buffer for CAN by
> default?

I haven't done any testing.....As far as I understand the code, we can
a) increase the default tx_queue_len and/or
b) decrease the default snd buffer size.

Note: a) is a per device setting whereas b) is a per socket setting.

With the current settings the -ENOBUF is triggered if we have X unsend
can frames (per device) where X equals the tx_queue_len. This means
using 5 applications, it about 2 queued (i.e. unsent) frames per app and
device.

If we increase the tx_queue_len to a high value (via ifconfig), so that
the snd buffer is fully used, before the tx_queue_len is exceeded the
write system call will block, (or return -EAGAIN of opened non
blocking). At least the last time I've done this.

I think solution b) would lead to a similar behavioural change.

What do we really want to specify?

Something like: queue up to X frames per socket and queue only Y frames
per device. Where Y = X * n and n is "I don't know yet"?

Y is simple, it's the tx_queue_len. But X is more complicated. The can
frames have non constant length (i.e. dlc) and I'm not sure that the
netdev people say if we misuse the sock_alloc_send_pskb() for our
tx-flow-control :)

Cheers, Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

[-- Attachment #2: Type: text/plain, Size: 191 bytes --]

_______________________________________________
Socketcan-users mailing list
Socketcan-users-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org
https://lists.berlios.de/mailman/listinfo/socketcan-users

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: poll broken (for can)
       [not found]                   ` <4D90E262.1090201-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
@ 2011-03-29 20:03                     ` Oliver Hartkopp
  2011-03-30  7:05                       ` David Miller
                                         ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Oliver Hartkopp @ 2011-03-29 20:03 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: Netdev-u79uwXL29TY76Z2rM5mHXA,
	socketcan-users-0fE9KPoRgkgATYTw5x5z8w, Wolfgang Grandegger

On 28.03.2011 21:32, Marc Kleine-Budde wrote:
> On 03/28/2011 07:53 PM, Oliver Hartkopp wrote:
>> On 28.03.2011 18:13, Marc Kleine-Budde wrote:
>>> On 03/28/2011 05:55 PM, Wolfgang Grandegger wrote:
>>>>> BTW: I figured out why poll() wakes you up but the next write will fail
>>>>> with -ENOBUFS again.
>>>>
>>>> Ah, I'm curious? I also did realize that poll does burn CPU cycles
>>>> (instead of waiting).
>>>
>>> The poll callback checks if the used memory is less than the half of per
>>> socket snd buffer (IIRC ~60K). See:
>>>
>>> datagram_poll (http://lxr.linux.no/linux+v2.6.38/net/core/datagram.c#L737)
>>> sock_writeable (http://lxr.linux.no/linux+v2.6.38/include/net/sock.h#L1618)
>>>
>>> Because the size of a can frame (+the skb overhead) is much less then
>>> the ethernet frame (+overhead) the default value for the snd buffer is
>>> too big for can.
>>>
>>> We get the -ENOBUF from write() if the tx_queue_len (default 10) is
>>> exceeded.
>>>
>>> http://lxr.linux.no/linux+v2.6.38/drivers/net/can/dev.c#L435
>>> http://lxr.linux.no/linux+v2.6.38/net/can/af_can.c#L268
>>>
>>
>> What would be your suggestion? Decreasing the socket send buffer for CAN by
>> default?
> 
> I haven't done any testing.....As far as I understand the code, we can
> a) increase the default tx_queue_len and/or
> b) decrease the default snd buffer size.
> 
> Note: a) is a per device setting whereas b) is a per socket setting.
> 
> With the current settings the -ENOBUF is triggered if we have X unsend
> can frames (per device) where X equals the tx_queue_len. This means
> using 5 applications, it about 2 queued (i.e. unsent) frames per app and
> device.
> 
> If we increase the tx_queue_len to a high value (via ifconfig), so that
> the snd buffer is fully used, before the tx_queue_len is exceeded the
> write system call will block, (or return -EAGAIN of opened non
> blocking). At least the last time I've done this.
> 
> I think solution b) would lead to a similar behavioural change.
> 
> What do we really want to specify?

Hm - the problem could be that people expect their frames to be sent 'in
time', so if we increase the tx_queue_len, it's not transparent when the
frames are potentially leaving the system - and if the application data is
already out-dated when hitting the medium.

What about having up to three CAN frames in each CAN_RAW socket send buffer
and e.g.50 frames in the tx_queue_len of the netdevice as a starting point?


> 
> Something like: queue up to X frames per socket and queue only Y frames
> per device. Where Y = X * n and n is "I don't know yet"?
> 
> Y is simple, it's the tx_queue_len. But X is more complicated. The can
> frames have non constant length (i.e. dlc) and I'm not sure that the
> netdev people say if we misuse the sock_alloc_send_pskb() for our
> tx-flow-control :)

I would propose to count the CAN frames independently from the can_dlc. AFAIK
the tx_queue_len is dealing with skb's - and the skb->len for the socket send
buffer is also size of struct can_frame, right?

Regards,
Oliver

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: poll broken (for can)
  2011-03-29 20:03                     ` Oliver Hartkopp
@ 2011-03-30  7:05                       ` David Miller
  2011-03-30  7:46                       ` [Socketcan-users] " Kurt Van Dijck
       [not found]                       ` <4D923B00.20500-fJ+pQTUTwRTk1uMJSBkQmQ@public.gmane.org>
  2 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2011-03-30  7:05 UTC (permalink / raw)
  To: socketcan; +Cc: mkl, wg, socketcan-users, netdev

From: Oliver Hartkopp <socketcan@hartkopp.net>
Date: Tue, 29 Mar 2011 22:03:12 +0200

> Hm - the problem could be that people expect their frames to be sent 'in
> time', so if we increase the tx_queue_len, it's not transparent when the
> frames are potentially leaving the system - and if the application data is
> already out-dated when hitting the medium.
> 
> What about having up to three CAN frames in each CAN_RAW socket send buffer
> and e.g.50 frames in the tx_queue_len of the netdevice as a starting point?

Setting tx_queue_len low is bound to cause all kinds of problems.

This poll() peculiarity is just one such problem.

I would suggest increasing it for CAN devices to at least 100.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Socketcan-users] poll broken (for can)
  2011-03-29 20:03                     ` Oliver Hartkopp
  2011-03-30  7:05                       ` David Miller
@ 2011-03-30  7:46                       ` Kurt Van Dijck
       [not found]                       ` <4D923B00.20500-fJ+pQTUTwRTk1uMJSBkQmQ@public.gmane.org>
  2 siblings, 0 replies; 7+ messages in thread
From: Kurt Van Dijck @ 2011-03-30  7:46 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Marc Kleine-Budde, Netdev, socketcan-users, Wolfgang Grandegger

On Tue, Mar 29, 2011 at 10:03:12PM +0200, Oliver Hartkopp wrote:
> On 28.03.2011 21:32, Marc Kleine-Budde wrote:
> > On 03/28/2011 07:53 PM, Oliver Hartkopp wrote:
> >> On 28.03.2011 18:13, Marc Kleine-Budde wrote:
> >>> On 03/28/2011 05:55 PM, Wolfgang Grandegger wrote:

> > What do we really want to specify?
> 
> Hm - the problem could be that people expect their frames to be sent 'in
> time', so if we increase the tx_queue_len, it's not transparent when the
> frames are potentially leaving the system - and if the application data is
> already out-dated when hitting the medium.
> 
> What about having up to three CAN frames in each CAN_RAW socket send buffer
> and e.g.50 frames in the tx_queue_len of the netdevice as a starting point?
50 looks good, but 3 is a bit small IMHO.
> 
> 
> > 
> > Something like: queue up to X frames per socket and queue only Y frames
> > per device. Where Y = X * n and n is "I don't know yet"?
> > 
> > Y is simple, it's the tx_queue_len. But X is more complicated. The can
> > frames have non constant length (i.e. dlc) and I'm not sure that the
> > netdev people say if we misuse the sock_alloc_send_pskb() for our
> > tx-flow-control :)
> 
> I would propose to count the CAN frames independently from the can_dlc.
ack.
> AFAIK
> the tx_queue_len is dealing with skb's - and the skb->len for the socket send
> buffer is also size of struct can_frame, right?
ack.
> 
> Regards,
> Oliver
> 
> _______________________________________________
> Socketcan-users mailing list
> Socketcan-users@lists.berlios.de
> https://lists.berlios.de/mailman/listinfo/socketcan-users

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: poll broken (for can)
       [not found]                       ` <4D923B00.20500-fJ+pQTUTwRTk1uMJSBkQmQ@public.gmane.org>
@ 2011-03-30  8:53                         ` Marc Kleine-Budde
  0 siblings, 0 replies; 7+ messages in thread
From: Marc Kleine-Budde @ 2011-03-30  8:53 UTC (permalink / raw)
  To: Oliver Hartkopp
  Cc: Netdev-u79uwXL29TY76Z2rM5mHXA,
	socketcan-users-0fE9KPoRgkgATYTw5x5z8w, Wolfgang Grandegger


[-- Attachment #1.1: Type: text/plain, Size: 3883 bytes --]

On 03/29/2011 10:03 PM, Oliver Hartkopp wrote:
> On 28.03.2011 21:32, Marc Kleine-Budde wrote:
>> On 03/28/2011 07:53 PM, Oliver Hartkopp wrote:
>>> On 28.03.2011 18:13, Marc Kleine-Budde wrote:
>>>> On 03/28/2011 05:55 PM, Wolfgang Grandegger wrote:
>>>>>> BTW: I figured out why poll() wakes you up but the next write will fail
>>>>>> with -ENOBUFS again.
>>>>>
>>>>> Ah, I'm curious? I also did realize that poll does burn CPU cycles
>>>>> (instead of waiting).
>>>>
>>>> The poll callback checks if the used memory is less than the half of per
>>>> socket snd buffer (IIRC ~60K). See:
>>>>
>>>> datagram_poll (http://lxr.linux.no/linux+v2.6.38/net/core/datagram.c#L737)
>>>> sock_writeable (http://lxr.linux.no/linux+v2.6.38/include/net/sock.h#L1618)
>>>>
>>>> Because the size of a can frame (+the skb overhead) is much less then
>>>> the ethernet frame (+overhead) the default value for the snd buffer is
>>>> too big for can.
>>>>
>>>> We get the -ENOBUF from write() if the tx_queue_len (default 10) is
>>>> exceeded.
>>>>
>>>> http://lxr.linux.no/linux+v2.6.38/drivers/net/can/dev.c#L435
>>>> http://lxr.linux.no/linux+v2.6.38/net/can/af_can.c#L268
>>>>
>>>
>>> What would be your suggestion? Decreasing the socket send buffer for CAN by
>>> default?
>>
>> I haven't done any testing.....As far as I understand the code, we can
>> a) increase the default tx_queue_len and/or
>> b) decrease the default snd buffer size.
>>
>> Note: a) is a per device setting whereas b) is a per socket setting.
>>
>> With the current settings the -ENOBUF is triggered if we have X unsend
>> can frames (per device) where X equals the tx_queue_len. This means
>> using 5 applications, it about 2 queued (i.e. unsent) frames per app and
>> device.
>>
>> If we increase the tx_queue_len to a high value (via ifconfig), so that
>> the snd buffer is fully used, before the tx_queue_len is exceeded the
>> write system call will block, (or return -EAGAIN of opened non
>> blocking). At least the last time I've done this.
>>
>> I think solution b) would lead to a similar behavioural change.
>>
>> What do we really want to specify?
> 
> Hm - the problem could be that people expect their frames to be sent 'in
> time', so if we increase the tx_queue_len, it's not transparent when the
> frames are potentially leaving the system - and if the application data is
> already out-dated when hitting the medium.
> 
> What about having up to three CAN frames in each CAN_RAW socket send buffer
> and e.g.50 frames in the tx_queue_len of the netdevice as a starting point?

With current the tx_queue_len of 10 it's 10 frames for a single
application scenario. But I don't have any real world CAN experience.

>> Something like: queue up to X frames per socket and queue only Y frames
>> per device. Where Y = X * n and n is "I don't know yet"?
>>
>> Y is simple, it's the tx_queue_len. But X is more complicated. The can
>> frames have non constant length (i.e. dlc) and I'm not sure that the
>> netdev people say if we misuse the sock_alloc_send_pskb() for our
>> tx-flow-control :)

> I would propose to count the CAN frames independently from the can_dlc. AFAIK

Sure - That's what we want. The code has to be written, though :)

> the tx_queue_len is dealing with skb's - and the skb->len for the socket send
> buffer is also size of struct can_frame, right?

I don't know the code in depth, but I think the skb holding the outgoing
can frame is allocated (or at least accounted) from the snd buffer.
Maybe even more data structures.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

[-- Attachment #2: Type: text/plain, Size: 191 bytes --]

_______________________________________________
Socketcan-users mailing list
Socketcan-users-0fE9KPoRgkgATYTw5x5z8w@public.gmane.org
https://lists.berlios.de/mailman/listinfo/socketcan-users

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-03-30  8:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1301321142.9519.10.camel@lukonin-pc>
     [not found] ` <4D90A7E9.1080804@grandegger.com>
     [not found]   ` <4D90AA8A.9010804@pengutronix.de>
     [not found]     ` <4D90AF67.1080405@grandegger.com>
     [not found]       ` <4D90AF67.1080405-5Yr1BZd7O62+XT7JhA+gdA@public.gmane.org>
2011-03-28 16:13         ` poll broken (for can) (was: Re: Multiple programs trying to access the socket) Marc Kleine-Budde
     [not found]           ` <4D90B3B0.2010401-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
2011-03-28 17:53             ` poll broken (for can) Oliver Hartkopp
     [not found]               ` <4D90CB17.4030205-fJ+pQTUTwRTk1uMJSBkQmQ@public.gmane.org>
2011-03-28 19:32                 ` Marc Kleine-Budde
     [not found]                   ` <4D90E262.1090201-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
2011-03-29 20:03                     ` Oliver Hartkopp
2011-03-30  7:05                       ` David Miller
2011-03-30  7:46                       ` [Socketcan-users] " Kurt Van Dijck
     [not found]                       ` <4D923B00.20500-fJ+pQTUTwRTk1uMJSBkQmQ@public.gmane.org>
2011-03-30  8:53                         ` Marc Kleine-Budde

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).