netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible bugfix for AF_UNIX, SOCK_SEQPACKET sockets
@ 2012-02-14 22:05 Piergiorgio Beruto
  2012-02-15  6:25 ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Piergiorgio Beruto @ 2012-02-14 22:05 UTC (permalink / raw)
  To: davem; +Cc: netdev

Hello,
I would like to have your opinion regarding what I think could be a
misbehaviour of the SIOCINQ ioctl on AF_UNIX domain sockets.

I was implementing an IPC user-space library which makes use of a
variety of sockets for sending data all around.
When dealing with datagram sockets (i.e. SOCK_DGRAM, SOCK_RDM and
SOCK_SEQPACKET) I used to implement the following paradigm:

1) issue the SIOCINQ (aka FIONREAD) ioctl and get the "size of the
first queued datagram"
2) allocate memory of that size and fill it reading out data from the socket

This works very fine for UDP and TIPC (including SOCK_SEQPACKET)
sockets but when using AF_UNIX (SEQPACKET) sockets the same ioctl
returns instead the number of bytes queued (sum of all pending
datagrams) .

In short, there is no way (as far as I know) to work out how much
memory to allocate in userspace before reading out the datagram.
The workaround for this is to use a fixed swap buffer of the max
possible size of your packets (assuming you know / can retrieve it
somehow) and make a copy of your data (which is bad for performance
anyway).

Looking at the kernel code I've found this in the
linux/net/unix/af_unix.c file (my comment with ////)

case SIOCINQ:
	{
	        struct sk_buff *skb;
	
	        if (sk->sk_state == TCP_LISTEN) {
	                err = -EINVAL;
	                break;
	        }
	
	        spin_lock(&sk->sk_receive_queue.lock);
	        if (sk->sk_type == SOCK_STREAM ||
	            sk->sk_type == SOCK_SEQPACKET) {           //// <--------
treats SEQPACKET as SOCK_STREAM
	                skb_queue_walk(&sk->sk_receive_queue, skb)
	                        amount += skb->len;
	        } else {
	                skb = skb_peek(&sk->sk_receive_queue);
	                if (skb)
	                        amount = skb->len;
	        }
	        spin_unlock(&sk->sk_receive_queue.lock);
	        err = put_user(amount, (int __user *)arg);
	        break;
	}

In my opinion SOCK_SEQPACKETS are similar to SOCK_STREAM because they
are "connection oriented" sockets but from the I/O point of view are
more similar to datagram ones.

My proposal is to remove the above condition from the test and let the
function process SOCK_SEQPACKETS as if they were SOCK_DGRAM for this
particular ioctl.

Thank you for your time,
Regards,

PIergiorgio

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible bugfix for AF_UNIX, SOCK_SEQPACKET sockets
  2012-02-14 22:05 Possible bugfix for AF_UNIX, SOCK_SEQPACKET sockets Piergiorgio Beruto
@ 2012-02-15  6:25 ` Eric Dumazet
  2012-02-15 10:43   ` Piergiorgio Beruto
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2012-02-15  6:25 UTC (permalink / raw)
  To: Piergiorgio Beruto; +Cc: davem, netdev

Le mardi 14 février 2012 à 23:05 +0100, Piergiorgio Beruto a écrit :
> Hello,
> I would like to have your opinion regarding what I think could be a
> misbehaviour of the SIOCINQ ioctl on AF_UNIX domain sockets.
> 
> I was implementing an IPC user-space library which makes use of a
> variety of sockets for sending data all around.
> When dealing with datagram sockets (i.e. SOCK_DGRAM, SOCK_RDM and
> SOCK_SEQPACKET) I used to implement the following paradigm:
> 
> 1) issue the SIOCINQ (aka FIONREAD) ioctl and get the "size of the
> first queued datagram"
> 2) allocate memory of that size and fill it reading out data from the socket
> 
> This works very fine for UDP and TIPC (including SOCK_SEQPACKET)
> sockets but when using AF_UNIX (SEQPACKET) sockets the same ioctl
> returns instead the number of bytes queued (sum of all pending
> datagrams) .
> 
> In short, there is no way (as far as I know) to work out how much
> memory to allocate in userspace before reading out the datagram.
> The workaround for this is to use a fixed swap buffer of the max
> possible size of your packets (assuming you know / can retrieve it
> somehow) and make a copy of your data (which is bad for performance
> anyway).
> 

I dont understand.... SIOCINQ returns an upper limit to you.

So your app can malloc() a big enough buffer.

Yes, SEQPACKET preserves message boundaries so your recvmsg() can fill
part of the application buffer.

> Looking at the kernel code I've found this in the
> linux/net/unix/af_unix.c file (my comment with ////)
> 
> case SIOCINQ:
> 	{
> 	        struct sk_buff *skb;
> 	
> 	        if (sk->sk_state == TCP_LISTEN) {
> 	                err = -EINVAL;
> 	                break;
> 	        }
> 	
> 	        spin_lock(&sk->sk_receive_queue.lock);
> 	        if (sk->sk_type == SOCK_STREAM ||
> 	            sk->sk_type == SOCK_SEQPACKET) {           //// <--------
> treats SEQPACKET as SOCK_STREAM
> 	                skb_queue_walk(&sk->sk_receive_queue, skb)
> 	                        amount += skb->len;
> 	        } else {
> 	                skb = skb_peek(&sk->sk_receive_queue);
> 	                if (skb)
> 	                        amount = skb->len;
> 	        }
> 	        spin_unlock(&sk->sk_receive_queue.lock);
> 	        err = put_user(amount, (int __user *)arg);
> 	        break;
> 	}
> 
> In my opinion SOCK_SEQPACKETS are similar to SOCK_STREAM because they
> are "connection oriented" sockets but from the I/O point of view are
> more similar to datagram ones.
> 
> My proposal is to remove the above condition from the test and let the
> function process SOCK_SEQPACKETS as if they were SOCK_DGRAM for this
> particular ioctl.
> 

This behavior is undocumented in "man unix", so it is "implementation
dependant"

This change could break some applications.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible bugfix for AF_UNIX, SOCK_SEQPACKET sockets
  2012-02-15  6:25 ` Eric Dumazet
@ 2012-02-15 10:43   ` Piergiorgio Beruto
  2012-02-15 12:42     ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Piergiorgio Beruto @ 2012-02-15 10:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev

Il 15 febbraio 2012 07:25, Eric Dumazet <eric.dumazet@gmail.com> ha scritto:
> Le mardi 14 février 2012 à 23:05 +0100, Piergiorgio Beruto a écrit :
>> Hello,
>> I would like to have your opinion regarding what I think could be a
>> misbehaviour of the SIOCINQ ioctl on AF_UNIX domain sockets.
>>
>> I was implementing an IPC user-space library which makes use of a
>> variety of sockets for sending data all around.
>> When dealing with datagram sockets (i.e. SOCK_DGRAM, SOCK_RDM and
>> SOCK_SEQPACKET) I used to implement the following paradigm:
>>
>> 1) issue the SIOCINQ (aka FIONREAD) ioctl and get the "size of the
>> first queued datagram"
>> 2) allocate memory of that size and fill it reading out data from the socket
>>
>> This works very fine for UDP and TIPC (including SOCK_SEQPACKET)
>> sockets but when using AF_UNIX (SEQPACKET) sockets the same ioctl
>> returns instead the number of bytes queued (sum of all pending
>> datagrams) .
>>
>> In short, there is no way (as far as I know) to work out how much
>> memory to allocate in userspace before reading out the datagram.
>> The workaround for this is to use a fixed swap buffer of the max
>> possible size of your packets (assuming you know / can retrieve it
>> somehow) and make a copy of your data (which is bad for performance
>> anyway).
>>
>
> I dont understand.... SIOCINQ returns an upper limit to you.
Yes, there's nothing that "doesn't work", it's a matter of performance
(I am working on strong embedded so I'm quite concerned about both
memory usage and "speed").

The problem is that when the socket queue is filled with short sized
packets, dequeue operation would allocate a lot of "big" chunks of
memory, progressively smaller (first one the size of the queue, second
one = queue size - first packet size and so on).

Besides the waste of memory, you get less perfromance as malloc()
would use the heap or mmap() depending on the size of the chunk
(usually 64 bytes) and  the use of mmap is more memory efficient but
quite slower.

>
> So your app can malloc() a big enough buffer.
>
> Yes, SEQPACKET preserves message boundaries so your recvmsg() can fill
> part of the application buffer.
>
>> Looking at the kernel code I've found this in the
>> linux/net/unix/af_unix.c file (my comment with ////)
>>
>> case SIOCINQ:
>>       {
>>               struct sk_buff *skb;
>>
>>               if (sk->sk_state == TCP_LISTEN) {
>>                       err = -EINVAL;
>>                       break;
>>               }
>>
>>               spin_lock(&sk->sk_receive_queue.lock);
>>               if (sk->sk_type == SOCK_STREAM ||
>>                   sk->sk_type == SOCK_SEQPACKET) {           //// <--------
>> treats SEQPACKET as SOCK_STREAM
>>                       skb_queue_walk(&sk->sk_receive_queue, skb)
>>                               amount += skb->len;
>>               } else {
>>                       skb = skb_peek(&sk->sk_receive_queue);
>>                       if (skb)
>>                               amount = skb->len;
>>               }
>>               spin_unlock(&sk->sk_receive_queue.lock);
>>               err = put_user(amount, (int __user *)arg);
>>               break;
>>       }
>>
>> In my opinion SOCK_SEQPACKETS are similar to SOCK_STREAM because they
>> are "connection oriented" sockets but from the I/O point of view are
>> more similar to datagram ones.
>>
>> My proposal is to remove the above condition from the test and let the
>> function process SOCK_SEQPACKETS as if they were SOCK_DGRAM for this
>> particular ioctl.
>>
>
> This behavior is undocumented in "man unix", so it is "implementation
> dependant"
>
> This change could break some applications.
Ok, that's why I asked :) But if you agree with my objection regarding
performance, what about adding a brand new (linux only) ioctl which
implements the other behaviour? So the code would look something like
this:

case SIOCINQ:
case SIOCPSZ:     //// new packet size ioctl
       {
               struct sk_buff *skb;

               if (sk->sk_state == TCP_LISTEN) {
                       err = -EINVAL;
                       break;
               }

               spin_lock(&sk->sk_receive_queue.lock);
               if ((sk->sk_type == SOCK_STREAM ||
                   sk->sk_type == SOCK_SEQPACKET) &&
                   ioctl_code != SIOCPSZ) {       //// have SIOCPSZ
behave as for datagram sockets
                       skb_queue_walk(&sk->sk_receive_queue, skb)
                               amount += skb->len;
               } else {
                       skb = skb_peek(&sk->sk_receive_queue);
                       if (skb)
                               amount = skb->len;
               }
               spin_unlock(&sk->sk_receive_queue.lock);
               err = put_user(amount, (int __user *)arg);
               break;
       }

Thank you again,
Piergiorgio

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible bugfix for AF_UNIX, SOCK_SEQPACKET sockets
  2012-02-15 10:43   ` Piergiorgio Beruto
@ 2012-02-15 12:42     ` Eric Dumazet
  2012-02-15 17:36       ` Piergiorgio Beruto
  2012-02-15 19:55       ` David Miller
  0 siblings, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2012-02-15 12:42 UTC (permalink / raw)
  To: Piergiorgio Beruto; +Cc: davem, netdev

Le mercredi 15 février 2012 à 11:43 +0100, Piergiorgio Beruto a écrit :

> Yes, there's nothing that "doesn't work", it's a matter of performance
> (I am working on strong embedded so I'm quite concerned about both
> memory usage and "speed").
> 
> The problem is that when the socket queue is filled with short sized
> packets, dequeue operation would allocate a lot of "big" chunks of
> memory, progressively smaller (first one the size of the queue, second
> one = queue size - first packet size and so on).
> 
> Besides the waste of memory, you get less perfromance as malloc()
> would use the heap or mmap() depending on the size of the chunk
> (usually 64 bytes) and  the use of mmap is more memory efficient but
> quite slower.

I see

> Ok, that's why I asked :) But if you agree with my objection regarding
> performance, what about adding a brand new (linux only) ioctl which
> implements the other behaviour? So the code would look something like
> this:
> 
> case SIOCINQ:
> case SIOCPSZ:     //// new packet size ioctl
>        {
>                struct sk_buff *skb;
> 
>                if (sk->sk_state == TCP_LISTEN) {
>                        err = -EINVAL;
>                        break;
>                }
> 
>                spin_lock(&sk->sk_receive_queue.lock);
>                if ((sk->sk_type == SOCK_STREAM ||
>                    sk->sk_type == SOCK_SEQPACKET) &&
>                    ioctl_code != SIOCPSZ) {       //// have SIOCPSZ
> behave as for datagram sockets
>                        skb_queue_walk(&sk->sk_receive_queue, skb)
>                                amount += skb->len;
>                } else {
>                        skb = skb_peek(&sk->sk_receive_queue);
>                        if (skb)
>                                amount = skb->len;
>                }
>                spin_unlock(&sk->sk_receive_queue.lock);
>                err = put_user(amount, (int __user *)arg);
>                break;
>        }

ioctl() are deprecated, and such a change has ramification (for example
on strace tool)

You could use a recvmsg( ... MSG_PEEK|MSG_TRUNC) to get size of next
packet (passing a small buffer)

Ah... it seems af_unix doesnt handle MSG_TRUNC semantic as
UDP/RAW/NETLINK sockets do.


diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 85d3bb7..70d9414 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1824,7 +1824,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 		if (UNIXCB(skb).fp)
 			siocb->scm->fp = scm_fp_dup(UNIXCB(skb).fp);
 	}
-	err = size;
+	err = (flags & MSG_TRUNC) ? skb->len : size;
 
 	scm_recv(sock, msg, siocb->scm, flags);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Possible bugfix for AF_UNIX, SOCK_SEQPACKET sockets
  2012-02-15 12:42     ` Eric Dumazet
@ 2012-02-15 17:36       ` Piergiorgio Beruto
  2012-02-15 18:16         ` Eric Dumazet
  2012-02-15 19:55       ` David Miller
  1 sibling, 1 reply; 11+ messages in thread
From: Piergiorgio Beruto @ 2012-02-15 17:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev

Ok, I take the recvmsg( ... MSG_PEEK|MSG_TRUNC) *USUALLY* returns the
size of the next packet (which would solve my problem without any ugly
& deprecated ioctl) when passing a "too small buffer" but as you said
AF_UNIX sockets do not implement this behavior...

Are you proposing to implement such patch in the kernel to fix this?
Do you want me to test it?


Il 15 febbraio 2012 13:42, Eric Dumazet <eric.dumazet@gmail.com> ha scritto:
> Le mercredi 15 février 2012 à 11:43 +0100, Piergiorgio Beruto a écrit :
>
>> Yes, there's nothing that "doesn't work", it's a matter of performance
>> (I am working on strong embedded so I'm quite concerned about both
>> memory usage and "speed").
>>
>> The problem is that when the socket queue is filled with short sized
>> packets, dequeue operation would allocate a lot of "big" chunks of
>> memory, progressively smaller (first one the size of the queue, second
>> one = queue size - first packet size and so on).
>>
>> Besides the waste of memory, you get less perfromance as malloc()
>> would use the heap or mmap() depending on the size of the chunk
>> (usually 64 bytes) and  the use of mmap is more memory efficient but
>> quite slower.
>
> I see
>
>> Ok, that's why I asked :) But if you agree with my objection regarding
>> performance, what about adding a brand new (linux only) ioctl which
>> implements the other behaviour? So the code would look something like
>> this:
>>
>> case SIOCINQ:
>> case SIOCPSZ:     //// new packet size ioctl
>>        {
>>                struct sk_buff *skb;
>>
>>                if (sk->sk_state == TCP_LISTEN) {
>>                        err = -EINVAL;
>>                        break;
>>                }
>>
>>                spin_lock(&sk->sk_receive_queue.lock);
>>                if ((sk->sk_type == SOCK_STREAM ||
>>                    sk->sk_type == SOCK_SEQPACKET) &&
>>                    ioctl_code != SIOCPSZ) {       //// have SIOCPSZ
>> behave as for datagram sockets
>>                        skb_queue_walk(&sk->sk_receive_queue, skb)
>>                                amount += skb->len;
>>                } else {
>>                        skb = skb_peek(&sk->sk_receive_queue);
>>                        if (skb)
>>                                amount = skb->len;
>>                }
>>                spin_unlock(&sk->sk_receive_queue.lock);
>>                err = put_user(amount, (int __user *)arg);
>>                break;
>>        }
>
> ioctl() are deprecated, and such a change has ramification (for example
> on strace tool)
>
> You could use a recvmsg( ... MSG_PEEK|MSG_TRUNC) to get size of next
> packet (passing a small buffer)
>
> Ah... it seems af_unix doesnt handle MSG_TRUNC semantic as
> UDP/RAW/NETLINK sockets do.
>
>
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 85d3bb7..70d9414 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -1824,7 +1824,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
>                if (UNIXCB(skb).fp)
>                        siocb->scm->fp = scm_fp_dup(UNIXCB(skb).fp);
>        }
> -       err = size;
> +       err = (flags & MSG_TRUNC) ? skb->len : size;
>
>        scm_recv(sock, msg, siocb->scm, flags);
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible bugfix for AF_UNIX, SOCK_SEQPACKET sockets
  2012-02-15 17:36       ` Piergiorgio Beruto
@ 2012-02-15 18:16         ` Eric Dumazet
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2012-02-15 18:16 UTC (permalink / raw)
  To: Piergiorgio Beruto; +Cc: davem, netdev

Le mercredi 15 février 2012 à 18:36 +0100, Piergiorgio Beruto a écrit :
> Ok, I take the recvmsg( ... MSG_PEEK|MSG_TRUNC) *USUALLY* returns the
> size of the next packet (which would solve my problem without any ugly
> & deprecated ioctl) when passing a "too small buffer" but as you said
> AF_UNIX sockets do not implement this behavior...
> 
> Are you proposing to implement such patch in the kernel to fix this?
> Do you want me to test it?
> 

Yes please test it and I'll submit it.

Thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible bugfix for AF_UNIX, SOCK_SEQPACKET sockets
  2012-02-15 12:42     ` Eric Dumazet
  2012-02-15 17:36       ` Piergiorgio Beruto
@ 2012-02-15 19:55       ` David Miller
  2012-02-18 21:23         ` Piergiorgio Beruto
  2012-02-22  9:24         ` [PATCH net-next] af_unix: MSG_TRUNC support for dgram sockets Eric Dumazet
  1 sibling, 2 replies; 11+ messages in thread
From: David Miller @ 2012-02-15 19:55 UTC (permalink / raw)
  To: eric.dumazet; +Cc: piergiorgio.beruto, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 15 Feb 2012 13:42:07 +0100

> @@ -1824,7 +1824,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
>  		if (UNIXCB(skb).fp)
>  			siocb->scm->fp = scm_fp_dup(UNIXCB(skb).fp);
>  	}
> -	err = size;
> +	err = (flags & MSG_TRUNC) ? skb->len : size;
>  
>  	scm_recv(sock, msg, siocb->scm, flags);

I'm ok with this, but if it is found to break even one userland application
I'm reverting with impunity.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Possible bugfix for AF_UNIX, SOCK_SEQPACKET sockets
  2012-02-15 19:55       ` David Miller
@ 2012-02-18 21:23         ` Piergiorgio Beruto
  2012-02-22  9:24         ` [PATCH net-next] af_unix: MSG_TRUNC support for dgram sockets Eric Dumazet
  1 sibling, 0 replies; 11+ messages in thread
From: Piergiorgio Beruto @ 2012-02-18 21:23 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev

Hi,
I've tested the patch on my Linux box running kernel 3.2.6 and rich of
userspace apps and everything seems to work fine.

Then I used the following test program to verify the patch effects:

// ---------------------------------------------------

#include <sys/types.h>
#include <sys/socket.h>
#include <sys/ioctl.h>
#include <sys/un.h>
#include <stdio.h>
#include <errno.h>

int main()
{
   int sv[2];
   if(socketpair(AF_UNIX, SOCK_SEQPACKET, 0, sv) != 0)
   {
      printf("socketpair() error %d\n", errno);
      return 1;
   }

   // write two messages of size 3 and 2
   const char msg[] = "abc";
   send(sv[1], msg, 2, 0);
   send(sv[1], msg, 3, 0);

   int sz = 0;
   if(ioctl(sv[0], FIONREAD, &sz) != 0)
   {
      printf("ioctl() error %d\n", errno);
      return 1;
   }

   printf("ioctl(SIOCINQ) returned %d\n", sz);

   char dummy;
   sz = recv(sv[0], &dummy, 0, MSG_PEEK | MSG_TRUNC);
   printf("recv(MSG_TRUNC) returned %d\n", sz);

   return 0;
}

// ----------------------------------------------------------

the output of this program with the patched kernel is, as expected:
ioctl(SIOCINQ) returned 5
recv(MSG_TRUNC) returned 2

which validates the test.

Furthermore, I found this in the man page of recv:

MSG_TRUNC (since Linux 2.2)
              For raw (AF_PACKET), Internet datagram (since Linux
2.4.27/2.6.8), and netlink (since Linux 2.6.22) sockets: return
              the  real  length  of  the packet or datagram, even when
it was longer than the passed buffer.  Not implemented for
              UNIX domain (unix(7)) sockets.

The "not implemented for unix domain sockets" sounds to me such as no
userland applications should have used this flag, but I wonder why it
was not implemented since kernel 2.2 (?)

Thank you again,
Regards,

Piergiorgio


Il 15 febbraio 2012 20:55, David Miller <davem@davemloft.net> ha scritto:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 15 Feb 2012 13:42:07 +0100
>
>> @@ -1824,7 +1824,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
>>               if (UNIXCB(skb).fp)
>>                       siocb->scm->fp = scm_fp_dup(UNIXCB(skb).fp);
>>       }
>> -     err = size;
>> +     err = (flags & MSG_TRUNC) ? skb->len : size;
>>
>>       scm_recv(sock, msg, siocb->scm, flags);
>
> I'm ok with this, but if it is found to break even one userland application
> I'm reverting with impunity.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next] af_unix: MSG_TRUNC support for dgram sockets
  2012-02-15 19:55       ` David Miller
  2012-02-18 21:23         ` Piergiorgio Beruto
@ 2012-02-22  9:24         ` Eric Dumazet
  2012-02-22 19:50           ` David Miller
  2012-12-20 16:50           ` Michael Kerrisk (man-pages)
  1 sibling, 2 replies; 11+ messages in thread
From: Eric Dumazet @ 2012-02-22  9:24 UTC (permalink / raw)
  To: David Miller; +Cc: piergiorgio.beruto, netdev, Michael Kerrisk

Piergiorgio Beruto expressed the need to fetch size of first datagram in
queue for AF_UNIX sockets and suggested a patch against SIOCINQ ioctl.

I suggested instead to implement MSG_TRUNC support as a recv() input
flag, as already done for RAW, UDP & NETLINK sockets.

len = recv(fd, &byte, 1, MSG_PEEK | MSG_TRUNC);

MSG_TRUNC asks recv() to return the real length of the packet, even when
is was longer than the passed buffer.

There is risk that a userland application used MSG_TRUNC by accident
(since it had no effect on af_unix sockets) and this might break after
this patch.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
---
 net/unix/af_unix.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 0be4d24..8ee85aa 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1845,7 +1845,7 @@ static int unix_dgram_recvmsg(struct kiocb *iocb, struct socket *sock,
 		if (UNIXCB(skb).fp)
 			siocb->scm->fp = scm_fp_dup(UNIXCB(skb).fp);
 	}
-	err = size;
+	err = (flags & MSG_TRUNC) ? skb->len - skip : size;
 
 	scm_recv(sock, msg, siocb->scm, flags);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next] af_unix: MSG_TRUNC support for dgram sockets
  2012-02-22  9:24         ` [PATCH net-next] af_unix: MSG_TRUNC support for dgram sockets Eric Dumazet
@ 2012-02-22 19:50           ` David Miller
  2012-12-20 16:50           ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 11+ messages in thread
From: David Miller @ 2012-02-22 19:50 UTC (permalink / raw)
  To: eric.dumazet; +Cc: piergiorgio.beruto, netdev, mtk.manpages

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 22 Feb 2012 10:24:55 +0100

> Piergiorgio Beruto expressed the need to fetch size of first datagram in
> queue for AF_UNIX sockets and suggested a patch against SIOCINQ ioctl.
> 
> I suggested instead to implement MSG_TRUNC support as a recv() input
> flag, as already done for RAW, UDP & NETLINK sockets.
> 
> len = recv(fd, &byte, 1, MSG_PEEK | MSG_TRUNC);
> 
> MSG_TRUNC asks recv() to return the real length of the packet, even when
> is was longer than the passed buffer.
> 
> There is risk that a userland application used MSG_TRUNC by accident
> (since it had no effect on af_unix sockets) and this might break after
> this patch.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Tested-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
> CC: Michael Kerrisk <mtk.manpages@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next] af_unix: MSG_TRUNC support for dgram sockets
  2012-02-22  9:24         ` [PATCH net-next] af_unix: MSG_TRUNC support for dgram sockets Eric Dumazet
  2012-02-22 19:50           ` David Miller
@ 2012-12-20 16:50           ` Michael Kerrisk (man-pages)
  1 sibling, 0 replies; 11+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-12-20 16:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, piergiorgio.beruto, netdev

On Wed, Feb 22, 2012 at 10:24 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Piergiorgio Beruto expressed the need to fetch size of first datagram in
> queue for AF_UNIX sockets and suggested a patch against SIOCINQ ioctl.
>
> I suggested instead to implement MSG_TRUNC support as a recv() input
> flag, as already done for RAW, UDP & NETLINK sockets.
>
> len = recv(fd, &byte, 1, MSG_PEEK | MSG_TRUNC);
>
> MSG_TRUNC asks recv() to return the real length of the packet, even when
> is was longer than the passed buffer.
>
> There is risk that a userland application used MSG_TRUNC by accident
> (since it had no effect on af_unix sockets) and this might break after
> this patch.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Tested-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com>
> CC: Michael Kerrisk <mtk.manpages@gmail.com>

Patch below to man-pages applied.

Thanks for CCing me, Eric.

Cheers,

Michael

--- a/man2/recv.2
+++ b/man2/recv.2
@@ -264,7 +264,7 @@ subsequent receive call will return the same data.
 For raw
 .RB ( AF_PACKET ),
 Internet datagram (since Linux 2.4.27/2.6.8),
-and netlink (since Linux 2.6.22) sockets:
+netlink (since Linux 2.6.22) and UNIX datagram (since Linux 3.4) sockets:
 return the real length of the packet or datagram,
 even when it was longer than the passed buffer.
 Not implemented for UNIX domain

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-12-20 16:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-14 22:05 Possible bugfix for AF_UNIX, SOCK_SEQPACKET sockets Piergiorgio Beruto
2012-02-15  6:25 ` Eric Dumazet
2012-02-15 10:43   ` Piergiorgio Beruto
2012-02-15 12:42     ` Eric Dumazet
2012-02-15 17:36       ` Piergiorgio Beruto
2012-02-15 18:16         ` Eric Dumazet
2012-02-15 19:55       ` David Miller
2012-02-18 21:23         ` Piergiorgio Beruto
2012-02-22  9:24         ` [PATCH net-next] af_unix: MSG_TRUNC support for dgram sockets Eric Dumazet
2012-02-22 19:50           ` David Miller
2012-12-20 16:50           ` Michael Kerrisk (man-pages)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).