All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Dawson <matthew@mjdsystems.ca>
To: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org, "Macieira,
	Thiago" <thiago.macieira@intel.com>,
	willemdebruijn.kernel@gmail.com
Subject: Re: [PATCH net] datagram: When peeking datagrams with offset < 0 don't skip empty skbs
Date: Mon, 14 Aug 2017 10:05:08 -0400	[thread overview]
Message-ID: <1846443.VTclhqQinN@ring00> (raw)
In-Reply-To: <1502702846.8411.25.camel@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4132 bytes --]

On Monday, August 14, 2017 5:27:26 AM EDT Paolo Abeni wrote:
> On Mon, 2017-08-14 at 01:52 -0400, Matthew Dawson wrote:
> > Due to commit e6afc8ace6dd5cef5e812f26c72579da8806f5ac ("udp: remove
> > headers from UDP packets before queueing"), when udp packets are being
> > peeked the requested extra offset is always 0 as there is no need to skip
> > the udp header.  However, when the offset is 0 and the next skb is
> > of length 0, it is only returned once.  The behaviour can be seen with
> > the following python script:
> > 
> > #!/usr/bin/env python3
> > from socket import *;
> > f=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
> > g=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
> > f.bind(('::', 0));
> > addr=('::1', f.getsockname()[1]);
> > g.sendto(b'', addr)
> > g.sendto(b'b', addr)
> > print(f.recvfrom(10, MSG_PEEK));
> > print(f.recvfrom(10, MSG_PEEK));
> > 
> > Where the expected output should be the empty string twice.
> > 
> > Instead, make sk_peek_offset return negative values, and pass those values
> > to __skb_try_recv_datagram/__skb_try_recv_from_queue.  If the passed
> > offset
> > to __skb_try_recv_from_queue is negative, the checked skb is never
> > skipped.
> > After the call, the offset is set to 0 if negative to ensure all further
> > calculations are correct.
> > 
> > Also simplify the if condition in __skb_try_recv_from_queue.  If _off is
> > greater then 0, and off is greater then or equal to skb->len, then (_off
> > ||
> > skb->len) must always be true assuming skb->len >= 0 is always true.
> > 
> > Also remove a redundant check around a call to sk_peek_offset in
> > af_unix.c,
> > as it double checked if MSG_PEEK was set in the flags.
> > 
> > Signed-off-by: Matthew Dawson <matthew@mjdsystems.ca>
> > ---
> > 
> >  include/net/sock.h  | 4 +---
> >  net/core/datagram.c | 4 ++--
> >  net/ipv4/udp.c      | 4 ++++
> >  net/ipv6/udp.c      | 4 ++++
> >  net/unix/af_unix.c  | 8 +++++---
> >  5 files changed, 16 insertions(+), 8 deletions(-)
> > 
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 7c0632c7e870..aeeec62992ca 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -507,9 +507,7 @@ int sk_set_peek_off(struct sock *sk, int val);
> > 
> >  static inline int sk_peek_offset(struct sock *sk, int flags)
> >  {
> >  
> >  	if (unlikely(flags & MSG_PEEK)) {
> > 
> > -		s32 off = READ_ONCE(sk->sk_peek_off);
> > -		if (off >= 0)
> > -			return off;
> > +		return READ_ONCE(sk->sk_peek_off);
> > 
> >  	}
> >  	
> >  	return 0;
> 
> You probably want/must also update sk_set_peek_off() to allow negative
> values, elsewhere this will break as soon as the user will do
> SOL_SOCKET/SO_PEEK_OFF.
I'm happy to do this either in this patch or as a second patch and make this a 
series, whatever the networking community prefers.

I'm actually surprised that only unix sockets can have negative values.  Is 
there a reason for that?  I had assumed that sk_set_peek_off would allow 
negative values as the code already has to support negative values due to what 
the initial value is.

> I'm wondering adding an explicit SOCK_PEEK_OFF/MSG_PEEK_OFF socket flag
> would help simplyifing the code: no need for negative offset; set such
> flag when SOL_SOCKET/SO_PEEK_OFF with a non negative value is called
> (and clear it when a negative value is used), forward such flag to
> __skb_try_recv_datagram/__skb_try_recv_from_queue and use it to select
> the proper peek behaviour.
The negative value is documented and supported for unix sockets, so I don't 
think we can just reject all negative values.  However, I do see a way to 
implement that while supporting user space sending negative values.  If that 
is preferred let me know and I'll see what I can make.

I assume that would make this patch target net-next?  Would it be possible to 
pull this into net so it can fix this regression for the next kernel release, 
while I work on getting the better solution finished?  I'm happy to to make 
__skb_try_recv_datagram/__skb_try_recv_from_queue accept the extra parameter 
so they don't see an offset less then 0 in this patch.

-- 
Matthew

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2017-08-14 14:05 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-14  5:52 [PATCH net] datagram: When peeking datagrams with offset < 0 don't skip empty skbs Matthew Dawson
2017-08-14  9:27 ` Paolo Abeni
2017-08-14 14:05   ` Matthew Dawson [this message]
2017-08-14 15:03     ` Willem de Bruijn
2017-08-14 15:31       ` Paolo Abeni
2017-08-15  1:35         ` Willem de Bruijn
2017-08-15 15:40           ` Paolo Abeni
2017-08-15 16:45             ` Willem de Bruijn
2017-08-15 17:00               ` Willem de Bruijn
2017-08-16  9:28                 ` Paolo Abeni
2017-08-16 15:18                   ` Willem de Bruijn
2017-08-16 20:20                     ` Paolo Abeni
2017-08-16 23:27                       ` Willem de Bruijn
2017-08-16 23:40                         ` Willem de Bruijn
2017-08-16 23:55                         ` Thiago Macieira
2017-08-17  0:10                           ` Willem de Bruijn
2017-08-17  9:15                         ` David Laight
2017-08-17 14:37                           ` Willem de Bruijn
2017-08-17 15:47                         ` Matthew Dawson
2017-08-17 16:45                           ` Willem de Bruijn
2017-08-15 18:17               ` Paolo Abeni
2017-08-14 16:06       ` Thiago Macieira
2017-08-14 16:33         ` Willem de Bruijn
2017-08-14 17:02           ` Thiago Macieira
2017-08-14 18:25             ` Willem de Bruijn
2017-08-14 18:33               ` Thiago Macieira
2017-08-14 18:46                 ` Willem de Bruijn
2017-08-14 18:58                   ` Thiago Macieira
2017-08-14 19:03                     ` Willem de Bruijn
2017-08-14 19:15                       ` Thiago Macieira
2017-08-14 19:39                         ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1846443.VTclhqQinN@ring00 \
    --to=matthew@mjdsystems.ca \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=thiago.macieira@intel.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.