From: Paolo Abeni <pabeni@redhat.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Matthew Dawson <matthew@mjdsystems.ca>,
Network Development <netdev@vger.kernel.org>,
"Macieira, Thiago" <thiago.macieira@intel.com>
Subject: Re: [PATCH net] datagram: When peeking datagrams with offset < 0 don't skip empty skbs
Date: Wed, 16 Aug 2017 11:28:42 +0200 [thread overview]
Message-ID: <1502875722.3548.11.camel@redhat.com> (raw)
In-Reply-To: <CAF=yD-+jBim==ktwYUNAFkn-N4eabqgNME8NXM77v9-U_bfWPg@mail.gmail.com>
On Tue, 2017-08-15 at 13:00 -0400, Willem de Bruijn wrote:
> > There is another difference between reading sk_peek_offset in the
> > caller or in __skb_try_recv_from_queue. The latter is called repeatedly
> > when it returns NULL. Each call can modify *off. I believe that it needs
> > to restart with _off at sk->sk_peek_off each time, as it restarts from the
> > head of the queue each time.
>
> I made a mistake here. *off is not updated when returning NULL.
>
> In that case, it is better to read sk_peek_offset once, than to read
> it each time __skb_try_recv_from_queue is entered.
If I read the above correctly, you are arguining in favor of the
addittional flag version, right?
Regarding the MSG flag exaustion, there are a bunch of flags defined
but apparently unused (MSG_FIN, MSG_SYN, MSG_RST) since long time (if
I'm not too low on coffee).
We can shadow one of them (or ev. drop the above defines, if really
unused).
I think that the MSG_PEEK_OFF should be explicitly cleared in
sk_peek_offset() when the sk_peek_off is negative, to avoid beeing
fooled by stray bits passed by the us, like in the following:
---
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 8b13db5163cc..3b7f53b9cc08 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -286,6 +286,7 @@ struct ucred {
#define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */
#define MSG_BATCH 0x40000 /* sendmmsg(): more messages coming */
#define MSG_EOF MSG_FIN
+#define MSG_PEEK_OFF MSG_FIN /* Peeking with offset for datagram sockets */
#define MSG_FASTOPEN 0x20000000 /* Send data in TCP SYN */
#define MSG_CMSG_CLOEXEC 0x40000000 /* Set close_on_exec for file
diff --git a/include/net/sock.h b/include/net/sock.h
index 7c0632c7e870..452f9aac2e6a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -504,12 +504,17 @@ enum sk_pacing {
int sk_set_peek_off(struct sock *sk, int val);
-static inline int sk_peek_offset(struct sock *sk, int flags)
+static inline int sk_peek_offset(struct sock *sk, int *flags)
{
- if (unlikely(flags & MSG_PEEK)) {
+ if (unlikely(*flags & MSG_PEEK)) {
s32 off = READ_ONCE(sk->sk_peek_off);
- if (off >= 0)
+ if (off >= 0) {
+ *flags |= MSG_PEEK_OFF;
return off;
+ }
+
+ /* clear evental stray bits in the user-provided bitmask */
+ *flags &= ~MSG_PEEK_OFF;
}
return 0;
diff --git a/net/core/datagram.c b/net/core/datagram.c
index ee5647bd91b3..91e1d014d64c 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -175,8 +175,8 @@ struct sk_buff *__skb_try_recv_from_queue(struct sock *sk,
*last = queue->prev;
skb_queue_walk(queue, skb) {
if (flags & MSG_PEEK) {
- if (_off >= skb->len && (skb->len || _off ||
- skb->peeked)) {
+ if (flags & MSG_PEEK_OFF && _off >= skb->len &&
+ (skb->len || _off || skb->peeked)) {
_off -= skb->len;
continue;
}
diff --git a/net/core/sock.c b/net/core/sock.c
index ac2a404c73eb..0c123540b02f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2408,9 +2408,7 @@ EXPORT_SYMBOL(__sk_mem_reclaim);
int sk_set_peek_off(struct sock *sk, int val)
{
- if (val < 0)
- return -EINVAL;
-
+ /* a negative value will disable peeking with offset */
sk->sk_peek_off = val;
return 0;
}
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index a7c804f73990..7e1bcd3500f4 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1574,7 +1574,7 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
return ip_recv_error(sk, msg, len, addr_len);
try_again:
- peeking = off = sk_peek_offset(sk, flags);
+ peeking = off = sk_peek_offset(sk, &flags);
skb = __skb_recv_udp(sk, flags, noblock, &peeked, &off, &err);
if (!skb)
return err;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 578142b7ca3e..86fb4ff8934c 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -362,7 +362,7 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
return ipv6_recv_rxpmtu(sk, msg, len, addr_len);
try_again:
- peeking = off = sk_peek_offset(sk, flags);
+ peeking = off = sk_peek_offset(sk, &flags);
skb = __skb_recv_udp(sk, flags, noblock, &peeked, &off, &err);
if (!skb)
return err;
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 7b52a380d710..06c740939d9d 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2124,7 +2124,7 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
do {
mutex_lock(&u->iolock);
- skip = sk_peek_offset(sk, flags);
+ skip = sk_peek_offset(sk, &flags);
skb = __skb_try_recv_datagram(sk, flags, NULL, &peeked, &skip,
&err, &last);
if (skb)
@@ -2304,10 +2304,7 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
*/
mutex_lock(&u->iolock);
- if (flags & MSG_PEEK)
- skip = sk_peek_offset(sk, flags);
- else
- skip = 0;
+ skip = sk_peek_offset(sk, &flags);
do {
int chunk;
next prev parent reply other threads:[~2017-08-16 9:28 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-14 5:52 [PATCH net] datagram: When peeking datagrams with offset < 0 don't skip empty skbs Matthew Dawson
2017-08-14 9:27 ` Paolo Abeni
2017-08-14 14:05 ` Matthew Dawson
2017-08-14 15:03 ` Willem de Bruijn
2017-08-14 15:31 ` Paolo Abeni
2017-08-15 1:35 ` Willem de Bruijn
2017-08-15 15:40 ` Paolo Abeni
2017-08-15 16:45 ` Willem de Bruijn
2017-08-15 17:00 ` Willem de Bruijn
2017-08-16 9:28 ` Paolo Abeni [this message]
2017-08-16 15:18 ` Willem de Bruijn
2017-08-16 20:20 ` Paolo Abeni
2017-08-16 23:27 ` Willem de Bruijn
2017-08-16 23:40 ` Willem de Bruijn
2017-08-16 23:55 ` Thiago Macieira
2017-08-17 0:10 ` Willem de Bruijn
2017-08-17 9:15 ` David Laight
2017-08-17 14:37 ` Willem de Bruijn
2017-08-17 15:47 ` Matthew Dawson
2017-08-17 16:45 ` Willem de Bruijn
2017-08-15 18:17 ` Paolo Abeni
2017-08-14 16:06 ` Thiago Macieira
2017-08-14 16:33 ` Willem de Bruijn
2017-08-14 17:02 ` Thiago Macieira
2017-08-14 18:25 ` Willem de Bruijn
2017-08-14 18:33 ` Thiago Macieira
2017-08-14 18:46 ` Willem de Bruijn
2017-08-14 18:58 ` Thiago Macieira
2017-08-14 19:03 ` Willem de Bruijn
2017-08-14 19:15 ` Thiago Macieira
2017-08-14 19:39 ` Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1502875722.3548.11.camel@redhat.com \
--to=pabeni@redhat.com \
--cc=matthew@mjdsystems.ca \
--cc=netdev@vger.kernel.org \
--cc=thiago.macieira@intel.com \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).