* [PATCH mptcp-next 3/3] mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls
2021-11-08 10:57 [PATCH mptcp-next 0/3] support for setsockopt(TCP_INQ) Florian Westphal
@ 2021-11-08 10:57 ` Florian Westphal
2021-11-08 17:08 ` Matthieu Baerts
2021-11-10 8:48 ` Paolo Abeni
0 siblings, 2 replies; 8+ messages in thread
From: Florian Westphal @ 2021-11-08 10:57 UTC (permalink / raw)
To: mptcp; +Cc: Florian Westphal
Allows to query in-sequence data ready for read(), total bytes in
write queue and total bytes in write queue that have not yet been sent.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/mptcp/protocol.c | 52 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index b4263bf821ac..29b6f57b917e 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -22,6 +22,7 @@
#endif
#include <net/mptcp.h>
#include <net/xfrm.h>
+#include <asm/ioctls.h>
#include "protocol.h"
#include "mib.h"
@@ -3260,6 +3261,56 @@ static int mptcp_forward_alloc_get(const struct sock *sk)
return sk->sk_forward_alloc + mptcp_sk(sk)->rmem_fwd_alloc;
}
+static int mptcp_ioctl_outq(const struct mptcp_sock *msk, u64 v)
+{
+ const struct sock *sk = (void *)msk;
+ u64 delta;
+
+ if (sk->sk_state == TCP_LISTEN)
+ return -EINVAL;
+
+ if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))
+ return 0;
+
+ delta = READ_ONCE(msk->write_seq) - v;
+ if (delta > INT_MAX)
+ delta = INT_MAX;
+
+ return (int)delta;
+}
+
+static int mptcp_ioctl(struct sock *sk, int cmd, unsigned long arg)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ int answ;
+ bool slow;
+
+ switch (cmd) {
+ case SIOCINQ:
+ if (sk->sk_state == TCP_LISTEN)
+ return -EINVAL;
+
+ slow = lock_sock_fast(sk);
+ answ = mptcp_inq_hint(sk);
+ unlock_sock_fast(sk, slow);
+ break;
+ case SIOCOUTQ:
+ slow = lock_sock_fast(sk);
+ answ = mptcp_ioctl_outq(msk, READ_ONCE(msk->snd_una));
+ unlock_sock_fast(sk, slow);
+ break;
+ case SIOCOUTQNSD:
+ slow = lock_sock_fast(sk);
+ answ = mptcp_ioctl_outq(msk, READ_ONCE(msk->snd_nxt));
+ unlock_sock_fast(sk, slow);
+ break;
+ default:
+ return -ENOIOCTLCMD;
+ }
+
+ return put_user(answ, (int __user *)arg);
+}
+
static struct proto mptcp_prot = {
.name = "MPTCP",
.owner = THIS_MODULE,
@@ -3272,6 +3323,7 @@ static struct proto mptcp_prot = {
.shutdown = mptcp_shutdown,
.destroy = mptcp_destroy,
.sendmsg = mptcp_sendmsg,
+ .ioctl = mptcp_ioctl,
.recvmsg = mptcp_recvmsg,
.release_cb = mptcp_release_cb,
.hash = mptcp_hash,
--
2.32.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH mptcp-next 3/3] mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls
2021-11-08 10:57 ` [PATCH mptcp-next 3/3] mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls Florian Westphal
@ 2021-11-08 17:08 ` Matthieu Baerts
2021-11-10 8:48 ` Paolo Abeni
1 sibling, 0 replies; 8+ messages in thread
From: Matthieu Baerts @ 2021-11-08 17:08 UTC (permalink / raw)
To: Florian Westphal; +Cc: mptcp
Hi Florian,
On 08/11/2021 11:57, Florian Westphal wrote:
> Allows to query in-sequence data ready for read(), total bytes in
> write queue and total bytes in write queue that have not yet been sent.
Thank you for looking at that!
(...)
> +static int mptcp_ioctl(struct sock *sk, int cmd, unsigned long arg)
> +{
> + struct mptcp_sock *msk = mptcp_sk(sk);
> + int answ;
> + bool slow;
Very very small detail: we should swap the two lines.
I can do that when applying the patches if no v2 is needed.
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH mptcp-next 3/3] mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls
2021-11-08 10:57 ` [PATCH mptcp-next 3/3] mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls Florian Westphal
2021-11-08 17:08 ` Matthieu Baerts
@ 2021-11-10 8:48 ` Paolo Abeni
2021-11-10 8:53 ` Florian Westphal
1 sibling, 1 reply; 8+ messages in thread
From: Paolo Abeni @ 2021-11-10 8:48 UTC (permalink / raw)
To: Florian Westphal, mptcp
Hello,
On Mon, 2021-11-08 at 11:57 +0100, Florian Westphal wrote:
> Allows to query in-sequence data ready for read(), total bytes in
> write queue and total bytes in write queue that have not yet been sent.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> net/mptcp/protocol.c | 52 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 52 insertions(+)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index b4263bf821ac..29b6f57b917e 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -22,6 +22,7 @@
> #endif
> #include <net/mptcp.h>
> #include <net/xfrm.h>
> +#include <asm/ioctls.h>
> #include "protocol.h"
> #include "mib.h"
>
> @@ -3260,6 +3261,56 @@ static int mptcp_forward_alloc_get(const struct sock *sk)
> return sk->sk_forward_alloc + mptcp_sk(sk)->rmem_fwd_alloc;
> }
>
> +static int mptcp_ioctl_outq(const struct mptcp_sock *msk, u64 v)
> +{
> + const struct sock *sk = (void *)msk;
> + u64 delta;
> +
> + if (sk->sk_state == TCP_LISTEN)
> + return -EINVAL;
> +
> + if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))
> + return 0;
> +
> + delta = READ_ONCE(msk->write_seq) - v;
This is under the msk socket lock and write_seq is protected by the
full/plain msk socket lock so READ_ONCE should not be necessary, I
think. The same for 'snd_nxt' below.
'snd_una' instead is updated under the msk data lock, so the read once
there should be required.
Cheers,
Paolo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH mptcp-next 3/3] mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls
2021-11-10 8:48 ` Paolo Abeni
@ 2021-11-10 8:53 ` Florian Westphal
0 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2021-11-10 8:53 UTC (permalink / raw)
To: Paolo Abeni; +Cc: Florian Westphal, mptcp
Paolo Abeni <pabeni@redhat.com> wrote:
> > + if (sk->sk_state == TCP_LISTEN)
> > + return -EINVAL;
> > +
> > + if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))
> > + return 0;
> > +
> > + delta = READ_ONCE(msk->write_seq) - v;
>
> This is under the msk socket lock and write_seq is protected by the
> full/plain msk socket lock so READ_ONCE should not be necessary, I
> think. The same for 'snd_nxt' below.
Then why is write_seq updated with WRITE_ONCE in lots of places?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH mptcp-next 3/3] mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls
[not found] <7bfe33f96550ffd6efaa3266f6027a8d7c500b70.camel@redhat.com>
@ 2021-11-10 11:02 ` Paolo Abeni
2021-11-10 11:37 ` Florian Westphal
0 siblings, 1 reply; 8+ messages in thread
From: Paolo Abeni @ 2021-11-10 11:02 UTC (permalink / raw)
To: mptcp
Whoops, I omitted the ML in my reply, sorry...
@Florian: sorry for the dups! please reply to this one, if needed, to
re-include properly the ML into the thread.
On Wed, 2021-11-10 at 09:53 +0100, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
> > > + if (sk->sk_state == TCP_LISTEN)
> > > + return -EINVAL;
> > > +
> > > + if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))
> > > + return 0;
> > > +
> > > + delta = READ_ONCE(msk->write_seq) - v;
> >
> > This is under the msk socket lock and write_seq is protected by the
> > full/plain msk socket lock so READ_ONCE should not be necessary, I
> > think. The same for 'snd_nxt' below.
>
> Then why is write_seq updated with WRITE_ONCE in lots of places?
'write_seq' is read outside the msk socket lock scope in a few places.
My understanding is that to avoid reces and ubsan splat we need to mark
each write with WRITE_ONCE, and read outside the socket lock with
READ_ONCE.
If all the write happens under the socket lock, read under the socket
lock don't need the _ONCE annotation. My reference is:
https://marc.info/?l=linux-netdev&m=161198643714502&w=2
Feel free to correct me if I'm wrong!
BTW this always reports '1' on fallback sockets after shutdown, as
after shutdown write_seq is incremented by 1, but snd_una/snd_nxt are
not incremented when receiving the TCP fin. I guess the fix not
strictly related to this patch.
Cheers,
Paolo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH mptcp-next 3/3] mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls
2021-11-10 11:02 ` [PATCH mptcp-next 3/3] mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls Paolo Abeni
@ 2021-11-10 11:37 ` Florian Westphal
2021-11-10 11:45 ` Paolo Abeni
0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2021-11-10 11:37 UTC (permalink / raw)
To: Paolo Abeni; +Cc: mptcp
Paolo Abeni <pabeni@redhat.com> wrote:
> Whoops, I omitted the ML in my reply, sorry...
>
> @Florian: sorry for the dups! please reply to this one, if needed, to
> re-include properly the ML into the thread.
>
> On Wed, 2021-11-10 at 09:53 +0100, Florian Westphal wrote:
> > Paolo Abeni <pabeni@redhat.com> wrote:
> > > > + if (sk->sk_state == TCP_LISTEN)
> > > > + return -EINVAL;
> > > > +
> > > > + if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))
> > > > + return 0;
> > > > +
> > > > + delta = READ_ONCE(msk->write_seq) - v;
> > >
> > > This is under the msk socket lock and write_seq is protected by the
> > > full/plain msk socket lock so READ_ONCE should not be necessary, I
> > > think. The same for 'snd_nxt' below.
> >
> > Then why is write_seq updated with WRITE_ONCE in lots of places?
>
> 'write_seq' is read outside the msk socket lock scope in a few places.
Bah. You are right. TBH I do not understand READ|WRITE_ONCE usage in
mptcp anymore. Maybe its correct, who knows.
The different locking schemes are not easy to follow for me.
> BTW this always reports '1' on fallback sockets after shutdown, as
> after shutdown write_seq is incremented by 1, but snd_una/snd_nxt are
> not incremented when receiving the TCP fin. I guess the fix not
> strictly related to this patch.
That should be ok, we should return nonzero inq hint when next read()
would indicate EOF.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH mptcp-next 3/3] mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls
2021-11-10 11:37 ` Florian Westphal
@ 2021-11-10 11:45 ` Paolo Abeni
2021-11-10 11:49 ` Florian Westphal
0 siblings, 1 reply; 8+ messages in thread
From: Paolo Abeni @ 2021-11-10 11:45 UTC (permalink / raw)
To: Florian Westphal; +Cc: mptcp
On Wed, 2021-11-10 at 12:37 +0100, Florian Westphal wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
> > Whoops, I omitted the ML in my reply, sorry...
> >
> > @Florian: sorry for the dups! please reply to this one, if needed, to
> > re-include properly the ML into the thread.
> >
> > On Wed, 2021-11-10 at 09:53 +0100, Florian Westphal wrote:
> > > Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > + if (sk->sk_state == TCP_LISTEN)
> > > > > + return -EINVAL;
> > > > > +
> > > > > + if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV))
> > > > > + return 0;
> > > > > +
> > > > > + delta = READ_ONCE(msk->write_seq) - v;
> > > >
> > > > This is under the msk socket lock and write_seq is protected by the
> > > > full/plain msk socket lock so READ_ONCE should not be necessary, I
> > > > think. The same for 'snd_nxt' below.
> > >
> > > Then why is write_seq updated with WRITE_ONCE in lots of places?
> >
> > 'write_seq' is read outside the msk socket lock scope in a few places.
>
> Bah. You are right. TBH I do not understand READ|WRITE_ONCE usage in
> mptcp anymore. Maybe its correct, who knows.
>
> The different locking schemes are not easy to follow for me.
Indeed I agreed!
I *think* (mostly guess) some _ONCE() annotation are now obsoleted/made
unnecessary by the several re-factors.
For sure we need to document clarely the locking schema (where? how?
would comments on the protectect fields suffice?) and possibly/likely
review the current annotations.
> > BTW this always reports '1' on fallback sockets after shutdown, as
> > after shutdown write_seq is incremented by 1, but snd_una/snd_nxt are
> > not incremented when receiving the TCP fin. I guess the fix not
> > strictly related to this patch.
>
> That should be ok, we should return nonzero inq hint when next read()
> would indicate EOF.
This is for the outq ?!? I think it should be 0 after all outstanding
data is acked ?!?
Thanks!
Paolo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH mptcp-next 3/3] mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls
2021-11-10 11:45 ` Paolo Abeni
@ 2021-11-10 11:49 ` Florian Westphal
0 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2021-11-10 11:49 UTC (permalink / raw)
To: Paolo Abeni; +Cc: Florian Westphal, mptcp
Paolo Abeni <pabeni@redhat.com> wrote:
> > Bah. You are right. TBH I do not understand READ|WRITE_ONCE usage in
> > mptcp anymore. Maybe its correct, who knows.
> >
> > The different locking schemes are not easy to follow for me.
>
> Indeed I agreed!
>
> I *think* (mostly guess) some _ONCE() annotation are now obsoleted/made
> unnecessary by the several re-factors.
>
> For sure we need to document clarely the locking schema (where? how?
> would comments on the protectect fields suffice?) and possibly/likely
> review the current annotations.
I think adding comments in the mptcp_sock struct would be a great start.
> > > BTW this always reports '1' on fallback sockets after shutdown, as
> > > after shutdown write_seq is incremented by 1, but snd_una/snd_nxt are
> > > not incremented when receiving the TCP fin. I guess the fix not
> > > strictly related to this patch.
> >
> > That should be ok, we should return nonzero inq hint when next read()
> > would indicate EOF.
>
> This is for the outq ?!? I think it should be 0 after all outstanding
> data is acked ?!?
Oh, I misread it as inq.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-11-10 11:49 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <7bfe33f96550ffd6efaa3266f6027a8d7c500b70.camel@redhat.com>
2021-11-10 11:02 ` [PATCH mptcp-next 3/3] mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls Paolo Abeni
2021-11-10 11:37 ` Florian Westphal
2021-11-10 11:45 ` Paolo Abeni
2021-11-10 11:49 ` Florian Westphal
2021-11-08 10:57 [PATCH mptcp-next 0/3] support for setsockopt(TCP_INQ) Florian Westphal
2021-11-08 10:57 ` [PATCH mptcp-next 3/3] mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctls Florian Westphal
2021-11-08 17:08 ` Matthieu Baerts
2021-11-10 8:48 ` Paolo Abeni
2021-11-10 8:53 ` Florian Westphal
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.