[PATCH v3 0/2] sctp: delay calls to sk_data

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
@ 2016-04-08 19:41 Marcelo Ricardo Leitner
  2016-04-08 19:41 ` [PATCH v3 1/2] sctp: compress bit-wide flags to a bitfield on sctp_sock Marcelo Ricardo Leitner
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Marcelo Ricardo Leitner @ 2016-04-08 19:41 UTC (permalink / raw)
  To: netdev; +Cc: Vlad Yasevich, Neil Horman, linux-sctp, David Laight,
	Jakub Sitnicki

1st patch is a preparation for the 2nd. The idea is to not call
->sk_data_ready() for every data chunk processed while processing
packets but only once before releasing the socket.

v2: patchset re-checked, small changelog fixes
v3: on patch 2, make use of local vars to make it more readable

Marcelo Ricardo Leitner (2):
  sctp: compress bit-wide flags to a bitfield on sctp_sock
  sctp: delay calls to sk_data_ready() as much as possible

 include/net/sctp/structs.h | 13 +++++++------
 net/sctp/sm_sideeffect.c   |  7 +++++++
 net/sctp/ulpqueue.c        |  4 ++--
 3 files changed, 16 insertions(+), 8 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/2] sctp: compress bit-wide flags to a bitfield on sctp_sock
  2016-04-08 19:41 [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible Marcelo Ricardo Leitner
@ 2016-04-08 19:41 ` Marcelo Ricardo Leitner
  2016-04-12 19:50   ` Neil Horman
  2016-04-08 19:41 ` [PATCH v3 2/2] sctp: delay calls to sk_data_ready() as much as possible Marcelo Ricardo Leitner
  2016-04-14  3:05 ` [PATCH v3 0/2] " David Miller
  2 siblings, 1 reply; 16+ messages in thread
From: Marcelo Ricardo Leitner @ 2016-04-08 19:41 UTC (permalink / raw)
  To: netdev; +Cc: Vlad Yasevich, Neil Horman, linux-sctp, David Laight,
	Jakub Sitnicki

It wastes space and gets worse as we add new flags, so convert bit-wide
flags to a bitfield.

Currently it already saves 4 bytes in sctp_sock, which are left as holes
in it for now. The whole struct needs packing, which should be done in
another patch.

Note that do_auto_asconf cannot be merged, as explained in the comment
before it.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 include/net/sctp/structs.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 6df1ce7a411c548bda4163840a90578b6e1b4cfe..1a6a626904bba4223b7921bbb4be41c2550271a7 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -210,14 +210,14 @@ struct sctp_sock {
 	int user_frag;
 
 	__u32 autoclose;
-	__u8 nodelay;
-	__u8 disable_fragments;
-	__u8 v4mapped;
-	__u8 frag_interleave;
 	__u32 adaptation_ind;
 	__u32 pd_point;
-	__u8 recvrcvinfo;
-	__u8 recvnxtinfo;
+	__u16	nodelay:1,
+		disable_fragments:1,
+		v4mapped:1,
+		frag_interleave:1,
+		recvrcvinfo:1,
+		recvnxtinfo:1;
 
 	atomic_t pd_mode;
 	/* Receive to here while partial delivery is in effect. */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] sctp: compress bit-wide flags to a bitfield on sctp_sock
  2016-04-08 19:41 ` [PATCH v3 1/2] sctp: compress bit-wide flags to a bitfield on sctp_sock Marcelo Ricardo Leitner
@ 2016-04-12 19:50   ` Neil Horman
  0 siblings, 0 replies; 16+ messages in thread
From: Neil Horman @ 2016-04-12 19:50 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: netdev, Vlad Yasevich, linux-sctp, David Laight, Jakub Sitnicki

On Fri, Apr 08, 2016 at 04:41:27PM -0300, Marcelo Ricardo Leitner wrote:
> It wastes space and gets worse as we add new flags, so convert bit-wide
> flags to a bitfield.
> 
> Currently it already saves 4 bytes in sctp_sock, which are left as holes
> in it for now. The whole struct needs packing, which should be done in
> another patch.
> 
> Note that do_auto_asconf cannot be merged, as explained in the comment
> before it.
> 
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> ---
>  include/net/sctp/structs.h | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index 6df1ce7a411c548bda4163840a90578b6e1b4cfe..1a6a626904bba4223b7921bbb4be41c2550271a7 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -210,14 +210,14 @@ struct sctp_sock {
>  	int user_frag;
>  
>  	__u32 autoclose;
> -	__u8 nodelay;
> -	__u8 disable_fragments;
> -	__u8 v4mapped;
> -	__u8 frag_interleave;
>  	__u32 adaptation_ind;
>  	__u32 pd_point;
> -	__u8 recvrcvinfo;
> -	__u8 recvnxtinfo;
> +	__u16	nodelay:1,
> +		disable_fragments:1,
> +		v4mapped:1,
> +		frag_interleave:1,
> +		recvrcvinfo:1,
> +		recvnxtinfo:1;
>  
>  	atomic_t pd_mode;
>  	/* Receive to here while partial delivery is in effect. */
> -- 
> 2.5.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

I've not run it myself, but this series looks reasonable

Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 2/2] sctp: delay calls to sk_data_ready() as much as possible
  2016-04-08 19:41 [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible Marcelo Ricardo Leitner
  2016-04-08 19:41 ` [PATCH v3 1/2] sctp: compress bit-wide flags to a bitfield on sctp_sock Marcelo Ricardo Leitner
@ 2016-04-08 19:41 ` Marcelo Ricardo Leitner
  2016-04-14  3:05 ` [PATCH v3 0/2] " David Miller
  2 siblings, 0 replies; 16+ messages in thread
From: Marcelo Ricardo Leitner @ 2016-04-08 19:41 UTC (permalink / raw)
  To: netdev; +Cc: Vlad Yasevich, Neil Horman, linux-sctp, David Laight,
	Jakub Sitnicki

Currently processing of multiple chunks in a single SCTP packet leads to
multiple calls to sk_data_ready, causing multiple wake up signals which
are costy and doesn't make it wake up any faster.

With this patch it will note that the wake up is pending and will do it
before leaving the state machine interpreter, latest place possible to
do it realiably and cleanly.

Note that sk_data_ready events are not dependent on asocs, unlike waking
up writers.

v2: series re-checked
v3: use local vars to cleanup the code, suggested by Jakub Sitnicki
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 include/net/sctp/structs.h | 3 ++-
 net/sctp/sm_sideeffect.c   | 7 +++++++
 net/sctp/ulpqueue.c        | 4 ++--
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 1a6a626904bba4223b7921bbb4be41c2550271a7..21cb11107e378b4da1e7efde22fab4349496e35a 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -217,7 +217,8 @@ struct sctp_sock {
 		v4mapped:1,
 		frag_interleave:1,
 		recvrcvinfo:1,
-		recvnxtinfo:1;
+		recvnxtinfo:1,
+		pending_data_ready:1;
 
 	atomic_t pd_mode;
 	/* Receive to here while partial delivery is in effect. */
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 7fe56d0acabf66cfd8fe29dfdb45f7620b470ac7..d06317de873090be359ce768fe291224ee50658f 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1222,6 +1222,8 @@ static int sctp_cmd_interpreter(sctp_event_t event_type,
 				sctp_cmd_seq_t *commands,
 				gfp_t gfp)
 {
+	struct sock *sk = ep->base.sk;
+	struct sctp_sock *sp = sctp_sk(sk);
 	int error = 0;
 	int force;
 	sctp_cmd_t *cmd;
@@ -1742,6 +1744,11 @@ out:
 			error = sctp_outq_uncork(&asoc->outqueue, gfp);
 	} else if (local_cork)
 		error = sctp_outq_uncork(&asoc->outqueue, gfp);
+
+	if (sp->pending_data_ready) {
+		sk->sk_data_ready(sk);
+		sp->pending_data_ready = 0;
+	}
 	return error;
 nomem:
 	error = -ENOMEM;
diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
index ce469d648ffbe166f9ae1c5650f481256f31a7f8..72e5b3e41cddf9d79371de8ab01484e4601b97b6 100644
--- a/net/sctp/ulpqueue.c
+++ b/net/sctp/ulpqueue.c
@@ -264,7 +264,7 @@ int sctp_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event)
 		sctp_ulpq_clear_pd(ulpq);
 
 	if (queue == &sk->sk_receive_queue)
-		sk->sk_data_ready(sk);
+		sctp_sk(sk)->pending_data_ready = 1;
 	return 1;
 
 out_free:
@@ -1140,5 +1140,5 @@ void sctp_ulpq_abort_pd(struct sctp_ulpq *ulpq, gfp_t gfp)
 
 	/* If there is data waiting, send it up the socket now. */
 	if (sctp_ulpq_clear_pd(ulpq) || ev)
-		sk->sk_data_ready(sk);
+		sctp_sk(sk)->pending_data_ready = 1;
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
  2016-04-08 19:41 [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible Marcelo Ricardo Leitner
  2016-04-08 19:41 ` [PATCH v3 1/2] sctp: compress bit-wide flags to a bitfield on sctp_sock Marcelo Ricardo Leitner
  2016-04-08 19:41 ` [PATCH v3 2/2] sctp: delay calls to sk_data_ready() as much as possible Marcelo Ricardo Leitner
@ 2016-04-14  3:05 ` David Miller
  2016-04-14 13:03   ` Neil Horman
  2 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2016-04-14  3:05 UTC (permalink / raw)
  To: marcelo.leitner
  Cc: netdev, vyasevich, nhorman, linux-sctp, David.Laight, jkbs

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Fri,  8 Apr 2016 16:41:26 -0300

> 1st patch is a preparation for the 2nd. The idea is to not call
> ->sk_data_ready() for every data chunk processed while processing
> packets but only once before releasing the socket.
> 
> v2: patchset re-checked, small changelog fixes
> v3: on patch 2, make use of local vars to make it more readable

Applied to net-next, but isn't this reduced overhead coming at the
expense of latency?  What if that lower latency is important to the
application and/or consumer?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
  2016-04-14  3:05 ` [PATCH v3 0/2] " David Miller
@ 2016-04-14 13:03   ` Neil Horman
  2016-04-14 17:00     ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 16+ messages in thread
From: Neil Horman @ 2016-04-14 13:03 UTC (permalink / raw)
  To: David Miller
  Cc: marcelo.leitner, netdev, vyasevich, linux-sctp, David.Laight,
	jkbs

On Wed, Apr 13, 2016 at 11:05:32PM -0400, David Miller wrote:
> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Date: Fri,  8 Apr 2016 16:41:26 -0300
> 
> > 1st patch is a preparation for the 2nd. The idea is to not call
> > ->sk_data_ready() for every data chunk processed while processing
> > packets but only once before releasing the socket.
> > 
> > v2: patchset re-checked, small changelog fixes
> > v3: on patch 2, make use of local vars to make it more readable
> 
> Applied to net-next, but isn't this reduced overhead coming at the
> expense of latency?  What if that lower latency is important to the
> application and/or consumer?
Thats a fair point, but I'd make the counter argument that, as it currently
stands, any latency introduced (or removed), is an artifact of our
implementation rather than a designed feature of it.  That is to say, we make no
guarantees at the application level regarding how long it takes to signal data
readines from the time we get data off the wire, so I would rather see our
throughput raised if we can, as thats been sctp's more pressing achilles heel.


Thats not to say I'd like to enable lower latency, but I'd rather have this now,
and start pondering how to design that in.  Perhaps we can convert the pending
flag to a counter to count the number of events we enqueue, and call
sk_data_ready every  time we reach a sysctl defined threshold.

Neil

> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
  2016-04-14 13:03   ` Neil Horman
@ 2016-04-14 17:00     ` Marcelo Ricardo Leitner
  2016-04-14 18:59       ` David Miller
  0 siblings, 1 reply; 16+ messages in thread
From: Marcelo Ricardo Leitner @ 2016-04-14 17:00 UTC (permalink / raw)
  To: Neil Horman, David Miller
  Cc: netdev, vyasevich, linux-sctp, David.Laight, jkbs

Em 14-04-2016 10:03, Neil Horman escreveu:
> On Wed, Apr 13, 2016 at 11:05:32PM -0400, David Miller wrote:
>> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>> Date: Fri,  8 Apr 2016 16:41:26 -0300
>>
>>> 1st patch is a preparation for the 2nd. The idea is to not call
>>> ->sk_data_ready() for every data chunk processed while processing
>>> packets but only once before releasing the socket.
>>>
>>> v2: patchset re-checked, small changelog fixes
>>> v3: on patch 2, make use of local vars to make it more readable
>>
>> Applied to net-next, but isn't this reduced overhead coming at the
>> expense of latency?  What if that lower latency is important to the
>> application and/or consumer?
> Thats a fair point, but I'd make the counter argument that, as it currently
> stands, any latency introduced (or removed), is an artifact of our
> implementation rather than a designed feature of it.  That is to say, we make no
> guarantees at the application level regarding how long it takes to signal data
> readines from the time we get data off the wire, so I would rather see our
> throughput raised if we can, as thats been sctp's more pressing achilles heel.
>
>
> Thats not to say I'd like to enable lower latency, but I'd rather have this now,
> and start pondering how to design that in.  Perhaps we can convert the pending
> flag to a counter to count the number of events we enqueue, and call
> sk_data_ready every  time we reach a sysctl defined threshold.

That and also that there is no chance of the application reading the 
first chunks before all current ToDo's are performed by either the bh or 
backlog handlers for that packet. Socket lock won't be cycled in between 
chunks so the application is going to wait all the processing one way or 
another.

Thanks,
Marcelo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
  2016-04-14 17:00     ` Marcelo Ricardo Leitner
@ 2016-04-14 18:59       ` David Miller
  2016-04-14 19:33         ` marcelo.leitner
  2016-04-14 20:03         ` Neil Horman
  0 siblings, 2 replies; 16+ messages in thread
From: David Miller @ 2016-04-14 18:59 UTC (permalink / raw)
  To: marcelo.leitner
  Cc: nhorman, netdev, vyasevich, linux-sctp, David.Laight, jkbs

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Thu, 14 Apr 2016 14:00:49 -0300

> Em 14-04-2016 10:03, Neil Horman escreveu:
>> On Wed, Apr 13, 2016 at 11:05:32PM -0400, David Miller wrote:
>>> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>> Date: Fri,  8 Apr 2016 16:41:26 -0300
>>>
>>>> 1st patch is a preparation for the 2nd. The idea is to not call
>>>> ->sk_data_ready() for every data chunk processed while processing
>>>> packets but only once before releasing the socket.
>>>>
>>>> v2: patchset re-checked, small changelog fixes
>>>> v3: on patch 2, make use of local vars to make it more readable
>>>
>>> Applied to net-next, but isn't this reduced overhead coming at the
>>> expense of latency?  What if that lower latency is important to the
>>> application and/or consumer?
>> Thats a fair point, but I'd make the counter argument that, as it
>> currently
>> stands, any latency introduced (or removed), is an artifact of our
>> implementation rather than a designed feature of it.  That is to say,
>> we make no
>> guarantees at the application level regarding how long it takes to
>> signal data
>> readines from the time we get data off the wire, so I would rather see
>> our
>> throughput raised if we can, as thats been sctp's more pressing
>> achilles heel.
>>
>>
>> Thats not to say I'd like to enable lower latency, but I'd rather have
>> this now,
>> and start pondering how to design that in.  Perhaps we can convert the
>> pending
>> flag to a counter to count the number of events we enqueue, and call
>> sk_data_ready every  time we reach a sysctl defined threshold.
> 
> That and also that there is no chance of the application reading the
> first chunks before all current ToDo's are performed by either the bh
> or backlog handlers for that packet. Socket lock won't be cycled in
> between chunks so the application is going to wait all the processing
> one way or another.

But it takes time to signal the wakeup to the remote cpu the process
was running on, schedule out the current process on that cpu (if it
has in fact lost it's timeslice), and then finally look at the socket
queue.

Of course this is all assuming the process was sleeping in the first
place, either in recv or more likely poll.

I really think signalling early helps performance.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
  2016-04-14 18:59       ` David Miller
@ 2016-04-14 19:33         ` marcelo.leitner
  2016-04-14 20:03         ` Neil Horman
  1 sibling, 0 replies; 16+ messages in thread
From: marcelo.leitner @ 2016-04-14 19:33 UTC (permalink / raw)
  To: David Miller; +Cc: nhorman, netdev, vyasevich, linux-sctp, David.Laight, jkbs

On Thu, Apr 14, 2016 at 02:59:16PM -0400, David Miller wrote:
> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Date: Thu, 14 Apr 2016 14:00:49 -0300
> 
> > Em 14-04-2016 10:03, Neil Horman escreveu:
> >> On Wed, Apr 13, 2016 at 11:05:32PM -0400, David Miller wrote:
> >>> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> >>> Date: Fri,  8 Apr 2016 16:41:26 -0300
> >>>
> >>>> 1st patch is a preparation for the 2nd. The idea is to not call
> >>>> ->sk_data_ready() for every data chunk processed while processing
> >>>> packets but only once before releasing the socket.
> >>>>
> >>>> v2: patchset re-checked, small changelog fixes
> >>>> v3: on patch 2, make use of local vars to make it more readable
> >>>
> >>> Applied to net-next, but isn't this reduced overhead coming at the
> >>> expense of latency?  What if that lower latency is important to the
> >>> application and/or consumer?
> >> Thats a fair point, but I'd make the counter argument that, as it
> >> currently
> >> stands, any latency introduced (or removed), is an artifact of our
> >> implementation rather than a designed feature of it.  That is to say,
> >> we make no
> >> guarantees at the application level regarding how long it takes to
> >> signal data
> >> readines from the time we get data off the wire, so I would rather see
> >> our
> >> throughput raised if we can, as thats been sctp's more pressing
> >> achilles heel.
> >>
> >>
> >> Thats not to say I'd like to enable lower latency, but I'd rather have
> >> this now,
> >> and start pondering how to design that in.  Perhaps we can convert the
> >> pending
> >> flag to a counter to count the number of events we enqueue, and call
> >> sk_data_ready every  time we reach a sysctl defined threshold.
> > 
> > That and also that there is no chance of the application reading the
> > first chunks before all current ToDo's are performed by either the bh
> > or backlog handlers for that packet. Socket lock won't be cycled in
> > between chunks so the application is going to wait all the processing
> > one way or another.
> 
> But it takes time to signal the wakeup to the remote cpu the process
> was running on, schedule out the current process on that cpu (if it
> has in fact lost it's timeslice), and then finally look at the socket
> queue.
> 
> Of course this is all assuming the process was sleeping in the first
> place, either in recv or more likely poll.
> 
> I really think signalling early helps performance.

I see. Okay, I'll revisit this, thanks.

  Marcelo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
  2016-04-14 18:59       ` David Miller
  2016-04-14 19:33         ` marcelo.leitner
@ 2016-04-14 20:03         ` Neil Horman
  2016-04-14 20:19           ` marcelo.leitner
  1 sibling, 1 reply; 16+ messages in thread
From: Neil Horman @ 2016-04-14 20:03 UTC (permalink / raw)
  To: David Miller
  Cc: marcelo.leitner, netdev, vyasevich, linux-sctp, David.Laight,
	jkbs

On Thu, Apr 14, 2016 at 02:59:16PM -0400, David Miller wrote:
> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Date: Thu, 14 Apr 2016 14:00:49 -0300
> 
> > Em 14-04-2016 10:03, Neil Horman escreveu:
> >> On Wed, Apr 13, 2016 at 11:05:32PM -0400, David Miller wrote:
> >>> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> >>> Date: Fri,  8 Apr 2016 16:41:26 -0300
> >>>
> >>>> 1st patch is a preparation for the 2nd. The idea is to not call
> >>>> ->sk_data_ready() for every data chunk processed while processing
> >>>> packets but only once before releasing the socket.
> >>>>
> >>>> v2: patchset re-checked, small changelog fixes
> >>>> v3: on patch 2, make use of local vars to make it more readable
> >>>
> >>> Applied to net-next, but isn't this reduced overhead coming at the
> >>> expense of latency?  What if that lower latency is important to the
> >>> application and/or consumer?
> >> Thats a fair point, but I'd make the counter argument that, as it
> >> currently
> >> stands, any latency introduced (or removed), is an artifact of our
> >> implementation rather than a designed feature of it.  That is to say,
> >> we make no
> >> guarantees at the application level regarding how long it takes to
> >> signal data
> >> readines from the time we get data off the wire, so I would rather see
> >> our
> >> throughput raised if we can, as thats been sctp's more pressing
> >> achilles heel.
> >>
> >>
> >> Thats not to say I'd like to enable lower latency, but I'd rather have
> >> this now,
> >> and start pondering how to design that in.  Perhaps we can convert the
> >> pending
> >> flag to a counter to count the number of events we enqueue, and call
> >> sk_data_ready every  time we reach a sysctl defined threshold.
> > 
> > That and also that there is no chance of the application reading the
> > first chunks before all current ToDo's are performed by either the bh
> > or backlog handlers for that packet. Socket lock won't be cycled in
> > between chunks so the application is going to wait all the processing
> > one way or another.
> 
> But it takes time to signal the wakeup to the remote cpu the process
> was running on, schedule out the current process on that cpu (if it
> has in fact lost it's timeslice), and then finally look at the socket
> queue.
> 
> Of course this is all assuming the process was sleeping in the first
> place, either in recv or more likely poll.
> 
> I really think signalling early helps performance.
> 

Early, yes, often, not so much :).  Perhaps what would be adventageous would be
to signal at the start of a set of enqueues, rather than at the end.  That would
be equivalent in terms of not signaling more than needed, but would eliminate
the signaling on every chunk.   Perhaps what you could do Marcelo would be to
change the sense of the signal_ready flag to be a has_signaled flag.  e.g. call
sk_data_ready in ulp_event_tail like we used to, but only if the has_signaled
flag isn't set, then set the flag, and clear it at the end of the command
interpreter.

That would be a best of both worlds solution, as long as theres no chance of
race with user space reading from the socket before we were done enqueuing (i.e.
you have to guarantee that the socket lock stays held, which I think we do).

Neil

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
  2016-04-14 20:03         ` Neil Horman
@ 2016-04-14 20:19           ` marcelo.leitner
  2016-04-28 20:46             ` marcelo.leitner
  0 siblings, 1 reply; 16+ messages in thread
From: marcelo.leitner @ 2016-04-14 20:19 UTC (permalink / raw)
  To: Neil Horman
  Cc: David Miller, netdev, vyasevich, linux-sctp, David.Laight, jkbs

On Thu, Apr 14, 2016 at 04:03:51PM -0400, Neil Horman wrote:
> On Thu, Apr 14, 2016 at 02:59:16PM -0400, David Miller wrote:
> > From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > Date: Thu, 14 Apr 2016 14:00:49 -0300
> > 
> > > Em 14-04-2016 10:03, Neil Horman escreveu:
> > >> On Wed, Apr 13, 2016 at 11:05:32PM -0400, David Miller wrote:
> > >>> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > >>> Date: Fri,  8 Apr 2016 16:41:26 -0300
> > >>>
> > >>>> 1st patch is a preparation for the 2nd. The idea is to not call
> > >>>> ->sk_data_ready() for every data chunk processed while processing
> > >>>> packets but only once before releasing the socket.
> > >>>>
> > >>>> v2: patchset re-checked, small changelog fixes
> > >>>> v3: on patch 2, make use of local vars to make it more readable
> > >>>
> > >>> Applied to net-next, but isn't this reduced overhead coming at the
> > >>> expense of latency?  What if that lower latency is important to the
> > >>> application and/or consumer?
> > >> Thats a fair point, but I'd make the counter argument that, as it
> > >> currently
> > >> stands, any latency introduced (or removed), is an artifact of our
> > >> implementation rather than a designed feature of it.  That is to say,
> > >> we make no
> > >> guarantees at the application level regarding how long it takes to
> > >> signal data
> > >> readines from the time we get data off the wire, so I would rather see
> > >> our
> > >> throughput raised if we can, as thats been sctp's more pressing
> > >> achilles heel.
> > >>
> > >>
> > >> Thats not to say I'd like to enable lower latency, but I'd rather have
> > >> this now,
> > >> and start pondering how to design that in.  Perhaps we can convert the
> > >> pending
> > >> flag to a counter to count the number of events we enqueue, and call
> > >> sk_data_ready every  time we reach a sysctl defined threshold.
> > > 
> > > That and also that there is no chance of the application reading the
> > > first chunks before all current ToDo's are performed by either the bh
> > > or backlog handlers for that packet. Socket lock won't be cycled in
> > > between chunks so the application is going to wait all the processing
> > > one way or another.
> > 
> > But it takes time to signal the wakeup to the remote cpu the process
> > was running on, schedule out the current process on that cpu (if it
> > has in fact lost it's timeslice), and then finally look at the socket
> > queue.
> > 
> > Of course this is all assuming the process was sleeping in the first
> > place, either in recv or more likely poll.
> > 
> > I really think signalling early helps performance.
> > 
> 
> Early, yes, often, not so much :).  Perhaps what would be adventageous would be
> to signal at the start of a set of enqueues, rather than at the end.  That would
> be equivalent in terms of not signaling more than needed, but would eliminate
> the signaling on every chunk.   Perhaps what you could do Marcelo would be to
> change the sense of the signal_ready flag to be a has_signaled flag.  e.g. call
> sk_data_ready in ulp_event_tail like we used to, but only if the has_signaled
> flag isn't set, then set the flag, and clear it at the end of the command
> interpreter.
> 
> That would be a best of both worlds solution, as long as theres no chance of
> race with user space reading from the socket before we were done enqueuing (i.e.
> you have to guarantee that the socket lock stays held, which I think we do).

That is my feeling too. Will work on it. Thanks :-)

  Marcelo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
  2016-04-14 20:19           ` marcelo.leitner
@ 2016-04-28 20:46             ` marcelo.leitner
  2016-04-29 13:36               ` Neil Horman
  0 siblings, 1 reply; 16+ messages in thread
From: marcelo.leitner @ 2016-04-28 20:46 UTC (permalink / raw)
  To: Neil Horman
  Cc: David Miller, netdev, vyasevich, linux-sctp, David.Laight, jkbs

On Thu, Apr 14, 2016 at 05:19:00PM -0300, marcelo.leitner@gmail.com wrote:
> On Thu, Apr 14, 2016 at 04:03:51PM -0400, Neil Horman wrote:
> > On Thu, Apr 14, 2016 at 02:59:16PM -0400, David Miller wrote:
> > > From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > Date: Thu, 14 Apr 2016 14:00:49 -0300
> > > 
> > > > Em 14-04-2016 10:03, Neil Horman escreveu:
> > > >> On Wed, Apr 13, 2016 at 11:05:32PM -0400, David Miller wrote:
> > > >>> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > >>> Date: Fri,  8 Apr 2016 16:41:26 -0300
> > > >>>
> > > >>>> 1st patch is a preparation for the 2nd. The idea is to not call
> > > >>>> ->sk_data_ready() for every data chunk processed while processing
> > > >>>> packets but only once before releasing the socket.
> > > >>>>
> > > >>>> v2: patchset re-checked, small changelog fixes
> > > >>>> v3: on patch 2, make use of local vars to make it more readable
> > > >>>
> > > >>> Applied to net-next, but isn't this reduced overhead coming at the
> > > >>> expense of latency?  What if that lower latency is important to the
> > > >>> application and/or consumer?
> > > >> Thats a fair point, but I'd make the counter argument that, as it
> > > >> currently
> > > >> stands, any latency introduced (or removed), is an artifact of our
> > > >> implementation rather than a designed feature of it.  That is to say,
> > > >> we make no
> > > >> guarantees at the application level regarding how long it takes to
> > > >> signal data
> > > >> readines from the time we get data off the wire, so I would rather see
> > > >> our
> > > >> throughput raised if we can, as thats been sctp's more pressing
> > > >> achilles heel.
> > > >>
> > > >>
> > > >> Thats not to say I'd like to enable lower latency, but I'd rather have
> > > >> this now,
> > > >> and start pondering how to design that in.  Perhaps we can convert the
> > > >> pending
> > > >> flag to a counter to count the number of events we enqueue, and call
> > > >> sk_data_ready every  time we reach a sysctl defined threshold.
> > > > 
> > > > That and also that there is no chance of the application reading the
> > > > first chunks before all current ToDo's are performed by either the bh
> > > > or backlog handlers for that packet. Socket lock won't be cycled in
> > > > between chunks so the application is going to wait all the processing
> > > > one way or another.
> > > 
> > > But it takes time to signal the wakeup to the remote cpu the process
> > > was running on, schedule out the current process on that cpu (if it
> > > has in fact lost it's timeslice), and then finally look at the socket
> > > queue.
> > > 
> > > Of course this is all assuming the process was sleeping in the first
> > > place, either in recv or more likely poll.
> > > 
> > > I really think signalling early helps performance.
> > > 
> > 
> > Early, yes, often, not so much :).  Perhaps what would be adventageous would be
> > to signal at the start of a set of enqueues, rather than at the end.  That would
> > be equivalent in terms of not signaling more than needed, but would eliminate
> > the signaling on every chunk.   Perhaps what you could do Marcelo would be to
> > change the sense of the signal_ready flag to be a has_signaled flag.  e.g. call
> > sk_data_ready in ulp_event_tail like we used to, but only if the has_signaled
> > flag isn't set, then set the flag, and clear it at the end of the command
> > interpreter.
> > 
> > That would be a best of both worlds solution, as long as theres no chance of
> > race with user space reading from the socket before we were done enqueuing (i.e.
> > you have to guarantee that the socket lock stays held, which I think we do).
> 
> That is my feeling too. Will work on it. Thanks :-)

I did the change and tested it on real machines set all for performance.
I couldn't spot any difference between both implementations.

Set RSS and queue irq affinity for a cpu and taskset netperf and another
app I wrote to run on another cpu. It hits socket backlog quite often
but still do direct processing every now and then.

With current state, netperf, scenario above. Results of perf sched
record for the CPUs in use, reported by perf sched latency:

  Task                  |   Runtime ms  | Switches | Average delay ms |
  Maximum delay ms | Maximum delay at       |
  netserver:3205        |   9999.490 ms |       10 | avg:    0.003 ms |
  max:    0.004 ms | max at:  69087.753356 s

another run
  netserver:3483        |   9999.412 ms |       15 | avg:    0.003 ms |
  max:    0.004 ms | max at:  69194.749814 s

With the patch below, same test:
  netserver:2643        |  10000.110 ms |       14 | avg:    0.003 ms |
  max:    0.004 ms | max at:    172.006315 s

another run:
  netserver:2698        |  10000.049 ms |       15 | avg:    0.003 ms |
  max:    0.004 ms | max at:    368.061672 s

I'll be happy to do more tests if you have any suggestions on how/what
to test.

---8<---
 
 include/net/sctp/structs.h |  2 +-
 net/sctp/sm_sideeffect.c   |  7 +++----
 net/sctp/ulpqueue.c        | 25 ++++++++++++++++---------
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 558bae3cbe0d5107d52c8cb31b324cfd5479def0..16b013a6191cf1c416e4dd1aeb1707a8569ea49b 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -218,7 +218,7 @@ struct sctp_sock {
 		frag_interleave:1,
 		recvrcvinfo:1,
 		recvnxtinfo:1,
-		pending_data_ready:1;
+		data_ready_signalled:1;
 
 	atomic_t pd_mode;
 	/* Receive to here while partial delivery is in effect. */
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index e8f0112f9b28472c39c4c91dcb28576373c858e7..aa37122593684d8501fdca15983fbd8620fabe07 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1741,10 +1741,9 @@ out:
 	} else if (local_cork)
 		error = sctp_outq_uncork(&asoc->outqueue, gfp);
 
-	if (sp->pending_data_ready) {
-		sk->sk_data_ready(sk);
-		sp->pending_data_ready = 0;
-	}
+	if (sp->data_ready_signalled)
+		sp->data_ready_signalled = 0;
+
 	return error;
 nomem:
 	error = -ENOMEM;
diff --git a/net/sctp/ulpqueue.c b/net/sctp/ulpqueue.c
index ec12a8920e5fd7a0f26d19f1695bc2feeae41518..ec166d2bd2d95d9aa69369da2ead9437da4ce8ed 100644
--- a/net/sctp/ulpqueue.c
+++ b/net/sctp/ulpqueue.c
@@ -194,6 +194,7 @@ static int sctp_ulpq_clear_pd(struct sctp_ulpq *ulpq)
 int sctp_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event)
 {
 	struct sock *sk = ulpq->asoc->base.sk;
+	struct sctp_sock *sp = sctp_sk(sk);
 	struct sk_buff_head *queue, *skb_list;
 	struct sk_buff *skb = sctp_event2skb(event);
 	int clear_pd = 0;
@@ -211,7 +212,7 @@ int sctp_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event)
 		sk_incoming_cpu_update(sk);
 	}
 	/* Check if the user wishes to receive this event.  */
-	if (!sctp_ulpevent_is_enabled(event, &sctp_sk(sk)->subscribe))
+	if (!sctp_ulpevent_is_enabled(event, &sp->subscribe))
 		goto out_free;
 
 	/* If we are in partial delivery mode, post to the lobby until
@@ -219,7 +220,7 @@ int sctp_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event)
 	 * the association the cause of the partial delivery.
 	 */
 
-	if (atomic_read(&sctp_sk(sk)->pd_mode) == 0) {
+	if (atomic_read(&sp->pd_mode) == 0) {
 		queue = &sk->sk_receive_queue;
 	} else {
 		if (ulpq->pd_mode) {
@@ -231,7 +232,7 @@ int sctp_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event)
 			if ((event->msg_flags & MSG_NOTIFICATION) ||
 			    (SCTP_DATA_NOT_FRAG ==
 				    (event->msg_flags & SCTP_DATA_FRAG_MASK)))
-				queue = &sctp_sk(sk)->pd_lobby;
+				queue = &sp->pd_lobby;
 			else {
 				clear_pd = event->msg_flags & MSG_EOR;
 				queue = &sk->sk_receive_queue;
@@ -242,10 +243,10 @@ int sctp_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event)
 			 * can queue this to the receive queue instead
 			 * of the lobby.
 			 */
-			if (sctp_sk(sk)->frag_interleave)
+			if (sp->frag_interleave)
 				queue = &sk->sk_receive_queue;
 			else
-				queue = &sctp_sk(sk)->pd_lobby;
+				queue = &sp->pd_lobby;
 		}
 	}
 
@@ -264,8 +265,10 @@ int sctp_ulpq_tail_event(struct sctp_ulpq *ulpq, struct sctp_ulpevent *event)
 	if (clear_pd)
 		sctp_ulpq_clear_pd(ulpq);
 
-	if (queue == &sk->sk_receive_queue)
-		sctp_sk(sk)->pending_data_ready = 1;
+	if (queue == &sk->sk_receive_queue && !sp->data_ready_signalled) {
+		sp->data_ready_signalled = 1;
+		sk->sk_data_ready(sk);
+	}
 	return 1;
 
 out_free:
@@ -1126,11 +1129,13 @@ void sctp_ulpq_abort_pd(struct sctp_ulpq *ulpq, gfp_t gfp)
 {
 	struct sctp_ulpevent *ev = NULL;
 	struct sock *sk;
+	struct sctp_sock *sp;
 
 	if (!ulpq->pd_mode)
 		return;
 
 	sk = ulpq->asoc->base.sk;
+	sp = sctp_sk(sk);
 	if (sctp_ulpevent_type_enabled(SCTP_PARTIAL_DELIVERY_EVENT,
 				       &sctp_sk(sk)->subscribe))
 		ev = sctp_ulpevent_make_pdapi(ulpq->asoc,
@@ -1140,6 +1145,8 @@ void sctp_ulpq_abort_pd(struct sctp_ulpq *ulpq, gfp_t gfp)
 		__skb_queue_tail(&sk->sk_receive_queue, sctp_event2skb(ev));
 
 	/* If there is data waiting, send it up the socket now. */
-	if (sctp_ulpq_clear_pd(ulpq) || ev)
-		sctp_sk(sk)->pending_data_ready = 1;
+	if ((sctp_ulpq_clear_pd(ulpq) || ev) && !sp->data_ready_signalled) {
+		sp->data_ready_signalled = 1;
+		sk->sk_data_ready(sk);
+	}
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
  2016-04-28 20:46             ` marcelo.leitner
@ 2016-04-29 13:36               ` Neil Horman
  2016-04-29 13:47                 ` marcelo.leitner
  0 siblings, 1 reply; 16+ messages in thread
From: Neil Horman @ 2016-04-29 13:36 UTC (permalink / raw)
  To: marcelo.leitner
  Cc: David Miller, netdev, vyasevich, linux-sctp, David.Laight, jkbs

On Thu, Apr 28, 2016 at 05:46:59PM -0300, marcelo.leitner@gmail.com wrote:
> On Thu, Apr 14, 2016 at 05:19:00PM -0300, marcelo.leitner@gmail.com wrote:
> > On Thu, Apr 14, 2016 at 04:03:51PM -0400, Neil Horman wrote:
> > > On Thu, Apr 14, 2016 at 02:59:16PM -0400, David Miller wrote:
> > > > From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > > Date: Thu, 14 Apr 2016 14:00:49 -0300
> > > > 
> > > > > Em 14-04-2016 10:03, Neil Horman escreveu:
> > > > >> On Wed, Apr 13, 2016 at 11:05:32PM -0400, David Miller wrote:
> > > > >>> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > > >>> Date: Fri,  8 Apr 2016 16:41:26 -0300
> > > > >>>
> > > > >>>> 1st patch is a preparation for the 2nd. The idea is to not call
> > > > >>>> ->sk_data_ready() for every data chunk processed while processing
> > > > >>>> packets but only once before releasing the socket.
> > > > >>>>
> > > > >>>> v2: patchset re-checked, small changelog fixes
> > > > >>>> v3: on patch 2, make use of local vars to make it more readable
> > > > >>>
> > > > >>> Applied to net-next, but isn't this reduced overhead coming at the
> > > > >>> expense of latency?  What if that lower latency is important to the
> > > > >>> application and/or consumer?
> > > > >> Thats a fair point, but I'd make the counter argument that, as it
> > > > >> currently
> > > > >> stands, any latency introduced (or removed), is an artifact of our
> > > > >> implementation rather than a designed feature of it.  That is to say,
> > > > >> we make no
> > > > >> guarantees at the application level regarding how long it takes to
> > > > >> signal data
> > > > >> readines from the time we get data off the wire, so I would rather see
> > > > >> our
> > > > >> throughput raised if we can, as thats been sctp's more pressing
> > > > >> achilles heel.
> > > > >>
> > > > >>
> > > > >> Thats not to say I'd like to enable lower latency, but I'd rather have
> > > > >> this now,
> > > > >> and start pondering how to design that in.  Perhaps we can convert the
> > > > >> pending
> > > > >> flag to a counter to count the number of events we enqueue, and call
> > > > >> sk_data_ready every  time we reach a sysctl defined threshold.
> > > > > 
> > > > > That and also that there is no chance of the application reading the
> > > > > first chunks before all current ToDo's are performed by either the bh
> > > > > or backlog handlers for that packet. Socket lock won't be cycled in
> > > > > between chunks so the application is going to wait all the processing
> > > > > one way or another.
> > > > 
> > > > But it takes time to signal the wakeup to the remote cpu the process
> > > > was running on, schedule out the current process on that cpu (if it
> > > > has in fact lost it's timeslice), and then finally look at the socket
> > > > queue.
> > > > 
> > > > Of course this is all assuming the process was sleeping in the first
> > > > place, either in recv or more likely poll.
> > > > 
> > > > I really think signalling early helps performance.
> > > > 
> > > 
> > > Early, yes, often, not so much :).  Perhaps what would be adventageous would be
> > > to signal at the start of a set of enqueues, rather than at the end.  That would
> > > be equivalent in terms of not signaling more than needed, but would eliminate
> > > the signaling on every chunk.   Perhaps what you could do Marcelo would be to
> > > change the sense of the signal_ready flag to be a has_signaled flag.  e.g. call
> > > sk_data_ready in ulp_event_tail like we used to, but only if the has_signaled
> > > flag isn't set, then set the flag, and clear it at the end of the command
> > > interpreter.
> > > 
> > > That would be a best of both worlds solution, as long as theres no chance of
> > > race with user space reading from the socket before we were done enqueuing (i.e.
> > > you have to guarantee that the socket lock stays held, which I think we do).
> > 
> > That is my feeling too. Will work on it. Thanks :-)
> 
> I did the change and tested it on real machines set all for performance.
> I couldn't spot any difference between both implementations.
> 
> Set RSS and queue irq affinity for a cpu and taskset netperf and another
> app I wrote to run on another cpu. It hits socket backlog quite often
> but still do direct processing every now and then.
> 
> With current state, netperf, scenario above. Results of perf sched
> record for the CPUs in use, reported by perf sched latency:
> 
>   Task                  |   Runtime ms  | Switches | Average delay ms |
>   Maximum delay ms | Maximum delay at       |
>   netserver:3205        |   9999.490 ms |       10 | avg:    0.003 ms |
>   max:    0.004 ms | max at:  69087.753356 s
> 
> another run
>   netserver:3483        |   9999.412 ms |       15 | avg:    0.003 ms |
>   max:    0.004 ms | max at:  69194.749814 s
> 
> With the patch below, same test:
>   netserver:2643        |  10000.110 ms |       14 | avg:    0.003 ms |
>   max:    0.004 ms | max at:    172.006315 s
> 
> another run:
>   netserver:2698        |  10000.049 ms |       15 | avg:    0.003 ms |
>   max:    0.004 ms | max at:    368.061672 s
> 
> I'll be happy to do more tests if you have any suggestions on how/what
> to test.
> 
> ---8<---
>  
I think this looks reasonable, but can you post it properly please, as a patch
against the head of teh net-next tree, rather than a diff from your previous
work (which wasn't comitted)

Thanks!
Neil

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
  2016-04-29 13:36               ` Neil Horman
@ 2016-04-29 13:47                 ` marcelo.leitner
  2016-04-29 16:10                   ` Neil Horman
  0 siblings, 1 reply; 16+ messages in thread
From: marcelo.leitner @ 2016-04-29 13:47 UTC (permalink / raw)
  To: Neil Horman
  Cc: David Miller, netdev, vyasevich, linux-sctp, David.Laight, jkbs

On Fri, Apr 29, 2016 at 09:36:37AM -0400, Neil Horman wrote:
> On Thu, Apr 28, 2016 at 05:46:59PM -0300, marcelo.leitner@gmail.com wrote:
> > On Thu, Apr 14, 2016 at 05:19:00PM -0300, marcelo.leitner@gmail.com wrote:
> > > On Thu, Apr 14, 2016 at 04:03:51PM -0400, Neil Horman wrote:
> > > > On Thu, Apr 14, 2016 at 02:59:16PM -0400, David Miller wrote:
> > > > > From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > > > Date: Thu, 14 Apr 2016 14:00:49 -0300
> > > > > 
> > > > > > Em 14-04-2016 10:03, Neil Horman escreveu:
> > > > > >> On Wed, Apr 13, 2016 at 11:05:32PM -0400, David Miller wrote:
> > > > > >>> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > > > >>> Date: Fri,  8 Apr 2016 16:41:26 -0300
> > > > > >>>
> > > > > >>>> 1st patch is a preparation for the 2nd. The idea is to not call
> > > > > >>>> ->sk_data_ready() for every data chunk processed while processing
> > > > > >>>> packets but only once before releasing the socket.
> > > > > >>>>
> > > > > >>>> v2: patchset re-checked, small changelog fixes
> > > > > >>>> v3: on patch 2, make use of local vars to make it more readable
> > > > > >>>
> > > > > >>> Applied to net-next, but isn't this reduced overhead coming at the
> > > > > >>> expense of latency?  What if that lower latency is important to the
> > > > > >>> application and/or consumer?
> > > > > >> Thats a fair point, but I'd make the counter argument that, as it
> > > > > >> currently
> > > > > >> stands, any latency introduced (or removed), is an artifact of our
> > > > > >> implementation rather than a designed feature of it.  That is to say,
> > > > > >> we make no
> > > > > >> guarantees at the application level regarding how long it takes to
> > > > > >> signal data
> > > > > >> readines from the time we get data off the wire, so I would rather see
> > > > > >> our
> > > > > >> throughput raised if we can, as thats been sctp's more pressing
> > > > > >> achilles heel.
> > > > > >>
> > > > > >>
> > > > > >> Thats not to say I'd like to enable lower latency, but I'd rather have
> > > > > >> this now,
> > > > > >> and start pondering how to design that in.  Perhaps we can convert the
> > > > > >> pending
> > > > > >> flag to a counter to count the number of events we enqueue, and call
> > > > > >> sk_data_ready every  time we reach a sysctl defined threshold.
> > > > > > 
> > > > > > That and also that there is no chance of the application reading the
> > > > > > first chunks before all current ToDo's are performed by either the bh
> > > > > > or backlog handlers for that packet. Socket lock won't be cycled in
> > > > > > between chunks so the application is going to wait all the processing
> > > > > > one way or another.
> > > > > 
> > > > > But it takes time to signal the wakeup to the remote cpu the process
> > > > > was running on, schedule out the current process on that cpu (if it
> > > > > has in fact lost it's timeslice), and then finally look at the socket
> > > > > queue.
> > > > > 
> > > > > Of course this is all assuming the process was sleeping in the first
> > > > > place, either in recv or more likely poll.
> > > > > 
> > > > > I really think signalling early helps performance.
> > > > > 
> > > > 
> > > > Early, yes, often, not so much :).  Perhaps what would be adventageous would be
> > > > to signal at the start of a set of enqueues, rather than at the end.  That would
> > > > be equivalent in terms of not signaling more than needed, but would eliminate
> > > > the signaling on every chunk.   Perhaps what you could do Marcelo would be to
> > > > change the sense of the signal_ready flag to be a has_signaled flag.  e.g. call
> > > > sk_data_ready in ulp_event_tail like we used to, but only if the has_signaled
> > > > flag isn't set, then set the flag, and clear it at the end of the command
> > > > interpreter.
> > > > 
> > > > That would be a best of both worlds solution, as long as theres no chance of
> > > > race with user space reading from the socket before we were done enqueuing (i.e.
> > > > you have to guarantee that the socket lock stays held, which I think we do).
> > > 
> > > That is my feeling too. Will work on it. Thanks :-)
> > 
> > I did the change and tested it on real machines set all for performance.
> > I couldn't spot any difference between both implementations.
> > 
> > Set RSS and queue irq affinity for a cpu and taskset netperf and another
> > app I wrote to run on another cpu. It hits socket backlog quite often
> > but still do direct processing every now and then.
> > 
> > With current state, netperf, scenario above. Results of perf sched
> > record for the CPUs in use, reported by perf sched latency:
> > 
> >   Task                  |   Runtime ms  | Switches | Average delay ms |
> >   Maximum delay ms | Maximum delay at       |
> >   netserver:3205        |   9999.490 ms |       10 | avg:    0.003 ms |
> >   max:    0.004 ms | max at:  69087.753356 s
> > 
> > another run
> >   netserver:3483        |   9999.412 ms |       15 | avg:    0.003 ms |
> >   max:    0.004 ms | max at:  69194.749814 s
> > 
> > With the patch below, same test:
> >   netserver:2643        |  10000.110 ms |       14 | avg:    0.003 ms |
> >   max:    0.004 ms | max at:    172.006315 s
> > 
> > another run:
> >   netserver:2698        |  10000.049 ms |       15 | avg:    0.003 ms |
> >   max:    0.004 ms | max at:    368.061672 s
> > 
> > I'll be happy to do more tests if you have any suggestions on how/what
> > to test.
> > 
> > ---8<---
> >  
> I think this looks reasonable, but can you post it properly please, as a patch
> against the head of teh net-next tree, rather than a diff from your previous
> work (which wasn't comitted)

The idea was to not officially post it yet, more just as a reference,
because I can't see any gains from it. I'm reluctant just due to that,
no strong opinion here on one way or another.

If you think it's better anyway to signal it early, I'll properly repost
it.

Thanks,
Marcelo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
  2016-04-29 13:47                 ` marcelo.leitner
@ 2016-04-29 16:10                   ` Neil Horman
  2016-04-29 16:28                     ` marcelo.leitner
  0 siblings, 1 reply; 16+ messages in thread
From: Neil Horman @ 2016-04-29 16:10 UTC (permalink / raw)
  To: marcelo.leitner
  Cc: David Miller, netdev, vyasevich, linux-sctp, David.Laight, jkbs

On Fri, Apr 29, 2016 at 10:47:25AM -0300, marcelo.leitner@gmail.com wrote:
> On Fri, Apr 29, 2016 at 09:36:37AM -0400, Neil Horman wrote:
> > On Thu, Apr 28, 2016 at 05:46:59PM -0300, marcelo.leitner@gmail.com wrote:
> > > On Thu, Apr 14, 2016 at 05:19:00PM -0300, marcelo.leitner@gmail.com wrote:
> > > > On Thu, Apr 14, 2016 at 04:03:51PM -0400, Neil Horman wrote:
> > > > > On Thu, Apr 14, 2016 at 02:59:16PM -0400, David Miller wrote:
> > > > > > From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > > > > Date: Thu, 14 Apr 2016 14:00:49 -0300
> > > > > > 
> > > > > > > Em 14-04-2016 10:03, Neil Horman escreveu:
> > > > > > >> On Wed, Apr 13, 2016 at 11:05:32PM -0400, David Miller wrote:
> > > > > > >>> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > > > > >>> Date: Fri,  8 Apr 2016 16:41:26 -0300
> > > > > > >>>
> > > > > > >>>> 1st patch is a preparation for the 2nd. The idea is to not call
> > > > > > >>>> ->sk_data_ready() for every data chunk processed while processing
> > > > > > >>>> packets but only once before releasing the socket.
> > > > > > >>>>
> > > > > > >>>> v2: patchset re-checked, small changelog fixes
> > > > > > >>>> v3: on patch 2, make use of local vars to make it more readable
> > > > > > >>>
> > > > > > >>> Applied to net-next, but isn't this reduced overhead coming at the
> > > > > > >>> expense of latency?  What if that lower latency is important to the
> > > > > > >>> application and/or consumer?
> > > > > > >> Thats a fair point, but I'd make the counter argument that, as it
> > > > > > >> currently
> > > > > > >> stands, any latency introduced (or removed), is an artifact of our
> > > > > > >> implementation rather than a designed feature of it.  That is to say,
> > > > > > >> we make no
> > > > > > >> guarantees at the application level regarding how long it takes to
> > > > > > >> signal data
> > > > > > >> readines from the time we get data off the wire, so I would rather see
> > > > > > >> our
> > > > > > >> throughput raised if we can, as thats been sctp's more pressing
> > > > > > >> achilles heel.
> > > > > > >>
> > > > > > >>
> > > > > > >> Thats not to say I'd like to enable lower latency, but I'd rather have
> > > > > > >> this now,
> > > > > > >> and start pondering how to design that in.  Perhaps we can convert the
> > > > > > >> pending
> > > > > > >> flag to a counter to count the number of events we enqueue, and call
> > > > > > >> sk_data_ready every  time we reach a sysctl defined threshold.
> > > > > > > 
> > > > > > > That and also that there is no chance of the application reading the
> > > > > > > first chunks before all current ToDo's are performed by either the bh
> > > > > > > or backlog handlers for that packet. Socket lock won't be cycled in
> > > > > > > between chunks so the application is going to wait all the processing
> > > > > > > one way or another.
> > > > > > 
> > > > > > But it takes time to signal the wakeup to the remote cpu the process
> > > > > > was running on, schedule out the current process on that cpu (if it
> > > > > > has in fact lost it's timeslice), and then finally look at the socket
> > > > > > queue.
> > > > > > 
> > > > > > Of course this is all assuming the process was sleeping in the first
> > > > > > place, either in recv or more likely poll.
> > > > > > 
> > > > > > I really think signalling early helps performance.
> > > > > > 
> > > > > 
> > > > > Early, yes, often, not so much :).  Perhaps what would be adventageous would be
> > > > > to signal at the start of a set of enqueues, rather than at the end.  That would
> > > > > be equivalent in terms of not signaling more than needed, but would eliminate
> > > > > the signaling on every chunk.   Perhaps what you could do Marcelo would be to
> > > > > change the sense of the signal_ready flag to be a has_signaled flag.  e.g. call
> > > > > sk_data_ready in ulp_event_tail like we used to, but only if the has_signaled
> > > > > flag isn't set, then set the flag, and clear it at the end of the command
> > > > > interpreter.
> > > > > 
> > > > > That would be a best of both worlds solution, as long as theres no chance of
> > > > > race with user space reading from the socket before we were done enqueuing (i.e.
> > > > > you have to guarantee that the socket lock stays held, which I think we do).
> > > > 
> > > > That is my feeling too. Will work on it. Thanks :-)
> > > 
> > > I did the change and tested it on real machines set all for performance.
> > > I couldn't spot any difference between both implementations.
> > > 
> > > Set RSS and queue irq affinity for a cpu and taskset netperf and another
> > > app I wrote to run on another cpu. It hits socket backlog quite often
> > > but still do direct processing every now and then.
> > > 
> > > With current state, netperf, scenario above. Results of perf sched
> > > record for the CPUs in use, reported by perf sched latency:
> > > 
> > >   Task                  |   Runtime ms  | Switches | Average delay ms |
> > >   Maximum delay ms | Maximum delay at       |
> > >   netserver:3205        |   9999.490 ms |       10 | avg:    0.003 ms |
> > >   max:    0.004 ms | max at:  69087.753356 s
> > > 
> > > another run
> > >   netserver:3483        |   9999.412 ms |       15 | avg:    0.003 ms |
> > >   max:    0.004 ms | max at:  69194.749814 s
> > > 
> > > With the patch below, same test:
> > >   netserver:2643        |  10000.110 ms |       14 | avg:    0.003 ms |
> > >   max:    0.004 ms | max at:    172.006315 s
> > > 
> > > another run:
> > >   netserver:2698        |  10000.049 ms |       15 | avg:    0.003 ms |
> > >   max:    0.004 ms | max at:    368.061672 s
> > > 
> > > I'll be happy to do more tests if you have any suggestions on how/what
> > > to test.
> > > 
> > > ---8<---
> > >  
> > I think this looks reasonable, but can you post it properly please, as a patch
> > against the head of teh net-next tree, rather than a diff from your previous
> > work (which wasn't comitted)
> 
> The idea was to not officially post it yet, more just as a reference,
> because I can't see any gains from it. I'm reluctant just due to that,
> no strong opinion here on one way or another.
> 
> If you think it's better anyway to signal it early, I'll properly repost
> it.
> 
Yeah, your results seem to me to indicate that for your test at least, signaling
early vs. late doesn't make alot of difference, but Dave I think made a point in
principle in that allowing processes to wake up when we start enqueuing can be
better in some situations.  So all other things being equal, I'd say go with the
method that you have here.

Best
Neil

> Thanks,
> Marcelo
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible
  2016-04-29 16:10                   ` Neil Horman
@ 2016-04-29 16:28                     ` marcelo.leitner
  0 siblings, 0 replies; 16+ messages in thread
From: marcelo.leitner @ 2016-04-29 16:28 UTC (permalink / raw)
  To: Neil Horman
  Cc: David Miller, netdev, vyasevich, linux-sctp, David.Laight, jkbs

On Fri, Apr 29, 2016 at 12:10:31PM -0400, Neil Horman wrote:
> On Fri, Apr 29, 2016 at 10:47:25AM -0300, marcelo.leitner@gmail.com wrote:
> > On Fri, Apr 29, 2016 at 09:36:37AM -0400, Neil Horman wrote:
> > > On Thu, Apr 28, 2016 at 05:46:59PM -0300, marcelo.leitner@gmail.com wrote:
> > > > On Thu, Apr 14, 2016 at 05:19:00PM -0300, marcelo.leitner@gmail.com wrote:
> > > > > On Thu, Apr 14, 2016 at 04:03:51PM -0400, Neil Horman wrote:
> > > > > > On Thu, Apr 14, 2016 at 02:59:16PM -0400, David Miller wrote:
> > > > > > > From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > > > > > Date: Thu, 14 Apr 2016 14:00:49 -0300
> > > > > > > 
> > > > > > > > Em 14-04-2016 10:03, Neil Horman escreveu:
> > > > > > > >> On Wed, Apr 13, 2016 at 11:05:32PM -0400, David Miller wrote:
> > > > > > > >>> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > > > > > >>> Date: Fri,  8 Apr 2016 16:41:26 -0300
> > > > > > > >>>
> > > > > > > >>>> 1st patch is a preparation for the 2nd. The idea is to not call
> > > > > > > >>>> ->sk_data_ready() for every data chunk processed while processing
> > > > > > > >>>> packets but only once before releasing the socket.
> > > > > > > >>>>
> > > > > > > >>>> v2: patchset re-checked, small changelog fixes
> > > > > > > >>>> v3: on patch 2, make use of local vars to make it more readable
> > > > > > > >>>
> > > > > > > >>> Applied to net-next, but isn't this reduced overhead coming at the
> > > > > > > >>> expense of latency?  What if that lower latency is important to the
> > > > > > > >>> application and/or consumer?
> > > > > > > >> Thats a fair point, but I'd make the counter argument that, as it
> > > > > > > >> currently
> > > > > > > >> stands, any latency introduced (or removed), is an artifact of our
> > > > > > > >> implementation rather than a designed feature of it.  That is to say,
> > > > > > > >> we make no
> > > > > > > >> guarantees at the application level regarding how long it takes to
> > > > > > > >> signal data
> > > > > > > >> readines from the time we get data off the wire, so I would rather see
> > > > > > > >> our
> > > > > > > >> throughput raised if we can, as thats been sctp's more pressing
> > > > > > > >> achilles heel.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Thats not to say I'd like to enable lower latency, but I'd rather have
> > > > > > > >> this now,
> > > > > > > >> and start pondering how to design that in.  Perhaps we can convert the
> > > > > > > >> pending
> > > > > > > >> flag to a counter to count the number of events we enqueue, and call
> > > > > > > >> sk_data_ready every  time we reach a sysctl defined threshold.
> > > > > > > > 
> > > > > > > > That and also that there is no chance of the application reading the
> > > > > > > > first chunks before all current ToDo's are performed by either the bh
> > > > > > > > or backlog handlers for that packet. Socket lock won't be cycled in
> > > > > > > > between chunks so the application is going to wait all the processing
> > > > > > > > one way or another.
> > > > > > > 
> > > > > > > But it takes time to signal the wakeup to the remote cpu the process
> > > > > > > was running on, schedule out the current process on that cpu (if it
> > > > > > > has in fact lost it's timeslice), and then finally look at the socket
> > > > > > > queue.
> > > > > > > 
> > > > > > > Of course this is all assuming the process was sleeping in the first
> > > > > > > place, either in recv or more likely poll.
> > > > > > > 
> > > > > > > I really think signalling early helps performance.
> > > > > > > 
> > > > > > 
> > > > > > Early, yes, often, not so much :).  Perhaps what would be adventageous would be
> > > > > > to signal at the start of a set of enqueues, rather than at the end.  That would
> > > > > > be equivalent in terms of not signaling more than needed, but would eliminate
> > > > > > the signaling on every chunk.   Perhaps what you could do Marcelo would be to
> > > > > > change the sense of the signal_ready flag to be a has_signaled flag.  e.g. call
> > > > > > sk_data_ready in ulp_event_tail like we used to, but only if the has_signaled
> > > > > > flag isn't set, then set the flag, and clear it at the end of the command
> > > > > > interpreter.
> > > > > > 
> > > > > > That would be a best of both worlds solution, as long as theres no chance of
> > > > > > race with user space reading from the socket before we were done enqueuing (i.e.
> > > > > > you have to guarantee that the socket lock stays held, which I think we do).
> > > > > 
> > > > > That is my feeling too. Will work on it. Thanks :-)
> > > > 
> > > > I did the change and tested it on real machines set all for performance.
> > > > I couldn't spot any difference between both implementations.
> > > > 
> > > > Set RSS and queue irq affinity for a cpu and taskset netperf and another
> > > > app I wrote to run on another cpu. It hits socket backlog quite often
> > > > but still do direct processing every now and then.
> > > > 
> > > > With current state, netperf, scenario above. Results of perf sched
> > > > record for the CPUs in use, reported by perf sched latency:
> > > > 
> > > >   Task                  |   Runtime ms  | Switches | Average delay ms |
> > > >   Maximum delay ms | Maximum delay at       |
> > > >   netserver:3205        |   9999.490 ms |       10 | avg:    0.003 ms |
> > > >   max:    0.004 ms | max at:  69087.753356 s
> > > > 
> > > > another run
> > > >   netserver:3483        |   9999.412 ms |       15 | avg:    0.003 ms |
> > > >   max:    0.004 ms | max at:  69194.749814 s
> > > > 
> > > > With the patch below, same test:
> > > >   netserver:2643        |  10000.110 ms |       14 | avg:    0.003 ms |
> > > >   max:    0.004 ms | max at:    172.006315 s
> > > > 
> > > > another run:
> > > >   netserver:2698        |  10000.049 ms |       15 | avg:    0.003 ms |
> > > >   max:    0.004 ms | max at:    368.061672 s
> > > > 
> > > > I'll be happy to do more tests if you have any suggestions on how/what
> > > > to test.
> > > > 
> > > > ---8<---
> > > >  
> > > I think this looks reasonable, but can you post it properly please, as a patch
> > > against the head of teh net-next tree, rather than a diff from your previous
> > > work (which wasn't comitted)
> > 
> > The idea was to not officially post it yet, more just as a reference,
> > because I can't see any gains from it. I'm reluctant just due to that,
> > no strong opinion here on one way or another.
> > 
> > If you think it's better anyway to signal it early, I'll properly repost
> > it.
> > 
> Yeah, your results seem to me to indicate that for your test at least, signaling
> early vs. late doesn't make alot of difference, but Dave I think made a point in
> principle in that allowing processes to wake up when we start enqueuing can be
> better in some situations.  So all other things being equal, I'd say go with the
> method that you have here.

Okay, I'll rebase the patch and post it properly. Thanks Neil!

  Marcelo

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-04-29 16:28 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-08 19:41 [PATCH v3 0/2] sctp: delay calls to sk_data_ready() as much as possible Marcelo Ricardo Leitner
2016-04-08 19:41 ` [PATCH v3 1/2] sctp: compress bit-wide flags to a bitfield on sctp_sock Marcelo Ricardo Leitner
2016-04-12 19:50   ` Neil Horman
2016-04-08 19:41 ` [PATCH v3 2/2] sctp: delay calls to sk_data_ready() as much as possible Marcelo Ricardo Leitner
2016-04-14  3:05 ` [PATCH v3 0/2] " David Miller
2016-04-14 13:03   ` Neil Horman
2016-04-14 17:00     ` Marcelo Ricardo Leitner
2016-04-14 18:59       ` David Miller
2016-04-14 19:33         ` marcelo.leitner
2016-04-14 20:03         ` Neil Horman
2016-04-14 20:19           ` marcelo.leitner
2016-04-28 20:46             ` marcelo.leitner
2016-04-29 13:36               ` Neil Horman
2016-04-29 13:47                 ` marcelo.leitner
2016-04-29 16:10                   ` Neil Horman
2016-04-29 16:28                     ` marcelo.leitner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).