DCCP protocol discussions
 help / color / mirror / Atom feed
* [PATCH 15/25]: Disable softirq when sending and notifying CCID
@ 2007-03-21 18:45 Gerrit Renker
  2007-03-26  3:16 ` Ian McDonald
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Gerrit Renker @ 2007-03-21 18:45 UTC (permalink / raw)
  To: dccp

[DCCP]: Disable softirq when sending and notifying CCID

This disables softirqs when performing the CCID-specific send operation
in dccp_write_xmit, so that actual sending, and calling the CCID post-send
routine becomes an atomic unit.

Why this needs to be done:
--------------------------
 The function dccp_write_xmit can be called both in user context (via
 dccp_sendmsg) and via timer softirq (dccp_write_xmit_timer). It does 

  1. call the CCID-specific `pre-send' routine ccid_hc_tx_send_packet()
  2. ship the skb via dccp_transmit_skb
  3. call the CCID-specific `post-send' routine ccid_hc_tx_packet_sent().

 The last one does e.g. accounting by updating data records (as in CCID 3).

 The transition from 2 ... 3 should be atomic and not be interrupted
 by softirqs.  The reason is that the TX and RX halves of the CCID modules
 share data structures and both halves change state. If the sending process is
 allowed to be interrupted by the reception of a DCCP packet via softirq
 handler, then state and data structures of the sender can become corrupted.

 Here is an actual example whose effects were observed and lead to this patch:
 in CCID 3 the sender records a timestamp when ccid_hc_tx_packet_sent() is called.
 If the application is sending via dccp_sendmsg, it may be interrupted and run a
 little while later. Suppose that such interruption happens between steps (2) and
 (3) above: the packet has been sent, and immediately afterwards dccp_sendmsg is
 interrupted. Meanwhile the transmitted skb reaches the other side, and an Ack
 comes back; this Ack is processed via softirq (which is allowed to interrupt 
 dccp_sendmsg); only then step (3) is performed, but too late: the timestamp
 taken in ccid3_hc_tx_packet_sent is now /after/ the Ack has come in.

 In the observed case, negative RTT samples (i.e. Acks arriving before the
 sent packet was registered) were the result.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/output.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -250,11 +250,18 @@ void dccp_write_xmit(struct sock *sk, in
 			else
 				dcb->dccpd_type = DCCP_PKT_DATA;
 
+			/*
+			 * Transmission and calling the post-send CCID operation
+			 * must not be interrupted by other processing (e.g.
+			 * packet reception), otherwise strange errors result.
+			 */
+			local_bh_disable();
 			err = dccp_transmit_skb(sk, skb);
 			ccid_hc_tx_packet_sent(dp->dccps_hc_tx_ccid, sk, 0, len);
-			if (err)
-				DCCP_BUG("err=%d after ccid_hc_tx_packet_sent",
-					 err);
+			local_bh_enable();
+
+			if (unlikely(err))
+				DCCP_BUG("dccp_transmit_skb returned %d", err);
 		} else {
 			dccp_pr_debug("packet discarded due to err=%d\n", err);
 			kfree_skb(skb);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 15/25]: Disable softirq when sending and notifying CCID
  2007-03-21 18:45 [PATCH 15/25]: Disable softirq when sending and notifying CCID Gerrit Renker
@ 2007-03-26  3:16 ` Ian McDonald
  2007-04-13 12:14 ` Arnaldo Carvalho de Melo
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Ian McDonald @ 2007-03-26  3:16 UTC (permalink / raw)
  To: dccp

On 3/22/07, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
> [DCCP]: Disable softirq when sending and notifying CCID
>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
-- 
Web: http://wand.net.nz/~iam4
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 15/25]: Disable softirq when sending and notifying CCID
  2007-03-21 18:45 [PATCH 15/25]: Disable softirq when sending and notifying CCID Gerrit Renker
  2007-03-26  3:16 ` Ian McDonald
@ 2007-04-13 12:14 ` Arnaldo Carvalho de Melo
  2007-04-13 12:39 ` Gerrit Renker
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-04-13 12:14 UTC (permalink / raw)
  To: dccp

On 3/21/07, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
> [DCCP]: Disable softirq when sending and notifying CCID
>
> This disables softirqs when performing the CCID-specific send operation
> in dccp_write_xmit, so that actual sending, and calling the CCID post-send
> routine becomes an atomic unit.
>
> Why this needs to be done:
> --------------------------
>  The function dccp_write_xmit can be called both in user context (via
>  dccp_sendmsg) and via timer softirq (dccp_write_xmit_timer). It does
>
>   1. call the CCID-specific `pre-send' routine ccid_hc_tx_send_packet()
>   2. ship the skb via dccp_transmit_skb
>   3. call the CCID-specific `post-send' routine ccid_hc_tx_packet_sent().
>
>  The last one does e.g. accounting by updating data records (as in CCID 3).
>
>  The transition from 2 ... 3 should be atomic and not be interrupted
>  by softirqs.  The reason is that the TX and RX halves of the CCID modules
>  share data structures and both halves change state. If the sending process is
>  allowed to be interrupted by the reception of a DCCP packet via softirq
>  handler, then state and data structures of the sender can become corrupted.
>
>  Here is an actual example whose effects were observed and lead to this patch:
>  in CCID 3 the sender records a timestamp when ccid_hc_tx_packet_sent() is called.
>  If the application is sending via dccp_sendmsg, it may be interrupted and run a
>  little while later. Suppose that such interruption happens between steps (2) and
>  (3) above: the packet has been sent, and immediately afterwards dccp_sendmsg is
>  interrupted. Meanwhile the transmitted skb reaches the other side, and an Ack
>  comes back; this Ack is processed via softirq (which is allowed to interrupt
>  dccp_sendmsg); only then step (3) is performed, but too late: the timestamp
>  taken in ccid3_hc_tx_packet_sent is now /after/ the Ack has come in.
>
>  In the observed case, negative RTT samples (i.e. Acks arriving before the
>  sent packet was registered) were the result.

Gerrit,

     I looked at this one now and there and I noticed that we already
take a timestamp in the dccp_insert_options code path, so I think that
the right fix is to avoid taking two timestamps and using the one
taken at insert option time, i.e. when it gets to
dccp_hc_tx_packet_sent time we already have a timestamp in the packet,
i.e. avoid taking the timestamp after the packet is sent, just like
you did for the REQUEST case in dccp_insert_options, makes sense?

- Arnaldo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 15/25]: Disable softirq when sending and notifying CCID
  2007-03-21 18:45 [PATCH 15/25]: Disable softirq when sending and notifying CCID Gerrit Renker
  2007-03-26  3:16 ` Ian McDonald
  2007-04-13 12:14 ` Arnaldo Carvalho de Melo
@ 2007-04-13 12:39 ` Gerrit Renker
  2007-04-13 13:04 ` Arnaldo Carvalho de Melo
  2007-04-13 14:06 ` Gerrit Renker
  4 siblings, 0 replies; 6+ messages in thread
From: Gerrit Renker @ 2007-04-13 12:39 UTC (permalink / raw)
  To: dccp

Quoting Arnaldo Carvalho de Melo:
|  On 3/21/07, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
|  > [DCCP]: Disable softirq when sending and notifying CCID
|  >
|  > This disables softirqs when performing the CCID-specific send operation
|  > in dccp_write_xmit, so that actual sending, and calling the CCID post-send
|  > routine becomes an atomic unit.
|  >
|  > Why this needs to be done:
|  > --------------------------
|  >  The function dccp_write_xmit can be called both in user context (via
|  >  dccp_sendmsg) and via timer softirq (dccp_write_xmit_timer). It does
|  >
|  >   1. call the CCID-specific `pre-send' routine ccid_hc_tx_send_packet()
|  >   2. ship the skb via dccp_transmit_skb
|  >   3. call the CCID-specific `post-send' routine ccid_hc_tx_packet_sent().
|  >
|  >  The last one does e.g. accounting by updating data records (as in CCID 3).
|  >
|  >  The transition from 2 ... 3 should be atomic and not be interrupted
|  >  by softirqs.  The reason is that the TX and RX halves of the CCID modules
|  >  share data structures and both halves change state. If the sending process is
|  >  allowed to be interrupted by the reception of a DCCP packet via softirq
|  >  handler, then state and data structures of the sender can become corrupted.
|  >
|  >  Here is an actual example whose effects were observed and lead to this patch:
|  >  in CCID 3 the sender records a timestamp when ccid_hc_tx_packet_sent() is called.
|  >  If the application is sending via dccp_sendmsg, it may be interrupted and run a
|  >  little while later. Suppose that such interruption happens between steps (2) and
|  >  (3) above: the packet has been sent, and immediately afterwards dccp_sendmsg is
|  >  interrupted. Meanwhile the transmitted skb reaches the other side, and an Ack
|  >  comes back; this Ack is processed via softirq (which is allowed to interrupt
|  >  dccp_sendmsg); only then step (3) is performed, but too late: the timestamp
|  >  taken in ccid3_hc_tx_packet_sent is now /after/ the Ack has come in.
|  >
|  >  In the observed case, negative RTT samples (i.e. Acks arriving before the
|  >  sent packet was registered) were the result.
|  
|  Gerrit,
|  
|       I looked at this one now and there and I noticed that we already
|  take a timestamp in the dccp_insert_options code path, so I think that
|  the right fix is to avoid taking two timestamps and using the one
|  taken at insert option time, i.e. when it gets to
|  dccp_hc_tx_packet_sent time we already have a timestamp in the packet,
|  i.e. avoid taking the timestamp after the packet is sent, just like
|  you did for the REQUEST case in dccp_insert_options, makes sense?
|  
I think you are making a good point - there is potential to integrate the use of
timestamps (I was thinking of the icsk_ack.lrcvtime field to keep a 32-bit timestamp).

I have a problem here - where in the code path do you mean? The only place dccp_insert_option_timestamp
is called is in ccid3_hc_rx_insert_options. Maybe I am confusing something here.

Is your intention to re-enable softirqs? I am quite a bit uneasy about it. When observing the
negative timestamps, I triggered a dump_stack. This kind of problem always happened when release_sock
called the backlog handler. 
There is so much interplay going on between CCID3 receiver and transmitter side, several timers, two
different history structures, that I think it is safer to disable softirqs. What I haven't tested yet
but which I think will bring further complications is when we use e.g. CCID3 for the transmitter path,
and CCID2 (or CCID4) for the reverse path. 
Also with the bidirectional communication - it is sometimes like a little firework of timers going
of everywhere (nofeedback timer, delack timer, response timer and all this for two independent data
paths).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 15/25]: Disable softirq when sending and notifying CCID
  2007-03-21 18:45 [PATCH 15/25]: Disable softirq when sending and notifying CCID Gerrit Renker
                   ` (2 preceding siblings ...)
  2007-04-13 12:39 ` Gerrit Renker
@ 2007-04-13 13:04 ` Arnaldo Carvalho de Melo
  2007-04-13 14:06 ` Gerrit Renker
  4 siblings, 0 replies; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-04-13 13:04 UTC (permalink / raw)
  To: dccp

On 4/13/07, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
> Quoting Arnaldo Carvalho de Melo:
> |  On 3/21/07, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
> |  > [DCCP]: Disable softirq when sending and notifying CCID
> |  >
> |  > This disables softirqs when performing the CCID-specific send operation
> |  > in dccp_write_xmit, so that actual sending, and calling the CCID post-send
> |  > routine becomes an atomic unit.
> |  >
> |  > Why this needs to be done:
> |  > --------------------------
> |  >  The function dccp_write_xmit can be called both in user context (via
> |  >  dccp_sendmsg) and via timer softirq (dccp_write_xmit_timer). It does
> |  >
> |  >   1. call the CCID-specific `pre-send' routine ccid_hc_tx_send_packet()
> |  >   2. ship the skb via dccp_transmit_skb
> |  >   3. call the CCID-specific `post-send' routine ccid_hc_tx_packet_sent().
> |  >
> |  >  The last one does e.g. accounting by updating data records (as in CCID 3).
> |  >
> |  >  The transition from 2 ... 3 should be atomic and not be interrupted
> |  >  by softirqs.  The reason is that the TX and RX halves of the CCID modules
> |  >  share data structures and both halves change state. If the sending process is
> |  >  allowed to be interrupted by the reception of a DCCP packet via softirq
> |  >  handler, then state and data structures of the sender can become corrupted.
> |  >
> |  >  Here is an actual example whose effects were observed and lead to this patch:
> |  >  in CCID 3 the sender records a timestamp when ccid_hc_tx_packet_sent() is called.
> |  >  If the application is sending via dccp_sendmsg, it may be interrupted and run a
> |  >  little while later. Suppose that such interruption happens between steps (2) and
> |  >  (3) above: the packet has been sent, and immediately afterwards dccp_sendmsg is
> |  >  interrupted. Meanwhile the transmitted skb reaches the other side, and an Ack
> |  >  comes back; this Ack is processed via softirq (which is allowed to interrupt
> |  >  dccp_sendmsg); only then step (3) is performed, but too late: the timestamp
> |  >  taken in ccid3_hc_tx_packet_sent is now /after/ the Ack has come in.
> |  >
> |  >  In the observed case, negative RTT samples (i.e. Acks arriving before the
> |  >  sent packet was registered) were the result.
> |
> |  Gerrit,
> |
> |       I looked at this one now and there and I noticed that we already
> |  take a timestamp in the dccp_insert_options code path, so I think that
> |  the right fix is to avoid taking two timestamps and using the one
> |  taken at insert option time, i.e. when it gets to
> |  dccp_hc_tx_packet_sent time we already have a timestamp in the packet,
> |  i.e. avoid taking the timestamp after the packet is sent, just like
> |  you did for the REQUEST case in dccp_insert_options, makes sense?
> |
> I think you are making a good point - there is potential to integrate the use of
> timestamps (I was thinking of the icsk_ack.lrcvtime field to keep a 32-bit timestamp).

icsk is per socket, we need something per packet, that will be put
into the DCCP_OPT_TIMESTAMP and used in the packet history.

> I have a problem here - where in the code path do you mean? The only place dccp_insert_option_timestamp
> is called is in ccid3_hc_rx_insert_options. Maybe I am confusing something here.

dccp_transmit_skb
  dccp_insert_options
      dccp_insert_timestamp (now just for the REQUEST packet, but we
can change this)

> Is your intention to re-enable softirqs?

Intention is to reduce the window when softirqs are disabled, if they
need to be disabled at all.

> I am quite a bit uneasy about it. When observing the
> negative timestamps, I triggered a dump_stack. This kind of problem always happened when release_sock
> called the backlog handler.

The problem you mentioned is that we want to put it into the TX
history only when we know for sure that the packet was sent, and after
we send the packet and before we put it into TX history we could be
softirq interrupted and process an ACK for this exact packet (which
_seems_ unlikely), so we would need to disable softirqs to make the
sequence:

    err = icsk->icsk_af_ops->queue_xmit(skb, 0);
    dccp_tx_packet_sent()

Uninterrupted by softirq, so it seems we need to disable softirq just
on this smaller window, no?

But how could that happen? If we receive the ACK before the packet was
put into TX history we will not find it in the TX history in the first
place, no?

I.e. if we are interrupted before putting the packet into the TX
history and the ACK is for another packet and this other packet is in
the TX history we won't get negative RTTs.

Please bear with me, I'm just trying to get back at Net/DCCP
development, may have said something stupid as I don't have that much
time right now for checking all this in detail.

> There is so much interplay going on between CCID3 receiver and transmitter side, several timers, two
> different history structures, that I think it is safer to disable softirqs. What I haven't tested yet
> but which I think will bring further complications is when we use e.g. CCID3 for the transmitter path,
> and CCID2 (or CCID4) for the reverse path.
> Also with the bidirectional communication - it is sometimes like a little firework of timers going
> of everywhere (nofeedback timer, delack timer, response timer and all this for two independent data
> paths).

Will have to look at this in more detail, perhaps in the weekend.

- Arnaldo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 15/25]: Disable softirq when sending and notifying CCID
  2007-03-21 18:45 [PATCH 15/25]: Disable softirq when sending and notifying CCID Gerrit Renker
                   ` (3 preceding siblings ...)
  2007-04-13 13:04 ` Arnaldo Carvalho de Melo
@ 2007-04-13 14:06 ` Gerrit Renker
  4 siblings, 0 replies; 6+ messages in thread
From: Gerrit Renker @ 2007-04-13 14:06 UTC (permalink / raw)
  To: dccp

|  > I think you are making a good point - there is potential to integrate the use of
|  > timestamps (I was thinking of the icsk_ack.lrcvtime field to keep a 32-bit timestamp).
|  
|  icsk is per socket, we need something per packet, that will be put
|  into the DCCP_OPT_TIMESTAMP and used in the packet history.
My idea was to use this field always for the last-received timestamp from the other end,
so that it would have the meaning `last time a data packet with timestamp was received'.

|  > I have a problem here - where in the code path do you mean? The only place dccp_insert_option_timestamp
|  > is called is in ccid3_hc_rx_insert_options. Maybe I am confusing something here.
|  
|  dccp_transmit_skb
|    dccp_insert_options
|        dccp_insert_timestamp (now just for the REQUEST packet, but we
|  can change this)
Ah I see - but that means a timestamp per every packet sent. CCID2 for instance does not need
timestamps. I think that the specific problem of negative RTTs will no longer apply since now
the timestamp is taken after everything else has happened - including socket (sk) lock times.



|  > Is your intention to re-enable softirqs?
|  
|  Intention is to reduce the window when softirqs are disabled, if they
|  need to be disabled at all.
|  
|  > I am quite a bit uneasy about it. When observing the
|  > negative timestamps, I triggered a dump_stack. This kind of problem always happened when release_sock
|  > called the backlog handler.
|  
|  The problem you mentioned is that we want to put it into the TX
|  history only when we know for sure that the packet was sent, and after
|  we send the packet and before we put it into TX history we could be
|  softirq interrupted and process an ACK for this exact packet (which
|  _seems_ unlikely), so we would need to disable softirqs to make the
|  sequence:
|  
|      err = icsk->icsk_af_ops->queue_xmit(skb, 0);
|      dccp_tx_packet_sent()
|  
|  Uninterrupted by softirq, so it seems we need to disable softirq just
|  on this smaller window, no?
I think you mention a second problem, which is independent of the synchronisation problem that this 
patch aims at tackling. So if I understand you correctly, do you mean the following in dccp_write_xmit?
	err = icsk->icsk_af_ops->queue_xmit(skb, 0);   
	if (net_xmit_eval(err) = 0)
		dccp_tx_packet_sent();
The problem I found with this second issue is that at this time skb_dequeue(&sk->sk_write_queue) has
already been called, so for all accounting purposes the packet is written off as `sent'. The CCID 
infrastructure had also given green light via ccid_hc_tx_send_packet(dp->dccps_hc_tx_ccid, sk, skb),
so telling it not to consider the packet as sent would give it the wrong information. The problem is
benign, I think, since it will come up as loss; so that at the end of the day the CCID infrastructure
will register it. (Also the case seems very rare under normal conditions).

 
|  But how could that happen? If we receive the ACK before the packet was
|  put into TX history we will not find it in the TX history in the first
|  place, no?
My admittedly modest understanding of NAPI and the way the packets are processed in the receive path
is that the simple idea of 'A sends Data/DataAck and B sends an Ack back' never happens in reality as
described.  More than one packets are dequeued from the hardware ring buffer of the interface. 

So when we allow softirq processing to happen in parallel with the send operation then I am concerned that
this will mess up the interaction of receive-CCID and send-CCID.
Can the case that sending happens on one active CPU and concurrent receive happens on another CPU in softirq
be dismissed?

Since the tx queue can be long, the time that the sk lock is held can potentially be quite long.
Especially when dccp_wait_for_ccid() always returns 0; then the sk lock is held until the entire queue
has been emptied.

I had initially put the local_bh_disable() earlier, the setting used in the patch was the smallest possible
time window I could find. 

|  I.e. if we are interrupted before putting the packet into the TX
|  history and the ACK is for another packet and this other packet is in
|  the TX history we won't get negative RTTs.
|  
|  Please bear with me, I'm just trying to get back at Net/DCCP
|  development, may have said something stupid as I don't have that much
|  time right now for checking all this in detail.
Part of the patch is a precaution, so if we can be 100% sure that sending and receiving a burst of 
packets never happens in parallel, then it is probably safe to do without disabling softirqs here.
But if it is not, then this would allow nasty timing problems.
I think this is more of a SMP / NAPI problem.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-04-13 14:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-21 18:45 [PATCH 15/25]: Disable softirq when sending and notifying CCID Gerrit Renker
2007-03-26  3:16 ` Ian McDonald
2007-04-13 12:14 ` Arnaldo Carvalho de Melo
2007-04-13 12:39 ` Gerrit Renker
2007-04-13 13:04 ` Arnaldo Carvalho de Melo
2007-04-13 14:06 ` Gerrit Renker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox