* [PATCH] nf_conntrack_core: Updated nf_conntrack to destroy/refresh conn irrespective of del_timer status
@ 2008-02-25 5:19 Kapil Juneja
2008-02-25 12:11 ` Patrick McHardy
0 siblings, 1 reply; 6+ messages in thread
From: Kapil Juneja @ 2008-02-25 5:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: Kapil Juneja, Emil Medve
Currently NF_CONNTRACK assumes that a running timer is present before refreshing
the connection or destroying it. This may not be the case when, for example,
another forwarding engine hooks up to it to listen to new connections
but disables the NF_CONNTRACK timer in order to have more control.
In such a scenario, only control packets may be terminated to NF_CONNTRACK for
it to decode and update the connection status. It will not impact the present
scenario of kernel forwarding without the aid of any forwarding engine.
Signed-off-by: Kapil Juneja <Kapil.Juneja@freescale.com>
Signed-off-by: Emil Medve <Emilian.Medve@freescale.com>
---
net/netfilter/nf_conntrack_core.c | 16 ++++++++++------
1 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index a4d5cde..2d1f83c 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -791,10 +791,14 @@ void __nf_ct_refresh_acct(struct nf_conn *ct,
/* Only update the timeout if the new timeout is at least
HZ jiffies from the old timeout. Need del_timer for race
avoidance (may already be dying). */
- if (newtime - ct->timeout.expires >= HZ
- && del_timer(&ct->timeout)) {
- ct->timeout.expires = newtime;
- add_timer(&ct->timeout);
+ if (newtime - ct->timeout.expires >= HZ) {
+ /*
+ * The timer could have already been deleted
+ * while still alive (for example connection
+ * offloaded to a forwarding module other than
+ * the kernel stack).
+ */
+ mod_timer(&ct->timeout, newtime);
event = IPCT_REFRESH;
}
}
@@ -928,8 +932,8 @@ nf_ct_iterate_cleanup(int (*iter)(struct nf_conn *i, void *data), void *data)
while ((ct = get_next_corpse(iter, data, &bucket)) != NULL) {
/* Time to push up daises... */
- if (del_timer(&ct->timeout))
- death_by_timeout((unsigned long)ct);
+ del_timer(&ct->timeout);
+ death_by_timeout((unsigned long)ct);
/* ... else the timer will get him soon. */
nf_ct_put(ct);
--
1.5.3.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] nf_conntrack_core: Updated nf_conntrack to destroy/refresh conn irrespective of del_timer status
2008-02-25 5:19 [PATCH] nf_conntrack_core: Updated nf_conntrack to destroy/refresh conn irrespective of del_timer status Kapil Juneja
@ 2008-02-25 12:11 ` Patrick McHardy
2008-02-26 7:39 ` Juneja Kapil
0 siblings, 1 reply; 6+ messages in thread
From: Patrick McHardy @ 2008-02-25 12:11 UTC (permalink / raw)
To: Kapil Juneja; +Cc: netfilter-devel, Emil Medve
Kapil Juneja wrote:
> Currently NF_CONNTRACK assumes that a running timer is present before refreshing
> the connection or destroying it. This may not be the case when, for example,
> another forwarding engine hooks up to it to listen to new connections
> but disables the NF_CONNTRACK timer in order to have more control.
> In such a scenario, only control packets may be terminated to NF_CONNTRACK for
> it to decode and update the connection status. It will not impact the present
> scenario of kernel forwarding without the aid of any forwarding engine.
Do you have a pointer to the code you're talking about?
> + if (newtime - ct->timeout.expires >= HZ) {
> + /*
> + * The timer could have already been deleted
> + * while still alive (for example connection
> + * offloaded to a forwarding module other than
> + * the kernel stack).
> + */
> + mod_timer(&ct->timeout, newtime);
> event = IPCT_REFRESH;
This adds a race, we don't want to update the timer if it already
went off this that means the connection is already destroyed.
Same problem with the other chunk.
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH] nf_conntrack_core: Updated nf_conntrack to destroy/refresh conn irrespective of del_timer status
2008-02-25 12:11 ` Patrick McHardy
@ 2008-02-26 7:39 ` Juneja Kapil
2008-02-27 13:00 ` Patrick McHardy
0 siblings, 1 reply; 6+ messages in thread
From: Juneja Kapil @ 2008-02-26 7:39 UTC (permalink / raw)
To: Patrick McHardy; +Cc: netfilter-devel, Medve Emilian
Hi,
My responses are inlined..
> -----Original Message-----
> From: Patrick McHardy [mailto:kaber@trash.net]
> Sent: Monday, February 25, 2008 5:41 PM
> To: Juneja Kapil
> Cc: netfilter-devel@vger.kernel.org; Medve Emilian
> Subject: Re: [PATCH] nf_conntrack_core: Updated nf_conntrack
> to destroy/refresh conn irrespective of del_timer status
>
> Kapil Juneja wrote:
> > Currently NF_CONNTRACK assumes that a running timer is
> present before refreshing
> > the connection or destroying it. This may not be the case
> when, for example,
> > another forwarding engine hooks up to it to listen to new
> connections
> > but disables the NF_CONNTRACK timer in order to have more control.
> > In such a scenario, only control packets may be terminated
> to NF_CONNTRACK for
> > it to decode and update the connection status. It will not
> impact the present
> > scenario of kernel forwarding without the aid of any
> forwarding engine.
>
> Do you have a pointer to the code you're talking about?
The forwarding engine concept is same as the Grand Unified Flow Cache
idea mooted by Rusty Russel some time back:
http://lwn.net/Articles/194443/
Our architecture runs on three components - Linux NF_CONNTRACK, a
Control (Linux) Module(CM), and a Forwarding Engine or Flow Caching
system (FC):
1) The CM registers itself to the NF_CONNTRACK notifier chain. Whenever
an assured connection notifier event is received, it extracts all the
relevant tuple parameters (Src IP, Dst IP, Protocol, Src Port, Dst Port
etc.) and caches them to the FC. Due to reasons mentioned subsequently,
it also disables the timer for the said conntrack object. The object,
however, remains in the conntrack list as long as it is not destroyed.
3) The FC, sitting at the ethernet driver level, sends all the data
packets belonging to the cached connection directly to the outbound port
(as identified in step 2), bypassing the Linux stack altogether. All the
packets not belonging to either of the cached tuples are terminated to
the Linux stack. Also TCP control packets FIN/RST/SYN are terminated
irrespective of wether the connection is cached or not.
4) With assistance from the FC, the CM also runs aging on the cached
connections (hence the requirement of deleting the NF_CONNTRACK timer in
step 2)
5) Cached connections can be terminated (i.e, removed from cache) in two
ways:
i) Aging out by the CM: In this scenario, the CM removes the
connection tuple from FC as well as NF_CONNTRACK by calling the
corresponding timer destroy function directly.
ii) Destroy via TCP control packet: All the FIN-ACK, RST,
RST-ACK packets are send to conntrack irrespective of the fact that they
match a cached tuple. They are picked up by the TCP conntrack module
which restarts the accounting and refreshes the connection state. It is
at this point that the first chunk of this patch comes into picture.
6) When the NF_CONNTRACK module is removed, it iterates through the list
to destroy the detected connections. Currently, it does not remove those
connections whose timers have gone off (which is the case with
connections cached to FC). This is fixed by the second chunk of the
patch.
>
> > + if (newtime - ct->timeout.expires >= HZ) {
> > + /*
> > + * The timer could have already been deleted
> > + * while still alive (for example connection
> > + * offloaded to a forwarding module other than
> > + * the kernel stack).
> > + */
> > + mod_timer(&ct->timeout, newtime);
> > event = IPCT_REFRESH;
>
> This adds a race, we don't want to update the timer if it already
> went off this that means the connection is already destroyed.
> Same problem with the other chunk.
>
A timer call would have invalidated the conntrack by a call to
'death_by_timeout' (or similar such routine), thereby rendering this
check redundant. Theoretically, I think the check is irrelevant unless
a hypothetical timeout doesn't really invalidate the conntrack. Can you
describe the race scenario mentioned by you?
The other chunk for iterative cleaning of list during module removal
holds true on similar reasoning.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] nf_conntrack_core: Updated nf_conntrack to destroy/refresh conn irrespective of del_timer status
2008-02-26 7:39 ` Juneja Kapil
@ 2008-02-27 13:00 ` Patrick McHardy
2008-02-29 9:26 ` Juneja Kapil
0 siblings, 1 reply; 6+ messages in thread
From: Patrick McHardy @ 2008-02-27 13:00 UTC (permalink / raw)
To: Juneja Kapil; +Cc: netfilter-devel, Medve Emilian
Juneja Kapil wrote:
>> -----Original Message-----
>> From: Patrick McHardy [mailto:kaber@trash.net]
>> Sent: Monday, February 25, 2008 5:41 PM
>> To: Juneja Kapil
>> Cc: netfilter-devel@vger.kernel.org; Medve Emilian
>> Subject: Re: [PATCH] nf_conntrack_core: Updated nf_conntrack
>> to destroy/refresh conn irrespective of del_timer status
>>
>> Kapil Juneja wrote:
>>> Currently NF_CONNTRACK assumes that a running timer is
>> present before refreshing
>>> the connection or destroying it. This may not be the case
>> when, for example,
>>> another forwarding engine hooks up to it to listen to new
>> connections
>>> but disables the NF_CONNTRACK timer in order to have more control.
>>> In such a scenario, only control packets may be terminated
>> to NF_CONNTRACK for
>>> it to decode and update the connection status. It will not
>> impact the present
>>> scenario of kernel forwarding without the aid of any
>> forwarding engine.
>>
>> Do you have a pointer to the code you're talking about?
>
> The forwarding engine concept is same as the Grand Unified Flow Cache
> idea mooted by Rusty Russel some time back:
> http://lwn.net/Articles/194443/
>
> Our architecture runs on three components - Linux NF_CONNTRACK, a
> Control (Linux) Module(CM), and a Forwarding Engine or Flow Caching
> system (FC):
> 1) The CM registers itself to the NF_CONNTRACK notifier chain. Whenever
> an assured connection notifier event is received, it extracts all the
> relevant tuple parameters (Src IP, Dst IP, Protocol, Src Port, Dst Port
> etc.) and caches them to the FC. Due to reasons mentioned subsequently,
> it also disables the timer for the said conntrack object. The object,
> however, remains in the conntrack list as long as it is not destroyed.
> 3) The FC, sitting at the ethernet driver level, sends all the data
> packets belonging to the cached connection directly to the outbound port
> (as identified in step 2), bypassing the Linux stack altogether. All the
> packets not belonging to either of the cached tuples are terminated to
> the Linux stack. Also TCP control packets FIN/RST/SYN are terminated
> irrespective of wether the connection is cached or not.
> 4) With assistance from the FC, the CM also runs aging on the cached
> connections (hence the requirement of deleting the NF_CONNTRACK timer in
> step 2)
> 5) Cached connections can be terminated (i.e, removed from cache) in two
> ways:
> i) Aging out by the CM: In this scenario, the CM removes the
> connection tuple from FC as well as NF_CONNTRACK by calling the
> corresponding timer destroy function directly.
> ii) Destroy via TCP control packet: All the FIN-ACK, RST,
> RST-ACK packets are send to conntrack irrespective of the fact that they
> match a cached tuple. They are picked up by the TCP conntrack module
> which restarts the accounting and refreshes the connection state. It is
> at this point that the first chunk of this patch comes into picture.
> 6) When the NF_CONNTRACK module is removed, it iterates through the list
> to destroy the detected connections. Currently, it does not remove those
> connections whose timers have gone off (which is the case with
> connections cached to FC). This is fixed by the second chunk of the
> patch.
That sounds pretty reasonable. Is that code available somewhere?
>>> + if (newtime - ct->timeout.expires >= HZ) {
>>> + /*
>>> + * The timer could have already been deleted
>>> + * while still alive (for example connection
>>> + * offloaded to a forwarding module other than
>>> + * the kernel stack).
>>> + */
>>> + mod_timer(&ct->timeout, newtime);
>>> event = IPCT_REFRESH;
>> This adds a race, we don't want to update the timer if it already
>> went off this that means the connection is already destroyed.
>> Same problem with the other chunk.
>>
>
> A timer call would have invalidated the conntrack by a call to
> 'death_by_timeout' (or similar such routine), thereby rendering this
> check redundant. Theoretically, I think the check is irrelevant unless
> a hypothetical timeout doesn't really invalidate the conntrack. Can you
> describe the race scenario mentioned by you?
Very simple:
CPU0 CPU1
timer goes off
refresh_timer: mod_timer, rearm death_by_timeout()
timer goes off again
Using del_timer prevents us from rearming the timer if it
already went off.
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH] nf_conntrack_core: Updated nf_conntrack to destroy/refresh conn irrespective of del_timer status
2008-02-27 13:00 ` Patrick McHardy
@ 2008-02-29 9:26 ` Juneja Kapil
2008-02-29 12:23 ` Patrick McHardy
0 siblings, 1 reply; 6+ messages in thread
From: Juneja Kapil @ 2008-02-29 9:26 UTC (permalink / raw)
To: Patrick McHardy; +Cc: netfilter-devel, Medve Emilian
Hi Patrick,
> -----Original Message-----
> From: netfilter-devel-owner@vger.kernel.org
> [mailto:netfilter-devel-owner@vger.kernel.org] On Behalf Of
> Patrick McHardy
> Sent: Wednesday, February 27, 2008 6:31 PM
> To: Juneja Kapil
> Cc: netfilter-devel@vger.kernel.org; Medve Emilian
> Subject: Re: [PATCH] nf_conntrack_core: Updated nf_conntrack
> to destroy/refresh conn irrespective of del_timer status
>
> Juneja Kapil wrote:
> >> -----Original Message-----
> >> From: Patrick McHardy [mailto:kaber@trash.net]
> >> Sent: Monday, February 25, 2008 5:41 PM
> >> To: Juneja Kapil
> >> Cc: netfilter-devel@vger.kernel.org; Medve Emilian
> >> Subject: Re: [PATCH] nf_conntrack_core: Updated nf_conntrack to
> >> destroy/refresh conn irrespective of del_timer status
> >>
> >> Kapil Juneja wrote:
> >>> Currently NF_CONNTRACK assumes that a running timer is
> >> present before refreshing
> >>> the connection or destroying it. This may not be the case
> >> when, for example,
> >>> another forwarding engine hooks up to it to listen to new
> >> connections
> >>> but disables the NF_CONNTRACK timer in order to have more control.
> >>> In such a scenario, only control packets may be terminated
> >> to NF_CONNTRACK for
> >>> it to decode and update the connection status. It will not
> >> impact the present
> >>> scenario of kernel forwarding without the aid of any
> >> forwarding engine.
> >>
> >> Do you have a pointer to the code you're talking about?
> >
> > The forwarding engine concept is same as the Grand Unified
> Flow Cache
> > idea mooted by Rusty Russel some time back:
> > http://lwn.net/Articles/194443/
> >
> > Our architecture runs on three components - Linux NF_CONNTRACK, a
> > Control (Linux) Module(CM), and a Forwarding Engine or Flow Caching
> > system (FC):
> > 1) The CM registers itself to the NF_CONNTRACK notifier chain.
> > Whenever an assured connection notifier event is received,
> it extracts
> > all the relevant tuple parameters (Src IP, Dst IP,
> Protocol, Src Port,
> > Dst Port
> > etc.) and caches them to the FC. Due to reasons mentioned
> > subsequently, it also disables the timer for the said conntrack
> > object. The object, however, remains in the conntrack list
> as long as it is not destroyed.
> > 3) The FC, sitting at the ethernet driver level, sends all the data
> > packets belonging to the cached connection directly to the outbound
> > port (as identified in step 2), bypassing the Linux stack
> altogether.
> > All the packets not belonging to either of the cached tuples are
> > terminated to the Linux stack. Also TCP control packets FIN/RST/SYN
> > are terminated irrespective of wether the connection is
> cached or not.
> > 4) With assistance from the FC, the CM also runs aging on
> the cached
> > connections (hence the requirement of deleting the
> NF_CONNTRACK timer
> > in step 2)
> > 5) Cached connections can be terminated (i.e, removed from
> cache) in
> > two
> > ways:
> > i) Aging out by the CM: In this scenario, the CM removes the
> > connection tuple from FC as well as NF_CONNTRACK by calling the
> > corresponding timer destroy function directly.
> > ii) Destroy via TCP control packet: All the FIN-ACK,
> RST, RST-ACK
> > packets are send to conntrack irrespective of the fact that
> they match
> > a cached tuple. They are picked up by the TCP conntrack
> module which
> > restarts the accounting and refreshes the connection state.
> It is at
> > this point that the first chunk of this patch comes into picture.
> > 6) When the NF_CONNTRACK module is removed, it iterates through the
> > list to destroy the detected connections. Currently, it does not
> > remove those connections whose timers have gone off (which
> is the case
> > with connections cached to FC). This is fixed by the second
> chunk of
> > the patch.
>
>
> That sounds pretty reasonable. Is that code available somewhere?
We are working on licensing aspects and will be glad to share the code
as and when we get approval.
>
> >>> + if (newtime - ct->timeout.expires >= HZ) {
> >>> + /*
> >>> + * The timer could have already been deleted
> >>> + * while still alive (for example connection
> >>> + * offloaded to a forwarding module other than
> >>> + * the kernel stack).
> >>> + */
> >>> + mod_timer(&ct->timeout, newtime);
> >>> event = IPCT_REFRESH;
> >> This adds a race, we don't want to update the timer if it already
> >> went off this that means the connection is already destroyed.
> >> Same problem with the other chunk.
> >>
> >
> > A timer call would have invalidated the conntrack by a call to
> > 'death_by_timeout' (or similar such routine), thereby
> rendering this
> > check redundant. Theoretically, I think the check is irrelevant
> > unless a hypothetical timeout doesn't really invalidate the
> conntrack.
> > Can you describe the race scenario mentioned by you?
>
> Very simple:
>
> CPU0 CPU1
> timer goes off
> refresh_timer: mod_timer, rearm death_by_timeout()
>
> timer goes off again
>
> Using del_timer prevents us from rearming the timer if it
> already went off.
Thanks for the explanation. I was probably only thinking of the non-SMP
scenarios. However, I feel that if this cannot be done this simple way,
then we are in a bit trouble because our need is to make
nf_ct_refresh_acct work independent of the existing timer being dead or
alive.
While we think about a possible alternatives on respin of patch/control
module, can you provide some insight into any other alternatives.
> -
> To unsubscribe from this list: send the line "unsubscribe
> netfilter-devel" in the body of a message to
> majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] nf_conntrack_core: Updated nf_conntrack to destroy/refresh conn irrespective of del_timer status
2008-02-29 9:26 ` Juneja Kapil
@ 2008-02-29 12:23 ` Patrick McHardy
0 siblings, 0 replies; 6+ messages in thread
From: Patrick McHardy @ 2008-02-29 12:23 UTC (permalink / raw)
To: Juneja Kapil; +Cc: netfilter-devel, Medve Emilian
Juneja Kapil wrote:
> Hi Patrick,
>
>> That sounds pretty reasonable. Is that code available somewhere?
>
> We are working on licensing aspects and will be glad to share the code
> as and when we get approval.
>
OK thanks.
>>> Can you describe the race scenario mentioned by you?
>>
>> Very simple:
>>
>> CPU0 CPU1
>> timer goes off
>> refresh_timer: mod_timer, rearm death_by_timeout()
>>
>> timer goes off again
>>
>> Using del_timer prevents us from rearming the timer if it
>> already went off.
>
> Thanks for the explanation. I was probably only thinking of the non-SMP
> scenarios. However, I feel that if this cannot be done this simple way,
> then we are in a bit trouble because our need is to make
> nf_ct_refresh_acct work independent of the existing timer being dead or
> alive.
> While we think about a possible alternatives on respin of patch/control
> module, can you provide some insight into any other alternatives.
The IPS_FIXED_TIMEOUT_BIT should work I guess. It skips
timer updates, but still does accounting (at least in
the current kernel version).
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-02-29 12:23 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-25 5:19 [PATCH] nf_conntrack_core: Updated nf_conntrack to destroy/refresh conn irrespective of del_timer status Kapil Juneja
2008-02-25 12:11 ` Patrick McHardy
2008-02-26 7:39 ` Juneja Kapil
2008-02-27 13:00 ` Patrick McHardy
2008-02-29 9:26 ` Juneja Kapil
2008-02-29 12:23 ` Patrick McHardy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).