* what's the lockingrules for ip_conntrack_expect_list? @ 2002-10-10 21:40 Martin Josefsson 2002-10-11 13:06 ` Jozsef Kadlecsik 0 siblings, 1 reply; 19+ messages in thread From: Martin Josefsson @ 2002-10-10 21:40 UTC (permalink / raw) To: Netfilter-devel; +Cc: Harald Welte Hi, I've been going through the latest bugreport about failed ASSERT's... It's the usual suspects, exp_for_packet that calls ip_ct_find_proto instead of __find_proto, this isn't a bug, it's just that the debug-macros can't handle it, we've been over this before... But here's the real question... What's the lockingrules for ip_conntrack_expect_list? I've gone through 2.4.20-pre8aa2 which was the kernel the bugreport was from. And I've found some inconsistencies. here's a list of all functions that deals with ip_conntrack_expect_list and which locks they are holding when they do so, and the specific operation they perform on the list. ip_conntrack_expect_find_get READ_LOCK(&ip_conntrack_lock); READ_LOCK(&ip_conntrack_expect_tuple_lock); __ip_ct_expect_find MUST_BE_READ_LOCKED(&ip_conntrack_lock); MUST_BE_READ_LOCKED(&ip_conntrack_expect_tuple_lock); LIST_FIND death_by_timeout WRITE_LOCK(&ip_conntrack_lock); clean_from_lists MUST_BE_WRITE_LOCKED(&ip_conntrack_lock); remove_expectations list_inlist init_conntrack first: WRITE_LOCK(&ip_conntrack_lock); READ_LOCK(&ip_conntrack_expect_tuple_lock); LIST_FIND later: WRITE_LOCK(&ip_conntrack_lock); LIST_DELETE ip_conntrack_expect_related WRITE_LOCK(&ip_conntrack_lock); LIST_FIND LIST_FIND list_prepend ip_conntrack_change_expect MUST_BE_READ_LOCKED(&ip_conntrack_lock); LIST_FIND FAILS!! called from nat-helpers which only hold their own locks if any at all. So what's the story? sometimes we hold only ip_conntrack_lock and sometimes we hold that plus ip_conntrack_expect_tuple_lock and now I've found out that sometimes we never hold any lock. Give me a hint and I'll write up a patch to fix it. -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-10 21:40 what's the lockingrules for ip_conntrack_expect_list? Martin Josefsson @ 2002-10-11 13:06 ` Jozsef Kadlecsik 2002-10-11 13:56 ` Martin Josefsson 2002-10-12 12:17 ` Harald Welte 0 siblings, 2 replies; 19+ messages in thread From: Jozsef Kadlecsik @ 2002-10-11 13:06 UTC (permalink / raw) To: Martin Josefsson; +Cc: Netfilter-devel, Harald Welte Hi Martin, On 10 Oct 2002, Martin Josefsson wrote: > What's the lockingrules for ip_conntrack_expect_list? As far as I remember, the guiding rules were the following: ip_conntrack_lock is always held when we do something with the expectations. If it's write-locked, then there is no need for additional locking. If it's read-locked, then we use the additional ip_conntrack_expect_tuple_lock to protect the expectation lists. > I've gone through 2.4.20-pre8aa2 which was the kernel the bugreport was > from. And I've found some inconsistencies. > init_conntrack > first: > WRITE_LOCK(&ip_conntrack_lock); > READ_LOCK(&ip_conntrack_expect_tuple_lock); > LIST_FIND Read-locking ip_conntrack_expect_tuple_lock is unnecessary here then. > later: > WRITE_LOCK(&ip_conntrack_lock); > LIST_DELETE > > ip_conntrack_expect_related > WRITE_LOCK(&ip_conntrack_lock); > LIST_FIND > LIST_FIND > list_prepend > > ip_conntrack_change_expect > MUST_BE_READ_LOCKED(&ip_conntrack_lock); > LIST_FIND > > FAILS!! called from nat-helpers which only hold their own locks if any > at all. Well spotted! It's a locking bug then. Regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-11 13:06 ` Jozsef Kadlecsik @ 2002-10-11 13:56 ` Martin Josefsson 2002-10-11 14:07 ` Jozsef Kadlecsik 2002-10-12 12:17 ` Harald Welte 1 sibling, 1 reply; 19+ messages in thread From: Martin Josefsson @ 2002-10-11 13:56 UTC (permalink / raw) To: Jozsef Kadlecsik; +Cc: Netfilter-devel, Harald Welte On Fri, 2002-10-11 at 15:06, Jozsef Kadlecsik wrote: > Hi Martin, Hi Jozsef, > As far as I remember, the guiding rules were the following: > ip_conntrack_lock is always held when we do something with the > expectations. If it's write-locked, then there is no need for additional > locking. If it's read-locked, then we use the additional > ip_conntrack_expect_tuple_lock to protect the expectation lists. Ahh that makes sense. But is it really neccessary to hold ip_conntrack_lock when we are only going to change an expectation and not touch anything else? If we enforce that we always hold a readlock or writelock on ip_conntrack_expect_tuple_lock when we touch it we should be safe? I think Rusty has changed the lockingrules in his conntrack-optimization patch to only require that we hold the ip_conntrack_expect_tuple_lock. I'll look at Rusty's patch and make two versions of this fix, one for current conntrack and one for his patch (if the bug exists there). > > ip_conntrack_change_expect > > MUST_BE_READ_LOCKED(&ip_conntrack_lock); > > LIST_FIND > > > > FAILS!! called from nat-helpers which only hold their own locks if any > > at all. > > Well spotted! It's a locking bug then. I just examined a bugreport :) Thanks for the explanation. -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-11 13:56 ` Martin Josefsson @ 2002-10-11 14:07 ` Jozsef Kadlecsik 2002-10-11 18:42 ` [PATCH] " Martin Josefsson 0 siblings, 1 reply; 19+ messages in thread From: Jozsef Kadlecsik @ 2002-10-11 14:07 UTC (permalink / raw) To: Martin Josefsson; +Cc: Netfilter-devel, Harald Welte On 11 Oct 2002, Martin Josefsson wrote: > But is it really neccessary to hold ip_conntrack_lock when we are only > going to change an expectation and not touch anything else? > If we enforce that we always hold a readlock or writelock on > ip_conntrack_expect_tuple_lock when we touch it we should be safe? We wanted to avoid possible cross-locking bugs by keeping a strict hierarchy of the locks. Sometimes too complicated functions are called while a lock is held. > I think Rusty has changed the lockingrules in his conntrack-optimization > patch to only require that we hold the ip_conntrack_expect_tuple_lock. The other way around: he removed ip_conntrack_expect_tuple_lock completely. I think that is the proper way, but it implies that later the fine-grained locking is introduced in order to remove the dependency of everything on the single ip_conntrack_lock. Regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-11 14:07 ` Jozsef Kadlecsik @ 2002-10-11 18:42 ` Martin Josefsson 2002-10-12 12:25 ` Harald Welte 0 siblings, 1 reply; 19+ messages in thread From: Martin Josefsson @ 2002-10-11 18:42 UTC (permalink / raw) To: Jozsef Kadlecsik; +Cc: Netfilter-devel, Harald Welte [-- Attachment #1: Type: text/plain, Size: 1725 bytes --] On Fri, 2002-10-11 at 16:07, Jozsef Kadlecsik wrote: > On 11 Oct 2002, Martin Josefsson wrote: > > > But is it really neccessary to hold ip_conntrack_lock when we are only > > going to change an expectation and not touch anything else? > > If we enforce that we always hold a readlock or writelock on > > ip_conntrack_expect_tuple_lock when we touch it we should be safe? > > We wanted to avoid possible cross-locking bugs by keeping a strict > hierarchy of the locks. Sometimes too complicated functions are called > while a lock is held. Ok. Patch for ip_conntrack_change_expect is attached. Harald, is this something you would accept and forward for inclusion? And while we're at it, I still think we should change exp_for_packet to use __find_proto instead of ip_ct_find_proto to avoid the ASSERT's that people are complaining about. Sure they are harmless but users don't know that and send reports. (this problem goes away with Rustys patches for 2.5, exp_for_packet is completely removed) > > I think Rusty has changed the lockingrules in his conntrack-optimization > > patch to only require that we hold the ip_conntrack_expect_tuple_lock. > > The other way around: he removed ip_conntrack_expect_tuple_lock > completely. I think that is the proper way, but it implies that later > the fine-grained locking is introduced in order to remove the dependency > of everything on the single ip_conntrack_lock. Yes I saw that when I read the patch. That change combined with the speedup patch will be nice. I found two small locking-bugs in those patches aswell, I'll send Rusty some patches in a short while. -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. [-- Attachment #2: ip_conntrack_change_expect-locking.diff --] [-- Type: text/plain, Size: 2161 bytes --] --- linux-2.4.20-pre10.orig/net/ipv4/netfilter/ip_conntrack_core.c 2002-10-11 15:48:28.000000000 +0200 +++ linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_core.c 2002-10-11 20:04:42.000000000 +0200 @@ -695,10 +695,8 @@ WRITE_LOCK(&ip_conntrack_lock); /* Need finding and deleting of expected ONLY if we win race */ - READ_LOCK(&ip_conntrack_expect_tuple_lock); expected = LIST_FIND(&ip_conntrack_expect_list, expect_cmp, struct ip_conntrack_expect *, tuple); - READ_UNLOCK(&ip_conntrack_expect_tuple_lock); /* Look up the conntrack helper for master connections only */ if (!expected) @@ -1060,7 +1058,7 @@ int ip_conntrack_change_expect(struct ip_conntrack_expect *expect, struct ip_conntrack_tuple *newtuple) { - MUST_BE_READ_LOCKED(&ip_conntrack_lock); + int ret; DEBUGP("change_expect:\n"); DEBUGP("exp tuple: "); DUMP_TUPLE(&expect->tuple); @@ -1069,30 +1067,32 @@ if (expect->ct_tuple.dst.protonum == 0) { /* Never seen before */ DEBUGP("change expect: never seen before\n"); + READ_LOCK(&ip_conntrack_lock); + WRITE_LOCK(&ip_conntrack_expect_tuple_lock); if (!ip_ct_tuple_equal(&expect->tuple, newtuple) && LIST_FIND(&ip_conntrack_expect_list, expect_clash, struct ip_conntrack_expect *, newtuple, &expect->mask)) { /* Force NAT to find an unused tuple */ - return -1; + ret = -1; } else { - WRITE_LOCK(&ip_conntrack_expect_tuple_lock); memcpy(&expect->ct_tuple, &expect->tuple, sizeof(expect->tuple)); memcpy(&expect->tuple, newtuple, sizeof(expect->tuple)); - WRITE_UNLOCK(&ip_conntrack_expect_tuple_lock); - return 0; + ret = 0; } + WRITE_UNLOCK(&ip_conntrack_expect_tuple_lock); + READ_UNLOCK(&ip_conntrack_lock); } else { /* Resent packet */ DEBUGP("change expect: resent packet\n"); if (ip_ct_tuple_equal(&expect->tuple, newtuple)) { - return 0; + ret = 0; } else { /* Force NAT to choose again the same port */ - return -1; + ret = -1; } } - return -1; + return ret; } /* Alter reply tuple (maybe alter helper). If it's already taken, ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-11 18:42 ` [PATCH] " Martin Josefsson @ 2002-10-12 12:25 ` Harald Welte 2002-10-12 13:11 ` Martin Josefsson 0 siblings, 1 reply; 19+ messages in thread From: Harald Welte @ 2002-10-12 12:25 UTC (permalink / raw) To: Martin Josefsson; +Cc: Jozsef Kadlecsik, Netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1337 bytes --] On Fri, Oct 11, 2002 at 08:42:51PM +0200, Martin Josefsson wrote: > Harald, is this something you would accept and forward for inclusion? See my other mail, I don't agree with everything said in the thread before. Sorry for my late comments, I should have sent that mail two days earlier. > And while we're at it, I still think we should change exp_for_packet to > use __find_proto instead of ip_ct_find_proto to avoid the ASSERT's that > people are complaining about. It's always ugly to export a non-locking __foo function as API to another module. > Sure they are harmless but users don't know that and send reports. (this > problem goes away with Rustys patches for 2.5, exp_for_packet is completely > removed) Yes I agree with the problem, but not with the style of the solution. *sigh*. __find_proto should be inlined anyway. So if we put it into a header file, which is included by conntrack and nat (ip_conntrack.h), we would avoid exporting it as a symbol. > -- > /Martin -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ "If this were a dictatorship, it'd be a heck of a lot easier, just so long as I'm the dictator." -- George W. Bush Dec 18, 2000 [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-12 12:25 ` Harald Welte @ 2002-10-12 13:11 ` Martin Josefsson 0 siblings, 0 replies; 19+ messages in thread From: Martin Josefsson @ 2002-10-12 13:11 UTC (permalink / raw) To: Harald Welte; +Cc: Jozsef Kadlecsik, Netfilter-devel On Sat, 2002-10-12 at 14:25, Harald Welte wrote: > See my other mail, I don't agree with everything said in the thread before. > Sorry for my late comments, I should have sent that mail two days earlier. No problem, I'm happy that you cleared things up. > It's always ugly to export a non-locking __foo function as API to another > module. Yes I know, but we need to do something about this. > > Sure they are harmless but users don't know that and send reports. (this > > problem goes away with Rustys patches for 2.5, exp_for_packet is completely > > removed) > > Yes I agree with the problem, but not with the style of the solution. > *sigh*. > > __find_proto should be inlined anyway. So if we put it into a header file, > which is included by conntrack and nat (ip_conntrack.h), we would avoid > exporting it as a symbol. I'll whip up a patch. -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-11 13:06 ` Jozsef Kadlecsik 2002-10-11 13:56 ` Martin Josefsson @ 2002-10-12 12:17 ` Harald Welte 2002-10-12 13:09 ` Martin Josefsson 1 sibling, 1 reply; 19+ messages in thread From: Harald Welte @ 2002-10-12 12:17 UTC (permalink / raw) To: Jozsef Kadlecsik; +Cc: Martin Josefsson, Netfilter-devel [-- Attachment #1.1: Type: text/plain, Size: 3657 bytes --] On Fri, Oct 11, 2002 at 03:06:59PM +0200, Jozsef Kadlecsik wrote: > Hi Martin, > > On 10 Oct 2002, Martin Josefsson wrote: > > > What's the lockingrules for ip_conntrack_expect_list? > > As far as I remember, the guiding rules were the following: > ip_conntrack_lock is always held when we do something with the > expectations. If it's write-locked, then there is no need for additional > locking. If it's read-locked, then we use the additional > ip_conntrack_expect_tuple_lock to protect the expectation lists. Unfortunately I (the inventor of this second lock) don't even remember the exact purpose anymore. Sometimes I feel really stupid. so let's go read the code and my personal notes again... I think the idea of the expect_tuple_lock() was to protect the tuple/mask inside a struct ip_conntrack_expect. This way we can implement ip_conntrack_change_expect() without the need to grab a writelock on ip_conntrack_lock. This is not jut an optimization, but absolutely necessarry for the NAT helpers: We grab a readlock on ip_conntrack_lock and traverse the list of siblings. For every sibling we call the nat helper function, which in turn wants to alter the expectation by calling ip_conntrack_change_expect(). Altering the expectation needs some kind of locking. If we only had one lock, we would need a writelock on ip_conntrack_lock. Since there is no primitive for atomically upgrading a lock from readlock to writelock, we would need to grab a writelock before traversing the sibling list. And this sucks, since nobody else can do anything inside connection tracking for a very long time. > > I've gone through 2.4.20-pre8aa2 which was the kernel the bugreport was > > from. And I've found some inconsistencies. > > > init_conntrack > > first: > > WRITE_LOCK(&ip_conntrack_lock); > > READ_LOCK(&ip_conntrack_expect_tuple_lock); > > LIST_FIND > > Read-locking ip_conntrack_expect_tuple_lock is unnecessary here then. Well, this is correct. But it's an optimiziation based on the assumption that there is only one place where we WRITE_LOCK(&expect_tuple_lock), which is in turn protected by READ_LOCK(&ip_conntrack_lock). While this is true now, I'm not sure that this is a good idea in general. > > later: > > WRITE_LOCK(&ip_conntrack_lock); > > LIST_DELETE This is OK, since list_delete doesn't read the tuple/mask inside this expect. > > ip_conntrack_expect_related > > WRITE_LOCK(&ip_conntrack_lock); > > LIST_FIND > > LIST_FIND > > list_prepend This is clearly a bug, but works according to the optimization described above. I think we should either grab the tuple_lock or at least make a verbose comment in the code describing why we don't grab it. > > ip_conntrack_change_expect > > MUST_BE_READ_LOCKED(&ip_conntrack_lock); > > LIST_FIND > > > > FAILS!! called from nat-helpers which only hold their own locks > > if any at all. No. The helper's help functions are called from do_bindings, which grab a readlock on ip_conntrack_lock before traversing the list of expectations matching this packet. See also my comment above. Please correct me if I'm overlooking something. > Regards, > Jozsef Just for your info, I'm describing a textfiles about my initial thoughs on newnat locking.. -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ "If this were a dictatorship, it'd be a heck of a lot easier, just so long as I'm the dictator." -- George W. Bush Dec 18, 2000 [-- Attachment #1.2: newnat-problems --] [-- Type: text/plain, Size: 4350 bytes --] Problems with locking and the current multirel/newnat code The general problem is that somebody, either the nat core or the nat helper have to traverse the sibling_list of ip_conntrack and possibly modify this list while traversing it. While this is not unsolvable, it is always ugly... maybe somebody has some ideas. Possible ways of implementation: a) traverse sibling_list and do not alter it while traversal - grab readlock before traversal - ip_ct_unexpect_related() just 'notes' the to-be-deleted expectations - release readlock - grab writelock - delete all 'noted' expectations b) let the helper do it by itself - call helper once for every packet (like current code) - helper grabs writelock before traversal - helper can alter his list exclusively while traversing (he has to know what he is doing, if for example deleting while traversing) - helper releases writelock after traversal Either way we go, there is some locking involved. 'traditionally' we would use the ip_conntrack_lock. However, as described below, this has some problems and using a seperate lock seems interesting. In any case, it looks unclean. Comments? 1. General locking problems if we would use a seperate sibling_lock: While iterating over the sibling list, the helper has to grab the sibling_lock. Then, while iterating, it calls expect_related/unexpect_related functions. Those functions need to grab the sibling_lock as well, as they have to add/ remove items from the list. Of course they can not grab the lock, as it is already grabbed. So what we could do, is provide non-locking variants of expect_related and unexpect_related, which are to be called in this case. However, expect_related and unexpect_related have to grab the ip_conntrack_lock in addition to the sibling_lock, because they have to possibly modify the global expect_list. destroy_expectations: Then there is destroy_expectations, which is called with an already locked ip_conntrack_lock, and wishes to grab sibling_lock as well, because it also has to delete expectations. -> possibility of deadlock. destroy_conntrack: Has to grab ip_conntrack_lock as well as sibling_lock, because it has to remove itself from the sibling_list. 2. Using only one lock If we only use one lock, the already existing ip_conntrack_lock, it would have to be grabbed for a long time. And there is no functionality for atomically changing a read lock into a write lock. So we would have to grab a write lock before we call the helper / traverse the list, and could release this only after the whole job is done. The problem with expect_related / unexpect_related still persists, because they still want to grab an already taken lock. Solution like above, provide non-locking-counterparts. destroy_expectations and/or destroy_conntrack shouldn't raise any additional problems in this setup.. Disadvantage: Most of the time, the nat helper will be called without having some actual work to do. Most of the time, the traffic will not raise any expectations, but still we have to grab the write lock ind advance, thus hurting SMP scalability :( === sat oct 12, 2002: Unfortunately I (the inventor of this second lock) don't even remember the exact purpose anymore. Sometimes I feel really stupid. so let's go read the code and my personal notes again... I think the idea of the expect_tuple_lock() was to protect the tuple/mask inside a struct ip_conntrack_expect. This way we can implement ip_conntrack_change_expect() without the need to grab a writelock on ip_conntrack_lock. This is not jut an optimization, but absolutely necessarry for the NAT helpers: We grab a readlock on ip_conntrack_lock and traverse the list of siblings. For every sibling we call the nat helper function, which in turn wants to alter the expectation by calling ip_conntrack_change_expect().Altering the expectation needs some kind of locking. If we only had one lock, we would need a writelock on ip_conntrack_lock. Since there is no primitive for atomically upgrading a lock from readlock to writelock, we would need to grab a writelock before traversing the sibling list. And this sucks, since nobody else can do anything inside connection tracking for a very long time. [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-12 12:17 ` Harald Welte @ 2002-10-12 13:09 ` Martin Josefsson 2002-10-12 13:20 ` Martin Josefsson ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Martin Josefsson @ 2002-10-12 13:09 UTC (permalink / raw) To: Harald Welte; +Cc: Jozsef Kadlecsik, Netfilter-devel On Sat, 2002-10-12 at 14:17, Harald Welte wrote: > > > init_conntrack > > > first: > > > WRITE_LOCK(&ip_conntrack_lock); > > > READ_LOCK(&ip_conntrack_expect_tuple_lock); > > > LIST_FIND > > > > Read-locking ip_conntrack_expect_tuple_lock is unnecessary here then. > > Well, this is correct. But it's an optimiziation based on the assumption > that there is only one place where we WRITE_LOCK(&expect_tuple_lock), > which is in turn protected by READ_LOCK(&ip_conntrack_lock). While this > is true now, I'm not sure that this is a good idea in general. Ok then that stays. > > > later: > > > WRITE_LOCK(&ip_conntrack_lock); > > > LIST_DELETE > > This is OK, since list_delete doesn't read the tuple/mask inside this expect. Ok. > > > ip_conntrack_expect_related > > > WRITE_LOCK(&ip_conntrack_lock); > > > LIST_FIND > > > LIST_FIND > > > list_prepend > > This is clearly a bug, but works according to the optimization described > above. I think we should either grab the tuple_lock or at least make > a verbose comment in the code describing why we don't grab it. I'll add a read-lock on the tuple_lock here just to mark that it should be taken. All modifications of the liststructure are done under a write-locked ip_conntrack_lock, the only things that happen under a read-locked ip_conntrack_lock are a find and a change, nothing that changes the structure. see below. > > > ip_conntrack_change_expect > > > MUST_BE_READ_LOCKED(&ip_conntrack_lock); > > > LIST_FIND > > > > > > FAILS!! called from nat-helpers which only hold their own locks > > > if any at all. > > No. The helper's help functions are called from do_bindings, which grab > a readlock on ip_conntrack_lock before traversing the list of expectations > matching this packet. See also my comment above. Ahh I missed this, the helper is called after exp_for_packet which has already screwed up the lock-debug. But we'll need to hold at least a read-lock on the tuple_lock during that LIST_FIND since we only have a read-lock on ip_conntrack_lock. and we modify tupe/mask inside this read-locked ip_conntrack_lock but within a write-lock'd tuple_lock. And since we'll probably need to take that write-lock anyway we might just as well take it from the beginning. > Please correct me if I'm overlooking something. I think you got everything covered. So the lockingrules comes down to this if I've understood things correctly: 1. All accesses to the expect-list are to be performed with ip_conntrack_lock held. 2. All list-structure changes are done under a write-locked ip_conntrack_lock and tuple_lock is not needed. 3. All other accesses are performed under at least a read-locked ip_conntrack_lock. 4. All read-accesses to tuple/mask data inside the list are performed under at least a read-locked tuple_lock. 5. All write-accesses to tuple/mask data inside the list are performed under a write-locked tuple_lock. Does that sound ok to you? -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-12 13:09 ` Martin Josefsson @ 2002-10-12 13:20 ` Martin Josefsson 2002-10-12 13:29 ` [PATCH 2] " Martin Josefsson 2002-10-12 14:34 ` what's the lockingrules for ip_conntrack_expect_list? Harald Welte 2 siblings, 0 replies; 19+ messages in thread From: Martin Josefsson @ 2002-10-12 13:20 UTC (permalink / raw) To: Harald Welte; +Cc: Jozsef Kadlecsik, Netfilter-devel On Sat, 2002-10-12 at 15:09, Martin Josefsson wrote: > > > > ip_conntrack_expect_related > > > > WRITE_LOCK(&ip_conntrack_lock); > > > > LIST_FIND > > > > LIST_FIND > > > > list_prepend > > > > This is clearly a bug, but works according to the optimization described > > above. I think we should either grab the tuple_lock or at least make > > a verbose comment in the code describing why we don't grab it. > > I'll add a read-lock on the tuple_lock here just to mark that it should > be taken. Oops, spoke too soon, there already exists such a comment: /* Because of the write lock, no reader can walk the lists, * so there is no need to use the tuple lock too */ I'll leave it at that. -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2] Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-12 13:09 ` Martin Josefsson 2002-10-12 13:20 ` Martin Josefsson @ 2002-10-12 13:29 ` Martin Josefsson 2002-10-12 14:36 ` Harald Welte 2002-10-12 14:34 ` what's the lockingrules for ip_conntrack_expect_list? Harald Welte 2 siblings, 1 reply; 19+ messages in thread From: Martin Josefsson @ 2002-10-12 13:29 UTC (permalink / raw) To: Harald Welte; +Cc: Jozsef Kadlecsik, Netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1217 bytes --] On Sat, 2002-10-12 at 15:09, Martin Josefsson wrote: > On Sat, 2002-10-12 at 14:17, Harald Welte wrote: > > > > ip_conntrack_change_expect > > > > MUST_BE_READ_LOCKED(&ip_conntrack_lock); > > > > LIST_FIND > > > > > > > > FAILS!! called from nat-helpers which only hold their own locks > > > > if any at all. > > > > No. The helper's help functions are called from do_bindings, which grab > > a readlock on ip_conntrack_lock before traversing the list of expectations > > matching this packet. See also my comment above. > > Ahh I missed this, the helper is called after exp_for_packet which has > already screwed up the lock-debug. > > But we'll need to hold at least a read-lock on the tuple_lock during > that LIST_FIND since we only have a read-lock on ip_conntrack_lock. > and we modify tupe/mask inside this read-locked ip_conntrack_lock but > within a write-lock'd tuple_lock. And since we'll probably need to take > that write-lock anyway we might just as well take it from the beginning. Attached patch changes the write-locked tuple_lock to cover the entire function. Is this ok with you? -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. [-- Attachment #2: ip_conntrack_change_expect-lockfix.diff --] [-- Type: text/plain, Size: 1494 bytes --] --- linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_core.c.mjufs 2002-10-12 15:23:22.000000000 +0200 +++ linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_core.c 2002-10-12 15:23:00.000000000 +0200 @@ -1060,7 +1060,10 @@ int ip_conntrack_change_expect(struct ip_conntrack_expect *expect, struct ip_conntrack_tuple *newtuple) { + int ret; + MUST_BE_READ_LOCKED(&ip_conntrack_lock); + WRITE_LOCK(&ip_conntrack_expect_tuple_lock); DEBUGP("change_expect:\n"); DEBUGP("exp tuple: "); DUMP_TUPLE(&expect->tuple); @@ -1073,26 +1076,25 @@ && LIST_FIND(&ip_conntrack_expect_list, expect_clash, struct ip_conntrack_expect *, newtuple, &expect->mask)) { /* Force NAT to find an unused tuple */ - return -1; + ret = -1; } else { - WRITE_LOCK(&ip_conntrack_expect_tuple_lock); memcpy(&expect->ct_tuple, &expect->tuple, sizeof(expect->tuple)); memcpy(&expect->tuple, newtuple, sizeof(expect->tuple)); - WRITE_UNLOCK(&ip_conntrack_expect_tuple_lock); - return 0; + ret = 0; } } else { /* Resent packet */ DEBUGP("change expect: resent packet\n"); if (ip_ct_tuple_equal(&expect->tuple, newtuple)) { - return 0; + ret = 0; } else { /* Force NAT to choose again the same port */ - return -1; + ret = -1; } } + WRITE_UNLOCK(&ip_conntrack_expect_tuple_lock); - return -1; + return ret; } /* Alter reply tuple (maybe alter helper). If it's already taken, ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2] Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-12 13:29 ` [PATCH 2] " Martin Josefsson @ 2002-10-12 14:36 ` Harald Welte 2002-10-12 14:59 ` Min Li 2002-10-12 15:32 ` Martin Josefsson 0 siblings, 2 replies; 19+ messages in thread From: Harald Welte @ 2002-10-12 14:36 UTC (permalink / raw) To: Martin Josefsson; +Cc: Jozsef Kadlecsik, Netfilter-devel [-- Attachment #1: Type: text/plain, Size: 616 bytes --] On Sat, Oct 12, 2002 at 03:29:12PM +0200, Martin Josefsson wrote: > Attached patch changes the write-locked tuple_lock to cover the entire > function. > Is this ok with you? looks fine. Now we only need the find_proto() change, and I'll submit the two combined to the kernel. > /Martin -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ "If this were a dictatorship, it'd be a heck of a lot easier, just so long as I'm the dictator." -- George W. Bush Dec 18, 2000 [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2] Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-12 14:36 ` Harald Welte @ 2002-10-12 14:59 ` Min Li 2002-10-12 15:32 ` Martin Josefsson 1 sibling, 0 replies; 19+ messages in thread From: Min Li @ 2002-10-12 14:59 UTC (permalink / raw) Cc: Netfilter-devel Dear All, I have to implement one senario which is to differentiate traffic coming through the router. For any new connections forwarded by the router, I have to change the source address of those packets. Also, the new source addresses are known at user space only. So, by simply using iptables NAT table, I won't be able to do that. In my application, I plan to inform the kernel about the new source addresses, and then create a address mapping table in the kernel. Then add a new module in iptables to take care of specially my scenario. So, in the kernel, I will do all the address translation based on the connection tracking results. However, that means I need to change iptables tool as well as adding a new module. I have never done linux kernel module programming before. Will it be possible for you to give me some advice? Thanks a lot. Min ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2] Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-12 14:36 ` Harald Welte 2002-10-12 14:59 ` Min Li @ 2002-10-12 15:32 ` Martin Josefsson 2002-10-14 11:33 ` Harald Welte 2002-10-23 15:16 ` how you use nfnetlink-ctnetlink-0.11.patch marian stagarescu 1 sibling, 2 replies; 19+ messages in thread From: Martin Josefsson @ 2002-10-12 15:32 UTC (permalink / raw) To: Harald Welte; +Cc: Jozsef Kadlecsik, Netfilter-devel [-- Attachment #1: Type: text/plain, Size: 972 bytes --] On Sat, 2002-10-12 at 16:36, Harald Welte wrote: > On Sat, Oct 12, 2002 at 03:29:12PM +0200, Martin Josefsson wrote: > > Attached patch changes the write-locked tuple_lock to cover the entire > > function. > > Is this ok with you? > > looks fine. Now we only need the find_proto() change, and I'll submit > the two combined to the kernel. The find_proto() change is a little trickier... it depends on ip_conntrack_protocol.h. I suggest that we move find_proto() to ip_conntrack_protocol.h. And then find_proto() also depends on listhelp.h... include-dependencies isn't fun. Making find_proto() an inline adds two exported symbols, protocol_list and ip_conntrack_generic_protocol. I've attached a patch that at least compiles, I havn't tried booting it yet. Does this look even remotely ok to you? If it does I'll try booting it and send you a combined patch. -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. [-- Attachment #2: find_proto-inline.diff --] [-- Type: text/plain, Size: 10383 bytes --] --- linux-2.4.20-pre10/include/linux/netfilter_ipv4/ip_conntrack_core.h.meep 2002-10-12 16:54:30.000000000 +0200 +++ linux-2.4.20-pre10/include/linux/netfilter_ipv4/ip_conntrack_core.h 2002-10-12 17:17:16.000000000 +0200 @@ -16,9 +16,6 @@ struct ip_conntrack_protocol; extern struct ip_conntrack_protocol *ip_ct_find_proto(u_int8_t protocol); -/* Like above, but you already have conntrack read lock. */ -extern struct ip_conntrack_protocol *__find_proto(u_int8_t protocol); -extern struct list_head protocol_list; /* Returns conntrack if it dealt with ICMP, and filled in skb->nfct */ extern struct ip_conntrack *icmp_error_track(struct sk_buff *skb, --- linux-2.4.20-pre10/include/linux/netfilter_ipv4/ip_conntrack_protocol.h.meep 2002-10-12 16:55:28.000000000 +0200 +++ linux-2.4.20-pre10/include/linux/netfilter_ipv4/ip_conntrack_protocol.h 2002-10-12 17:14:40.000000000 +0200 @@ -62,4 +62,28 @@ extern struct ip_conntrack_protocol ip_conntrack_protocol_udp; extern struct ip_conntrack_protocol ip_conntrack_protocol_icmp; extern int ip_conntrack_protocol_tcp_init(void); + +extern struct list_head protocol_list; +extern struct ip_conntrack_protocol ip_conntrack_generic_protocol; + +static inline int proto_cmpfn(const struct ip_conntrack_protocol *curr, + u_int8_t protocol) +{ + return protocol == curr->proto; +} + +static inline struct ip_conntrack_protocol *__find_proto(u_int8_t protocol) +{ + struct ip_conntrack_protocol *p; + + MUST_BE_READ_LOCKED(&ip_conntrack_lock); + p = LIST_FIND(&protocol_list, proto_cmpfn, + struct ip_conntrack_protocol *, protocol); + if (!p) + p = &ip_conntrack_generic_protocol; + + return p; +} + + #endif /*_IP_CONNTRACK_PROTOCOL_H*/ --- linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_core.c.meep 2002-10-12 15:29:47.000000000 +0200 +++ linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_core.c 2002-10-12 17:09:32.000000000 +0200 @@ -39,11 +39,11 @@ #define ASSERT_READ_LOCK(x) MUST_BE_READ_LOCKED(&ip_conntrack_lock) #define ASSERT_WRITE_LOCK(x) MUST_BE_WRITE_LOCKED(&ip_conntrack_lock) +#include <linux/netfilter_ipv4/listhelp.h> #include <linux/netfilter_ipv4/ip_conntrack.h> #include <linux/netfilter_ipv4/ip_conntrack_protocol.h> #include <linux/netfilter_ipv4/ip_conntrack_helper.h> #include <linux/netfilter_ipv4/ip_conntrack_core.h> -#include <linux/netfilter_ipv4/listhelp.h> #define IP_CONNTRACK_VERSION "2.1" @@ -66,27 +66,6 @@ struct list_head *ip_conntrack_hash; static kmem_cache_t *ip_conntrack_cachep; -extern struct ip_conntrack_protocol ip_conntrack_generic_protocol; - -static inline int proto_cmpfn(const struct ip_conntrack_protocol *curr, - u_int8_t protocol) -{ - return protocol == curr->proto; -} - -struct ip_conntrack_protocol *__find_proto(u_int8_t protocol) -{ - struct ip_conntrack_protocol *p; - - MUST_BE_READ_LOCKED(&ip_conntrack_lock); - p = LIST_FIND(&protocol_list, proto_cmpfn, - struct ip_conntrack_protocol *, protocol); - if (!p) - p = &ip_conntrack_generic_protocol; - - return p; -} - struct ip_conntrack_protocol *ip_ct_find_proto(u_int8_t protocol) { struct ip_conntrack_protocol *p; --- linux-2.4.20-pre10/net/ipv4/netfilter/ip_nat_core.c.meep 2002-10-12 15:37:02.000000000 +0200 +++ linux-2.4.20-pre10/net/ipv4/netfilter/ip_nat_core.c 2002-10-12 17:30:38.000000000 +0200 @@ -21,6 +21,7 @@ #define ASSERT_READ_LOCK(x) MUST_BE_READ_LOCKED(&ip_nat_lock) #define ASSERT_WRITE_LOCK(x) MUST_BE_WRITE_LOCKED(&ip_nat_lock) +#include <linux/netfilter_ipv4/listhelp.h> #include <linux/netfilter_ipv4/ip_conntrack.h> #include <linux/netfilter_ipv4/ip_conntrack_core.h> #include <linux/netfilter_ipv4/ip_conntrack_protocol.h> @@ -29,7 +30,6 @@ #include <linux/netfilter_ipv4/ip_nat_core.h> #include <linux/netfilter_ipv4/ip_nat_helper.h> #include <linux/netfilter_ipv4/ip_conntrack_helper.h> -#include <linux/netfilter_ipv4/listhelp.h> #if 0 #define DEBUGP printk @@ -739,7 +739,7 @@ int ret = 1; MUST_BE_READ_LOCKED(&ip_conntrack_lock); - proto = ip_ct_find_proto((*pskb)->nh.iph->protocol); + proto = __find_proto((*pskb)->nh.iph->protocol); if (proto->exp_matches_pkt) ret = proto->exp_matches_pkt(exp, pskb); --- linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_standalone.c.meep 2002-10-12 17:00:17.000000000 +0200 +++ linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_standalone.c 2002-10-12 17:25:42.000000000 +0200 @@ -21,11 +21,11 @@ #define ASSERT_READ_LOCK(x) MUST_BE_READ_LOCKED(&ip_conntrack_lock) #define ASSERT_WRITE_LOCK(x) MUST_BE_WRITE_LOCKED(&ip_conntrack_lock) +#include <linux/netfilter_ipv4/listhelp.h> #include <linux/netfilter_ipv4/ip_conntrack.h> #include <linux/netfilter_ipv4/ip_conntrack_protocol.h> #include <linux/netfilter_ipv4/ip_conntrack_core.h> #include <linux/netfilter_ipv4/ip_conntrack_helper.h> -#include <linux/netfilter_ipv4/listhelp.h> #if 0 #define DEBUGP printk @@ -361,6 +361,8 @@ EXPORT_SYMBOL(ip_ct_selective_cleanup); EXPORT_SYMBOL(ip_ct_refresh); EXPORT_SYMBOL(ip_ct_find_proto); +EXPORT_SYMBOL(protocol_list); +EXPORT_SYMBOL(ip_conntrack_generic_protocol); EXPORT_SYMBOL(ip_ct_find_helper); EXPORT_SYMBOL(ip_conntrack_expect_related); EXPORT_SYMBOL(ip_conntrack_change_expect); --- linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_proto_generic.c.meep 2002-10-12 17:09:55.000000000 +0200 +++ linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_proto_generic.c 2002-10-12 17:11:35.000000000 +0200 @@ -1,7 +1,12 @@ #include <linux/types.h> #include <linux/sched.h> #include <linux/timer.h> + +#define ASSERT_READ_LOCK(x) +#define ASSERT_WRITE_LOCK(x) + #include <linux/netfilter.h> +#include <linux/netfilter_ipv4/listhelp.h> #include <linux/netfilter_ipv4/ip_conntrack_protocol.h> #define GENERIC_TIMEOUT (600*HZ) --- linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_proto_tcp.c.meep 2002-10-12 17:11:56.000000000 +0200 +++ linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 2002-10-12 17:12:24.000000000 +0200 @@ -11,6 +11,10 @@ #include <net/tcp.h> +#define ASSERT_READ_LOCK(x) +#define ASSERT_WRITE_LOCK(x) + +#include <linux/netfilter_ipv4/listhelp.h> #include <linux/netfilter_ipv4/ip_conntrack.h> #include <linux/netfilter_ipv4/ip_conntrack_protocol.h> #include <linux/netfilter_ipv4/lockhelp.h> --- linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_proto_udp.c.meep 2002-10-12 17:12:34.000000000 +0200 +++ linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_proto_udp.c 2002-10-12 17:12:55.000000000 +0200 @@ -4,6 +4,11 @@ #include <linux/netfilter.h> #include <linux/in.h> #include <linux/udp.h> + +#define ASSERT_READ_LOCK(x) +#define ASSERT_WRITE_LOCK(x) + +#include <linux/netfilter_ipv4/listhelp.h> #include <linux/netfilter_ipv4/ip_conntrack_protocol.h> #define UDP_TIMEOUT (30*HZ) --- linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_proto_icmp.c.meep 2002-10-12 17:13:10.000000000 +0200 +++ linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_proto_icmp.c 2002-10-12 17:13:22.000000000 +0200 @@ -4,6 +4,11 @@ #include <linux/netfilter.h> #include <linux/in.h> #include <linux/icmp.h> + +#define ASSERT_READ_LOCK(x) +#define ASSERT_WRITE_LOCK(x) + +#include <linux/netfilter_ipv4/listhelp.h> #include <linux/netfilter_ipv4/ip_conntrack_protocol.h> #define ICMP_TIMEOUT (30*HZ) --- linux-2.4.20-pre10/net/ipv4/netfilter/ip_nat_helper.c.meep 2002-10-12 17:15:10.000000000 +0200 +++ linux-2.4.20-pre10/net/ipv4/netfilter/ip_nat_helper.c 2002-10-12 17:15:19.000000000 +0200 @@ -26,13 +26,13 @@ #define ASSERT_READ_LOCK(x) MUST_BE_READ_LOCKED(&ip_nat_lock) #define ASSERT_WRITE_LOCK(x) MUST_BE_WRITE_LOCKED(&ip_nat_lock) +#include <linux/netfilter_ipv4/listhelp.h> #include <linux/netfilter_ipv4/ip_conntrack.h> #include <linux/netfilter_ipv4/ip_conntrack_helper.h> #include <linux/netfilter_ipv4/ip_nat.h> #include <linux/netfilter_ipv4/ip_nat_protocol.h> #include <linux/netfilter_ipv4/ip_nat_core.h> #include <linux/netfilter_ipv4/ip_nat_helper.h> -#include <linux/netfilter_ipv4/listhelp.h> #if 0 #define DEBUGP printk --- linux-2.4.20-pre10/net/ipv4/netfilter/ip_nat_rule.c.meep 2002-10-12 17:15:28.000000000 +0200 +++ linux-2.4.20-pre10/net/ipv4/netfilter/ip_nat_rule.c 2002-10-12 17:15:45.000000000 +0200 @@ -15,11 +15,11 @@ #define ASSERT_READ_LOCK(x) MUST_BE_READ_LOCKED(&ip_nat_lock) #define ASSERT_WRITE_LOCK(x) MUST_BE_WRITE_LOCKED(&ip_nat_lock) +#include <linux/netfilter_ipv4/listhelp.h> #include <linux/netfilter_ipv4/ip_tables.h> #include <linux/netfilter_ipv4/ip_nat.h> #include <linux/netfilter_ipv4/ip_nat_core.h> #include <linux/netfilter_ipv4/ip_nat_rule.h> -#include <linux/netfilter_ipv4/listhelp.h> #if 0 #define DEBUGP printk --- linux-2.4.20-pre10/net/ipv4/netfilter/ip_nat_standalone.c.meep 2002-10-12 17:15:53.000000000 +0200 +++ linux-2.4.20-pre10/net/ipv4/netfilter/ip_nat_standalone.c 2002-10-12 17:16:01.000000000 +0200 @@ -28,6 +28,7 @@ #define ASSERT_READ_LOCK(x) MUST_BE_READ_LOCKED(&ip_nat_lock) #define ASSERT_WRITE_LOCK(x) MUST_BE_WRITE_LOCKED(&ip_nat_lock) +#include <linux/netfilter_ipv4/listhelp.h> #include <linux/netfilter_ipv4/ip_nat.h> #include <linux/netfilter_ipv4/ip_nat_rule.h> #include <linux/netfilter_ipv4/ip_nat_protocol.h> @@ -35,7 +36,6 @@ #include <linux/netfilter_ipv4/ip_nat_helper.h> #include <linux/netfilter_ipv4/ip_tables.h> #include <linux/netfilter_ipv4/ip_conntrack_core.h> -#include <linux/netfilter_ipv4/listhelp.h> #if 0 #define DEBUGP printk --- linux-2.4.20-pre10/net/ipv4/netfilter/ip_fw_compat_masq.c.meep 2002-10-12 17:20:08.000000000 +0200 +++ linux-2.4.20-pre10/net/ipv4/netfilter/ip_fw_compat_masq.c 2002-10-12 17:20:35.000000000 +0200 @@ -20,11 +20,11 @@ #define ASSERT_READ_LOCK(x) MUST_BE_READ_LOCKED(&ip_conntrack_lock) #define ASSERT_WRITE_LOCK(x) MUST_BE_WRITE_LOCKED(&ip_conntrack_lock) +#include <linux/netfilter_ipv4/listhelp.h> #include <linux/netfilter_ipv4/ip_conntrack.h> #include <linux/netfilter_ipv4/ip_conntrack_core.h> #include <linux/netfilter_ipv4/ip_nat.h> #include <linux/netfilter_ipv4/ip_nat_core.h> -#include <linux/netfilter_ipv4/listhelp.h> #if 0 #define DEBUGP printk ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2] Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-12 15:32 ` Martin Josefsson @ 2002-10-14 11:33 ` Harald Welte 2002-10-14 13:26 ` Martin Josefsson 2002-10-23 15:16 ` how you use nfnetlink-ctnetlink-0.11.patch marian stagarescu 1 sibling, 1 reply; 19+ messages in thread From: Harald Welte @ 2002-10-14 11:33 UTC (permalink / raw) To: Martin Josefsson; +Cc: Jozsef Kadlecsik, Netfilter-devel [-- Attachment #1: Type: text/plain, Size: 1244 bytes --] On Sat, Oct 12, 2002 at 05:32:55PM +0200, Martin Josefsson wrote: > On Sat, 2002-10-12 at 16:36, Harald Welte wrote: > > On Sat, Oct 12, 2002 at 03:29:12PM +0200, Martin Josefsson wrote: > > > Attached patch changes the write-locked tuple_lock to cover the entire > > > function. > > > Is this ok with you? > > > > looks fine. Now we only need the find_proto() change, and I'll submit > > the two combined to the kernel. > > The find_proto() change is a little trickier... it depends on > ip_conntrack_protocol.h. I suggest that we move find_proto() to > ip_conntrack_protocol.h. And then find_proto() also depends on > listhelp.h... include-dependencies isn't fun. > > Making find_proto() an inline adds two exported symbols, protocol_list > and ip_conntrack_generic_protocol. I didn't think about this, sorry. So I think we now go for the previous solution... export an __ function :( > /Martin -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ "If this were a dictatorship, it'd be a heck of a lot easier, just so long as I'm the dictator." -- George W. Bush Dec 18, 2000 [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2] Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-14 11:33 ` Harald Welte @ 2002-10-14 13:26 ` Martin Josefsson 0 siblings, 0 replies; 19+ messages in thread From: Martin Josefsson @ 2002-10-14 13:26 UTC (permalink / raw) To: Harald Welte; +Cc: Jozsef Kadlecsik, Netfilter-devel [-- Attachment #1: Type: text/plain, Size: 674 bytes --] On Mon, 2002-10-14 at 13:33, Harald Welte wrote: > > Making find_proto() an inline adds two exported symbols, protocol_list > > and ip_conntrack_generic_protocol. > > I didn't think about this, sorry. So I think we now go for the previous > solution... export an __ function :( You can't think of everything :) I saw that you submitted the __ patch, I havn't had time to test the ip_conntrack_change_expect patch but it looks correct to me, and it compiles. If you feel dangerous you can submit it. I've attached it again. (it applies cleanly to 2.4.20-pre10) -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. [-- Attachment #2: ip_conntrack_change_expect-lockfix.diff --] [-- Type: text/plain, Size: 1494 bytes --] --- linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_core.c.mjufs 2002-10-12 15:23:22.000000000 +0200 +++ linux-2.4.20-pre10/net/ipv4/netfilter/ip_conntrack_core.c 2002-10-12 15:23:00.000000000 +0200 @@ -1060,7 +1060,10 @@ int ip_conntrack_change_expect(struct ip_conntrack_expect *expect, struct ip_conntrack_tuple *newtuple) { + int ret; + MUST_BE_READ_LOCKED(&ip_conntrack_lock); + WRITE_LOCK(&ip_conntrack_expect_tuple_lock); DEBUGP("change_expect:\n"); DEBUGP("exp tuple: "); DUMP_TUPLE(&expect->tuple); @@ -1073,26 +1076,25 @@ && LIST_FIND(&ip_conntrack_expect_list, expect_clash, struct ip_conntrack_expect *, newtuple, &expect->mask)) { /* Force NAT to find an unused tuple */ - return -1; + ret = -1; } else { - WRITE_LOCK(&ip_conntrack_expect_tuple_lock); memcpy(&expect->ct_tuple, &expect->tuple, sizeof(expect->tuple)); memcpy(&expect->tuple, newtuple, sizeof(expect->tuple)); - WRITE_UNLOCK(&ip_conntrack_expect_tuple_lock); - return 0; + ret = 0; } } else { /* Resent packet */ DEBUGP("change expect: resent packet\n"); if (ip_ct_tuple_equal(&expect->tuple, newtuple)) { - return 0; + ret = 0; } else { /* Force NAT to choose again the same port */ - return -1; + ret = -1; } } + WRITE_UNLOCK(&ip_conntrack_expect_tuple_lock); - return -1; + return ret; } /* Alter reply tuple (maybe alter helper). If it's already taken, ^ permalink raw reply [flat|nested] 19+ messages in thread
* how you use nfnetlink-ctnetlink-0.11.patch 2002-10-12 15:32 ` Martin Josefsson 2002-10-14 11:33 ` Harald Welte @ 2002-10-23 15:16 ` marian stagarescu 2002-10-23 15:52 ` Harald Welte 1 sibling, 1 reply; 19+ messages in thread From: marian stagarescu @ 2002-10-23 15:16 UTC (permalink / raw) To: laforge, jschlst; +Cc: Netfilter-devel can someone give me some input on this patch: stability and how to use it ? sometime ago i was looking for a way to flush individual conntrack entries. i think this patch may let me access conntrack entries and, for example, set their TTL to 0 so that it can be flushed. but i'm not user how to use it: does it have a corresponding userland iptables part ? thanks, marian ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: how you use nfnetlink-ctnetlink-0.11.patch 2002-10-23 15:16 ` how you use nfnetlink-ctnetlink-0.11.patch marian stagarescu @ 2002-10-23 15:52 ` Harald Welte 0 siblings, 0 replies; 19+ messages in thread From: Harald Welte @ 2002-10-23 15:52 UTC (permalink / raw) To: marian stagarescu; +Cc: jschlst, Netfilter-devel [-- Attachment #1: Type: text/plain, Size: 994 bytes --] On Wed, Oct 23, 2002 at 11:16:36AM -0400, marian stagarescu wrote: > can someone give me some input on this patch: > stability and how to use it ? > sometime ago i was looking for a way to flush individual conntrack > entries. i think this patch may let me access conntrack entries and, for > example, set their TTL to 0 so that it can be flushed. but i'm not user > how to use it: does it have a corresponding userland iptables part ? It has a userspace library, called libctnetlink (see our cvs tree or http://cvs.netfilter.org/cgi-bin/cvsweb/netfilter/iptables2/libctnetlink/ This library has an example program, called 'ctnltest.c' > thanks, > marian -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ "If this were a dictatorship, it'd be a heck of a lot easier, just so long as I'm the dictator." -- George W. Bush Dec 18, 2000 [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: what's the lockingrules for ip_conntrack_expect_list? 2002-10-12 13:09 ` Martin Josefsson 2002-10-12 13:20 ` Martin Josefsson 2002-10-12 13:29 ` [PATCH 2] " Martin Josefsson @ 2002-10-12 14:34 ` Harald Welte 2 siblings, 0 replies; 19+ messages in thread From: Harald Welte @ 2002-10-12 14:34 UTC (permalink / raw) To: Martin Josefsson; +Cc: Jozsef Kadlecsik, Netfilter-devel [-- Attachment #1: Type: text/plain, Size: 2201 bytes --] On Sat, Oct 12, 2002 at 03:09:10PM +0200, Martin Josefsson wrote: > > > > ip_conntrack_change_expect > > > > MUST_BE_READ_LOCKED(&ip_conntrack_lock); > > > > LIST_FIND > > > > > > > > FAILS!! called from nat-helpers which only hold their own locks > > > > if any at all. > > > > No. The helper's help functions are called from do_bindings, which grab > > a readlock on ip_conntrack_lock before traversing the list of expectations > > matching this packet. See also my comment above. > > Ahh I missed this, the helper is called after exp_for_packet which has > already screwed up the lock-debug. > > But we'll need to hold at least a read-lock on the tuple_lock during > that LIST_FIND since we only have a read-lock on ip_conntrack_lock. > and we modify tupe/mask inside this read-locked ip_conntrack_lock but > within a write-lock'd tuple_lock. And since we'll probably need to take > that write-lock anyway we might just as well take it from the beginning. this is correct, we should grab a write-tuple-lock around the whole block, just as your patch did. Could you prepare (and test) a patch which does only this (and the ip_ct_find_proto() change?). > So the lockingrules comes down to this if I've understood things > correctly: > > 1. All accesses to the expect-list are to be performed with > ip_conntrack_lock held. > > 2. All list-structure changes are done under a write-locked > ip_conntrack_lock and tuple_lock is not needed. > > 3. All other accesses are performed under at least a read-locked > ip_conntrack_lock. > > 4. All read-accesses to tuple/mask data inside the list are performed > under at least a read-locked tuple_lock. > > 5. All write-accesses to tuple/mask data inside the list are performed > under a write-locked tuple_lock. > > Does that sound ok to you? sounds correct. > /Martin -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ "If this were a dictatorship, it'd be a heck of a lot easier, just so long as I'm the dictator." -- George W. Bush Dec 18, 2000 [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2002-10-23 15:52 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-10-10 21:40 what's the lockingrules for ip_conntrack_expect_list? Martin Josefsson 2002-10-11 13:06 ` Jozsef Kadlecsik 2002-10-11 13:56 ` Martin Josefsson 2002-10-11 14:07 ` Jozsef Kadlecsik 2002-10-11 18:42 ` [PATCH] " Martin Josefsson 2002-10-12 12:25 ` Harald Welte 2002-10-12 13:11 ` Martin Josefsson 2002-10-12 12:17 ` Harald Welte 2002-10-12 13:09 ` Martin Josefsson 2002-10-12 13:20 ` Martin Josefsson 2002-10-12 13:29 ` [PATCH 2] " Martin Josefsson 2002-10-12 14:36 ` Harald Welte 2002-10-12 14:59 ` Min Li 2002-10-12 15:32 ` Martin Josefsson 2002-10-14 11:33 ` Harald Welte 2002-10-14 13:26 ` Martin Josefsson 2002-10-23 15:16 ` how you use nfnetlink-ctnetlink-0.11.patch marian stagarescu 2002-10-23 15:52 ` Harald Welte 2002-10-12 14:34 ` what's the lockingrules for ip_conntrack_expect_list? Harald Welte
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.