* Re: IPSEC: on behavior of acquire [not found] <1112405303.1096.37.camel@jzny.localdomain> @ 2005-04-02 7:10 ` Aidas Kasparas 2005-04-02 12:25 ` [Ipsec-tools-devel] " Zilvinas Valinskas 2005-04-02 21:28 ` jamal 0 siblings, 2 replies; 16+ messages in thread From: Aidas Kasparas @ 2005-04-02 7:10 UTC (permalink / raw) To: hadi; +Cc: ipsec-tools-devel, netdev, nakam jamal wrote: > test1)on one window run setkey -x: > > ping -c 1 someDST > > -1) packet arrives towards outbound > 0) Larval state created > 1) one acquire sent. > 2) timeout. > 3) packet dropped. -ESRCH returned. > 4) larval state deleted > > So question 1): Shouldnt the return code be -ERESTART to ask > the app to retry? > question 2) Why is there a hardcoding of 1 try only? Re 1 try only. There is little sense to do more tries. If there is no deamon listening to pfkey messages, then no connection will be made no matter how many retries you'll do. If deamon/link/peer is slow and SA was not established before timeout expired, then repeated acquire will be simply ignored (deamon will find out that negotiation is already in progress, there is no reason to start another negotiation and therefore will drop that acquire request). And the only situation where repeated acquires may help is when pfkey messages are lost. But pfkey was not designed to survive message loses, therefore you should not operate your boxes in mode when lost pfkey messages are a rule, not an exception. And on the other hand, occasional pfkey message loses can be worked around by applications/user retry. Re error code returned. Error codes returned by pfkey never were perfect. But your experiment is not perfect too. You sent pings with no KE deamon running. pfkey code found that there is nothing receiving acquire messages => there is no chance that any process will setup required SAs and tried to inform about that (I agree, return code is not very informative, at least until you learn about reasons why it is such). If you would have racoon (or other pfkey based ISAKMP daemon) running, you would get "resource temporarily unavailable" (don't know which error code corresponds to that message), which IMHO is ok (if it is not, please explain). Re netlink behaviour I can not comment as I don't use it for ipsec purposes, but would like to read similar explanation. Reason for that - idea that ipsec-tools one day could support operation via netlink is not ruled out of our minds. Yet, afaik nobody is working on it at the moment. -- Aidas Kasparas IT administrator GM Consult Group, UAB ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Ipsec-tools-devel] Re: IPSEC: on behavior of acquire 2005-04-02 7:10 ` IPSEC: on behavior of acquire Aidas Kasparas @ 2005-04-02 12:25 ` Zilvinas Valinskas 2005-04-02 21:28 ` jamal 1 sibling, 0 replies; 16+ messages in thread From: Zilvinas Valinskas @ 2005-04-02 12:25 UTC (permalink / raw) To: Aidas Kasparas; +Cc: hadi, ipsec-tools-devel, netdev, nakam On Sat, Apr 02, 2005 at 10:10:05AM +0300, Aidas Kasparas wrote: > > > jamal wrote: > >test1)on one window run setkey -x: > > > >ping -c 1 someDST > > > >-1) packet arrives towards outbound > >0) Larval state created > >1) one acquire sent. > >2) timeout. > >3) packet dropped. -ESRCH returned. > >4) larval state deleted > > > >So question 1): Shouldnt the return code be -ERESTART to ask > >the app to retry? > >question 2) Why is there a hardcoding of 1 try only? > > Re 1 try only. There is little sense to do more tries. If there is no > deamon listening to pfkey messages, then no connection will be made no > matter how many retries you'll do. If deamon/link/peer is slow and SA > was not established before timeout expired, then repeated acquire will > be simply ignored (deamon will find out that negotiation is already in > progress, there is no reason to start another negotiation and therefore > will drop that acquire request). And the only situation where repeated > acquires may help is when pfkey messages are lost. But pfkey was not > designed to survive message loses, therefore you should not operate your > boxes in mode when lost pfkey messages are a rule, not an exception. And > on the other hand, occasional pfkey message loses can be worked around > by applications/user retry. > > Re error code returned. Error codes returned by pfkey never were > perfect. But your experiment is not perfect too. You sent pings with no > KE deamon running. pfkey code found that there is nothing receiving > acquire messages => there is no chance that any process will setup > required SAs and tried to inform about that (I agree, return code is not > very informative, at least until you learn about reasons why it is > such). If you would have racoon (or other pfkey based ISAKMP daemon) > running, you would get "resource temporarily unavailable" (don't know > which error code corresponds to that message), which IMHO is ok (if it > is not, please explain). EBUSY I think it is. I am not entirely sure it is ok to return such error, some applications are not coping nicely with it. Perhaps ECONNREFUSED is more reasonable - as it doesn't brake old apps assumption (connection cannot be established, doesn't matter if that is due to routing or IPsec SPD or anything else). Although it is quite simple to fix applications to handle EBUSY and retry ... I thought it was annoying that applications quit because of EBUSY - when I had tried IPsec first time. Now I think it is quite handy - especially from scripts, I am sure that if something goes wrong - ping (or other application) won't block ... > > Re netlink behaviour I can not comment as I don't use it for ipsec > purposes, but would like to read similar explanation. Reason for that - > idea that ipsec-tools one day could support operation via netlink is not > ruled out of our minds. Yet, afaik nobody is working on it at the moment. > > > -- > Aidas Kasparas > IT administrator > GM Consult Group, UAB > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Ipsec-tools-devel mailing list > Ipsec-tools-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/ipsec-tools-devel ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: IPSEC: on behavior of acquire 2005-04-02 7:10 ` IPSEC: on behavior of acquire Aidas Kasparas 2005-04-02 12:25 ` [Ipsec-tools-devel] " Zilvinas Valinskas @ 2005-04-02 21:28 ` jamal 2005-04-03 8:28 ` Aidas Kasparas 1 sibling, 1 reply; 16+ messages in thread From: jamal @ 2005-04-02 21:28 UTC (permalink / raw) To: Aidas Kasparas; +Cc: ipsec-tools-devel, netdev, nakam On Sat, 2005-04-02 at 02:10, Aidas Kasparas wrote: > > Re 1 try only. There is little sense to do more tries. If there is no > deamon listening to pfkey messages, then no connection will be made no > matter how many retries you'll do. If deamon/link/peer is slow and SA > was not established before timeout expired, then repeated acquire will > be simply ignored (deamon will find out that negotiation is already in > progress, there is no reason to start another negotiation and therefore > will drop that acquire request). And the only situation where repeated > acquires may help is when pfkey messages are lost. Exactly what i was trying to emulate - lost messages. I would expect it to be the rule to loose messages - but given theres no guarantee of delivery, messages could be lost. > But pfkey was not > designed to survive message loses, therefore you should not operate your > boxes in mode when lost pfkey messages are a rule, not an exception. And > on the other hand, occasional pfkey message loses can be worked around > by applications/user retry. > I think its more than just pfkey (or netlink) - rather the ipsec framework itself. One could look at the acquire as part of the "connection" setup (for lack of better description). Without the acquire succeeding, theres no connection..(assuming that to be a policy). Therefore if acquire is not supposed to be delivered with some certainty (read: retries) then theres some resiliciency issues IMO. Note: Sometimes theres no app. Example a packet coming into a gateway. > Re error code returned. Error codes returned by pfkey never were > perfect. But your experiment is not perfect too. You sent pings with no > KE deamon running. Note what my goals were. > pfkey code found that there is nothing receiving > acquire messages => there is no chance that any process will setup > required SAs and tried to inform about that (I agree, return code is not > very informative, at least until you learn about reasons why it is > such). If you would have racoon (or other pfkey based ISAKMP daemon) > running, you would get "resource temporarily unavailable" (don't know > which error code corresponds to that message), which IMHO is ok (if it > is not, please explain). > Havent tried that - the reason i said restart was the right signal was mainly that an app could translate that to mean "try again". In other words even in the case of ping -c1 the ping app could have reattempted. On Sat, 2005-04-02 at 07:25, Zilvinas Valinskas wrote: > EBUSY I think it is. > > I am not entirely sure it is ok to return such error, some applications are > not coping nicely with it. Perhaps ECONNREFUSED is more reasonable - as it > doesn't brake old apps assumption (connection cannot be established, > doesn't matter if that is due to routing or IPsec SPD or anything else). > What about ERESTART the way netlink does it right now? ECONNREFUSED is probably not a bad idea. ping was clearly dumb and didnt do anything with the info. Overall, I think the errors are unfortunately not descriptive at all. cheers, jamal ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: IPSEC: on behavior of acquire 2005-04-02 21:28 ` jamal @ 2005-04-03 8:28 ` Aidas Kasparas 2005-04-03 14:29 ` jamal 0 siblings, 1 reply; 16+ messages in thread From: Aidas Kasparas @ 2005-04-03 8:28 UTC (permalink / raw) To: hadi; +Cc: ipsec-tools-devel, netdev, nakam jamal wrote: > On Sat, 2005-04-02 at 02:10, Aidas Kasparas wrote: > > >>Re 1 try only. There is little sense to do more tries. If there is no >>deamon listening to pfkey messages, then no connection will be made no >>matter how many retries you'll do. If deamon/link/peer is slow and SA >>was not established before timeout expired, then repeated acquire will >>be simply ignored (deamon will find out that negotiation is already in >>progress, there is no reason to start another negotiation and therefore >>will drop that acquire request). And the only situation where repeated >>acquires may help is when pfkey messages are lost. > > > Exactly what i was trying to emulate - lost messages. Your emulation was not correct. More correct would have been to start KE daemon, let it fully initialize (open pfkey socket, inform kernel that it is interested in acquire messages), then stop it (via debugger or kill -STOP) and only then send pings or other traffic and see what will happen. This is because there are different paths in xfrm+pfkey for cases 1) when there is no KE daemon and 2) when daemon is, but for some reason it does not establish a SA and therefore reaction to traffic is different. In the first case it's xfrm_lookup() ->xfrm_tmpl_resolve() ->xfrm_state_find() ->xfrm_state.c:km_query() ->pfkey_send_acquire() ->pfkey_broadcast() ->return -ESRCH. This error code goes unchanged back to xfrm_state_find, where it is remaped into itself (other possible values are -EAGAIN and -ENOMEM). And then this error code goes back to application. In the second case it's xfrm_lookup() ->xfrm_tmpl_resolve() ->xfrm_state_find() ->xfrm_state.c:km_query() ->pfkey_send_acquire() ->pfkey_broadcast() ->pfkey_broadcast_one() -> return 0 also sent unchanged back to function xfrm_state_find, where SA is put into state XFRM_STATE_ACQ. xfrm_tmpl_resolve() returns -EAGAIN. xfrm_lookup then organizes timeout, and if the state was not changed after that timeout, returns -EAGAIN to the application. On the other hand, analysis above shows that return code is choosen by xfrm framework, therefore if error code has to be changed, it should be changed in xfrm, not in pfkey or netlink code. > I would expect it > to be the rule to loose messages - but given theres no guarantee of > delivery, messages could be lost. > > >>But pfkey was not >>designed to survive message loses, therefore you should not operate your >>boxes in mode when lost pfkey messages are a rule, not an exception. And >>on the other hand, occasional pfkey message loses can be worked around >>by applications/user retry. >> > > > I think its more than just pfkey (or netlink) - rather the ipsec > framework itself. > > One could look at the acquire as part of the "connection" setup > (for lack of better description). Without the acquire succeeding, theres > no connection..(assuming that to be a policy). > Therefore if acquire is not supposed to be delivered with some certainty > (read: retries) then theres some resiliciency issues IMO. OK, To avoid speaking about apples and oranges let's first find out where you see the problem. In the ipsec framework there are the following players (I'm speaking about pfkey case; netlink may be little different): xfrm <-> pfkey <-> KE daemon <-> remote peer xfrm-pfkey communication is based on function calls. For them to fail something really weird has to happen with your kernel. KE deamon - remote peer communications are done on UDP/500, UDP/4500 according to internet standards. Packet retransmissions are implemented the way standards require, therefore it is not a fatal condition if some packet will be lost on the way. And there is no 1:1 correspondence between packets sent over internet and those sent over pfkey socket. These communications are performed relatively independent. There is no need to receive extra acquire pfkey message to retransmit packet which initiates SA setup with remote peer. pfkey - KE daemon communication is performed over message socket. All the communication is performed within single box. More, only the kernel and userspace process are involved. Therefore I see only the following cases when message can be not delivered: 1) message is too big to fit into socket's buffer; 2) kernel decides to drop that socket buffer and reuse memory for something else; 3) KE daemon do not get [enough] CPU time to handle messages; 4) bug in KE daemon prevents it from reading messages. if you know other case, please, let me know. (1) do happens when there is big SPD/SAD and setkey/racoon request to dump it all. It is known pfkey architectural limitation. Acquire messages are small, therefore this can happen only when such call is made right after responce to big DUMP was generated. In racoon case SPD dump is performed only on daemon startup (and even then it is possible that it is not strictly necessary). Extra acquire message may make sense only if it is sent after some timeout. But again, KE daemon start is more exception than rule and applications can be started only after some delay after KE daemon has started. I'm not sure how realistic is (2). But it and (3) are clear resource shortage cases. Under no circumstances they should be allowed. And in (3) case extra acquire message definitely won't help situation. Inn (4) case it is KE daemon who is guilty, not pfkey. Extra message will not cure this case too. > > Note: Sometimes theres no app. Example a packet coming into a gateway. > What do you have in mind? If it is ISAKMP negotiation from remote peer, then it comes over UDP/500 or UDP/4500 over IP socket and not via acquire message via pfkey socket. If it is ESP/AH packet with unknown SPI, then kernel simply drops it and do not send any acquire messages. If it is something else, please explain. >> pfkey code found that there is nothing receiving >>acquire messages => there is no chance that any process will setup >>required SAs and tried to inform about that (I agree, return code is not >>very informative, at least until you learn about reasons why it is >>such). If you would have racoon (or other pfkey based ISAKMP daemon) >>running, you would get "resource temporarily unavailable" (don't know >>which error code corresponds to that message), which IMHO is ok (if it >>is not, please explain). >> > > > Havent tried that - the reason i said restart was the right signal was > mainly that an app could translate that to mean "try again". > In other words even in the case of ping -c1 the ping app could have > reattempted. If there is security policy which is not satisfied and there is nobody which could make it satisfied, then why should we give application false hope that on retry things will change? > > On Sat, 2005-04-02 at 07:25, Zilvinas Valinskas wrote: > >>EBUSY I think it is. >> >>I am not entirely sure it is ok to return such error, some applications are >>not coping nicely with it. Perhaps ECONNREFUSED is more reasonable - as it >>doesn't brake old apps assumption (connection cannot be established, >>doesn't matter if that is due to routing or IPsec SPD or anything else). >> > > > What about ERESTART the way netlink does it right now? I suspect that ERESTART is generated not by netlink, but by xfrm_lookup() function when signal_pending(current) is true. Why that function returns true in netlink case but not in pfkey case I don't know. IMHO, xfrm_lookup() returns correct error codes in that case. > ECONNREFUSED is probably not a bad idea. > ping was clearly dumb and didnt do anything with the info. > Overall, I think the errors are unfortunately not descriptive at all. I don't like ECONNREFUSED in this place. As a user if I would receive ECONNREFUSED message then I would address application server admin or remote host admin to resolve the problem. But the problem is in network setup and therefore person responsible for networks should be contacted. Therefore, I would like more ENETUNREACH or EHOSTUNREACH. P.S. for analysis kernel source from debian distribution was used (v.2.6.9) -- Aidas Kasparas IT administrator GM Consult Group, UAB ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: IPSEC: on behavior of acquire 2005-04-03 8:28 ` Aidas Kasparas @ 2005-04-03 14:29 ` jamal 2005-04-03 22:02 ` Aidas Kasparas 0 siblings, 1 reply; 16+ messages in thread From: jamal @ 2005-04-03 14:29 UTC (permalink / raw) To: Aidas Kasparas; +Cc: ipsec-tools-devel, netdev, nakam On Sun, 2005-04-03 at 04:28, Aidas Kasparas wrote: > jamal wrote: > > Exactly what i was trying to emulate - lost messages. > > Your emulation was not correct. More correct would have been to start KE > daemon, let it fully initialize (open pfkey socket, inform kernel that > it is interested in acquire messages), then stop it (via debugger or > kill -STOP) and only then send pings or other traffic and see what will > happen. This is because there are different paths in xfrm+pfkey for > cases 1) when there is no KE daemon and 2) when daemon is, but for some > reason it does not establish a SA and therefore reaction to traffic is > different. > I dont think that would work. To summarize what happens in the kernel: everything leads to km_query() as you have indicated in your text. If the kernel finds someone/thing has either a pfkey or netlink socket open it sends a acquire to them. In the code you are probably looking at (before i created the patch) - the first user/daemon the kernel sees (either pfkey or netlink based) that has a socket open will receive an acquire and the kernel will give up after that. As an example, if the first pfkey user was just doing "setkey -x" and the second was infact pluto, then pluto will never see the acquire. This is what got me looking at it to begin with. Look at the earlier postings on the subject. So in other words, just killing the ike server as you propose would mean the kernel has no open sockets and will therefore never bother to send an acquire. Still all this is moot and is distracting us from the main discussion. Lets define "lost" simply as the case where an acquire never got to the server (which may be sitting elsewhere on the network). In that case what i did is sufficient. i.e. The methods to create this are not the issue. The issue at stake is the behavior of the kernel in generating the acquires. [..] > On the other hand, analysis above shows that return code is choosen by > xfrm framework, therefore if error code has to be changed, it should be > changed in xfrm, not in pfkey or netlink code. The control for both is under generic code. The end return code - you are right, thats user behavior and should match. > > One could look at the acquire as part of the "connection" setup > > (for lack of better description). Without the acquire succeeding, theres > > no connection..(assuming that to be a policy). > > Therefore if acquire is not supposed to be delivered with some certainty > > (read: retries) then theres some resiliciency issues IMO. > > OK, To avoid speaking about apples and oranges let's first find out > where you see the problem. In the ipsec framework there are the > following players (I'm speaking about pfkey case; netlink may be little > different): > > xfrm <-> pfkey <-> KE daemon <-> remote peer > > xfrm-pfkey communication is based on function calls. For them to fail > something really weird has to happen with your kernel. > > KE deamon - remote peer communications are done on UDP/500, UDP/4500 > according to internet standards. Packet retransmissions are implemented > the way standards require, therefore it is not a fatal condition if some > packet will be lost on the way. Please refer to my earlier definition of what "lost" means. It doesnt matter where the breakage happens really. Think of everything to the right of "xfrm" in your diagram as a black box (i.e that second thing could be pfkey or netlink - thats not the issue). Think of some message that is supposed to reach the KE daemon (make it interesting and say it is remote KE) then think of that message never making it because something in the blackbox swallowed it. If that packet is the first one and it needs to do so for the sake of setup for subsequent packets - then the desire to have it reach its destination is very imprtant. There is no progress for it or subsequent packets if it doesnt make it. The solution being proposed for Linux to treat that xfrm piece in the same fashion as ARP is correct. Read the email from Alexey. Imagine if ARP was only issued once(as does pfkey) or forever(as does netlink). I believe this is an issue with ipsec architecture itself - someone needs to write an IETF draft on it. > > > > > Note: Sometimes theres no app. Example a packet coming into a gateway. > > > > What do you have in mind? > > If it is ISAKMP negotiation from remote peer, then it comes over UDP/500 > or UDP/4500 over IP socket and not via acquire message via pfkey socket. > > If it is ESP/AH packet with unknown SPI, then kernel simply drops it and > do not send any acquire messages. > I was thinking more of this second scenario with incoming from clear text domain and gateway encrypting assuming proper policy setup. I would have to go and reread the "opportunistic" encryption draft closely to make sense. > > Havent tried that - the reason i said restart was the right signal was > > mainly that an app could translate that to mean "try again". > > In other words even in the case of ping -c1 the ping app could have > > reattempted. > > If there is security policy which is not satisfied and there is nobody > which could make it satisfied, then why should we give application false > hope that on retry things will change? > In the case of knowing it is the policy that is not satisfied i think it would make sense to not to tell the app to retry. > > > > What about ERESTART the way netlink does it right now? > > I suspect that ERESTART is generated not by netlink, but by > xfrm_lookup() function when signal_pending(current) is true. Why that > function returns true in netlink case but not in pfkey case I don't > know. IMHO, xfrm_lookup() returns correct error codes in that case. > yes, you are correct. > > ECONNREFUSED is probably not a bad idea. > > ping was clearly dumb and didnt do anything with the info. > > Overall, I think the errors are unfortunately not descriptive at all. > > I don't like ECONNREFUSED in this place. As a user if I would receive > ECONNREFUSED message then I would address application server admin or > remote host admin to resolve the problem. But the problem is in network > setup and therefore person responsible for networks should be contacted. > Therefore, I would like more ENETUNREACH or EHOSTUNREACH. > Agreed to this as well. I think this is what would happen in the case of ARP failure as well. ECONNREFUSED would make sense in the case where the policy rejected progress. cheers, jamal ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: IPSEC: on behavior of acquire 2005-04-03 14:29 ` jamal @ 2005-04-03 22:02 ` Aidas Kasparas 2005-04-04 12:33 ` [Ipsec-tools-devel] " jamal 0 siblings, 1 reply; 16+ messages in thread From: Aidas Kasparas @ 2005-04-03 22:02 UTC (permalink / raw) To: hadi; +Cc: ipsec-tools-devel, netdev, nakam jamal wrote: > On Sun, 2005-04-03 at 04:28, Aidas Kasparas wrote: > >>jamal wrote: > > >>>Exactly what i was trying to emulate - lost messages. >> >>Your emulation was not correct. More correct would have been to start KE >>daemon, let it fully initialize (open pfkey socket, inform kernel that >>it is interested in acquire messages), then stop it (via debugger or >>kill -STOP) and only then send pings or other traffic and see what will >>happen. This is because there are different paths in xfrm+pfkey for >>cases 1) when there is no KE daemon and 2) when daemon is, but for some >>reason it does not establish a SA and therefore reaction to traffic is >>different. >> > > > I dont think that would work. > To summarize what happens in the kernel: everything leads to km_query() > as you have indicated in your text. > If the kernel finds someone/thing has either a pfkey or netlink socket > open it sends a acquire to them. In the code you are probably looking at > (before i created the patch) - the first user/daemon the kernel sees > (either pfkey or netlink based) that has a socket open > will receive an acquire and the kernel will give up after that. > > As an example, if the first pfkey user was just doing "setkey -x" and > the second was infact pluto, then pluto will never see the > acquire. This is what got me looking at it to begin with. Look at the > earlier postings on the subject. While I agree that code before your patch would not allow to cooperate tools using different ways to manage SAD/SPD (pfkey vs netlink), I have one setup in production where two instances of racoon runs simultaneously and both gets required pfkey-messages. > So in other words, just killing the ike server as you propose would mean > the kernel has no open sockets and will therefore never bother to send > an acquire. I proposed to stop KE server, not to kill it. > > Still all this is moot and is distracting us from the main discussion. > Lets define "lost" simply as the case where an acquire never got to the > server (which may be sitting elsewhere on the network). ACQUIREs _never_ _leaves_ _the box_ they are generated. It is allways kernel-to-userspace_process communication. It could be made reliable. And present situation IS sufficiently reliable. In that case > what i did is sufficient. i.e. The methods to create this are not the > issue. The issue at stake is the behavior of the kernel in generating > the acquires. > See below. > > Please refer to my earlier definition of what "lost" means. It doesnt > matter where the breakage happens really. > Think of everything to the right of "xfrm" in your diagram as a black > box (i.e that second thing could be pfkey or netlink - thats not the > issue). > Think of some message that is supposed to reach the KE daemon > (make it interesting and say it is remote KE) then think of that message > never making it because something in the blackbox swallowed it. > If that packet is the first one and it needs to do so for the sake of > setup for subsequent packets - then the desire to have it reach its > destination is very imprtant. There is no progress for it or subsequent > packets if it doesnt make it. OK, let's talk about architecture xfrm <-> blackbox. In this architecture communication between these two elements (I do not speak about any comms in the blackbox) can be of two types: 1) reliable (messages always reach blackbox or error is reported); 2) unreliable (messages may fail even to reach blackbox). With good blackboxes good ipsec system can be built using any of comm types. But: a) (1) will be more reliable; b) (1) will be more simple (at least xfrm side, as it will not require retransmisions); c) (1) is implemented now (as a function call). What I want to say is xfrm-to-blackbox interface is good as it is. The problem may only be in how good the blackbox is. And here we have to look inside blackbox and start talk about particular implementations of that blackbox. Retransmitions, if they needed, needs to be inside that blackbox. > > The solution being proposed for Linux to treat that xfrm piece in the > same fashion as ARP is correct. Read the email from Alexey. Imagine if > ARP was only issued once(as does pfkey) or forever(as does netlink). > I have read email from Alexey. I think that xfrm_lookup() function implements functionality very similar to functionality which Alexey described. And I think that direct comparison of ARP messages and pfkey messages is not fair, because pfkey acquire messages goes over reliable traffic and are used only to _initiate_ the process of SA negotiation. ARP has to receive information from other boxes which send it only as a direct responce to some packet. More, ARP is designed to be used [amogst others] on networks which loose some traffic by design. > I believe this is an issue with ipsec architecture itself - someone > needs to write an IETF draft on it. > I still do not see the topic for such draft. > >>> >>>Note: Sometimes theres no app. Example a packet coming into a gateway. >>> >> >>What do you have in mind? >> >>If it is ISAKMP negotiation from remote peer, then it comes over UDP/500 >>or UDP/4500 over IP socket and not via acquire message via pfkey socket. >> >>If it is ESP/AH packet with unknown SPI, then kernel simply drops it and >>do not send any acquire messages. >> > > > I was thinking more of this second scenario with incoming from clear > text domain and gateway encrypting assuming proper policy setup. If you're talking about network behind security gateway communicating to host or network for which there is security policy configured on gateway, then acquire message will be generated on that security gateway, when that packet will be considered for forwarding. Again, that acquire messages never will leave security gateway. > I would have to go and reread the "opportunistic" encryption draft > closely to make sense. > Speaking of "opportunistic" encryption. I never understood it. Ipsec-tools do not implement it. And in the year or so when I'm involved with it, I don't remember anybody even asking or mentioning about this feature. Therefore, I don't care about it -- users do not need it. -- Aidas Kasparas IT administrator GM Consult Group, UAB ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Ipsec-tools-devel] Re: IPSEC: on behavior of acquire 2005-04-03 22:02 ` Aidas Kasparas @ 2005-04-04 12:33 ` jamal 2005-04-04 12:59 ` Aidas Kasparas 0 siblings, 1 reply; 16+ messages in thread From: jamal @ 2005-04-04 12:33 UTC (permalink / raw) To: Aidas Kasparas; +Cc: ipsec-tools-devel, netdev, nakam On Sun, 2005-04-03 at 18:02, Aidas Kasparas wrote: > jamal wrote: [..] > > As an example, if the first pfkey user was just doing "setkey -x" and > > the second was infact pluto, then pluto will never see the > > acquire. This is what got me looking at it to begin with. Look at the > > earlier postings on the subject. > > While I agree that code before your patch would not allow to cooperate > tools using different ways to manage SAD/SPD (pfkey vs netlink), I have > one setup in production where two instances of racoon runs > simultaneously and both gets required pfkey-messages. > yes, multiple instances of the same socket type would work. Try running "ip xfrm mon" and your two racoon instances and see what happens;-> Anyways this will be fixed in upcoming kernels. > > So in other words, just killing the ike server as you propose would mean > > the kernel has no open sockets and will therefore never bother to send > > an acquire. > > I proposed to stop KE server, not to kill it. > The goal is: An acquire that the kernel thinks it sent successfuly in order to update a SA larval state never made it. To simulate this, it doesnt matter whether it happened in kernel-user space boundary or afterwards. The simple observation to make is: the kernel thinks the desired objective has been reached when it was not and from the little investigation conclude the kernel did not try to reliably deliver the message. > > > > Still all this is moot and is distracting us from the main discussion. > > Lets define "lost" simply as the case where an acquire never got to the > > server (which may be sitting elsewhere on the network). > > ACQUIREs _never_ _leaves_ _the box_ they are generated. It is allways > kernel-to-userspace_process communication. It could be made reliable. > And present situation IS sufficiently reliable. > I think i have made a bad case of explaining. Yes, I know where acquires terminate. However this is not about where acquires terminate. It is insufficient to assume that a succesful acquire to user space equates to successful interaction to the KE server which will do an update. Does that make more sense? If you issue an acquire from the kernel it will result in a domino effect in the blocks to the right of xfrm in your diagram and the end result is the larval SA gets an update (as a result of the acquire). So ignore where/how the acquire gets there and imagine that kernel sent an acquire so you could get an SA update then it will become clear. > OK, let's talk about architecture xfrm <-> blackbox. In this > architecture communication between these two elements (I do not speak > about any comms in the blackbox) can be of two types: > 1) reliable (messages always reach blackbox or error is reported); > 2) unreliable (messages may fail even to reach blackbox). > > With good blackboxes good ipsec system can be built using any of comm > types. But: > a) (1) will be more reliable; > b) (1) will be more simple (at least xfrm side, as it will not require > retransmisions); > c) (1) is implemented now (as a function call). > > What I want to say is xfrm-to-blackbox interface is good as it is. The > problem may only be in how good the blackbox is. And here we have to > look inside blackbox and start talk about particular implementations of > that blackbox. Retransmitions, if they needed, needs to be inside that > blackbox. > I am not sure i followed what you are actually trying to say above. Lets discuss basics of how reliability is achieved. If you want to have something reliably delivered after you transmit you do several basic things: a) you wait for an end acknowledgement, in this case an update to the acquire b) you timeout within reasonable time (30 seconds seems to be the default in acquire) and c) you retransmit upto a maximum number of times. This is the part that is missing > > > > The solution being proposed for Linux to treat that xfrm piece in the > > same fashion as ARP is correct. Read the email from Alexey. Imagine if > > ARP was only issued once(as does pfkey) or forever(as does netlink). > > > > I have read email from Alexey. I think that xfrm_lookup() function > implements functionality very similar to functionality which Alexey > described. Absolutely not. But this is a good sign - i.e you see the desire to do this, you just think its already there. > And I think that direct comparison of ARP messages and pfkey messages is > not fair, because pfkey acquire messages goes over reliable traffic and > are used only to _initiate_ the process of SA negotiation. ARP has to > receive information from other boxes which send it only as a direct > responce to some packet. More, ARP is designed to be used [amogst > others] on networks which loose some traffic by design. > Please refer to my above statements as to what is missing to complete the equation. > > I believe this is an issue with ipsec architecture itself - someone > > needs to write an IETF draft on it. > > > > I still do not see the topic for such draft. > Read again what i said above. cheers, jamal ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Ipsec-tools-devel] Re: IPSEC: on behavior of acquire 2005-04-04 12:33 ` [Ipsec-tools-devel] " jamal @ 2005-04-04 12:59 ` Aidas Kasparas 2005-04-04 13:09 ` jamal 0 siblings, 1 reply; 16+ messages in thread From: Aidas Kasparas @ 2005-04-04 12:59 UTC (permalink / raw) To: hadi; +Cc: ipsec-tools-devel, netdev, nakam jamal wrote: > I think i have made a bad case of explaining. > Yes, I know where acquires terminate. However this is not about where > acquires terminate. It is insufficient to assume that a succesful > acquire to user space equates to successful interaction to the KE server > which will do an update. Why? -- Aidas Kasparas IT administrator GM Consult Group, UAB ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Ipsec-tools-devel] Re: IPSEC: on behavior of acquire 2005-04-04 12:59 ` Aidas Kasparas @ 2005-04-04 13:09 ` jamal 2005-04-04 14:20 ` Aidas Kasparas 0 siblings, 1 reply; 16+ messages in thread From: jamal @ 2005-04-04 13:09 UTC (permalink / raw) To: Aidas Kasparas; +Cc: ipsec-tools-devel, netdev, nakam On Mon, 2005-04-04 at 08:59, Aidas Kasparas wrote: > jamal wrote: > > I think i have made a bad case of explaining. > > Yes, I know where acquires terminate. However this is not about where > > acquires terminate. It is insufficient to assume that a succesful > > acquire to user space equates to successful interaction to the KE server > > which will do an update. > > Why? The reason the kernel sends an acquire is to update larval SAs it created. The result is either updating the SA or a rejection for that matter. Else theres failure in communication. Anology: If you are trying to send a message from one end system to another and there are multiple hops between them, then just because it made it to the first hop does not equate it made it to its final destination. To make it to the final destination, the confirmation has to come from the target end. So if you said the KE was the final destination then kernel to user space was the first hop. I am not sure if this is clear as an analogy. cheers, jamal ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Ipsec-tools-devel] Re: IPSEC: on behavior of acquire 2005-04-04 13:09 ` jamal @ 2005-04-04 14:20 ` Aidas Kasparas 0 siblings, 0 replies; 16+ messages in thread From: Aidas Kasparas @ 2005-04-04 14:20 UTC (permalink / raw) To: hadi; +Cc: ipsec-tools-devel, netdev, nakam jamal wrote: > On Mon, 2005-04-04 at 08:59, Aidas Kasparas wrote: > >>jamal wrote: >> >>>I think i have made a bad case of explaining. >>>Yes, I know where acquires terminate. However this is not about where >>>acquires terminate. It is insufficient to assume that a succesful >>>acquire to user space equates to successful interaction to the KE server >>>which will do an update. >> >>Why? > > > The reason the kernel sends an acquire is to update larval SAs it > created. The result is either updating the SA or a rejection for that > matter. Else theres failure in communication. > > Anology: If you are trying to send a message from one end system > to another and there are multiple hops between them, then just because > it made it to the first hop does not equate it made it to its final > destination. To make it to the final destination, the confirmation has > to come from the target end. > So if you said the KE was the final destination then kernel to user > space was the first hop. > I am not sure if this is clear as an analogy. OK, if you have a chain with sevaral hops, then probably there is no better way than signal from other end that it got something. The thing we do not agree is how this should be managed and supervised. I would like to provide an analogy too. You have a telenet application. You try to connect to some host:port. Your telnet application just makes connect(2) syscall and do not cares how kernel establishes that connection. What MAC address to send packet to, how and when to retransmit syn packet if the ack was not received in timely fashion, and so on, so on, so on. If kernel does his job fine, then we have connected socket on which to communicate further. If it does not, or there are some problems on the target host or network in between, then we will not have that connected socket - syscall will return an error. With ipsec system the situation is quite similar, just kernel and userspace have swaped places. Kernel told the userspace to update larval SA. Userspace works on that. If it has negotiated keys for that SA with KE at remote site, fine, userspace will update SA. If there are problems, and key negotiation is not possible -- these SA will not get updated and eventually will die. But single signal to userspace is sufficient for that process to be performed. Yes, kernel can check state of SA every time some packet has to use that SA. But to make noise by asking "please negotiate the SA which you're supposed to be negotiating already" ... IMHO it is contrproductive. -- Aidas Kasparas IT administrator GM Consult Group, UAB ^ permalink raw reply [flat|nested] 16+ messages in thread
* IPSEC: on behavior of acquire @ 2005-04-02 1:25 jamal 2005-04-02 2:12 ` Herbert Xu 2005-04-02 14:00 ` Alexey Kuznetsov 0 siblings, 2 replies; 16+ messages in thread From: jamal @ 2005-04-02 1:25 UTC (permalink / raw) To: Herbert Xu, David S. Miller, Masahide NAKAMURA Cc: psec-tools-devel, netdev, kaber, kuznet, jmorris Folks, Theres something wrong in the way acquire works - IMO in both pfkey and netlink. I asked this before but didnt get satisfactory answer. Masahide-san and myself have had private exchanges and we are both unsatisfied with current situation. Theres probably a spec or known good practise documented somewhere ... Let me provide some testcases then theorize. The idea is to simulate a situation where the kernel thinks a km is listening (it could be there but just non-responsive) or just a scenario where the acquire gets lost. You need the current events patches to see this. test1)on one window run setkey -x: ping -c 1 someDST -1) packet arrives towards outbound 0) Larval state created 1) one acquire sent. 2) timeout. 3) packet dropped. -ESRCH returned. 4) larval state deleted So question 1): Shouldnt the return code be -ERESTART to ask the app to retry? question 2) Why is there a hardcoding of 1 try only? ping -c2 someDST Same as above (Steps -1 to 4) repeated twice one for each packet sent ping -c3 DST Same as above repeated 3 times. test2) With ip x m (but not setkey). ping -c 1 DST -1) packet arrives 0) Larval state created Loop: 1) one acquire sent. 2) timeout. go to loop. So loop has no way to break. ping is hang waiting. the only way to break out is by hitting control-c on prompt. I think ping gets a -ERESTART which i believe is the correct signal? When you hit control-c Larval state is deleted. Clearly this is not desirable. We want at some point to give up. Question: Can we have a configurable max retries (sysctl settable) for acquire - or does it already exist just not being used? Couldnt find any staring at the code. ping -c2/3 DST does not change the above behavior. Ping is hang after first packet - so it doesnt matter. The conclusion we reached in our discussion is: a) -ERESTART is the correct signal to return b) number of acquire retries should be configurable preferably a system wide value. Thoughts? cheers, jamal ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: IPSEC: on behavior of acquire 2005-04-02 1:25 jamal @ 2005-04-02 2:12 ` Herbert Xu 2005-04-02 14:00 ` Alexey Kuznetsov 1 sibling, 0 replies; 16+ messages in thread From: Herbert Xu @ 2005-04-02 2:12 UTC (permalink / raw) To: jamal Cc: David S. Miller, Masahide NAKAMURA, psec-tools-devel, netdev, kaber, kuznet, jmorris On Fri, Apr 01, 2005 at 08:25:44PM -0500, jamal wrote: > > The conclusion we reached in our discussion is: > a) -ERESTART is the correct signal to return > b) number of acquire retries should be configurable preferably a system > wide value. > > Thoughts? Once we have the xfrm resolution stuff that Patrick is working on, we can have knobs for these cases just like those in the neighbour code. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: IPSEC: on behavior of acquire 2005-04-02 1:25 jamal 2005-04-02 2:12 ` Herbert Xu @ 2005-04-02 14:00 ` Alexey Kuznetsov 2005-04-02 21:42 ` jamal 1 sibling, 1 reply; 16+ messages in thread From: Alexey Kuznetsov @ 2005-04-02 14:00 UTC (permalink / raw) To: jamal Cc: Herbert Xu, David S. Miller, Masahide NAKAMURA, psec-tools-devel, netdev, kaber, kuznet, jmorris Hello! > a) -ERESTART is the correct signal to return Right behaviour is to behave like ARP. A few of packets are queued, no errors (until timeout), no blocking. Alexey ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: IPSEC: on behavior of acquire 2005-04-02 14:00 ` Alexey Kuznetsov @ 2005-04-02 21:42 ` jamal 2005-04-02 21:52 ` Thomas Graf 2005-04-03 15:52 ` Patrick McHardy 0 siblings, 2 replies; 16+ messages in thread From: jamal @ 2005-04-02 21:42 UTC (permalink / raw) To: Alexey Kuznetsov Cc: Herbert Xu, David S. Miller, Masahide NAKAMURA, ipsec-tools-devel, netdev, kaber, jmorris On Sat, 2005-04-02 at 09:00, Alexey Kuznetsov wrote: > Hello! > > > a) -ERESTART is the correct signal to return > > Right behaviour is to behave like ARP. A few of packets are queued, > no errors (until timeout), no blocking. Herbert also mentions something along the same lines in his email. This would make a lot of sense! Is the state machine going to look something along the same lines as ARP? i.e incomplete->reachable etc? What would be a good code to return when you queue the packet? cheers, jamal ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: IPSEC: on behavior of acquire 2005-04-02 21:42 ` jamal @ 2005-04-02 21:52 ` Thomas Graf 2005-04-03 15:52 ` Patrick McHardy 1 sibling, 0 replies; 16+ messages in thread From: Thomas Graf @ 2005-04-02 21:52 UTC (permalink / raw) To: jamal Cc: Alexey Kuznetsov, Herbert Xu, David S. Miller, Masahide NAKAMURA, ipsec-tools-devel, netdev, kaber, jmorris * jamal <1112478168.1088.337.camel@jzny.localdomain> 2005-04-02 16:42 > Herbert also mentions something along the same lines in his email. > This would make a lot of sense! > Is the state machine going to look something along the same lines as > ARP? i.e incomplete->reachable etc? > > What would be a good code to return when you queue the packet? EINPROGRESS? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: IPSEC: on behavior of acquire 2005-04-02 21:42 ` jamal 2005-04-02 21:52 ` Thomas Graf @ 2005-04-03 15:52 ` Patrick McHardy 1 sibling, 0 replies; 16+ messages in thread From: Patrick McHardy @ 2005-04-03 15:52 UTC (permalink / raw) To: hadi Cc: Alexey Kuznetsov, Herbert Xu, David S. Miller, Masahide NAKAMURA, ipsec-tools-devel, netdev, jmorris jamal wrote: > Herbert also mentions something along the same lines in his email. > This would make a lot of sense! > Is the state machine going to look something along the same lines as > ARP? i.e incomplete->reachable etc? Yes, from a bundle POV. In my current approach a single state is resolved at a time and resolution is driven by XFRM_STATE_ACQ->* state transitions. > What would be a good code to return when you queue the packet? It should be transparent, so 0. Regards Patrick ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2005-04-04 14:20 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1112405303.1096.37.camel@jzny.localdomain>
2005-04-02 7:10 ` IPSEC: on behavior of acquire Aidas Kasparas
2005-04-02 12:25 ` [Ipsec-tools-devel] " Zilvinas Valinskas
2005-04-02 21:28 ` jamal
2005-04-03 8:28 ` Aidas Kasparas
2005-04-03 14:29 ` jamal
2005-04-03 22:02 ` Aidas Kasparas
2005-04-04 12:33 ` [Ipsec-tools-devel] " jamal
2005-04-04 12:59 ` Aidas Kasparas
2005-04-04 13:09 ` jamal
2005-04-04 14:20 ` Aidas Kasparas
2005-04-02 1:25 jamal
2005-04-02 2:12 ` Herbert Xu
2005-04-02 14:00 ` Alexey Kuznetsov
2005-04-02 21:42 ` jamal
2005-04-02 21:52 ` Thomas Graf
2005-04-03 15:52 ` Patrick McHardy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).