* Regarding offloading IPv6 addrconf and ndisc
@ 2006-07-27 11:25 Hugo Santos
2006-07-27 12:25 ` Kazunori Miyazawa
0 siblings, 1 reply; 32+ messages in thread
From: Hugo Santos @ 2006-07-27 11:25 UTC (permalink / raw)
To: yoshfuji, davem; +Cc: netdev, usagi-core
[-- Attachment #1: Type: text/plain, Size: 1014 bytes --]
Hi all,
In the same line as some of the recent IPv6 patches being submited
for comments, and taking into consideration RFCs such as 'SEcure
Neighbor Discovery (SEND)' (RFC 3971) and 'Cryptographically Generated
Addresses (CGA)' (RFC 3972) where the complexity associated with
maintaining addresses and performing neighbor discovering increases
considerably, what would be the possibility of inclusion of code that
would allow the outsource of address configuration, DAD and neighbor
discovery to a user-space control application (being that the first two
can already be somewhat outsourced)? Of course that the final decision
is always based on the patch itself but i would like to probe the
developers about the possibility of ever merging such code. Personally
i believe that this kind of control logic should always be in
user-space to allow for greater flexibility -- but i'm aware that lots
of people prefer to have it in kernel to minimize deployment
dependencies.
Comments?
Hugo
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-27 11:25 Regarding offloading IPv6 addrconf and ndisc Hugo Santos
@ 2006-07-27 12:25 ` Kazunori Miyazawa
2006-07-27 17:56 ` Hugo Santos
2006-07-27 23:56 ` Herbert Xu
0 siblings, 2 replies; 32+ messages in thread
From: Kazunori Miyazawa @ 2006-07-27 12:25 UTC (permalink / raw)
To: yoshfuji, davem, netdev, usagi-core
Hi,
I'm interested in the approach. And I have a couple of comments.
I think DAD and ND are time critical operations.
Can the daemons process with confirming to the specs.
even if it were swapped out?
Can we prevent the oom killer from killing the daemons?
Anyway, we have to consider Pros. and Cons of the approach.
Regards,
Hugo Santos wrote:
> Hi all,
>
> In the same line as some of the recent IPv6 patches being submited
> for comments, and taking into consideration RFCs such as 'SEcure
> Neighbor Discovery (SEND)' (RFC 3971) and 'Cryptographically Generated
> Addresses (CGA)' (RFC 3972) where the complexity associated with
> maintaining addresses and performing neighbor discovering increases
> considerably, what would be the possibility of inclusion of code that
> would allow the outsource of address configuration, DAD and neighbor
> discovery to a user-space control application (being that the first two
> can already be somewhat outsourced)? Of course that the final decision
> is always based on the patch itself but i would like to probe the
> developers about the possibility of ever merging such code. Personally
> i believe that this kind of control logic should always be in
> user-space to allow for greater flexibility -- but i'm aware that lots
> of people prefer to have it in kernel to minimize deployment
> dependencies.
>
> Comments?
>
> Hugo
--
Kazunori Miyazawa
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-27 12:25 ` Kazunori Miyazawa
@ 2006-07-27 17:56 ` Hugo Santos
2006-07-27 23:56 ` Herbert Xu
1 sibling, 0 replies; 32+ messages in thread
From: Hugo Santos @ 2006-07-27 17:56 UTC (permalink / raw)
To: Kazunori Miyazawa; +Cc: yoshfuji, davem, netdev, usagi-core
[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]
Hi,
> I'm interested in the approach. And I have a couple of comments.
> I think DAD and ND are time critical operations.
> Can the daemons process with confirming to the specs.
My tests indicate that yes, even when considering mobility scenarios
where expected times are reduced. There is always the possibility of
giving the process a priority high enough to work as well as possible
in most usages.
> even if it were swapped out?
Depending on the implementation, you may lock all or part of the
process address space so it doesn't get swapped out.
> Can we prevent the oom killer from killing the daemons?
I don't think we can prevent it, but we may adjust it's score via
/proc/pid/oom_adj as it's likeness of being killed being very reduced.
> Anyway, we have to consider Pros. and Cons of the approach.
As long as there is no impact in the current functionality and the
changes which are required from an architectual POV are acceptable by
the main developers, i don't see anything against.
Waiting for input,
Hugo
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-27 12:25 ` Kazunori Miyazawa
2006-07-27 17:56 ` Hugo Santos
@ 2006-07-27 23:56 ` Herbert Xu
2006-07-28 1:34 ` David Miller
1 sibling, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2006-07-27 23:56 UTC (permalink / raw)
To: Kazunori Miyazawa; +Cc: yoshfuji, davem, netdev, usagi-core
Kazunori Miyazawa <kazunori@miyazawa.org> wrote:
>
> I'm interested in the approach. And I have a couple of comments.
> I think DAD and ND are time critical operations.
> Can the daemons process with confirming to the specs.
> even if it were swapped out?
> Can we prevent the oom killer from killing the daemons?
These are valid concerns. However, if we can have things like ntpd
live in user-space without causing nuisance, then addrconf should be
fine as well.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-27 23:56 ` Herbert Xu
@ 2006-07-28 1:34 ` David Miller
2006-07-28 1:45 ` Hugo Santos
` (2 more replies)
0 siblings, 3 replies; 32+ messages in thread
From: David Miller @ 2006-07-28 1:34 UTC (permalink / raw)
To: herbert; +Cc: kazunori, yoshfuji, netdev, usagi-core
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 28 Jul 2006 09:56:42 +1000
> Kazunori Miyazawa <kazunori@miyazawa.org> wrote:
> >
> > I'm interested in the approach. And I have a couple of comments.
> > I think DAD and ND are time critical operations.
> > Can the daemons process with confirming to the specs.
> > even if it were swapped out?
> > Can we prevent the oom killer from killing the daemons?
>
> These are valid concerns. However, if we can have things like ntpd
> live in user-space without causing nuisance, then addrconf should be
> fine as well.
I have severe doubts actually in this area. And I have practical
experience to back up these doubts in this specific case.
If you'll remember, quite some time ago, I tried to move all the ipv6
interface address addition and removal out of software interrupt
context. The higher level goal of this work was to move the addrconf
locking over to RCU, which would fix several races and bugs.
Just moving the ipv6 address add/delete out of software interrupt
context broke the TAHI and other ipv6 testsuites.
The reason was simple. Consider a simple test case that emits an
NDISC packet that should cause an interface address to be added, and
then it sends a packet which makes sure that host responds to that
address. We have those two packets in our queue, as packet "A" and
"B".
If we process these in sequence in software interrupt, everything
is fine. Processing of "A" will add the address, and the test
ping packet "B" will respond properly.
If you defer "A", everything breaks and the test packet "B" will
get processed first and not work.
As a secondary reason not to even consider this, it's in the kernel
already and therefore it is totally impractical to try and remove it.
When considering new protocols or features, the "user vs. kernel"
argument is something to validly consider. But when it's already
there, it will have to live there basically for eternity. It is not
like some arbitrary internal kernel module symbol or interface we
can deprecate over a 6 month period or something like that.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-28 1:34 ` David Miller
@ 2006-07-28 1:45 ` Hugo Santos
2006-07-28 2:27 ` David Miller
2006-07-28 2:22 ` Herbert Xu
2006-08-01 0:31 ` Andi Kleen
2 siblings, 1 reply; 32+ messages in thread
From: Hugo Santos @ 2006-07-28 1:45 UTC (permalink / raw)
To: David Miller; +Cc: herbert, kazunori, yoshfuji, netdev, usagi-core
[-- Attachment #1: Type: text/plain, Size: 1399 bytes --]
Hi David,
> If we process these in sequence in software interrupt, everything
> is fine. Processing of "A" will add the address, and the test
> ping packet "B" will respond properly.
>
> If you defer "A", everything breaks and the test packet "B" will
> get processed first and not work.
Is it reasonable to consider that control packet processing needs to
be serialized with data packet processing? In this particular case, is
it not the tests that are broken if not giving enough time to the host
to configure the address? The standards do not specify implementation
details so no specific processing model should be assumed, and what you
describe sounds a bit like an ideal model.
> As a secondary reason not to even consider this, it's in the kernel
> already and therefore it is totally impractical to try and remove it.
I'm not sure if it was clear from my original e-mail, but i'm not
suggesting removing any of the functionality from the kernel -- it is
clear the current implementation is useful in out of the box
deployments. Instead, i would like to know what was the possibility of
having a bit of new functionality that would allow this specific
methods to be outsourced to an user-space control application. And by
specific methods i'm refering to NDISC message processing, "neighbor
entry needs refreshing" events, etc.
Hugo
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-28 1:34 ` David Miller
2006-07-28 1:45 ` Hugo Santos
@ 2006-07-28 2:22 ` Herbert Xu
2006-07-28 2:33 ` David Miller
2006-08-01 0:31 ` Andi Kleen
2 siblings, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2006-07-28 2:22 UTC (permalink / raw)
To: David Miller; +Cc: kazunori, yoshfuji, netdev, usagi-core
On Thu, Jul 27, 2006 at 06:34:15PM -0700, David Miller wrote:
>
> I have severe doubts actually in this area. And I have practical
> experience to back up these doubts in this specific case.
OK.
> Just moving the ipv6 address add/delete out of software interrupt
> context broke the TAHI and other ipv6 testsuites.
>
> The reason was simple. Consider a simple test case that emits an
> NDISC packet that should cause an interface address to be added, and
> then it sends a packet which makes sure that host responds to that
> address. We have those two packets in our queue, as packet "A" and
> "B".
I'd like to know more about this test. On the face of it this test seems
to be broken. What if packet A was lost? Surely this shouldn't be used
as an indication that the target IPv6 stack is out-of-spec.
If we're really going to guarantee that NDISC processing is always going
to be synchronous, this imposes fairly nasty restrictions on what we can
do in future. For instance, this would rule out having the NIC distribute
flows across CPUs as this would break the synchronocity of NDISC processing
vs. TCP processing.
> As a secondary reason not to even consider this, it's in the kernel
> already and therefore it is totally impractical to try and remove it.
> When considering new protocols or features, the "user vs. kernel"
> argument is something to validly consider. But when it's already
> there, it will have to live there basically for eternity. It is not
> like some arbitrary internal kernel module symbol or interface we
> can deprecate over a 6 month period or something like that.
Fair enough. I suppose another case in point is IPv4 autoconf which
is *still* in the kernel after all these years.
However, to draw an analogy we're kind of stuck in a bog here. So
while we can't extricate ourselves easily, we should attempt to come
up with ways of eventually lifting us out. We should also try to
avoid any actions that'll cause us to sink deeper :)
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-28 1:45 ` Hugo Santos
@ 2006-07-28 2:27 ` David Miller
2006-07-28 3:13 ` Hugo Santos
0 siblings, 1 reply; 32+ messages in thread
From: David Miller @ 2006-07-28 2:27 UTC (permalink / raw)
To: hsantos; +Cc: herbert, kazunori, yoshfuji, netdev, usagi-core
From: Hugo Santos <hsantos@av.it.pt>
Date: Fri, 28 Jul 2006 02:45:28 +0100
> Is it reasonable to consider that control packet processing needs to
> be serialized with data packet processing? In this particular case, is
> it not the tests that are broken if not giving enough time to the host
> to configure the address? The standards do not specify implementation
> details so no specific processing model should be assumed, and what you
> describe sounds a bit like an ideal model.
Just like a TCP connection, packets cause state transitions.
And it is reasonable to expect that after a state transition,
the effects can be visible by subsequent packets.
I think the tests are doing valid things.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-28 2:22 ` Herbert Xu
@ 2006-07-28 2:33 ` David Miller
0 siblings, 0 replies; 32+ messages in thread
From: David Miller @ 2006-07-28 2:33 UTC (permalink / raw)
To: herbert; +Cc: kazunori, yoshfuji, netdev, usagi-core
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 28 Jul 2006 12:22:29 +1000
> I suppose another case in point is IPv4 autoconf which
> is *still* in the kernel after all these years.
At least in that case, H. Peter Anvin has put together a
klibc equivalent.
Klibc is one possible way out of this quagmire. But we have to wait
until everyone is able to convert over to it. This isn't like
changing the module interfaces and APIs, for example, it is a much
larger change from the userland perspective. And it took a long time
for the module bits to propagate fully, which as I said was a less
drastic case.
And klibc is only really good for bootup stuff and loading initial
drivers. It probably cannot be applied to things like ARP and
NDISC. IPv4 autoconf works as a good application of klibc because
you only need it to run from the initial ramdisk on bootup.
> However, to draw an analogy we're kind of stuck in a bog here. So
> while we can't extricate ourselves easily, we should attempt to come
> up with ways of eventually lifting us out. We should also try to
> avoid any actions that'll cause us to sink deeper :)
That's why we are very careful when evaluating new protocol
implementations that want to be in the kernel. The first question we
always ask is "can you do this reasonably in userspace?"
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-28 2:27 ` David Miller
@ 2006-07-28 3:13 ` Hugo Santos
2006-07-28 3:20 ` David Miller
0 siblings, 1 reply; 32+ messages in thread
From: Hugo Santos @ 2006-07-28 3:13 UTC (permalink / raw)
To: David Miller; +Cc: herbert, kazunori, yoshfuji, netdev, usagi-core
[-- Attachment #1: Type: text/plain, Size: 1676 bytes --]
Hi,
On Thu, Jul 27, 2006 at 07:27:43PM -0700, David Miller wrote:
>
> Just like a TCP connection, packets cause state transitions.
> And it is reasonable to expect that after a state transition,
> the effects can be visible by subsequent packets.
Certainly, control packets cause state transitions. TCP is a mixed
bag. I think the question here is whether we can afford a stack where
the data path is fully synchronous with the control path -- considering
the amount of "time" required by a state transition (and other burdens
you've identified). It might not pose a problem using the current
signalling, but as an example, if we consider SEcure Neighbor Discovery
(SEND, RFC 3971), validating a secure prefix to derive an address from,
involves checking certificate signatures (besides the
certificate-obtaining procedure); a process which may take some time.
I believe it is reasonable to be synchronous within certain limits,
specifically when the impact is local; for instance queueing outgoing
packets during neighbor resolution (which is something some network
stacks don't do actually). However when we consider something as global
to the stack as configuring an address, being synchronous is expensive.
I understand that any kind of impact should be well thought of, and
adding new interfaces and behaviour to the kernel just adds to complex-
ity and maintenance hell. But in this case, i think that adding a bit
of additional complexity to the kernel would save us from a bunch of
extra complexity in the long run associated with supporting these new
protocols; besides helping the development.
Hugo
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-28 3:13 ` Hugo Santos
@ 2006-07-28 3:20 ` David Miller
2006-07-28 3:31 ` Hugo Santos
0 siblings, 1 reply; 32+ messages in thread
From: David Miller @ 2006-07-28 3:20 UTC (permalink / raw)
To: hsantos; +Cc: herbert, kazunori, yoshfuji, netdev, usagi-core
From: Hugo Santos <hsantos@av.it.pt>
Date: Fri, 28 Jul 2006 04:13:22 +0100
> Certainly, control packets cause state transitions. TCP is a mixed
> bag. I think the question here is whether we can afford a stack where
> the data path is fully synchronous with the control path -- considering
> the amount of "time" required by a state transition (and other burdens
> you've identified). It might not pose a problem using the current
> signalling, but as an example, if we consider SEcure Neighbor Discovery
> (SEND, RFC 3971), validating a secure prefix to derive an address from,
> involves checking certificate signatures (besides the
> certificate-obtaining procedure); a process which may take some time.
We check AH4 hash signatures synchronously in the softirq packet
input path. I know about async-crypto, but the point is that we
do this kind of heavy computation in the input path and it isn't
a big deal.
Now, if you're saying that, in response to a NDISC packet, we might
have to go out and obtain the certificate, before we can process
the NDISC packet. This is a different issue. Is that how this
secure NDISC works? Or does the system obtain all the certificates
first, by some other means, and then either it can certify an NDISC
frame immediately or it can't?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-28 3:20 ` David Miller
@ 2006-07-28 3:31 ` Hugo Santos
2006-07-28 4:07 ` Stephen Hemminger
0 siblings, 1 reply; 32+ messages in thread
From: Hugo Santos @ 2006-07-28 3:31 UTC (permalink / raw)
To: David Miller; +Cc: herbert, kazunori, yoshfuji, netdev, usagi-core
[-- Attachment #1: Type: text/plain, Size: 1583 bytes --]
On Thu, Jul 27, 2006 at 08:20:44PM -0700, David Miller wrote:
>
> Now, if you're saying that, in response to a NDISC packet, we might
> have to go out and obtain the certificate, before we can process
> the NDISC packet. This is a different issue. Is that how this
> secure NDISC works? Or does the system obtain all the certificates
> first, by some other means, and then either it can certify an NDISC
> frame immediately or it can't?
It might happen that the host must ask the router for a Certification
Path by receiving a Router Advertisement. More specifically, RFC 3971
Section 6.4.6. 'Processing Rules for Hosts' states the following:
The host SHOULD retrieve a certification path when a Router
Advertisement has been received with a public key that is not
available from a certificate in the hosts' cache, or when there is
no certification path to one of the host's trust anchors. In
these situations, the host MAY send a Certification Path
Solicitation message to retrieve the path. If there is no
response within CPS_RETRY seconds, the message should be retried.
The wait interval for each subsequent retransmission MUST
exponentially increase, doubling each time. If there is no
response after CPS_RETRY_MAX seconds, the host abandons the
certification path retrieval process. (...)
If no certification path is established, the RA must be treated as
unsecure. Secure prefixes are given preference over non-secure ones so
it might cause problems.
Hugo
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-28 3:31 ` Hugo Santos
@ 2006-07-28 4:07 ` Stephen Hemminger
2006-07-28 8:34 ` Hugo Santos
0 siblings, 1 reply; 32+ messages in thread
From: Stephen Hemminger @ 2006-07-28 4:07 UTC (permalink / raw)
To: Hugo Santos; +Cc: David Miller, herbert, kazunori, yoshfuji, netdev, usagi-core
On Fri, 28 Jul 2006 04:31:32 +0100
Hugo Santos <hsantos@av.it.pt> wrote:
> On Thu, Jul 27, 2006 at 08:20:44PM -0700, David Miller wrote:
> >
> > Now, if you're saying that, in response to a NDISC packet, we might
> > have to go out and obtain the certificate, before we can process
> > the NDISC packet. This is a different issue. Is that how this
> > secure NDISC works? Or does the system obtain all the certificates
> > first, by some other means, and then either it can certify an NDISC
> > frame immediately or it can't?
>
> It might happen that the host must ask the router for a Certification
> Path by receiving a Router Advertisement. More specifically, RFC 3971
> Section 6.4.6. 'Processing Rules for Hosts' states the following:
>
> The host SHOULD retrieve a certification path when a Router
> Advertisement has been received with a public key that is not
> available from a certificate in the hosts' cache, or when there is
> no certification path to one of the host's trust anchors. In
> these situations, the host MAY send a Certification Path
> Solicitation message to retrieve the path. If there is no
> response within CPS_RETRY seconds, the message should be retried.
> The wait interval for each subsequent retransmission MUST
> exponentially increase, doubling each time. If there is no
> response after CPS_RETRY_MAX seconds, the host abandons the
> certification path retrieval process. (...)
>
> If no certification path is established, the RA must be treated as
> unsecure. Secure prefixes are given preference over non-secure ones so
> it might cause problems.
>
> Hugo
A couple of basic questions:
1. Can we just proceed assuming it is non-secure until a later time when
the certificate path is established?
2. What if user process dies? or gets overwhelmed?
One of the assumptions of the any well designed kernel is that the system should never
hang because some user application died or waited for ever.
--
If one would give me six lines written by the hand of the most honest
man, I would find something in them to have him hanged. -- Cardinal Richlieu
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-28 4:07 ` Stephen Hemminger
@ 2006-07-28 8:34 ` Hugo Santos
2006-07-28 12:45 ` Jamal Hadi Salim
0 siblings, 1 reply; 32+ messages in thread
From: Hugo Santos @ 2006-07-28 8:34 UTC (permalink / raw)
To: Stephen Hemminger
Cc: David Miller, herbert, kazunori, yoshfuji, netdev, usagi-core
[-- Attachment #1: Type: text/plain, Size: 2006 bytes --]
> A couple of basic questions:
> 1. Can we just proceed assuming it is non-secure until a later time when
> the certificate path is established?
This is not something which is described in the standard. In fact,
processing the RA without a certificate path to the router already
assumes the host is configured to so do (assuming unverified messages
are treated as normal non-secure ones). Treating it as non-secure would
allow an attacker to temporarily receive packets from the host if it
has no secure router to be used (in the same or other interfaces).
This may allow it to retrieve some of the user's info (think web-login
portal) and it just has to be on-link (typical NDP attack). A solution
would be not to assume it is non-secure, but instead cache or drop the
RA and initiate the process to obtain a certificate path. This however
does not allow the kind of behaviour that Dave described in one of is
earlier e-mails, where packets are processed in order.
Also, the host cache needs to hold X.509v3 certificates, and even if
a lighter crypto-hash based check is available (if CGAs are used as
well, to make sure the packets come from the packet's source address),
hosts will end up having to perform RSA signature checks.
> 2. What if user process dies? or gets overwhelmed?
> One of the assumptions of the any well designed kernel is that the system should never
> hang because some user application died or waited for ever.
Of course that this is a real problem. However, if the control daemon
dies the kernel won't die. Depending on the implementation -- you might
temporarily get out of addresses, if the addresses are flushed when the
control daemon dies, etc. But, just like a routing daemon is critical
to a router, this control application would also be critical to the
host's connectivity. And if it dies, it needs to be restarted. The
application might be itself complex, but in the end we moved this
complexity away from the kernel.
Hugo
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-28 8:34 ` Hugo Santos
@ 2006-07-28 12:45 ` Jamal Hadi Salim
2006-07-29 13:34 ` Hugo Santos
0 siblings, 1 reply; 32+ messages in thread
From: Jamal Hadi Salim @ 2006-07-28 12:45 UTC (permalink / raw)
To: Hugo Santos
Cc: Stephen Hemminger, David Miller, herbert, kazunori, yoshfuji,
netdev, usagi-core
On Fri, 2006-28-07 at 09:34 +0100, Hugo Santos wrote:
> > 2. What if user process dies? or gets overwhelmed?
> > One of the assumptions of the any well designed kernel is that the system should never
> > hang because some user application died or waited for ever.
>
> Of course that this is a real problem. However, if the control daemon
> dies the kernel won't die. Depending on the implementation -- you might
> temporarily get out of addresses, if the addresses are flushed when the
> control daemon dies, etc. But, just like a routing daemon is critical
> to a router, this control application would also be critical to the
> host's connectivity. And if it dies, it needs to be restarted. The
> application might be itself complex, but in the end we moved this
> complexity away from the kernel.
>
Hugo,
The biggest challenge you will face is the view that people hate daemons
- mostly from a usability perspective (is the gist of the arguements i
have seen) but also because of concerns such as the one Stephen mentions
above.
I hold the same views as you do on the separation of control from the
datapath and to respond to Stephens assertion on well designed kernel
above: It is good kernel abstraction to separate policy management from
mechanisms.
The certificate issue only validates further this pov: that control
tends to be feature-rich, swiss-army knife i.e more moving target than
datapath. Such things typically belong to user-space.
I have also seen talk of secure ARP; i wonder if there may be
certificates involved there as well? If you look at the archives on
netdev you may notice such discussions. Summary: I violently agree with
you and i think if you address the "daemon" concerns, you will get other
folks to agree as well.
cheers,
jamal
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-28 12:45 ` Jamal Hadi Salim
@ 2006-07-29 13:34 ` Hugo Santos
2006-07-30 3:28 ` Kazunori Miyazawa
0 siblings, 1 reply; 32+ messages in thread
From: Hugo Santos @ 2006-07-29 13:34 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Stephen Hemminger, David Miller, herbert, kazunori, yoshfuji,
netdev, usagi-core
[-- Attachment #1: Type: text/plain, Size: 3201 bytes --]
Hi Jamal,
Through this discussion i've identified three points: one is that
some believe control and data should be kept synchronized; the other is
how some (including all of the first :-) think control should remain
inside the kernel; and finally you and me so far who believe they
should be separated for increased flexibility. If we consider the first
point, i would too have difficulties accepting the second one, taking
the same considerations Stephen mentioned, where an application could
possibly bork the kernel. So, first of all, we need to settle whether
data really needs to be in-sync with control (already assuming the
complexity control is gaining).
Personally i think the answer is clear; the overall throughput
depends on the amount of time spent in a single state. Thus, the data
path should be well contained and depend only on the current state;
while control executes in parallel as it's state transitions are much
longer and might depend on multiple sub-transitions (communicating with
peers, etc).
Deciding whether control should remain inside the kernel or not is
another story; as you point out, people generically don't like to
depend on daemons. I understand this point, and i think a solution that
would keep both parties happy would be to have the current
functionality inside the kernel while at the same time allowing control
daemons to take over and support additional complex features.
I would say that the generic worries such as 'how do we handle out of
memory', or 'what if it crashes' or even 'what if it is overloaded'
apply both to the kernel and to a possible user-space application.
- As this control daemon is important for the proper interaction of
the host with the network, we would reduce it's chances of being
OOM killed (while at the same time implementing algorithms to
prevent DoS by state flooding);
- Regarding the crashes, i think it is better for a system to have an
application rather than a kernel component crash as it might be
seamlessly restarted to recover. I would also say that the complex-
ity of the code is the same (or worst if in the kernel) to support
exactly the same features;
- The overloading problem also applies to the kernel, and would be
something that must be considered by either implementation;
Don't forget that having this functionality in a daemon would also
allow for easier updates (including updates without having to reboot
your machine) besides offering a greater degree of flexibility.
Let me also point out to any IKE daemon, where the SA/SP database is
kept in kernel, but is provided for by the daemon using explicit
requests by the kernel (ACQUIREs). If we consider neighbor discovery
for instance, if configured to do so, and as an example, instead of
sending an IPv6 NS, the kernel could netlink broadcast (or unicast to a
specific controller) the request due to an entry being STALEd and being
required where the daemon would then update the entry to REACHABLE.
I'm sure some of you will continue to disagree, but i would really
like to move this decision to the user (or system deployer).
Hugo
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-29 13:34 ` Hugo Santos
@ 2006-07-30 3:28 ` Kazunori Miyazawa
2006-07-30 11:30 ` Hugo Santos
0 siblings, 1 reply; 32+ messages in thread
From: Kazunori Miyazawa @ 2006-07-30 3:28 UTC (permalink / raw)
To: Jamal Hadi Salim, Stephen Hemminger, David Miller, herbert,
kazunori, yoshfuji, netdev, usagi-core
Hi Hugo,
I think it is not correct to refer the IPSec architecture in the context.
It is originally desigened to separete the packet processing and the
management. They are loosely coupled and are not always synchronized.
For example if the daemon installs a pair of SAs which override the
SAs in SAD, there is no problem and the IPsec stack uses new SAs.
On the other hand, if a ND daemon loose the synchronization, it is
unpredicable, I guess.
BTW, we have a choice which we implement a functionality as a
module. I think it can achieve some of what you want.
Regards,
Hugo Santos wrote:
> Hi Jamal,
>
> Through this discussion i've identified three points: one is that
> some believe control and data should be kept synchronized; the other is
> how some (including all of the first :-) think control should remain
> inside the kernel; and finally you and me so far who believe they
> should be separated for increased flexibility. If we consider the first
> point, i would too have difficulties accepting the second one, taking
> the same considerations Stephen mentioned, where an application could
> possibly bork the kernel. So, first of all, we need to settle whether
> data really needs to be in-sync with control (already assuming the
> complexity control is gaining).
>
> Personally i think the answer is clear; the overall throughput
> depends on the amount of time spent in a single state. Thus, the data
> path should be well contained and depend only on the current state;
> while control executes in parallel as it's state transitions are much
> longer and might depend on multiple sub-transitions (communicating with
> peers, etc).
>
> Deciding whether control should remain inside the kernel or not is
> another story; as you point out, people generically don't like to
> depend on daemons. I understand this point, and i think a solution that
> would keep both parties happy would be to have the current
> functionality inside the kernel while at the same time allowing control
> daemons to take over and support additional complex features.
>
> I would say that the generic worries such as 'how do we handle out of
> memory', or 'what if it crashes' or even 'what if it is overloaded'
> apply both to the kernel and to a possible user-space application.
>
> - As this control daemon is important for the proper interaction of
> the host with the network, we would reduce it's chances of being
> OOM killed (while at the same time implementing algorithms to
> prevent DoS by state flooding);
> - Regarding the crashes, i think it is better for a system to have an
> application rather than a kernel component crash as it might be
> seamlessly restarted to recover. I would also say that the complex-
> ity of the code is the same (or worst if in the kernel) to support
> exactly the same features;
> - The overloading problem also applies to the kernel, and would be
> something that must be considered by either implementation;
>
> Don't forget that having this functionality in a daemon would also
> allow for easier updates (including updates without having to reboot
> your machine) besides offering a greater degree of flexibility.
>
> Let me also point out to any IKE daemon, where the SA/SP database is
> kept in kernel, but is provided for by the daemon using explicit
> requests by the kernel (ACQUIREs). If we consider neighbor discovery
> for instance, if configured to do so, and as an example, instead of
> sending an IPv6 NS, the kernel could netlink broadcast (or unicast to a
> specific controller) the request due to an entry being STALEd and being
> required where the daemon would then update the entry to REACHABLE.
>
> I'm sure some of you will continue to disagree, but i would really
> like to move this decision to the user (or system deployer).
>
> Hugo
--
Kazunori Miyazawa
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-30 3:28 ` Kazunori Miyazawa
@ 2006-07-30 11:30 ` Hugo Santos
2006-07-31 21:23 ` David Miller
2006-08-01 0:16 ` Kazunori Miyazawa
0 siblings, 2 replies; 32+ messages in thread
From: Hugo Santos @ 2006-07-30 11:30 UTC (permalink / raw)
To: Kazunori Miyazawa
Cc: Jamal Hadi Salim, Stephen Hemminger, David Miller, herbert,
yoshfuji, netdev, usagi-core
[-- Attachment #1: Type: text/plain, Size: 1066 bytes --]
Hi,
> On the other hand, if a ND daemon loose the synchronization, it is
> unpredicable, I guess.
What do you mean by synchronization in this context? My idea was to
keep the ND state machine inside the kernel, and instead have the
daemon be reactive. That means it would send messages on behalf of the
kernel, and apply information based on received signalling (besides, ND
is reseliant to loss of messages). Taking your example, if the kernel
is using a neighbor entry and you replace it (either changing it's
state or link-layer address), the kernel will adapt, i believe it is
predictable. To be honest, i'm only worried about possible lost netlink
messages; but the daemon may be implemented to handle this, re-sending
while an ACK isn't receiving, thus minimizing any de-synchronization
possibilities.
> BTW, we have a choice which we implement a functionality as a
> module. I think it can achieve some of what you want.
Well, exporting the functionality to a module would be a start to
have one moving it out of the kernel. :-)
Hugo
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-30 11:30 ` Hugo Santos
@ 2006-07-31 21:23 ` David Miller
2006-08-01 11:50 ` Hugo Santos
2006-08-01 0:16 ` Kazunori Miyazawa
1 sibling, 1 reply; 32+ messages in thread
From: David Miller @ 2006-07-31 21:23 UTC (permalink / raw)
To: hsantos; +Cc: kazunori, hadi, shemminger, herbert, yoshfuji, netdev, usagi-core
So all of you userland control-plane fanatics, how will you handle
things like NFS root with these daemon-required variants of NDISC and
ARP?
I know the devils' advocate responses already, so don't bother with
responses saying things like 1) "do it in the initial ramdisk, we only
need the daemon to setup the NDISC entries to talk to the NFS server"
or 2) "IPSEC's control plane is in userspace and therefore we can't do
NFS root over IPSEC, why is that ok and key'd NDISC is not?"
I think we are building systems which gradually are becomming less and
less reliable, with increasing numbers of possible points of failure.
Flexibility is overrated. There are many crucial optimizations and
simplifications we cannot perform because we've made certain aspects
of network configuration far too flexible.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-30 11:30 ` Hugo Santos
2006-07-31 21:23 ` David Miller
@ 2006-08-01 0:16 ` Kazunori Miyazawa
1 sibling, 0 replies; 32+ messages in thread
From: Kazunori Miyazawa @ 2006-08-01 0:16 UTC (permalink / raw)
To: Kazunori Miyazawa, Jamal Hadi Salim, Stephen Hemminger,
David Miller, herbert, yoshfuji, netdev, usagi-core
Hello Hugo,
Hugo Santos wrote:
> Hi,
>
>> On the other hand, if a ND daemon loose the synchronization, it is
>> unpredicable, I guess.
>
> What do you mean by synchronization in this context? My idea was to
> keep the ND state machine inside the kernel, and instead have the
> daemon be reactive. That means it would send messages on behalf of the
> kernel, and apply information based on received signalling (besides, ND
> is reseliant to loss of messages). Taking your example, if the kernel
> is using a neighbor entry and you replace it (either changing it's
> state or link-layer address), the kernel will adapt, i believe it is
> predictable. To be honest, i'm only worried about possible lost netlink
> messages; but the daemon may be implemented to handle this, re-sending
> while an ACK isn't receiving, thus minimizing any de-synchronization
> possibilities.
>
The kernel maintains the ND state by itself and the daemon touches
the state. I think the daemon should aware the state.
It is what I meant with "synchronization".
Anyway I do not intend to prevent you from your work anymore.
I quit discussion without seeing the codes.
>> BTW, we have a choice which we implement a functionality as a
>> module. I think it can achieve some of what you want.
>
> Well, exporting the functionality to a module would be a start to
> have one moving it out of the kernel. :-)
>
> Hugo
--
Kazunori Miyazawa
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-28 1:34 ` David Miller
2006-07-28 1:45 ` Hugo Santos
2006-07-28 2:22 ` Herbert Xu
@ 2006-08-01 0:31 ` Andi Kleen
2006-08-01 0:46 ` David Miller
2 siblings, 1 reply; 32+ messages in thread
From: Andi Kleen @ 2006-08-01 0:31 UTC (permalink / raw)
To: David Miller; +Cc: herbert, kazunori, yoshfuji, netdev, usagi-core
> If we process these in sequence in software interrupt, everything
> is fine. Processing of "A" will add the address, and the test
> ping packet "B" will respond properly.
>
> If you defer "A", everything breaks and the test packet "B" will
> get processed first and not work.
Playing devil's advocate here: if the packets are processed on
two different CPUs then this could also happen and break the test
case.
So the test is probably a bit fragile.
Currently it is unlikely to happen because of interrupt affinity for a
single device, but in future with MSI-X support it might not.
I generally agree it's better to keep this in kernel though.
-Andi
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-08-01 0:31 ` Andi Kleen
@ 2006-08-01 0:46 ` David Miller
2006-08-01 0:49 ` Roland Dreier
2006-08-01 12:00 ` Hugo Santos
0 siblings, 2 replies; 32+ messages in thread
From: David Miller @ 2006-08-01 0:46 UTC (permalink / raw)
To: ak; +Cc: herbert, kazunori, yoshfuji, netdev, usagi-core
From: Andi Kleen <ak@suse.de>
Date: Tue, 1 Aug 2006 02:31:58 +0200
> Playing devil's advocate here: if the packets are processed on
> two different CPUs then this could also happen and break the test
> case.
>
> So the test is probably a bit fragile.
Good point.
> I generally agree it's better to keep this in kernel though.
To drive this home even more, I do not believe that the people who
advocate pushing NDISC and ARP policy into userspace would be very
happy if something like the RAID transformations were moved into
userspace and they were not able to access their disks if the RAID
transformer process in userspace died.
Why is this a relevant analogy? Well, you have physical hard-disks in
your computer today, but at some point that device becomes largely
superfluous. It makes more sense to have just a cpu with a 10-gigabit
ethernet interface incorporated onto the cpu die, and the majority if
not all of your disk access is remote.
At that point, network access equals disk access. It would be amusing
to need to restart such an NDISC/ARP daemon if it were to live on a
remote volume. :-)
I understand full well that on special purpose network devices this
control vs. data plane seperation into userspace might make a lot of
sense. But for a general purpose operating system, such as Linux, the
greater concern is resiliency to failures and each piece of core
functionality you move to userspace is a new potential point of
failure.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-08-01 0:46 ` David Miller
@ 2006-08-01 0:49 ` Roland Dreier
2006-08-01 1:24 ` Jamal Hadi Salim
2006-08-01 12:00 ` Hugo Santos
1 sibling, 1 reply; 32+ messages in thread
From: Roland Dreier @ 2006-08-01 0:49 UTC (permalink / raw)
To: David Miller; +Cc: ak, herbert, kazunori, yoshfuji, netdev, usagi-core
David> Why is this a relevant analogy? Well, you have physical
David> hard-disks in your computer today, but at some point that
David> device becomes largely superfluous. It makes more sense to
David> have just a cpu with a 10-gigabit ethernet interface
David> incorporated onto the cpu die, and the majority if not all
David> of your disk access is remote.
Isn't most of the iSCSI control plane in userspace right now?
- R.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-08-01 0:49 ` Roland Dreier
@ 2006-08-01 1:24 ` Jamal Hadi Salim
2006-08-01 1:30 ` Herbert Xu
0 siblings, 1 reply; 32+ messages in thread
From: Jamal Hadi Salim @ 2006-08-01 1:24 UTC (permalink / raw)
To: Roland Dreier
Cc: David Miller, ak, herbert, kazunori, yoshfuji, netdev, usagi-core
On Mon, 2006-31-07 at 17:49 -0700, Roland Dreier wrote:
> David> Why is this a relevant analogy? Well, you have physical
> David> hard-disks in your computer today, but at some point that
> David> device becomes largely superfluous. It makes more sense to
> David> have just a cpu with a 10-gigabit ethernet interface
> David> incorporated onto the cpu die, and the majority if not all
> David> of your disk access is remote.
>
> Isn't most of the iSCSI control plane in userspace right now?
I know iscsi is supposed to integrate with ipsec as well (and SLP for
discovery) - does that happen in user space as well?
Dave (I am under heavy flu dose, so I may be incoherent;->) but heres a
devils advocate bit for you:
TCP FIN/SYN are just control packets - so move the connection
setup/teardown out to user space;->. You can then add all sorts of funky
DOS detection/prevention schemes as needed - makes it easy to experiment with.
Actually move the slow path as well, SACK processing etc (i know it is in process
context today, but thats in the kernel). Just leave VJs fast path in the
kernel. Extend the user space bit to be the new VJ (channels stuff but
just for control) - asynch notification to carry the control/slow path
packets to user space.
In regards to ARP/NDISC being in user space: note people are talking
about secure DHCP or some form of initial pre-layer2 addressing over EAP
or something along those lines; i.e if you are not securely validated at
the L2 level you are not even getting an IP address.
In regards to reliability: The thing that really fscks people using
daemons from what i have seen is the oom killer policies and the lack of
correlation by apps. I just watched quagga die horribly on a 256M
machine on friday once we hit around 100K routes and a lot of route
cache hits. So apps like that may need a total rewrite. I am not looking
forward to trying to get racoon to do 50K SAs and 100K SPDs on the same
machine ;->
I think I like what Hugo is saying ;-> I just hope he has time and
resources to produce code.
cheers,
jamal
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-08-01 1:24 ` Jamal Hadi Salim
@ 2006-08-01 1:30 ` Herbert Xu
2006-08-01 1:47 ` Jamal Hadi Salim
0 siblings, 1 reply; 32+ messages in thread
From: Herbert Xu @ 2006-08-01 1:30 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Roland Dreier, David Miller, ak, kazunori, yoshfuji, netdev,
usagi-core
On Mon, Jul 31, 2006 at 09:24:27PM -0400, Jamal Hadi Salim wrote:
>
> In regards to reliability: The thing that really fscks people using
> daemons from what i have seen is the oom killer policies and the lack of
> correlation by apps. I just watched quagga die horribly on a 256M
> machine on friday once we hit around 100K routes and a lot of route
> cache hits. So apps like that may need a total rewrite. I am not looking
> forward to trying to get racoon to do 50K SAs and 100K SPDs on the same
> machine ;->
You can now disable the OOM killer on a per-process basis by
echo -17 > /proc/<pid>/oom_adj
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-08-01 1:30 ` Herbert Xu
@ 2006-08-01 1:47 ` Jamal Hadi Salim
2006-08-01 12:13 ` Hugo Santos
0 siblings, 1 reply; 32+ messages in thread
From: Jamal Hadi Salim @ 2006-08-01 1:47 UTC (permalink / raw)
To: Herbert Xu
Cc: usagi-core, netdev, yoshfuji, kazunori, ak, David Miller,
Roland Dreier
On Tue, 2006-01-08 at 11:30 +1000, Herbert Xu wrote:
>
> You can now disable the OOM killer on a per-process basis by
>
> echo -17 > /proc/<pid>/oom_adj
>
nice to know ;-> At least you can protect some apps if you need to.
Only racoon and quagga are important for me.
But what happens then if you have a beast that just chews memory
forever? I suppose other poor apps will just get shot.
My plan was just to write a simple daemon that uses the genetlink API
that Shailabh (IBM) and company wrote and just restart the app if i see
it disappear.
cheers,
jamal
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-07-31 21:23 ` David Miller
@ 2006-08-01 11:50 ` Hugo Santos
2006-08-01 21:54 ` David Miller
0 siblings, 1 reply; 32+ messages in thread
From: Hugo Santos @ 2006-08-01 11:50 UTC (permalink / raw)
To: David Miller
Cc: kazunori, hadi, shemminger, herbert, yoshfuji, netdev, usagi-core
[-- Attachment #1: Type: text/plain, Size: 1843 bytes --]
David,
> So all of you userland control-plane fanatics, how will you handle
> things like NFS root with these daemon-required variants of NDISC and
> ARP?
Do it in the initial ramdisk, we only need the daemon to setup the
NDISC entries to talk to the NFS server. :-)
There is obviously a cost associated with this, a deployment cost.
But there are additional factors we must consider. In a later e-mail
you state that Linux is a generic purpose operating system; how many
users need to boot from a NFS root (besides myself :-)? I think that we
must take into consideration that currently Linux is used in lots of
distinct environments, not only Desktop computers, and servers, but
also smaller devices. Configuration/Flexibility vs. optimization is
something that varies a lot depending on the deployment you are talking
about, and in most of my scenarios, a small mobile device isn't
required at the moment to push 100Mbps (optimization) but must be
capable of verifying it's peers and maintaining secure connections
(flexibility). So, let's be generic?
I might have some cycles during the month to code up something in
this direction, at least for an initial review, i'll try to do so.
Also, the reliability of a system depends on a lot of things, but
please, let's not use the assumption that because everything sits in
the kernel, it will be stable as the number of 'points of failure' is
smaller; this is only true as long as people work to have stable
components -- and this is independent of where the components sit. A
few kernel versions ago (2.6.8 if i remember correctly) i couldn't even
remove a used network interface safely from the system without hanging
the network stack. It is possible to have stable user-space code, if
people developing it work to and make sure it is stable.
Hugo
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-08-01 0:46 ` David Miller
2006-08-01 0:49 ` Roland Dreier
@ 2006-08-01 12:00 ` Hugo Santos
2006-08-01 21:57 ` David Miller
1 sibling, 1 reply; 32+ messages in thread
From: Hugo Santos @ 2006-08-01 12:00 UTC (permalink / raw)
To: David Miller; +Cc: ak, herbert, kazunori, yoshfuji, netdev, usagi-core
[-- Attachment #1: Type: text/plain, Size: 1890 bytes --]
David,
> To drive this home even more, I do not believe that the people who
> advocate pushing NDISC and ARP policy into userspace would be very
> happy if something like the RAID transformations were moved into
> userspace and they were not able to access their disks if the RAID
> transformer process in userspace died.
How would you restart the RAID controller daemon if it's stored
in the RAID itself? Also, assuming the same code quality (and ignoring
OOM killer for a moment), if the RAID controller daemon dies, if that
code was in the kernel, it would also possibly crash the whole kernel.
> At that point, network access equals disk access. It would be amusing
> to need to restart such an NDISC/ARP daemon if it were to live on a
> remote volume. :-)
What you are saying is that, well, the NDISC handling is already in
the host's memory (kernel text), so the connection could be restarted
with the remote storage facility. So, let's be fair, and say that
somehow the NDISC daemon would be available localy?
> I understand full well that on special purpose network devices this
> control vs. data plane seperation into userspace might make a lot of
> sense. But for a general purpose operating system, such as Linux, the
> greater concern is resiliency to failures and each piece of core
> functionality you move to userspace is a new potential point of
> failure.
I think 100% of Linux's users want stability. Resiliency to failure
is not something that depends on the kernel. If the code in question is
in the kernel, and it crashes, how will you recover?
Please note that i'm not making this a monolithic vs. micro- kernel
discussion (i wouldn't want Linus to step in and kick me to hell), but
if we have the possibility of not having _complex_ interactions within
the kernel, we are making the kernel itself more resilient to failure.
Hugo
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-08-01 1:47 ` Jamal Hadi Salim
@ 2006-08-01 12:13 ` Hugo Santos
0 siblings, 0 replies; 32+ messages in thread
From: Hugo Santos @ 2006-08-01 12:13 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: Herbert Xu, netdev, ak, David Miller, Roland Dreier
[-- Attachment #1: Type: text/plain, Size: 463 bytes --]
Jamal,
> nice to know ;-> At least you can protect some apps if you need to.
> Only racoon and quagga are important for me.
> But what happens then if you have a beast that just chews memory
> forever? I suppose other poor apps will just get shot.
You should push QoS and differentiation into the memory-subsystem :-)
Give a priority flag to mmap(). It's not simple to degrade existing
allocations, but taking into consideration the OOM killer...
Hugo
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-08-01 11:50 ` Hugo Santos
@ 2006-08-01 21:54 ` David Miller
0 siblings, 0 replies; 32+ messages in thread
From: David Miller @ 2006-08-01 21:54 UTC (permalink / raw)
To: hsantos; +Cc: kazunori, hadi, shemminger, herbert, yoshfuji, netdev, usagi-core
From: Hugo Santos <hsantos@av.it.pt>
Date: Tue, 1 Aug 2006 12:50:02 +0100
> I might have some cycles during the month to code up something in
> this direction, at least for an initial review, i'll try to do so.
Great. I prefer to talk about code anyways :)
> Also, the reliability of a system depends on a lot of things, but
> please, let's not use the assumption that because everything sits in
> the kernel, it will be stable as the number of 'points of failure' is
> smaller; this is only true as long as people work to have stable
> components -- and this is independent of where the components sit.
This disagrees with my experience. Things in the kernel tend to get
noticed fast and fixed, whereas things in userspace can stay broken
for a long period of time.
Everything is about momentum, and the kernel is where all the
development momentum is. It's not in these userland components.
People are running semantic checkers on the kernel constantly,
the kernel has all sorts of automatic locking, memory allocation,
et. al verifications and assertions.
A particular userland components might have this treatment and checks,
but the kernel has them going all the time and people are looking at
the output of these tools and checks constantly. You cannot get the
kind of coverage the kernel gets.
As Andrew Morton says, userland is just a testsuite for the kernel.
:-)
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-08-01 12:00 ` Hugo Santos
@ 2006-08-01 21:57 ` David Miller
2006-08-03 13:28 ` Ingo Oeser
0 siblings, 1 reply; 32+ messages in thread
From: David Miller @ 2006-08-01 21:57 UTC (permalink / raw)
To: hsantos; +Cc: ak, herbert, kazunori, yoshfuji, netdev, usagi-core
From: Hugo Santos <hsantos@av.it.pt>
Date: Tue, 1 Aug 2006 13:00:03 +0100
> Resiliency to failure is not something that depends on the
> kernel. If the code in question is in the kernel, and it crashes,
> how will you recover?
Developer momentum means that the kernel is likely to get fixed
whereas the userland component will more likely rot and not get
fixed.
So in this sense resiliency does depend upon something being in
the kernel or not.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Regarding offloading IPv6 addrconf and ndisc
2006-08-01 21:57 ` David Miller
@ 2006-08-03 13:28 ` Ingo Oeser
0 siblings, 0 replies; 32+ messages in thread
From: Ingo Oeser @ 2006-08-03 13:28 UTC (permalink / raw)
To: David Miller; +Cc: hsantos, ak, herbert, kazunori, yoshfuji, netdev, usagi-core
Hi,
David Miller wrote:
> Developer momentum means that the kernel is likely to get fixed
> whereas the userland component will more likely rot and not get
> fixed.
>
> So in this sense resiliency does depend upon something being in
> the kernel or not.
I can only agree here. Lots of users use their own kernels instead
of distribution kernels. Much less users divert core software from
their distribution.
The big binary called bzImage/vmlinux whatever is a huge usability
advantage here :-)
Regards
Ingo Oeser
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2006-08-03 13:29 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-27 11:25 Regarding offloading IPv6 addrconf and ndisc Hugo Santos
2006-07-27 12:25 ` Kazunori Miyazawa
2006-07-27 17:56 ` Hugo Santos
2006-07-27 23:56 ` Herbert Xu
2006-07-28 1:34 ` David Miller
2006-07-28 1:45 ` Hugo Santos
2006-07-28 2:27 ` David Miller
2006-07-28 3:13 ` Hugo Santos
2006-07-28 3:20 ` David Miller
2006-07-28 3:31 ` Hugo Santos
2006-07-28 4:07 ` Stephen Hemminger
2006-07-28 8:34 ` Hugo Santos
2006-07-28 12:45 ` Jamal Hadi Salim
2006-07-29 13:34 ` Hugo Santos
2006-07-30 3:28 ` Kazunori Miyazawa
2006-07-30 11:30 ` Hugo Santos
2006-07-31 21:23 ` David Miller
2006-08-01 11:50 ` Hugo Santos
2006-08-01 21:54 ` David Miller
2006-08-01 0:16 ` Kazunori Miyazawa
2006-07-28 2:22 ` Herbert Xu
2006-07-28 2:33 ` David Miller
2006-08-01 0:31 ` Andi Kleen
2006-08-01 0:46 ` David Miller
2006-08-01 0:49 ` Roland Dreier
2006-08-01 1:24 ` Jamal Hadi Salim
2006-08-01 1:30 ` Herbert Xu
2006-08-01 1:47 ` Jamal Hadi Salim
2006-08-01 12:13 ` Hugo Santos
2006-08-01 12:00 ` Hugo Santos
2006-08-01 21:57 ` David Miller
2006-08-03 13:28 ` Ingo Oeser
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).