* [PATCH, untested] Support for PPPOE on SMP
@ 2003-06-25 7:24 Rusty Russell
2003-06-25 11:19 ` Jamal Hadi
` (2 more replies)
0 siblings, 3 replies; 35+ messages in thread
From: Rusty Russell @ 2003-06-25 7:24 UTC (permalink / raw)
To: davem, paulus; +Cc: netdev
Paul Mackerras says PPPoE relies on receiving packets in wire order,
and he has bug reports caused by packet reordering.
This is icky. Example code below:
1) Extract core queuing part of netif_rx into __netif_rx.
2) If the protocol is requires serialization, packets are put on a
global "serial" queue instead of the local queue. (Which protocols
currently hardcoded).
3) One cpu (boot cpu as it happens) drains this serial queue, so it
stays ordered.
4) Fix bug in cpu_raise_softirq: need to wake softirqd if it's a
different cpu.
Another option would simply be to stamp a serialization number into
the skb if the proto needs serialization, and drop packets if serial
number goes backwards. But since this is actually happening to
people, that would suck, too.
I don't understand the unbalanced dev_put in net_rx_action(), BTW.
Cheers,
Rusty.
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.72-bk2/kernel/softirq.c working-2.5.72-bk2-serial-protocols/kernel/softirq.c
--- linux-2.5.72-bk2/kernel/softirq.c 2003-06-25 17:17:19.000000000 +1000
+++ working-2.5.72-bk2-serial-protocols/kernel/softirq.c 2003-06-25 14:55:15.000000000 +1000
@@ -130,7 +130,7 @@ inline void cpu_raise_softirq(unsigned i
* Otherwise we wake up ksoftirqd to make sure we
* schedule the softirq soon.
*/
- if (!in_interrupt())
+ if (!in_interrupt() || cpu != smp_processor_id())
wakeup_softirqd(cpu);
}
diff -urpN --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.5.72-bk2/net/core/dev.c working-2.5.72-bk2-serial-protocols/net/core/dev.c
--- linux-2.5.72-bk2/net/core/dev.c 2003-06-20 11:53:36.000000000 +1000
+++ working-2.5.72-bk2-serial-protocols/net/core/dev.c 2003-06-25 17:11:36.000000000 +1000
@@ -1323,42 +1323,11 @@ static void sample_queue(unsigned long d
}
#endif
-
-/**
- * netif_rx - post buffer to the network code
- * @skb: buffer to post
- *
- * This function receives a packet from a device driver and queues it for
- * the upper (protocol) levels to process. It always succeeds. The buffer
- * may be dropped during processing for congestion control or by the
- * protocol layers.
- *
- * return values:
- * NET_RX_SUCCESS (no congestion)
- * NET_RX_CN_LOW (low congestion)
- * NET_RX_CN_MOD (moderate congestion)
- * NET_RX_CN_HIGH (high congestion)
- * NET_RX_DROP (packet was dropped)
- *
- */
-
-int netif_rx(struct sk_buff *skb)
+/* Called with IRQs disabled. */
+static inline int __netif_rx(int this_cpu,
+ struct softnet_data *queue,
+ struct sk_buff *skb)
{
- int this_cpu;
- struct softnet_data *queue;
- unsigned long flags;
-
- if (!skb->stamp.tv_sec)
- do_gettimeofday(&skb->stamp);
-
- /*
- * The code is rearranged so that the path is the most
- * short when CPU is congested, but is still operating.
- */
- local_irq_save(flags);
- this_cpu = smp_processor_id();
- queue = &softnet_data[this_cpu];
-
netdev_rx_stat[this_cpu].total++;
if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
if (queue->input_pkt_queue.qlen) {
@@ -1371,7 +1340,6 @@ enqueue:
#ifndef OFFLINE_SAMPLE
get_sample_stats(this_cpu);
#endif
- local_irq_restore(flags);
return queue->cng_level;
}
@@ -1397,12 +1365,116 @@ enqueue:
drop:
netdev_rx_stat[this_cpu].dropped++;
- local_irq_restore(flags);
kfree_skb(skb);
return NET_RX_DROP;
}
+#ifdef CONFIG_SMP
+/* Queue for serial protocols (eg PPPoe). All handled by one CPU. */
+static spinlock_t serial_queue_lock = SPIN_LOCK_UNLOCKED;
+static struct softnet_data serial_queue;
+
+/* Which cpu does serial queue. */
+static int serial_cpu;
+
+static inline int net_proto_serialize(struct sk_buff *skb,
+ int this_cpu,
+ int *ret)
+{
+ if (likely(skb->protocol != ETH_P_PPP_DISC
+ && skb->protocol != ETH_P_PPP_SES))
+ return 0;
+
+ spin_lock(&serial_queue_lock);
+ *ret = __netif_rx(this_cpu, &serial_queue, skb);
+ spin_unlock(&serial_queue_lock);
+ if (this_cpu != serial_cpu)
+ cpu_raise_softirq(serial_cpu, NET_RX_SOFTIRQ);
+ return 1;
+}
+
+static void init_queue(struct softnet_data *queue);
+
+static void init_serial(void)
+{
+ init_queue(&serial_queue);
+ serial_cpu = smp_processor_id();
+}
+
+static inline void drain_serial_queue(int this_cpu)
+{
+ if (this_cpu != serial_cpu)
+ return;
+
+ spin_lock(&serial_queue_lock);
+ while (!list_empty(&serial_queue.poll_list)) {
+ struct net_device *dev;
+
+ dev = list_entry(serial_queue.poll_list.next,
+ struct net_device, poll_list);
+
+ list_del(&dev->poll_list);
+ list_add_tail(&dev->poll_list, &serial_queue.poll_list);
+ }
+ spin_unlock(&serial_queue_lock);
+}
+#else
+static inline int net_proto_serialize(struct sk_buff *skb,
+ int this_cpu,
+ int *ret)
+{
+ return 0;
+}
+
+static void init_serial(void)
+{
+}
+
+static inline void drain_serial_queue(int this_cpu)
+{
+}
+#endif /* CONFIG_SMP */
+
+/**
+ * netif_rx - post buffer to the network code
+ * @skb: buffer to post
+ *
+ * This function receives a packet from a device driver and queues it for
+ * the upper (protocol) levels to process. It always succeeds. The buffer
+ * may be dropped during processing for congestion control or by the
+ * protocol layers.
+ *
+ * return values:
+ * NET_RX_SUCCESS (no congestion)
+ * NET_RX_CN_LOW (low congestion)
+ * NET_RX_CN_MOD (moderate congestion)
+ * NET_RX_CN_HIGH (high congestion)
+ * NET_RX_DROP (packet was dropped)
+ *
+ */
+
+int netif_rx(struct sk_buff *skb)
+{
+ int ret, this_cpu;
+ unsigned long flags;
+
+ if (!skb->stamp.tv_sec)
+ do_gettimeofday(&skb->stamp);
+
+ /*
+ * The code is rearranged so that the path is the most
+ * short when CPU is congested, but is still operating.
+ */
+ local_irq_save(flags);
+ this_cpu = smp_processor_id();
+
+ if (!net_proto_serialize(skb, this_cpu, &ret))
+ ret = __netif_rx(this_cpu, &softnet_data[this_cpu], skb);
+ local_irq_restore(flags);
+ return ret;
+}
+
/* Deliver skb to an old protocol, which is not threaded well
or which do not understand shared skbs.
*/
@@ -1705,6 +1777,8 @@ static void net_rx_action(struct softirq
local_irq_disable();
}
}
+
+ drain_serial_queue(this_cpu);
out:
local_irq_enable();
preempt_enable();
@@ -2944,6 +3018,20 @@ int unregister_netdevice(struct net_devi
}
+static void init_queue(struct softnet_data *queue)
+{
+ skb_queue_head_init(&queue->input_pkt_queue);
+ queue->throttle = 0;
+ queue->cng_level = 0;
+ queue->avg_blog = 10; /* arbitrary non-zero */
+ queue->completion_queue = NULL;
+ INIT_LIST_HEAD(&queue->poll_list);
+ set_bit(__LINK_STATE_START, &queue->backlog_dev.state);
+ queue->backlog_dev.weight = weight_p;
+ queue->backlog_dev.poll = process_backlog;
+ atomic_set(&queue->backlog_dev.refcnt, 1);
+}
+
/*
* Initialize the DEV module. At boot time this walks the device list and
* unhooks any devices that fail to initialise (normally hardware not
@@ -2976,21 +3064,9 @@ static int __init net_dev_init(void)
* Initialise the packet receive queues.
*/
- for (i = 0; i < NR_CPUS; i++) {
- struct softnet_data *queue;
-
- queue = &softnet_data[i];
- skb_queue_head_init(&queue->input_pkt_queue);
- queue->throttle = 0;
- queue->cng_level = 0;
- queue->avg_blog = 10; /* arbitrary non-zero */
- queue->completion_queue = NULL;
- INIT_LIST_HEAD(&queue->poll_list);
- set_bit(__LINK_STATE_START, &queue->backlog_dev.state);
- queue->backlog_dev.weight = weight_p;
- queue->backlog_dev.poll = process_backlog;
- atomic_set(&queue->backlog_dev.refcnt, 1);
- }
+ for (i = 0; i < NR_CPUS; i++)
+ init_queue(&softnet_data[i]);
+ init_serial();
#ifdef CONFIG_NET_PROFILE
net_profile_init();
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 7:24 [PATCH, untested] Support for PPPOE on SMP Rusty Russell
@ 2003-06-25 11:19 ` Jamal Hadi
2003-06-25 13:21 ` Michal Ostrowski
2003-06-25 16:01 ` Jason Lunz
2 siblings, 0 replies; 35+ messages in thread
From: Jamal Hadi @ 2003-06-25 11:19 UTC (permalink / raw)
To: Rusty Russell; +Cc: davem, paulus, netdev
On Wed, 25 Jun 2003, Rusty Russell wrote:
> Paul Mackerras says PPPoE relies on receiving packets in wire order,
> and he has bug reports caused by packet reordering.
>
I dont know of any ordering dependencies with pppoe. Is this a bug
in the ppp code?
> This is icky.
Yes it is ;->
The effects of your patch could be achieved in two ways:
a) tie the pppoe related ethernet card to a processor.
b) use a NAPI caopable ethernet card.
Now, if there is a real need to have a serialization queue (i dont see
one) you really dont need to tie to a processor. Just have a single queue
shared by all processors; every one grabs a lock to it.
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 7:24 [PATCH, untested] Support for PPPOE on SMP Rusty Russell
2003-06-25 11:19 ` Jamal Hadi
@ 2003-06-25 13:21 ` Michal Ostrowski
2003-06-25 13:42 ` Michal Ostrowski
` (2 more replies)
2003-06-25 16:01 ` Jason Lunz
2 siblings, 3 replies; 35+ messages in thread
From: Michal Ostrowski @ 2003-06-25 13:21 UTC (permalink / raw)
To: Rusty Russell
Cc: David S. Miller, Paul MacKerras, netdev, fcusack, David F. Skoll,
James Carlson
First some background for those new to this discussion (I was going post
the original discussion that strted this to this list, but the summary
here should get everyone up to speed).
A user has observed a race condition where the last packet of PPPoE
discovery arrives just before the first payload packet. The discovery
packet carries the session id and pppd needs to take this session id and
create a PPPoE socket which will then pick up all packets matching the
given session id. The race is between the arrival of the first payload
packet and pppd's creation of the socket that is to receive PPPoE
payload. If the packet wins the race, the payload packet is lost. This
problem was noticed only because the ISP in this case configured their
systems to use a longer, non-standard (but legal) retransmit timeout
thus causing noticeable delays in PPP negotiation.
About the patch: Do we have any guarantees that no drivers will break
this? From the few drivers I've looked at, this will not be a problem
since they lock to ensure that we can't have races in submitting packets
to netif_rx. My concern here would be that it appears that there is no
explicit requirement that this be so; we may be safe in this regard only
by accident. (I can think of a device and driver design where this need
not be so.)
> +
> +static inline int net_proto_serialize(struct sk_buff *skb,
> + int this_cpu,
> + int *ret)
> +{
> + if (likely(skb->protocol != ETH_P_PPP_DISC
> + && skb->protocol != ETH_P_PPP_SES))
> + return 0;
I believe there are concerns with other protocols as well (SNA, spanning
tree - I'm just the messenger on this). If this is so, then I have two
concerns:
1. Some protocols may have no in-kernel implementation, we'd have to
ensure that raw sockets get packets in the right order (perhaps
even regardless of what packet type we hreceive).
2. There are two issues with PPPoE: there's the creation race
described above which requires correct ordering of packets of two
different packet types (discovery is 0x8863, payload is 0x8864),
as well payload packets must be ordered to handle Paul's concerns
regarding compression.
The patch as is adequate to 2), but I'm concerned it would get ugly if
we need to do 1) (and in the process of doing 1) we may break 2) if we
can't synchronize between two different packet types).
I think we can fix the race condition I've described up top without such
core infrastructure changes (delay dropping unmatched payload packets,
give pppd a chance to make the socket). This however doesn't solve the
other ordering problems.
--
Michal Ostrowski <mostrows@watson.ibm.com>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 13:21 ` Michal Ostrowski
@ 2003-06-25 13:42 ` Michal Ostrowski
2003-06-25 15:45 ` Jamal Hadi
2003-06-25 16:15 ` Stephen Hemminger
2003-06-25 21:33 ` David S. Miller
2 siblings, 1 reply; 35+ messages in thread
From: Michal Ostrowski @ 2003-06-25 13:42 UTC (permalink / raw)
To: Rusty Russell
Cc: David S. Miller, Paul MacKerras, netdev, fcusack, David F. Skoll,
James Carlson
Perhaps instead of using a special queue that keeps packets ordered, we
add a tag to each skb as it comes off the card and let higher level
protocols use this to re-order things themselves? (And add some option
for AF_PACKET sockets to optionally enforce this ordering in presenting
packets to apps, or not.)
This may require modifying all drivers, but it does provide for an
explicit mechanism that can be made mandatory for drivers, avoids
special casing, avoids dumping work onto a single CPU and leaves it up
to the higher-level code to figure out ordering, if it wants to.
--
Michal Ostrowski <mostrows@watson.ibm.com>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 13:42 ` Michal Ostrowski
@ 2003-06-25 15:45 ` Jamal Hadi
2003-06-25 17:27 ` Michal Ostrowski
0 siblings, 1 reply; 35+ messages in thread
From: Jamal Hadi @ 2003-06-25 15:45 UTC (permalink / raw)
To: Michal Ostrowski
Cc: Rusty Russell, David S. Miller, Paul MacKerras, netdev, fcusack,
David F. Skoll, James Carlson
Have you tested the case where the ethernet card is tied to only
CPU in SMP? That guarantees ordering.
Ordering per protocol should really be that protocols problem to
solve. If you cant solve it you have a bug.
cheers,
jamal
On Wed, 25 Jun 2003, Michal Ostrowski wrote:
>
> Perhaps instead of using a special queue that keeps packets ordered, we
> add a tag to each skb as it comes off the card and let higher level
> protocols use this to re-order things themselves? (And add some option
> for AF_PACKET sockets to optionally enforce this ordering in presenting
> packets to apps, or not.)
>
> This may require modifying all drivers, but it does provide for an
> explicit mechanism that can be made mandatory for drivers, avoids
> special casing, avoids dumping work onto a single CPU and leaves it up
> to the higher-level code to figure out ordering, if it wants to.
>
> --
> Michal Ostrowski <mostrows@watson.ibm.com>
>
>
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 7:24 [PATCH, untested] Support for PPPOE on SMP Rusty Russell
2003-06-25 11:19 ` Jamal Hadi
2003-06-25 13:21 ` Michal Ostrowski
@ 2003-06-25 16:01 ` Jason Lunz
2 siblings, 0 replies; 35+ messages in thread
From: Jason Lunz @ 2003-06-25 16:01 UTC (permalink / raw)
To: netdev
rusty@rustcorp.com.au said:
> I don't understand the unbalanced dev_put in net_rx_action(), BTW.
It's tricky. There are two paths an skb can take into net_rx_action(),
napi and non-napi. The non-napi path uses dev_hold/dev_put on both
skb->dev and a virtual per-cpu struct net_device, the backlog_dev.
In a non-napi skb receive, the driver uses netif_rx() to hand the skb up
to the net core. netif_rx does a dev_hold on skb->dev, puts the skb on
the current cpu's softnet_data queue, and uses netif_rx_schedule to
schedule that softnet-data's ->backlog_dev to be polled. In the
process, __netif_rx_schedule does a dev_hold(backlog_dev).
So the queue of ready net_devices processed by net_rx_action may contain
actual struct net_devices (napi) or the virtual ->backlog_dev
net_device. In the former case, net_rx_action's dev_put balances the
dev_hold done when the driver called __netif_rx_schedule(). In the
latter case, net_rx_action's dev_put balances the dev_hold of the
backlog_dev done when netif_rx called __netif_rx_schedule().
I hope that makes some kind of sense. It took a while to figure out, but
I saved my notes. :)
Jason
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 13:21 ` Michal Ostrowski
2003-06-25 13:42 ` Michal Ostrowski
@ 2003-06-25 16:15 ` Stephen Hemminger
2003-06-25 16:22 ` Jamal Hadi
2003-06-25 21:33 ` David S. Miller
2 siblings, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2003-06-25 16:15 UTC (permalink / raw)
To: Michal Ostrowski; +Cc: rusty, davem, paulus, netdev, fcusack, dfs, carlson
On 25 Jun 2003 09:21:02 -0400
Michal Ostrowski <mostrows@watson.ibm.com> wrote:
> First some background for those new to this discussion (I was going post
> the original discussion that strted this to this list, but the summary
> here should get everyone up to speed).
>
> A user has observed a race condition where the last packet of PPPoE
> discovery arrives just before the first payload packet. The discovery
> packet carries the session id and pppd needs to take this session id and
> create a PPPoE socket which will then pick up all packets matching the
> given session id. The race is between the arrival of the first payload
> packet and pppd's creation of the socket that is to receive PPPoE
> payload. If the packet wins the race, the payload packet is lost. This
> problem was noticed only because the ISP in this case configured their
> systems to use a longer, non-standard (but legal) retransmit timeout
> thus causing noticeable delays in PPP negotiation.
>
Also, you only need the ordering dependency till the session is setup, not
after it is established. Imagine a large ISP with many PPPoE sessions; it
makes no sense to serialize traffic just for this session establishment case.
In the long run, the right answer probably is to push the session management
out of the daemon and into the kernel. Today the PPPoE code in the kernel
is only half-brained, it needs pppd to survive.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 16:15 ` Stephen Hemminger
@ 2003-06-25 16:22 ` Jamal Hadi
2003-06-25 16:39 ` Stephen Hemminger
0 siblings, 1 reply; 35+ messages in thread
From: Jamal Hadi @ 2003-06-25 16:22 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Michal Ostrowski, rusty, davem, paulus, netdev, fcusack, dfs,
carlson
On Wed, 25 Jun 2003, Stephen Hemminger wrote:
> In the long run, the right answer probably is to push the session management
> out of the daemon and into the kernel. Today the PPPoE code in the kernel
> is only half-brained, it needs pppd to survive.
>
I would think pppd is the half-brained portion ;->
Placing control protocols in the kernel is plain wrong.
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 16:22 ` Jamal Hadi
@ 2003-06-25 16:39 ` Stephen Hemminger
2003-06-25 17:07 ` Jamal Hadi
0 siblings, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2003-06-25 16:39 UTC (permalink / raw)
To: Jamal Hadi; +Cc: mostrows, rusty, davem, paulus, netdev, fcusack, dfs, carlson
On Wed, 25 Jun 2003 12:22:35 -0400 (EDT)
Jamal Hadi <hadi@shell.cyberus.ca> wrote:
>
>
> On Wed, 25 Jun 2003, Stephen Hemminger wrote:
>
> > In the long run, the right answer probably is to push the session management
> > out of the daemon and into the kernel. Today the PPPoE code in the kernel
> > is only half-brained, it needs pppd to survive.
> >
>
> I would think pppd is the half-brained portion ;->
>
> Placing control protocols in the kernel is plain wrong.
What about arp, TCP, IP, routing protocols. The problem is that state management
needs to be done in one place.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 16:39 ` Stephen Hemminger
@ 2003-06-25 17:07 ` Jamal Hadi
2003-06-25 17:40 ` Stephen Hemminger
2003-06-25 22:22 ` Paul Mackerras
0 siblings, 2 replies; 35+ messages in thread
From: Jamal Hadi @ 2003-06-25 17:07 UTC (permalink / raw)
To: Stephen Hemminger
Cc: mostrows, rusty, davem, paulus, netdev, fcusack, dfs, carlson
On Wed, 25 Jun 2003, Stephen Hemminger wrote:
> On Wed, 25 Jun 2003 12:22:35 -0400 (EDT)
> Jamal Hadi <hadi@shell.cyberus.ca> wrote:
>
> > Placing control protocols in the kernel is plain wrong.
>
> What about arp, TCP, IP, routing protocols.
ARP should really be ripped off the kernel. I mentioned to you once
the same in regards to STP and iirc you agreed.
I wouldnt call TCP or IP control protocols.
>The problem is that state management needs to be done in one place.
a protocol or implementation which wishes to do state maintanance
properly oughta be able to do the synchronization on its own.
Separation between policy and mechanism has been the strength of unix.
A clean separation between control and a data path is very important.
Control protocols tend to be very rich environments which are
constantly changing. Take STP, there are so many features that could be
added to STP that are much harder to add because it is in the kernel.
Maybe what needs to be looked at i sthe design of pppoe or ppp.
The patch from Rusty is just bandaid.
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 15:45 ` Jamal Hadi
@ 2003-06-25 17:27 ` Michal Ostrowski
2003-06-25 22:17 ` Paul Mackerras
0 siblings, 1 reply; 35+ messages in thread
From: Michal Ostrowski @ 2003-06-25 17:27 UTC (permalink / raw)
To: Jamal Hadi
Cc: Rusty Russell, David S. Miller, Paul MacKerras, netdev, fcusack,
David F. Skoll, James Carlson
Paul: you made an assertion to me in an eariler e-mail that you were
concerned about packet ordering for the sake of vj and compression.
IIRC the PPPoE spec prohibits compression, probably for this very
reason. Is there any other reason we'd be worried about re-ordering in
the PPP data stream?
On Wed, 2003-06-25 at 11:45, Jamal Hadi wrote:
>
> Have you tested the case where the ethernet card is tied to only
> CPU in SMP? That guarantees ordering.
Agreed, this does guarantee ordering. But there are cases where I don't
have this guarantee and those are the issues Rusty's patch attempts to
solve.
> Ordering per protocol should really be that protocols problem to
> solve. If you cant solve it you have a bug.
>
The session initiation race I described earlier is brought about
independently by several problems:
1. PPPoE negotiation is done in user space and thus there is a window
between completion of this negotiation and the creation of the PPPoE
socket during which a payload packet may arrive and be dropped (SMP
and UP).
2. Re-ordering by softIRQ handling on SMP may cause same problem.
There's also the question as to whether or not there are other protocols
(perhaps not implemented in the kernel, but relying on AF_PACKET) may be
affected by this (#2).
We can fix #1 without any patches to core networking code. If the SMP
softIRQ re-ordering issues is handled, then we may have some better
options for fixing #1. But note that even if #1 is fixed and #2 isn't,
then we're not any better off.
--
Michal Ostrowski <mostrows@watson.ibm.com>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 17:07 ` Jamal Hadi
@ 2003-06-25 17:40 ` Stephen Hemminger
2003-06-25 18:00 ` Michal Ostrowski
2003-06-25 22:22 ` Paul Mackerras
1 sibling, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2003-06-25 17:40 UTC (permalink / raw)
To: Jamal Hadi; +Cc: mostrows, rusty, davem, paulus, netdev, fcusack, dfs, carlson
On Wed, 25 Jun 2003 13:07:46 -0400 (EDT)
Jamal Hadi <hadi@shell.cyberus.ca> wrote:
>
>
> On Wed, 25 Jun 2003, Stephen Hemminger wrote:
>
> > On Wed, 25 Jun 2003 12:22:35 -0400 (EDT)
> > Jamal Hadi <hadi@shell.cyberus.ca> wrote:
> >
> > > Placing control protocols in the kernel is plain wrong.
> >
> > What about arp, TCP, IP, routing protocols.
>
> ARP should really be ripped off the kernel. I mentioned to you once
> the same in regards to STP and iirc you agreed.
> I wouldnt call TCP or IP control protocols.
>
> >The problem is that state management needs to be done in one place.
>
> a protocol or implementation which wishes to do state maintanance
> properly oughta be able to do the synchronization on its own.
> Separation between policy and mechanism has been the strength of unix.
> A clean separation between control and a data path is very important.
> Control protocols tend to be very rich environments which are
> constantly changing. Take STP, there are so many features that could be
> added to STP that are much harder to add because it is in the kernel.
Rather than take an architectural approach about what is right and wrong,
I take the practical point of view. If the protocol is small, and the
policy can be done in the kernel fine; if the implementation gets messy
and the right information is not there, then it belongs in user space.
For PPPoE, the session management needs to be in kernel space, with the policy
in user space. What if the kernel, initialized the session when it saw
the discovery and notified the pppd, session would not be established
until ppd accepted the connection. This would be more like a socket
protocol without auto-accept like TCP. Any data for the session would
then stay queued until it was accepted or rejected.
Having special non-SMP receive logic is bogus; and probably won't work
anyway with preempt and other races.
There is already work in moving STP out of the kernel, but even that
has shown that the problem is how to have the proper management hooks
to do the job. That is why it hasn't been a simple slam dunk.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 17:40 ` Stephen Hemminger
@ 2003-06-25 18:00 ` Michal Ostrowski
0 siblings, 0 replies; 35+ messages in thread
From: Michal Ostrowski @ 2003-06-25 18:00 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Jamal Hadi, rusty, David S. Miller, Paul MacKerras, netdev,
fcusack, carlson
On Wed, 2003-06-25 at 13:40, Stephen Hemminger wrote:
> For PPPoE, the session management needs to be in kernel space, with the policy
> in user space. What if the kernel, initialized the session when it saw
> the discovery and notified the pppd, session would not be established
> until ppd accepted the connection. This would be more like a socket
> protocol without auto-accept like TCP. Any data for the session would
> then stay queued until it was accepted or rejected.
>
Regardless of the solution take for the session-initiation race, any
solution would fall apart if SMP softIRQ's can reorder packets (that is
it would result in dropped packets). Only once there is a solution for
this reordering problem does it make sense to consider the options for
handling this race.
PPPoE also doesn't exactly cleanly fit nicely into the standard
bind()/listen()/accept()/connect() mould. I've been convinced that
negotiation/discovery belongs in pppd and so would like to avoid adding
connection detection logic into the kernel.
Finally, please keep in mind that with PPPoE when we do hit this problem
the effect is that PPP session establishment takes a bit longer since we
have to wait for an LCP timeout and retransmit. I am much more curious
about how other protocols may be affected by packet reordering.
--
Michal Ostrowski <mostrows@watson.ibm.com>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 13:21 ` Michal Ostrowski
2003-06-25 13:42 ` Michal Ostrowski
2003-06-25 16:15 ` Stephen Hemminger
@ 2003-06-25 21:33 ` David S. Miller
2003-06-25 22:06 ` Michal Ostrowski
2003-06-26 3:57 ` Rusty Russell
2 siblings, 2 replies; 35+ messages in thread
From: David S. Miller @ 2003-06-25 21:33 UTC (permalink / raw)
To: mostrows; +Cc: rusty, paulus, netdev, fcusack, dfs, carlson
Why don't you just queue the payload packets in a "resolution queue"
until the socket is created? Just make the resolution queue packets
timeout using a value that will easily exceed any reasonable PPP
negotiation time.
All this ordered packet arrival shit is just beyond stupid.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 21:33 ` David S. Miller
@ 2003-06-25 22:06 ` Michal Ostrowski
2003-06-26 1:04 ` David S. Miller
2003-06-26 3:57 ` Rusty Russell
1 sibling, 1 reply; 35+ messages in thread
From: Michal Ostrowski @ 2003-06-25 22:06 UTC (permalink / raw)
To: David S. Miller; +Cc: rusty, Paul MacKerras, netdev, fcusack, carlson
On Wed, 2003-06-25 at 17:33, David S. Miller wrote:
> Why don't you just queue the payload packets in a "resolution queue"
> until the socket is created? Just make the resolution queue packets
> timeout using a value that will easily exceed any reasonable PPP
> negotiation time.
>
> All this ordered packet arrival shit is just beyond stupid.
Exactly this mechanism is what I had in mind.
The open question remaining is if there are any protocols which can be
affected by packets being processed out of order. Some people have
suggested that there are. If not, then there's not much to discuss. Can
anyone comment on this decisively, either way?
--
Michal Ostrowski <mostrows@watson.ibm.com>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 17:27 ` Michal Ostrowski
@ 2003-06-25 22:17 ` Paul Mackerras
2003-06-25 22:56 ` Michal Ostrowski
0 siblings, 1 reply; 35+ messages in thread
From: Paul Mackerras @ 2003-06-25 22:17 UTC (permalink / raw)
To: Michal Ostrowski
Cc: Jamal Hadi, Rusty Russell, David S. Miller, netdev,
David F. Skoll, James Carlson
Michal Ostrowski writes:
> Paul: you made an assertion to me in an eariler e-mail that you were
> concerned about packet ordering for the sake of vj and compression.
> IIRC the PPPoE spec prohibits compression, probably for this very
> reason. Is there any other reason we'd be worried about re-ordering in
> the PPP data stream?
Reordering would stop you doing multilink, for instance. Generally,
PPP protocols assume ordering where it is helpful since most
point-to-point links don't reorder packets. IMO the PPPoE protocol
itself should have included a sequence number, but we can't change
what's deployed.
James might be able to comment better than me on what will happen if
packets get reordered during the negotiation phase of a PPP
connection. I think the worst is that some packets will have to be
retransmitted and thus the negotiation will take several seconds
longer than it needs to.
Paul.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 17:07 ` Jamal Hadi
2003-06-25 17:40 ` Stephen Hemminger
@ 2003-06-25 22:22 ` Paul Mackerras
2003-06-25 22:53 ` Ben Greear
1 sibling, 1 reply; 35+ messages in thread
From: Paul Mackerras @ 2003-06-25 22:22 UTC (permalink / raw)
To: Jamal Hadi
Cc: Stephen Hemminger, mostrows, rusty, davem, netdev, dfs, carlson
Jamal Hadi writes:
> a protocol or implementation which wishes to do state maintanance
> properly oughta be able to do the synchronization on its own.
> Separation between policy and mechanism has been the strength of unix.
> A clean separation between control and a data path is very important.
> Control protocols tend to be very rich environments which are
> constantly changing. Take STP, there are so many features that could be
> added to STP that are much harder to add because it is in the kernel.
>
> Maybe what needs to be looked at i sthe design of pppoe or ppp.
OK, now that we have had our little flight of fancy about what things
will be like once we get to heaven, can we talk about this bastard
protocol called PPPoE? :)
Or are you going to go personally to each ISP in the world and tell
them they shouldn't use PPPoE? :)
In any case the problem isn't strictly with PPPoE, since ethernet
doesn't reorder packets on the wire. The problem is that the lower
parts of the Linux network stack lose information.
Paul.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 22:22 ` Paul Mackerras
@ 2003-06-25 22:53 ` Ben Greear
0 siblings, 0 replies; 35+ messages in thread
From: Ben Greear @ 2003-06-25 22:53 UTC (permalink / raw)
To: Paul Mackerras
Cc: Jamal Hadi, Stephen Hemminger, mostrows, rusty, davem, netdev,
dfs, carlson
Paul Mackerras wrote:
> Jamal Hadi writes:
>
>
>>a protocol or implementation which wishes to do state maintanance
>>properly oughta be able to do the synchronization on its own.
>>Separation between policy and mechanism has been the strength of unix.
>>A clean separation between control and a data path is very important.
>>Control protocols tend to be very rich environments which are
>>constantly changing. Take STP, there are so many features that could be
>>added to STP that are much harder to add because it is in the kernel.
>>
>>Maybe what needs to be looked at i sthe design of pppoe or ppp.
>
>
> OK, now that we have had our little flight of fancy about what things
> will be like once we get to heaven, can we talk about this bastard
> protocol called PPPoE? :)
>
> Or are you going to go personally to each ISP in the world and tell
> them they shouldn't use PPPoE? :)
>
> In any case the problem isn't strictly with PPPoE, since ethernet
> doesn't reorder packets on the wire. The problem is that the lower
> parts of the Linux network stack lose information.
>
> Paul.
Nothing is guaranteed, but you may be right at least most of
the time. Btw, if you want a proprietary tool that
will emulate an ethernet network that reorders packets, I write
such a thing and will give it to you. It could help you
with testing perhaps.
Also, if you have a PCMCIA Zircom NIC, it seems to reorder packets
just for the hell of it (and no, I'm not using a dual-cpu laptop :))
I don't know of any other protocols that can't handle reordering,
since most of them seem to be designed to run over the real internet,
where reordering/drop/duplication is a part of life.
Ben
>
--
Ben Greear <greearb@candelatech.com> <Ben_Greear AT excite.com>
President of Candela Technologies Inc http://www.candelatech.com
ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 22:17 ` Paul Mackerras
@ 2003-06-25 22:56 ` Michal Ostrowski
0 siblings, 0 replies; 35+ messages in thread
From: Michal Ostrowski @ 2003-06-25 22:56 UTC (permalink / raw)
To: Paul MacKerras
Cc: Jamal Hadi, Rusty Russell, David S. Miller, netdev, carlson
On Wed, 2003-06-25 at 18:17, Paul Mackerras wrote:
> James might be able to comment better than me on what will happen if
> packets get reordered during the negotiation phase of a PPP
> connection. I think the worst is that some packets will have to be
> retransmitted and thus the negotiation will take several seconds
> longer than it needs to.
This is exactly what we're dealing with the current "bug"; the worst
case effect is a delay. I don't think heroic measures are called for
the sake of this PPPoE issue alone.
--
Michal Ostrowski <mostrows@watson.ibm.com>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 22:06 ` Michal Ostrowski
@ 2003-06-26 1:04 ` David S. Miller
0 siblings, 0 replies; 35+ messages in thread
From: David S. Miller @ 2003-06-26 1:04 UTC (permalink / raw)
To: mostrows; +Cc: rusty, paulus, netdev, fcusack, carlson
From: Michal Ostrowski <mostrows@watson.ibm.com>
Date: 25 Jun 2003 18:06:54 -0400
Exactly this mechanism is what I had in mind.
Great.
The open question remaining is if there are any protocols which can be
affected by packets being processed out of order. Some people have
suggested that there are. If not, then there's not much to discuss. Can
anyone comment on this decisively, either way?
TCP, as one example, is able to cope very well. It is even able to
distinguish reordering from true packet loss.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-25 21:33 ` David S. Miller
2003-06-25 22:06 ` Michal Ostrowski
@ 2003-06-26 3:57 ` Rusty Russell
2003-06-26 3:59 ` David S. Miller
2003-06-26 11:37 ` Michal Ostrowski
1 sibling, 2 replies; 35+ messages in thread
From: Rusty Russell @ 2003-06-26 3:57 UTC (permalink / raw)
To: David S. Miller; +Cc: paulus, netdev, fcusack, carlson
In message <20030625.143334.85380461.davem@redhat.com> you write:
>
> Why don't you just queue the payload packets in a "resolution queue"
> until the socket is created? Just make the resolution queue packets
> timeout using a value that will easily exceed any reasonable PPP
> negotiation time.
Sure, that works in this case, where you know when you get the packet
that it's out of order. But I wanted to see how ugly it got to do it
generally: a protocol where you can't tell until later that things
were in the wrong order can't use this technique. Paul tells me that
multilink PPP assumes this (moral: don't do multilink PPPoE).
Anyway, my patch is fundamentally flawed: you can't do
cpu_raise_softirq() on another CPU, it's racy (*bad* *bad* interface).
> All this ordered packet arrival shit is just beyond stupid.
I want to know how often this is happening (Michal?), because if
protocols need ordering and can't tell, it becomes effectively a
packet drop somewhere down in the protocol. If it's 1 in a million,
OK. If it's 1 in a thousand, that's bad.
Frankly, I'm amazed anyone sees reordering in real life...
Thanks,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-26 3:57 ` Rusty Russell
@ 2003-06-26 3:59 ` David S. Miller
2003-06-26 8:17 ` Rusty Russell
2003-06-26 10:51 ` James Carlson
2003-06-26 11:37 ` Michal Ostrowski
1 sibling, 2 replies; 35+ messages in thread
From: David S. Miller @ 2003-06-26 3:59 UTC (permalink / raw)
To: rusty; +Cc: paulus, netdev, fcusack, carlson
From: Rusty Russell <rusty@rustcorp.com.au>
Date: Thu, 26 Jun 2003 13:57:09 +1000
Frankly, I'm amazed anyone sees reordering in real life...
Many paths on the internet are quite reordered, this is
the first thing. In fact, I claim that any TCP stack that
doesn't do reordering detection is busted performance wise.
The second thing is that network cards can and do reorder packets.
Some PCMCIA cards do this just for fun. And ethernet _DOES NOT_
guarentee non-reordering. At a minumum, a card can use QoS values to
reorder receive of a given packet, it can also use this to reorder
transmit. Our packet schedulers do this on a software level.
If you need ordering, you need sequence numbers in your
protocol if you wish to operate over these mediums.
The case where SMP causes out-of-order packet delivery is just
academic compared to the non-local sources of reordering
mentioned above.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-26 3:59 ` David S. Miller
@ 2003-06-26 8:17 ` Rusty Russell
2003-06-26 8:55 ` David S. Miller
2003-06-26 10:51 ` James Carlson
1 sibling, 1 reply; 35+ messages in thread
From: Rusty Russell @ 2003-06-26 8:17 UTC (permalink / raw)
To: David S. Miller; +Cc: paulus, netdev, fcusack, carlson
In message <20030625.205941.41631020.davem@redhat.com> you write:
> From: Rusty Russell <rusty@rustcorp.com.au>
> Date: Thu, 26 Jun 2003 13:57:09 +1000
>
> Frankly, I'm amazed anyone sees reordering in real life...
>
> Many paths on the internet are quite reordered, this is
> the first thing. In fact, I claim that any TCP stack that
> doesn't do reordering detection is busted performance wise.
Sure, but I was assuming that the packets arrived in order and got
processed out of order. If the first one isn't happening, this patch
is doubly useless 8)
Thanks,
Rusty.
--
Anyone who quotes me in their sig is an idiot. -- Rusty Russell.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-26 8:17 ` Rusty Russell
@ 2003-06-26 8:55 ` David S. Miller
2003-06-26 10:47 ` James Carlson
0 siblings, 1 reply; 35+ messages in thread
From: David S. Miller @ 2003-06-26 8:55 UTC (permalink / raw)
To: rusty; +Cc: paulus, netdev, fcusack, carlson
From: Rusty Russell <rusty@rustcorp.com.au>
Date: Thu, 26 Jun 2003 18:17:45 +1000
If the first one isn't happening, this patch is doubly useless 8)
It does happen, but so does reordering on ethernet itself.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-26 8:55 ` David S. Miller
@ 2003-06-26 10:47 ` James Carlson
0 siblings, 0 replies; 35+ messages in thread
From: James Carlson @ 2003-06-26 10:47 UTC (permalink / raw)
To: David S. Miller; +Cc: rusty, paulus, netdev, fcusack
David S. Miller writes:
> From: Rusty Russell <rusty@rustcorp.com.au>
> Date: Thu, 26 Jun 2003 18:17:45 +1000
>
> If the first one isn't happening, this patch is doubly useless 8)
>
> It does happen, but so does reordering on ethernet itself.
That's nonsense, and it's directly counter to all of the 802
standards. Please explain how reordering on a single wire could ever
take place.
Reordering *requires* that you have something like a router in the
path. It's an issue for a network or transport layer protocol to
consider, but it's not a link-layer issue.
If you have Ethernet interfaces that reorder packets between a given
pair of stations, then those interfaces are just simply broken.
>From IEEE Std 802.1D, 1998:
6.3.3 Frame misordering
The MAC Service does not permit the reordering of frames with a
given user priority for a given combination of destination address
and source address. MA_UNITDATA.indication service primitives
corresponding to MA_UNITDATA.request primitives, with the same
requested priority and for the same combination of destination and
source addresses, are received in the same order as the request
primitives were processed.
Here are some excerpts from IEEE Std 802.3-2002:
1.4.94 Conversation: A set of MAC frames transmitted from one end
station to another, where all of the MAC frames form an ordered
sequence, and where the communicating end stations require the
ordering to be maintained among the set of MAC frames
exchanged. (See IEEE 802.3 Clause 43.)
43.2.1 Principles of Link Aggregation
Link Aggregation allows a MAC Client to treat a set of one or more
ports as if it were a single port. In doing so, it employs the
following principles and concepts:
[...]
f) Frame ordering must be maintained for certain sequences of frame
exchanges between MAC Clients (known as conversations, see
1.4). The Distributor ensures that all frames of a given
conversation are passed to a single port. For any given port, the
Collector is required to pass frames to the MAC Client in the order
that they are received from that port. The Collector is otherwise
free to select frames received from the aggregated ports in any
order. Since there are no means for frames to be misordered on a
single link, this guarantees that frame ordering is maintained for
any conversation.
[...]
43.2.3 Frame Collector
A Frame Collector is responsible for receiving incoming frames
(i.e., AggMuxN:MA_DATA.indications) from the set of individual
links that form the Link Aggregation Group (through each link s
associated Aggregator Parser/Multiplexer) and delivering them to
the MAC Client. Frames received from a given port are delivered
to the MAC Client in the order that they are received by the
Frame Collector. Since the Frame Distributor is responsible for
maintaining any frame ordering constraints, there is no
requirement for the Frame Collector to perform any reordering of
frames received from multiple links.
[...]
Annex 43A
a) Frame duplication is not permitted.
b) Frame ordering must be preserved in aggregated links. Strictly, the
MAC service specication (ISO/IEC 15802-1) states that order must be
preserved for frames with a given SA, DA, and priority; however,
this is a tighter constraint than is absolutely necessary. There
may be multiple, logically independent conversations in progress
between a given SA-DA pair at a given priority; the real
requirement is to maintain ordering within a conversation, though
not necessarily between conversations.
--
James Carlson <carlson@workingcode.com>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-26 3:59 ` David S. Miller
2003-06-26 8:17 ` Rusty Russell
@ 2003-06-26 10:51 ` James Carlson
2003-06-26 23:18 ` Jamal Hadi
1 sibling, 1 reply; 35+ messages in thread
From: James Carlson @ 2003-06-26 10:51 UTC (permalink / raw)
To: David S. Miller; +Cc: rusty, paulus, netdev, fcusack
David S. Miller writes:
> From: Rusty Russell <rusty@rustcorp.com.au>
> Date: Thu, 26 Jun 2003 13:57:09 +1000
>
> Frankly, I'm amazed anyone sees reordering in real life...
>
> Many paths on the internet are quite reordered, this is
> the first thing. In fact, I claim that any TCP stack that
> doesn't do reordering detection is busted performance wise.
Nobody's disputing that. That's certainly true. However, reordering
on a given wire does not happen.
> The second thing is that network cards can and do reorder packets.
> Some PCMCIA cards do this just for fun.
If so, then that needs to be taken up with the manufacturer. That's a
rather severe design flaw that will prevent such a card from ever
being used for anything other than IP -- many other protocols *ASSUME*
that packets on a single wire cannot be reordered, including SNA, PPP
(!), and link aggregation, among others.
> And ethernet _DOES NOT_
> guarentee non-reordering.
Please provide references. 802.1 MAC says otherwise.
> At a minumum, a card can use QoS values to
> reorder receive of a given packet, it can also use this to reorder
> transmit. Our packet schedulers do this on a software level.
Sure. *If* QoS is present, then reordering between priority levels is
permissible. However, reordering L2 frames at a given priority level
isn't.
> If you need ordering, you need sequence numbers in your
> protocol if you wish to operate over these mediums.
>
> The case where SMP causes out-of-order packet delivery is just
> academic compared to the non-local sources of reordering
> mentioned above.
Not where it affects the correctness of the defined protocols.
--
James Carlson <carlson@workingcode.com>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-26 3:57 ` Rusty Russell
2003-06-26 3:59 ` David S. Miller
@ 2003-06-26 11:37 ` Michal Ostrowski
1 sibling, 0 replies; 35+ messages in thread
From: Michal Ostrowski @ 2003-06-26 11:37 UTC (permalink / raw)
To: Rusty Russell; +Cc: David S. Miller, Paul MacKerras, netdev, fcusack, carlson
On Wed, 2003-06-25 at 23:57, Rusty Russell wrote:
> In message <20030625.143334.85380461.davem@redhat.com> you write:
> >
> > Why don't you just queue the payload packets in a "resolution queue"
> > until the socket is created? Just make the resolution queue packets
> > timeout using a value that will easily exceed any reasonable PPP
> > negotiation time.
>
> Sure, that works in this case, where you know when you get the packet
> that it's out of order. But I wanted to see how ugly it got to do it
> generally: a protocol where you can't tell until later that things
> were in the wrong order can't use this technique. Paul tells me that
> multilink PPP assumes this (moral: don't do multilink PPPoE).
>
> Anyway, my patch is fundamentally flawed: you can't do
> cpu_raise_softirq() on another CPU, it's racy (*bad* *bad* interface).
>
> > All this ordered packet arrival shit is just beyond stupid.
>
> I want to know how often this is happening (Michal?), because if
> protocols need ordering and can't tell, it becomes effectively a
> packet drop somewhere down in the protocol. If it's 1 in a million,
> OK. If it's 1 in a thousand, that's bad.
>
> Frankly, I'm amazed anyone sees reordering in real life...
I have observed (very, very rarely) a situation where interrupt
sequences for two CPUs allowed this to happen (but not that it did
necessarily happen). When these races do occur, it probably hits TCP
traffic which deals with it, otherwise any hiccups it causes are
probably lost in the noise.
For PPPoE (non multilink) the worst case scenario would appear to be a
packet drop with a retransmit delay imposed on or by higher-level
protocols. That being said, I don't think PPPoE provides any
justification for any modifications to the core networking code to deal
with this.
Continuing on with PPPoE, I would like to get people's opinions on
whether or not mechanisms should be put in (as outlined in David's
suggestion above) to handle races between payload packets and socket
creation. These races are, I think, quite rare and at worst may impose
a delay of a couple of seconds on session creation. I'm not entirely
comfortable with the idea of saving incoming packets that I can't match
to existing sessions in case a matching session comes into existence in
the near future (DOS), especially if not handling this case is
non-fatal. I'd like to get a consensus on this "policy" issue.
--
Michal Ostrowski <mostrows@watson.ibm.com>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-26 10:51 ` James Carlson
@ 2003-06-26 23:18 ` Jamal Hadi
2003-06-27 11:39 ` James Carlson
0 siblings, 1 reply; 35+ messages in thread
From: Jamal Hadi @ 2003-06-26 23:18 UTC (permalink / raw)
To: James Carlson; +Cc: David S. Miller, rusty, paulus, netdev, fcusack
On Thu, 26 Jun 2003, James Carlson wrote:
> David S. Miller writes:
> If so, then that needs to be taken up with the manufacturer. That's a
> rather severe design flaw that will prevent such a card from ever
> being used for anything other than IP -- many other protocols *ASSUME*
> that packets on a single wire cannot be reordered, including SNA, PPP
> (!), and link aggregation, among others.
>
So what about packet being loss? Wouldnt that ensure reordering?
And there is no such thing as a lossless wire.
cheers,
jamal
PS:- Paulus i wasnt preaching getting rid of ppp/pppoe although its
a nice thouhgt. More fix linux pppd and pppoe ;->
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-26 23:18 ` Jamal Hadi
@ 2003-06-27 11:39 ` James Carlson
2003-06-27 12:12 ` Paul Mackerras
2003-06-28 2:21 ` Jamal Hadi
0 siblings, 2 replies; 35+ messages in thread
From: James Carlson @ 2003-06-27 11:39 UTC (permalink / raw)
To: Jamal Hadi; +Cc: David S. Miller, rusty, paulus, netdev, fcusack
Jamal Hadi writes:
> So what about packet being loss? Wouldnt that ensure reordering?
Please explain. What pattern of loss possibly results in one packet
being inserted in the stream ahead of another?
Here's loss: 1 2 4 5 6
Here's reordering: 1 2 4 3 5 6
Loss preserves ordering. To get misordering, you have to
intentionally hold onto a message and reinsert it later. What I've
been pointing out is that the 802 MAC layer *does not* permit
misordering (or duplication, for that matter). Loss, reordering, and
duplication are all separate errors.
> And there is no such thing as a lossless wire.
True, but not relevant. When you put packets onto a wire, you must do
so in a particular order -- it's not possible to put more than one
packet at a given time on a single wire. It's also not possible for
the receiver to get them in a different order than you sent them.
They're essentially "single file" on that wire.
PPP relies on this fact (albeit for serial wires) as part of its
protocol definition (RFC 1661):
1. Introduction
The Point-to-Point Protocol is designed for simple links which
transport packets between two peers. These links provide full-duplex
simultaneous bi-directional operation, and are assumed to deliver
packets in order. It is intended that PPP provide a common solution
[...]
In addition, the 802 MAC layer cannot reorder packets, so there is no
conflict here. Although there are many design mistakes in PPPoE, this
just is not one of them.
There is a design problem here, but it's not PPPoE's.
> PS:- Paulus i wasnt preaching getting rid of ppp/pppoe although its
> a nice thouhgt. More fix linux pppd and pppoe ;->
Believe me, the IETF working group didn't want PPPoE, either. It
dropped from outer space. The only reason it was published as
"Informational" is that it had already been deployed (before anyone
bothered to talk to the folks who are responsible for the PPP
standards), and thus somebody might want to know about it.
If we could have killed it, we would have.
--
James Carlson <carlson@workingcode.com>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-27 11:39 ` James Carlson
@ 2003-06-27 12:12 ` Paul Mackerras
2003-06-27 13:19 ` James Carlson
2003-06-27 14:59 ` Stephen Hemminger
2003-06-28 2:21 ` Jamal Hadi
1 sibling, 2 replies; 35+ messages in thread
From: Paul Mackerras @ 2003-06-27 12:12 UTC (permalink / raw)
To: James Carlson; +Cc: Jamal Hadi, David S. Miller, rusty, netdev, fcusack
James Carlson writes:
> Jamal Hadi writes:
> > So what about packet being loss? Wouldnt that ensure reordering?
>
> Please explain. What pattern of loss possibly results in one packet
> being inserted in the stream ahead of another?
Rusty asked me today what protocols there were that coped with packet
loss but couldn't cope with reordering. I couldn't think of any. Do
you know of any examples?
Regards,
Paul.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-27 12:12 ` Paul Mackerras
@ 2003-06-27 13:19 ` James Carlson
2003-06-27 14:59 ` Stephen Hemminger
1 sibling, 0 replies; 35+ messages in thread
From: James Carlson @ 2003-06-27 13:19 UTC (permalink / raw)
To: Paul Mackerras; +Cc: Jamal Hadi, David S. Miller, rusty, netdev, fcusack
Paul Mackerras writes:
> Rusty asked me today what protocols there were that coped with packet
> loss but couldn't cope with reordering. I couldn't think of any. Do
> you know of any examples?
Sure. We've already pointed out MP (RFC 1990) and VJ (RFC 1144) --
both of which handle loss just fine, but fail miserably if packets are
misordered -- MP locks up and VJ will produce silent data corruption.
Regular PPP negotiation itself can have trouble with misordering.
Consider this example:
Peer A Peer B
Req-1 -->
<-- Req-a
Req-2 -->
<-- Ack-2
<-- Ack-1
<-- Req-a
Ack-a -->
In this case, Peer A sent Configure-Request ID 1 with some set of
options (we'll call this set "1"). Peer B then sent Configure-Request
ID a with its own set of options. Based on that Configure-Request,
peer A decided to start over (e.g., peer A originally offered ACCM 0
and then saw ACCM 0xa0000 from the peer and decided that, since the
peer may well be an idiot, changing peer A's ACCM to 0xa0000 would be
prudent) and it sends Configure-Request ID 2.
Because of reordering, Peer B sees Configure-Request ID 2 first. It
responds. Peer A sees the Ack and goes on to AckRcvd state. Peer B
sees ID 1 next. It discards the options it saw in ID 2 and keeps the
options from ID 1, and sends an Ack for that. Peer A discards this
bogus Ack -- it doesn't match the current ID number.
Peer A finally gets Req-a and sends an Ack.
Now we're in a very bad state. Peer A believes it has negotiated its
option set "2" with Peer B, and Peer B believes it has agreed to
option set "1." Oops.
Some others that are known to be sensitive to ordering (cited in
802.1w) are LAT, LLC2, and NETBEUI.
Another is SNA. Reordering SNA packets will cause the link to
reinitialize and cause a fault at the application level. Doing this
causes everyone to have a bad day.
Still another is GARP. This isn't a big deal if you're worrying about
bridging -- GARP doesn't get forwarded -- but it does matter if the
Ethernet driver itself mangles packet order. If you can't maintain
order inside your own host, then GARP is dead.
Yet another is EAP.
There are probably others that haven't occurred to me. The L2-wire-
preserves-order assumption turns out to be very easy to build into a
protocol. All you have to do is pretend (as PPP does) that there's
only one outstanding request at a time, and treat all others as
invalid. That builds in the ordering dependency -- and it's how many
lock-step protocols are designed. In order to be tolerate of
misordering (and duplication), a protocol has to define some sort of
ID numbering window (as is done, for example, in the L2TP control
connection) with a logical sequence of ID numbers, so that the peer
can determine which received ID numbers are "before" or "after" a
given number.
Then finally there's ISO/IEC 15802-1, Clause 9.2 (MAC) that permits
only a "negligible" amount of reordering. (The rate is exactly zero
on normal networks but could be nonzero for a "magically healed"
bridge connection -- if you don't know what that is, don't sweat it.
It requires accidents to occur.)
--
James Carlson <carlson@workingcode.com>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-27 12:12 ` Paul Mackerras
2003-06-27 13:19 ` James Carlson
@ 2003-06-27 14:59 ` Stephen Hemminger
2003-06-27 15:27 ` James Carlson
1 sibling, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2003-06-27 14:59 UTC (permalink / raw)
To: Paul Mackerras; +Cc: carlson, hadi, davem, rusty, netdev, fcusack
On Fri, 27 Jun 2003 22:12:13 +1000 (EST)
Paul Mackerras <paulus@samba.org> wrote:
> James Carlson writes:
> > Jamal Hadi writes:
> > > So what about packet being loss? Wouldnt that ensure reordering?
> >
> > Please explain. What pattern of loss possibly results in one packet
> > being inserted in the stream ahead of another?
>
> Rusty asked me today what protocols there were that coped with packet
> loss but couldn't cope with reordering. I couldn't think of any. Do
> you know of any examples?
>
Does LLC allow for re-ordering?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-27 14:59 ` Stephen Hemminger
@ 2003-06-27 15:27 ` James Carlson
0 siblings, 0 replies; 35+ messages in thread
From: James Carlson @ 2003-06-27 15:27 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Paul Mackerras, hadi, davem, rusty, netdev, fcusack
Stephen Hemminger writes:
> Does LLC allow for re-ordering?
ANSI/IEEE Std 802.2, 1998, section 8.5.2.2 describes a common LLC type
3 simplification that relies on MAC ordering. This is used on media
(such as Ethernet) that don't reorder.
I believe that LLC type 2 ought to be able to handle misordering and
duplication, at least that's the intent of I-mode frames. I don't
know if this actually works in all implementations (after all,
Ethernet doesn't reorder, so it's not as if anyone's really had to
test it), but I can check one or two if someone cares.
I'm not sure about LLC type 1. It appears to expose the client to the
ordering guarantees of the underlying MAC layer, and thus it's very
likely the case that LLC type 1 clients make assumptions about known
MAC types.
--
James Carlson <carlson@workingcode.com>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-27 11:39 ` James Carlson
2003-06-27 12:12 ` Paul Mackerras
@ 2003-06-28 2:21 ` Jamal Hadi
2003-06-28 22:51 ` Frank Cusack
1 sibling, 1 reply; 35+ messages in thread
From: Jamal Hadi @ 2003-06-28 2:21 UTC (permalink / raw)
To: James Carlson; +Cc: David S. Miller, rusty, paulus, netdev, fcusack
On Fri, 27 Jun 2003, James Carlson wrote:
> Jamal Hadi writes:
> > So what about packet being loss? Wouldnt that ensure reordering?
>
> Please explain. What pattern of loss possibly results in one packet
> being inserted in the stream ahead of another?
>
> Here's loss: 1 2 4 5 6
>
> Here's reordering: 1 2 4 3 5 6
>
> Loss preserves ordering. To get misordering, you have to
> intentionally hold onto a message and reinsert it later. What I've
And thats what i was implying.
In your above example:
1 2 4 5 6
If the entity above the wire cared about packet 3 there will be a
retransmit. so it becomes:
1 2 4 5 6 3
I suppose if you can ensure ordering with a retransmit by having a window
of size 1 clocked by ACKs.
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH, untested] Support for PPPOE on SMP
2003-06-28 2:21 ` Jamal Hadi
@ 2003-06-28 22:51 ` Frank Cusack
0 siblings, 0 replies; 35+ messages in thread
From: Frank Cusack @ 2003-06-28 22:51 UTC (permalink / raw)
To: Jamal Hadi; +Cc: James Carlson, David S. Miller, rusty, paulus, netdev, fcusack
On Fri, Jun 27, 2003 at 10:21:21PM -0400, Jamal Hadi wrote:
> On Fri, 27 Jun 2003, James Carlson wrote:
> >
> > Loss preserves ordering. To get misordering, you have to
> > intentionally hold onto a message and reinsert it later. What I've
>
> And thats what i was implying.
> In your above example:
>
> 1 2 4 5 6
> If the entity above the wire cared about packet 3 there will be a
> retransmit. so it becomes:
>
> 1 2 4 5 6 3
Higher layer entities doing retransmits is not reordering. To the lower
layer, it's just the next message.
/fc
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2003-06-28 22:51 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-25 7:24 [PATCH, untested] Support for PPPOE on SMP Rusty Russell
2003-06-25 11:19 ` Jamal Hadi
2003-06-25 13:21 ` Michal Ostrowski
2003-06-25 13:42 ` Michal Ostrowski
2003-06-25 15:45 ` Jamal Hadi
2003-06-25 17:27 ` Michal Ostrowski
2003-06-25 22:17 ` Paul Mackerras
2003-06-25 22:56 ` Michal Ostrowski
2003-06-25 16:15 ` Stephen Hemminger
2003-06-25 16:22 ` Jamal Hadi
2003-06-25 16:39 ` Stephen Hemminger
2003-06-25 17:07 ` Jamal Hadi
2003-06-25 17:40 ` Stephen Hemminger
2003-06-25 18:00 ` Michal Ostrowski
2003-06-25 22:22 ` Paul Mackerras
2003-06-25 22:53 ` Ben Greear
2003-06-25 21:33 ` David S. Miller
2003-06-25 22:06 ` Michal Ostrowski
2003-06-26 1:04 ` David S. Miller
2003-06-26 3:57 ` Rusty Russell
2003-06-26 3:59 ` David S. Miller
2003-06-26 8:17 ` Rusty Russell
2003-06-26 8:55 ` David S. Miller
2003-06-26 10:47 ` James Carlson
2003-06-26 10:51 ` James Carlson
2003-06-26 23:18 ` Jamal Hadi
2003-06-27 11:39 ` James Carlson
2003-06-27 12:12 ` Paul Mackerras
2003-06-27 13:19 ` James Carlson
2003-06-27 14:59 ` Stephen Hemminger
2003-06-27 15:27 ` James Carlson
2003-06-28 2:21 ` Jamal Hadi
2003-06-28 22:51 ` Frank Cusack
2003-06-26 11:37 ` Michal Ostrowski
2003-06-25 16:01 ` Jason Lunz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).