* [PATCH] mac80211: Use correct originator sequence number in a Path Reply
From: Qasim Javed @ 2012-05-25 5:02 UTC (permalink / raw)
To: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
devel-ZwoEplunGu1xMJw8dq7oimD2FQJk+8+b
Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, ravip-DNmUmOh1Rg72fBVCVOL8/A
Hi,
I have been doing some experiments using the 802.11s functionality in the mac80211 stack. Today I stumbled across something which I believe is a critical bug in the usage of originator sequence number for a Path Reply message upon the reception of a Path Request message.
Consider the following topology:
+---+
| S |
+---+
/ \
/ \
+---+ +---+
| A | | B |
+---+ +---+
\ /
\ /
+---+
| D |
+---+
Node S is the source node and D the destination. Clearly there are two possible paths from S to D namely S->A->D and S->B->D. When S wants to communicate with D, it will broadcast a Path Request (PREQ) where the originator will be S and target will be D. On receiving the PREQ, both A and B will broadcast it further to D. Let us assume that aggregate value of the metric for path D->B->S denoted by cost(DBS) is greater than cost(DAS). Notice that according to HWMP operation, when the PREQ is propagating from S to D, the cost on the "reverse" path is aggregated, that is why I used cost(DB) + cost(BS) for cost(DBS) and did not consider cost(SBD). Suppose also that smaller the metric the better it is which is the case for the default airtime link metric used by the 802.11s stack.
Let us suppose that the PREQ which passes through B arrives first at D and as mentioned earlier has a larger (worse) value than the soon to be received PREQ through the intermediate hop A. When D receives a PREQ from B, since it has not received any other PREQ, it generates a Path Reply (PREP). More specifically, the function hwmp_preq_frame_process generates the PREP. The PREQ contains originator and target sequence numbers which are used to avoid loops and ascertain the freshness of route information. On receiving a PREQ at D, the above mentioned function checks whether dot11MeshHWMPnetDiameterTraversalTime have elapsed since the last sequence number update (stored in ifmsh->last_sn_update). So suppose this is true when the first PREQ via B is received at D. So, in this case the originat
or sequence number in PREP is incremented (that is becomes one more than the target sequence number in the PREQ).
Let us look at an example at this point. Suppose, the the originator sequence number in the PREQ is 1 and the target sequence number is 2. When this PREQ is received at D via B, and considering that dot11MeshHWMPnetDiameterTraversalTime have passed since the last sequence number update, we will increment the target sequence number which now becomes 3. Now for the PREP, the originator sequence number of PREQ, 1 in this case, becomes the target sequence number of PREP and the target sequence number of the PREQ (which has been updated and its value is 3) becomes the originator sequence number of the PREP.
As this PREQ which was received at D via B has a larger metric, we know that when the PREQ from S is received via A, it will have a lower (better) metric, hence we will also generate a PREP for that PREQ. Suppose the second PREQ via A is received at D within dot11MeshHWMPnetDiameterTraversalTime (currently 50ms). This is a reasonable assumption since the PREQ is broadcast by S and further broadcast by A and B in some order. There is very less likelihood that the time difference between the PREQ from A and B would be greater than 50ms since this is a lot of time in 802.11 speak where the nodes are contending for the channel on the order of hundreds of microseconds (typically). In short, it is very likely (confirmed through experiments) that this difference is less than 50ms.
So, when the PREQ from S arrives a D via A, since most likely this event happens within 50ms if the PREQ via B, the target sequence number will not be updated. Therefore, the originator sequence number stays at 1 and the target sequence number remains 2. It is very important to note that the code in hwmp_preq_frame_process just "swaps" the originator and target sequence numbers for use in the PREP. More specifically as mentioned earlier, the second PREP will have an originator sequence number of 2 and a target sequence number of 1.
At this point, we have two PREPs in flight, one via B and one via A.
PREP via B: originator sequence number = 3, target sequence number = 1
PREP via A: originator sequence number = 2, target sequence number = 1
The net effect is that when these PREPs reach S, irrespective of the order in which they arrive, the PREP via A will be ignored! This is very wrong since the reason we sent the PREP via A in the first place was that it had a better metric (albeit on the reverse path).
I have not tested the patch yet. This is more of a heads up email to let everyone.
Signed-off-by: Qasim Javed <qasimj-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
net/mac80211/mesh_hwmp.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c
index 70ac7d1..a13b593 100644
--- a/net/mac80211/mesh_hwmp.c
+++ b/net/mac80211/mesh_hwmp.c
@@ -543,6 +543,8 @@ static void hwmp_preq_frame_process(struct ieee80211_sub_if_data *sdata,
time_before(jiffies, ifmsh->last_sn_update)) {
target_sn = ++ifmsh->sn;
ifmsh->last_sn_update = jiffies;
+ } else {
+ target_sn = ifmsh->sn;
}
} else {
rcu_read_lock();
--
1.7.1
^ permalink raw reply related
* Re: [RFC:kvm] export host NUMA info to guest & make emulated device NUMA attr
From: Liu ping fan @ 2012-05-25 4:05 UTC (permalink / raw)
To: Andrew Theurer
Cc: Shirley Ma, kvm, netdev, linux-kernel, qemu-devel, Avi Kivity,
Michael S. Tsirkin, Srivatsa Vaddagiri, Rusty Russell,
Anthony Liguori, Ryan Harper, Shirley Ma, Krishna Kumar,
Tom Lendacky
In-Reply-To: <4FBCF99F.4070409@linux.vnet.ibm.com>
On Wed, May 23, 2012 at 10:52 PM, Andrew Theurer
<habanero@linux.vnet.ibm.com> wrote:
> On 05/22/2012 04:28 AM, Liu ping fan wrote:
>>
>> On Sat, May 19, 2012 at 12:14 AM, Shirley Ma<mashirle@us.ibm.com> wrote:
>>>
>>> On Thu, 2012-05-17 at 17:20 +0800, Liu Ping Fan wrote:
>>>>
>>>> Currently, the guest can not know the NUMA info of the vcpu, which
>>>> will
>>>> result in performance drawback.
>>>>
>>>> This is the discovered and experiment by
>>>> Shirley Ma<xma@us.ibm.com>
>>>> Krishna Kumar<krkumar2@in.ibm.com>
>>>> Tom Lendacky<toml@us.ibm.com>
>>>> Refer to -
>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html
>>>> we can see the big perfermance gap between NUMA aware and unaware.
>>>>
>>>> Enlightened by their discovery, I think, we can do more work -- that
>>>> is to
>>>> export NUMA info of host to guest.
>>>
>>>
>>> There three problems we've found:
>>>
>>> 1. KVM doesn't support NUMA load balancer. Even there are no other
>>> workloads in the system, and the number of vcpus on the guest is smaller
>>> than the number of cpus per node, the vcpus could be scheduled on
>>> different nodes.
>>>
>>> Someone is working on in-kernel solution. Andrew Theurer has a working
>>> user-space NUMA aware VM balancer, it requires libvirt and cgroups
>>> (which is default for RHEL6 systems).
>>>
>> Interesting, and I found that "sched/numa: Introduce
>> sys_numa_{t,m}bind()" committed by Peter and Ingo may help.
>> But I think from the guest view, it can not tell whether the two vcpus
>> are on the same host node. For example,
>> vcpu-a in node-A is not vcpu-b in node-B, the guest lb will be more
>> expensive if it pull_task from vcpu-a and
>> choose vcpu-b to push. And my idea is to export such info to guest,
>> still working on it.
>
>
> The long term solution is to two-fold:
> 1) Guests that are quite large (in that they cannot fit in a host NUMA node)
> must have static mulit-node NUMA topology implemented by Qemu. That is here
> today, but we do not do it automatically, which is probably going to be a VM
> management responsibility.
> 2) Host scheduler and NUMA code must be enhanced to get better placement of
> Qemu memory and threads. For single-node vNUMA guests, this is easy, put it
> all in one node. For mulit-node vNUMA guests, the host must understand that
> some Qemu memory belongs with certain vCPU threads (which make up one of the
> guests vNUMA nodes), and then place that memory/threads in a specific host
> node (and continue for other memory/threads for each Qemu vNUMA node).
>
> Note that even if a guest's memory/threads for a vNUMA node are relocated to
> another host node (which will be necessary) the NUMA characteristics of
> guest are still maintained (as all those vCPUs and memory are still "close"
> to each other).
>
Yeah, I see Peter's work on tip/sched/numa
> The problem with exposing the host's NUMA info directly to the guest is that
> (1) vCPUs will get relocated, so their topology info in the guest will have
> to change over time. IMO that is a bad idea. We have a hard enough time
> getting applications to work with a static NUMA info. To get applications
I original think that vCPUS get relocated only on user demand. And
this can happen on hotplug, not happen frequently, otherwise user will
deserve the drawback.
But forget it, Peter has said no to dynamic-NUMA.
> to react to changing NUMA topology is not going to turn out well. (2) Every
> single guest would have to have the same number of NUMA nodes defined as the
> host. That is overkill, especially for small guests.
>
Thanks for your comment
pingfan
>>
>>
>>> 2. The host scheduler is not aware the relationship between guest vCPUs
>>> and vhost. So it's possible for host scheduler to schedule per-device
>>> vhost thread on the same cpu on which the vCPU kick a TX packet, or
>>> schecule vhost thread on different node than the vCPU for; For RX packet
>>> it's possible for vhost delivers RX packet on the vCPU running on
>>> different node too.
>>>
>> Yes. I notice this point in your original patch.
>>
>>> 3. per-device vhost thread is not scaled.
>>>
>> What about the scale-ability of per-vm * host_NUMA_NODE? When we make
>> advantage of multi-core, we produce mulit vcpu threads for one VM.
>> So what about the emulated device? Is it acceptable to scale to take
>> advantage of host NUMA attr. After all, how many nodes on which the
>> VM
>> can be run on are the user's control. It is a balance of
>> scale-ability and performance.
>>
>>> So the problems are in host scheduling and vhost thread scalability. I
>>> am not sure how much help from exposing NUMA info from host to guest.
>>>
>>> Have you tested these patched? How much performance gain here?
>>>
>> Sorry, not yet. As you have mentioned, the vhost thread scalability
>> is a big problem. So I want to see others' opinion before going on.
>>
>> Thanks and regards,
>> pingfan
>>
>>
>>> Thanks
>>> Shirley
>>>
>>>> So here comes the idea:
>>>> 1. export host numa info through guest's sched domain to its scheduler
>>>> Export vcpu's NUMA info to guest scheduler(I think mem NUMA problem
>>>> has been handled by host). So the guest's lb will consider the
>>>> cost.
>>>> I am still working on this, and my original idea is to export these
>>>> info
>>>> through "static struct sched_domain_topology_level
>>>> *sched_domain_topology"
>>>> to guest.
>>>>
>>>> 2. Do a better emulation of virt mach exported to guest.
>>>> In real world, the devices are limited by kinds of reasons to own
>>>> the NUMA
>>>> property. But as to Qemu, the device is emulated by thread, which
>>>> inherit
>>>> the NUMA attr in nature. We can implement the device as components
>>>> of many
>>>> logic units, each of the unit is backed by a thread in different
>>>> host node.
>>>> Currently, I want to start the work on vhost. But I think, maybe in
>>>> future, the iothread in Qemu can also has such attr.
>>>>
>>>>
>>>> Forgive me, for the limited time, I can not have more better
>>>> understand of
>>>> vhost/virtio_net drivers. These patches are just draft, _FAR_, _FAR_
>>>> from work.
>>>> I will do more detail work for them in future.
>>>>
>>>> To easy the review, the following is the sum up of the 2nd point of
>>>> the idea.
>>>> As for the 1st point of the idea, it is not reflected in the patches.
>>>>
>>>> --spread/shrink the vhost_workers over the host nodes as demanded from
>>>> Qemu.
>>>> And we can consider each vhost_worker as an independent net logic
>>>> device
>>>> embeded in physical device "vhost_net". At the meanwhile, we spread
>>>> vcpu
>>>> threads over the host node.
>>>> The vrings on guest are allocated PAGE_SIZE align separately, so
>>>> they can
>>>> will only be mapped into different host node, so vhost_worker in the
>>>> same
>>>> node can access it with the least cost. So does the vq on guest.
>>>>
>>>> --virtio_net driver will changes and talk with the logic device. And
>>>> which
>>>> logic device it will talk to is determined by on which vcpu it is
>>>> scheduled.
>>>>
>>>> --the binding of vcpus and vhost_worker is implemented by:
>>>> for call direction, vq-a in the node-A will have a dedicated irq-a.
>>>> And
>>>> we set the irq-a's affinity to vcpus in node-A.
>>>> for kick direction, kick register-b trigger different eventfd-b
>>>> which wake up
>>>> vhost_worker-b.
>>>>
> -Andrew Theurer
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply
* Re: [RFC:kvm] export host NUMA info to guest & make emulated device NUMA attr
From: Liu ping fan @ 2012-05-25 3:29 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Krishna Kumar, Andrew Theurer, Rusty Russell, Shirley Ma, kvm,
netdev, Shirley Ma, qemu-devel, linux-kernel, Tom Lendacky,
Ryan Harper, Avi Kivity, Anthony Liguori, Srivatsa Vaddagiri
In-Reply-To: <20120523151604.GB30542@redhat.com>
On Wed, May 23, 2012 at 11:16 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, May 23, 2012 at 09:52:15AM -0500, Andrew Theurer wrote:
>> On 05/22/2012 04:28 AM, Liu ping fan wrote:
>> >On Sat, May 19, 2012 at 12:14 AM, Shirley Ma<mashirle@us.ibm.com> wrote:
>> >>On Thu, 2012-05-17 at 17:20 +0800, Liu Ping Fan wrote:
>> >>>Currently, the guest can not know the NUMA info of the vcpu, which
>> >>>will
>> >>>result in performance drawback.
>> >>>
>> >>>This is the discovered and experiment by
>> >>> Shirley Ma<xma@us.ibm.com>
>> >>> Krishna Kumar<krkumar2@in.ibm.com>
>> >>> Tom Lendacky<toml@us.ibm.com>
>> >>>Refer to -
>> >>>http://www.mail-archive.com/kvm@vger.kernel.org/msg69868.html
>> >>>we can see the big perfermance gap between NUMA aware and unaware.
>> >>>
>> >>>Enlightened by their discovery, I think, we can do more work -- that
>> >>>is to
>> >>>export NUMA info of host to guest.
>> >>
>> >>There three problems we've found:
>> >>
>> >>1. KVM doesn't support NUMA load balancer. Even there are no other
>> >>workloads in the system, and the number of vcpus on the guest is smaller
>> >>than the number of cpus per node, the vcpus could be scheduled on
>> >>different nodes.
>> >>
>> >>Someone is working on in-kernel solution. Andrew Theurer has a working
>> >>user-space NUMA aware VM balancer, it requires libvirt and cgroups
>> >>(which is default for RHEL6 systems).
>> >>
>> >Interesting, and I found that "sched/numa: Introduce
>> >sys_numa_{t,m}bind()" committed by Peter and Ingo may help.
>> >But I think from the guest view, it can not tell whether the two vcpus
>> >are on the same host node. For example,
>> >vcpu-a in node-A is not vcpu-b in node-B, the guest lb will be more
>> >expensive if it pull_task from vcpu-a and
>> >choose vcpu-b to push. And my idea is to export such info to guest,
>> >still working on it.
>>
>> The long term solution is to two-fold:
>> 1) Guests that are quite large (in that they cannot fit in a host
>> NUMA node) must have static mulit-node NUMA topology implemented by
>> Qemu. That is here today, but we do not do it automatically, which
>> is probably going to be a VM management responsibility.
>> 2) Host scheduler and NUMA code must be enhanced to get better
>> placement of Qemu memory and threads. For single-node vNUMA guests,
>> this is easy, put it all in one node. For mulit-node vNUMA guests,
>> the host must understand that some Qemu memory belongs with certain
>> vCPU threads (which make up one of the guests vNUMA nodes), and then
>> place that memory/threads in a specific host node (and continue for
>> other memory/threads for each Qemu vNUMA node).
>
> And for IO, we need multiqueue devices such that each
> node can have its own queue in its local memory.
>
Yes, my patches include such solution. Independent device sub logic
units are seated in different NUMA node, "subdev" in the patches
stands for the logic unit. And each of they are backed by a
vhost-thread. On the other hand, for virtio-guest, the vqs(including
vrings) are allocated align at the PAGE_SIZE, so their NUMA problem
will be resolved automatically by KVM(maybe a little more effort
needed here).
I had thought to export the real host NUMA info to virtio layer (not
scheduler,that is another topic). So we can create the exact num of
logic unit as needed.
And we even can increase/decrease the logic unit.
But what hesitate me to move on is that is it acceptable to create
independent vhost-thread for each node as the user's demand?
And the scalability is perVM *demand_node_num. Object?
Thanks,
pingfan
> --
> MST
^ permalink raw reply
* Re: [PATCH 05/17] netfilter: add namespace support for l4proto_tcp
From: Pablo Neira Ayuso @ 2012-05-25 3:00 UTC (permalink / raw)
To: Gao feng; +Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano
In-Reply-To: <1336985547-31960-6-git-send-email-gaofeng@cn.fujitsu.com>
Hi Gao,
While having a look at this again, I have two new requests:
On Mon, May 14, 2012 at 04:52:15PM +0800, Gao feng wrote:
[...]
> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> index 4dfbfa8..dd19350 100644
> --- a/net/netfilter/nf_conntrack_proto_tcp.c
> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
[...]
> @@ -1549,10 +1532,80 @@ static struct ctl_table tcp_compat_sysctl_table[] = {
> #endif /* CONFIG_NF_CONNTRACK_PROC_COMPAT */
> #endif /* CONFIG_SYSCTL */
>
> +static int tcp_init_net(struct net *net, u_int8_t compat)
> +{
> + int i;
> + struct nf_tcp_net *tn = tcp_pernet(net);
> + struct nf_proto_net *pn = (struct nf_proto_net *)tn;
> +#ifdef CONFIG_SYSCTL
> +#ifdef CONFIG_NF_CONNTRACK_PROC_COMPAT
> + if (compat) {
> + pn->ctl_compat_table = kmemdup(tcp_compat_sysctl_table,
> + sizeof(tcp_compat_sysctl_table),
> + GFP_KERNEL);
> + if (!pn->ctl_compat_table)
> + return -ENOMEM;
> +
> + pn->ctl_compat_table[0].data = &tn->timeouts[TCP_CONNTRACK_SYN_SENT];
> + pn->ctl_compat_table[1].data = &tn->timeouts[TCP_CONNTRACK_SYN_SENT2];
> + pn->ctl_compat_table[2].data = &tn->timeouts[TCP_CONNTRACK_SYN_RECV];
> + pn->ctl_compat_table[3].data = &tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> + pn->ctl_compat_table[4].data = &tn->timeouts[TCP_CONNTRACK_FIN_WAIT];
> + pn->ctl_compat_table[5].data = &tn->timeouts[TCP_CONNTRACK_CLOSE_WAIT];
> + pn->ctl_compat_table[6].data = &tn->timeouts[TCP_CONNTRACK_LAST_ACK];
> + pn->ctl_compat_table[7].data = &tn->timeouts[TCP_CONNTRACK_TIME_WAIT];
> + pn->ctl_compat_table[8].data = &tn->timeouts[TCP_CONNTRACK_CLOSE];
> + pn->ctl_compat_table[9].data = &tn->timeouts[TCP_CONNTRACK_RETRANS];
> + pn->ctl_compat_table[10].data = &tn->tcp_loose;
> + pn->ctl_compat_table[11].data = &tn->tcp_be_liberal;
> + pn->ctl_compat_table[12].data = &tn->tcp_max_retrans;
You can make a generic function to set the ctl_data that you can
reuse for this code above and the one below.
> + }
> +#endif
> + if (!pn->ctl_table) {
> +#else
> + if (!pn->user++) {
> +#endif
> + for (i = 0; i < TCP_CONNTRACK_TIMEOUT_MAX; i++)
> + tn->timeouts[i] = tcp_timeouts[i];
> + tn->tcp_loose = nf_ct_tcp_loose;
> + tn->tcp_be_liberal = nf_ct_tcp_be_liberal;
> + tn->tcp_max_retrans = nf_ct_tcp_max_retrans;
> +#ifdef CONFIG_SYSCTL
> + pn->ctl_table = kmemdup(tcp_sysctl_table,
> + sizeof(tcp_sysctl_table),
> + GFP_KERNEL);
> + if (!pn->ctl_table) {
> +#ifdef CONFIG_NF_CONNTRACK_PROC_COMPAT
> + if (compat) {
> + kfree(pn->ctl_compat_table);
> + pn->ctl_compat_table = NULL;
> + }
> +#endif
> + return -ENOMEM;
> + }
> + pn->ctl_table[0].data = &tn->timeouts[TCP_CONNTRACK_SYN_SENT];
> + pn->ctl_table[1].data = &tn->timeouts[TCP_CONNTRACK_SYN_RECV];
> + pn->ctl_table[2].data = &tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> + pn->ctl_table[3].data = &tn->timeouts[TCP_CONNTRACK_FIN_WAIT];
> + pn->ctl_table[4].data = &tn->timeouts[TCP_CONNTRACK_CLOSE_WAIT];
> + pn->ctl_table[5].data = &tn->timeouts[TCP_CONNTRACK_LAST_ACK];
> + pn->ctl_table[6].data = &tn->timeouts[TCP_CONNTRACK_TIME_WAIT];
> + pn->ctl_table[7].data = &tn->timeouts[TCP_CONNTRACK_CLOSE];
> + pn->ctl_table[8].data = &tn->timeouts[TCP_CONNTRACK_RETRANS];
> + pn->ctl_table[9].data = &tn->timeouts[TCP_CONNTRACK_UNACK];
> + pn->ctl_table[10].data = &tn->tcp_loose;
> + pn->ctl_table[11].data = &tn->tcp_be_liberal;
> + pn->ctl_table[12].data = &tn->tcp_max_retrans;
> +#endif
I have bad experience with code that has lots of #ifdef's.
Please, split all *_init_net into smaller functions.
^ permalink raw reply
* Re: [PATCH 01/17] netfilter: add struct nf_proto_net for register l4proto sysctl
From: Pablo Neira Ayuso @ 2012-05-25 2:54 UTC (permalink / raw)
To: Gao feng
Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano,
Gao feng
In-Reply-To: <4FBEDADE.8040905@cn.fujitsu.com>
On Fri, May 25, 2012 at 09:05:34AM +0800, Gao feng wrote:
> 于 2012年05月24日 22:38, Pablo Neira Ayuso 写道:
> > On Thu, May 24, 2012 at 06:54:42PM +0800, Gao feng wrote:
> > [...]
> >>>>> I don't see why we need this new field.
> >>>>>
> >>>>> It seems to be set to 1 in each structure that has set:
> >>>>>
> >>>>> .ctl_compat_table
> >>>>>
> >>>>> to non-NULL. So, it's redundant.
> >>>>>
> >>>>> Moreover, you already know from the protocol tracker itself if you
> >>>>> have to allocate the compat ctl table or not.
> >>>>>
> >>>>> In other words: You set compat to 1 for nf_conntrack_l4proto_generic.
> >>>>> Then, you pass that compat value to generic_init_net via ->inet_net
> >>>>> again, but this information (that determines if the compat has to be
> >>>>> done or not) is already in the scope of the protocol tracker.
> >>>>>
> >>>>
> >>>> because some protocols such l4proto_tcp6 and l4proto_tcp use the same init_net
> >>>> function. the l4proto_tcp6 doesn't need compat sysctl, so we should use this new
> >>>> field to identify if we should kmemdup compat_sysctl_table.
> >>>
> >>> Then, could you use two init_net functions? one for TCP for IPv4 and another
> >>> for TCP for IPv6?
> >>
> >> Of cause, if you prefer to impletment it in this way.
> >
> > If this removes the .compat field that you added, then use two
> > init_net functions, yes.
>
> Sorry I miss something.
>
> nf_ct_l4proto_unregister_sysctl also uses .compat to identify if we
> can unregister the compat sysctl.
>
> if we register l4proto_tcp and l4proto_tcp6 both. without .compat,
> when unregister l4proto_tcp6, the compat sysctl will be unregister too.
>
> So maybe we have to use .compat.
Could you resolve this by checking pn->ctl_compat_header != NULL ?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] mac80211: Use correct originator sequence number in a Path Reply
From: Julian Calaby @ 2012-05-25 2:24 UTC (permalink / raw)
To: Qasim Javed; +Cc: linux-wireless, devel, netdev, linux-kernel, ravip
In-Reply-To: <1337922135-27846-1-git-send-email-qasimj@gmail.com>
Hi Qasim,
On Fri, May 25, 2012 at 3:02 PM, Qasim Javed <qasimj@gmail.com> wrote:
> Hi,
>
> I have not tested the patch yet. This is more of a heads up email to let everyone.
Just so you know, the usual practise when doing this is to mark the
patch as [RFC] rather than [PATCH] when you're asking for comments /
letting people know.
Thanks,
--
Julian Calaby
Email: julian.calaby@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/
.Plan: http://sites.google.com/site/juliancalaby/
^ permalink raw reply
* Re: [PATCH 01/17] netfilter: add struct nf_proto_net for register l4proto sysctl
From: Gao feng @ 2012-05-25 1:05 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: netfilter-devel, netdev, serge.hallyn, ebiederm, dlezcano,
Gao feng
In-Reply-To: <20120524143854.GA15898@1984>
于 2012年05月24日 22:38, Pablo Neira Ayuso 写道:
> On Thu, May 24, 2012 at 06:54:42PM +0800, Gao feng wrote:
> [...]
>>>>> I don't see why we need this new field.
>>>>>
>>>>> It seems to be set to 1 in each structure that has set:
>>>>>
>>>>> .ctl_compat_table
>>>>>
>>>>> to non-NULL. So, it's redundant.
>>>>>
>>>>> Moreover, you already know from the protocol tracker itself if you
>>>>> have to allocate the compat ctl table or not.
>>>>>
>>>>> In other words: You set compat to 1 for nf_conntrack_l4proto_generic.
>>>>> Then, you pass that compat value to generic_init_net via ->inet_net
>>>>> again, but this information (that determines if the compat has to be
>>>>> done or not) is already in the scope of the protocol tracker.
>>>>>
>>>>
>>>> because some protocols such l4proto_tcp6 and l4proto_tcp use the same init_net
>>>> function. the l4proto_tcp6 doesn't need compat sysctl, so we should use this new
>>>> field to identify if we should kmemdup compat_sysctl_table.
>>>
>>> Then, could you use two init_net functions? one for TCP for IPv4 and another
>>> for TCP for IPv6?
>>
>> Of cause, if you prefer to impletment it in this way.
>
> If this removes the .compat field that you added, then use two
> init_net functions, yes.
Sorry I miss something.
nf_ct_l4proto_unregister_sysctl also uses .compat to identify if we
can unregister the compat sysctl.
if we register l4proto_tcp and l4proto_tcp6 both. without .compat,
when unregister l4proto_tcp6, the compat sysctl will be unregister too.
So maybe we have to use .compat.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH 03/21] odp-util: Add tun_key to parse_odp_key_attr()
From: Simon Horman @ 2012-05-25 0:01 UTC (permalink / raw)
To: Ben Pfaff; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20120524162911.GD26173-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
On Thu, May 24, 2012 at 09:29:11AM -0700, Ben Pfaff wrote:
> On Thu, May 24, 2012 at 06:08:56PM +0900, Simon Horman wrote:
> > Cc: Kyle Mestery <kmestery-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
>
> But I don't see him CCed?
Strange. I asked git send-mail to CC him explicitly.
> > + ovs_be32 ipv4_src;
> > + ovs_be32 ipv4_dst;
> > + unsigned long long tun_flags;
> > + int ipv4_tos;
> > + int ipv4_ttl;
> > + int n = -1;
> > +
> > + if (sscanf(s, "ipv4_tunnel(tun_id=%31[x0123456789abcdefABCDEF]"
> > + ",flags=%llx,src="IP_SCAN_FMT",dst="IP_SCAN_FMT
> > + ",tos=%i,ttl=%i)%n",
> > + tun_id_s, &tun_flags,
> > + IP_SCAN_ARGS(&ipv4_src), IP_SCAN_ARGS(&ipv4_dst),
> > + &ipv4_tos, &ipv4_ttl, &n) > 0
> > + && n > 0) {
>
> Does this compile? I don't see a declaration of tun_id_s.
>
> In the ODP printer and parser, we usually require fields that are
> hexadecimal to be written with an explicit "0x" on output (using
> something like "0x%x" or "%#x" on output), and then use "%i" on input,
> so that it is always unambiguous at a glance whether a number is
> decimal or hexadecimal. I'd appreciate it if we could maintain that
> here (I didn't look over at the printer code to see if it writes 0x,
> but I'd like it to).
>
> Otherwise, this looks good, thank you.
Sorry, perhaps this is not the latest revision, somehow.
I did have it compiling, and I'll update the patch accordingly.
^ permalink raw reply
* Re: [ovs-dev] [PATCH 04/21] vswitchd: Add iface_parse_tunnel
From: Simon Horman @ 2012-05-24 23:59 UTC (permalink / raw)
To: Ben Pfaff; +Cc: dev, netdev
In-Reply-To: <20120524164738.GE26173@nicira.com>
On Thu, May 24, 2012 at 09:47:38AM -0700, Ben Pfaff wrote:
> The concept seems OK to me here. I have only a few minor comments.
>
> On Thu, May 24, 2012 at 06:08:57PM +0900, Simon Horman wrote:
> > +#define TNL_F_CSUM (1 << 0) /* Checksum packets. */
> > +#define TNL_F_TOS_INHERIT (1 << 1) /* Inherit ToS from inner packet. */
> > +#define TNL_F_TTL_INHERIT (1 << 2) /* Inherit TTL from inner packet. */
> > +#define TNL_F_DF_INHERIT (1 << 3) /* Inherit DF bit from inner packet. */
> > +#define TNL_F_DF_DEFAULT (1 << 4) /* Set DF bit if inherit off or
> > + * not IP. */
> > +#define TNL_F_PMTUD (1 << 5) /* Enable path MTU discovery. */
> > +#define TNL_F_HDR_CACHE (1 << 6) /* Enable tunnel header caching. */
> > +#define TNL_F_IPSEC (1 << 7) /* Traffic is IPsec encrypted. */
> > +#define TNL_F_IN_KEY (1 << 8) /* Tunnel port has input key. */
> > +#define TNL_F_OUT_KEY (1 << 9) /* Tunnel port has output key. */
>
> Some of the above definitions use all spaces, others use tabs. It's
> OVS userspace code so it's better to use all spaces, I think.
Sorry about that. I have a bit of trouble remembering to switch
tabbing modes in my editor depending on if I am in user-space or the
datapath.
> > + if (is_ipsec) {
> > + char *file_name = xasprintf("%s/%s", ovs_rundir(),
> > + "ovs-monitor-ipsec.pid");
> > + pid_t pid = read_pidfile(file_name);
> > + free(file_name);
> > + if (pid < 0) {
> > + VLOG_ERR("%s: IPsec requires the ovs-monitor-ipsec daemon",
> > + iface_cfg->name);
> > + goto err;
> > + }
>
> I just noticed that we re-read this pidfile every time we parse an
> IPsec tunnel. I guess that would be a big waste of time if we have a
> lot of IPsec tunnels. I'll make a note to consider fixing this
> separately (it's not your problem).
I guess that it should be easy enough to set a flag if any of the parsed
configurations use ipsec and perform the pid check if so.
As it is, I wouldn't be at all surprised if my series breaks ipsec as
I haven't tested it (with or without my changes).
^ permalink raw reply
* Re: [PATCH IPROUTE2] tc-codel: Update usage text
From: Stephen Hemminger @ 2012-05-24 22:02 UTC (permalink / raw)
To: Vijay Subramanian; +Cc: netdev, Eric Dumazet, Dave Taht
In-Reply-To: <1337885287-31354-1-git-send-email-subramanian.vijay@gmail.com>
On Thu, 24 May 2012 11:48:07 -0700
Vijay Subramanian <subramanian.vijay@gmail.com> wrote:
> codel can take 'noecn' as an option. This also makes it consistent with the
> manpage.
>
> Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
> ---
> tc/q_codel.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/tc/q_codel.c b/tc/q_codel.c
> index 826285a..dc4b3f6 100644
> --- a/tc/q_codel.c
> +++ b/tc/q_codel.c
> @@ -54,7 +54,7 @@
> static void explain(void)
> {
> fprintf(stderr, "Usage: ... codel [ limit PACKETS ] [ target TIME]\n");
> - fprintf(stderr, " [ interval TIME ] [ ecn ]\n");
> + fprintf(stderr, " [ interval TIME ] [ ecn | noecn ]\n");
> }
>
> static int codel_parse_opt(struct qdisc_util *qu, int argc, char **argv,
Applied, thanks.
^ permalink raw reply
* [PATCH v4] xfrm: take net hdr len into account for esp payload size calculation
From: Benjamin Poirier @ 2012-05-24 21:32 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy, linux-kernel,
Steffen Klassert, Diego Beltrami
In-Reply-To: <20120517.200509.2290282427866555176.davem@davemloft.net>
Corrects the function that determines the esp payload size. The calculations
done in esp{4,6}_get_mtu() lead to overlength frames in transport mode for
certain mtu values and suboptimal frames for others.
According to what is done, mainly in esp{,6}_output() and tcp_mtu_to_mss(),
net_header_len must be taken into account before doing the alignment
calculation.
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
---
Changes since v3:
* also fix ipv6
Changes since v2:
* rename l3_adj to net_adj
* fix indentation
Changes since v1:
* introduce l3_adj to preserve the same returned value as before for tunnel
mode
For example:
* on ipv4 with md5 AH and 3des ESP (transport mode):
mtu = 1499 leads to FRAGFAILS
mtu = 1500 the addition of padding in the esp header could be avoided
* on ipv6 with md5 AH and twofish-sha1 ESP (transport mode):
mtu = 1491 leads to Ip6FragFails
mtu = 1499 padding can be avoided
For details on how the formula is established, see
https://lkml.org/lkml/2012/5/10/597
Tested with
* transport mode E
* transport mode EA
* transport mode E + ah
* tunnel mode E
Not tested with BEET, but it should be the same as transport mode
draft-nikander-esp-beet-mode-03.txt Section 5.2:
"The wire packet format is identical to the ESP transport mode"
---
net/ipv4/esp4.c | 24 +++++++++---------------
net/ipv6/esp6.c | 18 +++++++-----------
2 files changed, 16 insertions(+), 26 deletions(-)
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 89a47b3..cb982a6 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -459,28 +459,22 @@ static u32 esp4_get_mtu(struct xfrm_state *x, int mtu)
struct esp_data *esp = x->data;
u32 blksize = ALIGN(crypto_aead_blocksize(esp->aead), 4);
u32 align = max_t(u32, blksize, esp->padlen);
- u32 rem;
-
- mtu -= x->props.header_len + crypto_aead_authsize(esp->aead);
- rem = mtu & (align - 1);
- mtu &= ~(align - 1);
+ unsigned int net_adj;
switch (x->props.mode) {
- case XFRM_MODE_TUNNEL:
- break;
- default:
case XFRM_MODE_TRANSPORT:
- /* The worst case */
- mtu -= blksize - 4;
- mtu += min_t(u32, blksize - 4, rem);
- break;
case XFRM_MODE_BEET:
- /* The worst case. */
- mtu += min_t(u32, IPV4_BEET_PHMAXLEN, rem);
+ net_adj = sizeof(struct iphdr);
break;
+ case XFRM_MODE_TUNNEL:
+ net_adj = 0;
+ break;
+ default:
+ BUG();
}
- return mtu - 2;
+ return ((mtu - x->props.header_len - crypto_aead_authsize(esp->aead) -
+ net_adj) & ~(align - 1)) + (net_adj - 2);
}
static void esp4_err(struct sk_buff *skb, u32 info)
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 1e62b75..db1521f 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -413,19 +413,15 @@ static u32 esp6_get_mtu(struct xfrm_state *x, int mtu)
struct esp_data *esp = x->data;
u32 blksize = ALIGN(crypto_aead_blocksize(esp->aead), 4);
u32 align = max_t(u32, blksize, esp->padlen);
- u32 rem;
+ unsigned int net_adj;
- mtu -= x->props.header_len + crypto_aead_authsize(esp->aead);
- rem = mtu & (align - 1);
- mtu &= ~(align - 1);
-
- if (x->props.mode != XFRM_MODE_TUNNEL) {
- u32 padsize = ((blksize - 1) & 7) + 1;
- mtu -= blksize - padsize;
- mtu += min_t(u32, blksize - padsize, rem);
- }
+ if (x->props.mode != XFRM_MODE_TUNNEL)
+ net_adj = sizeof(struct ipv6hdr);
+ else
+ net_adj = 0;
- return mtu - 2;
+ return ((mtu - x->props.header_len - crypto_aead_authsize(esp->aead) -
+ net_adj) & ~(align - 1)) + (net_adj - 2);
}
static void esp6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
--
1.7.7
^ permalink raw reply related
* Re: [PATCH net-next 0/2] qlcnic: Bug fixes
From: David Miller @ 2012-05-24 20:28 UTC (permalink / raw)
To: joe; +Cc: anirban.chakraborty, netdev, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1337891078.5070.36.camel@joe2Laptop>
From: Joe Perches <joe@perches.com>
Date: Thu, 24 May 2012 13:24:38 -0700
> On Thu, 2012-05-24 at 16:06 -0400, David Miller wrote:
>> From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>> Date: Thu, 24 May 2012 14:06:54 -0400
>>
>> > Please apply to net-next.
>>
>> As I've stated at least 10 times this week, net-next is not open
>> and therefore submitting patches for net-next is not appropriate.
>
>> If people are not going to even read my announcements and
>> notifications of the states of the various GIT trees, I might as well
>> not make them at all.
>
> Perhaps setup a patchwork bot to autoreply to the
> sender only that these won't be looked at until
> after the merge window closes and train yourself
> to ignore the patchwork queue until then?
Sorry, people simply need to learn when it's appropriate to
submit patches.
Forcing them to resend at the appropriate time will train
their minds to take such things into consideration.
And if it's too bothersome to get them to resubmit, perhaps
they don't consider their patch important enough after all.
That's why we always handle situations like this by dropping things
and asking for a resend.
^ permalink raw reply
* Re: [PATCH net-next 0/2] qlcnic: Bug fixes
From: Anirban Chakraborty @ 2012-05-24 20:24 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Dept-NX Linux NIC Driver
In-Reply-To: <20120524.160659.834400122540802357.davem@davemloft.net>
On 5/24/12 1:06 PM, "David Miller" <davem@davemloft.net> wrote:
>From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
>Date: Thu, 24 May 2012 14:06:54 -0400
>
>> Please apply to net-next.
>
>As I've stated at least 10 times this week, net-next is not open
>and therefore submitting patches for net-next is not appropriate.
>
>If people are not going to even read my announcements and
>notifications of the states of the various GIT trees, I might as well
>not make them at all.
My mistake, will resend it when the window opens. Sorry for the trouble.
-Anirban
^ permalink raw reply
* Re: [PATCH net-next 0/2] qlcnic: Bug fixes
From: Joe Perches @ 2012-05-24 20:24 UTC (permalink / raw)
To: David Miller; +Cc: anirban.chakraborty, netdev, Dept_NX_Linux_NIC_Driver
In-Reply-To: <20120524.160659.834400122540802357.davem@davemloft.net>
On Thu, 2012-05-24 at 16:06 -0400, David Miller wrote:
> From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
> Date: Thu, 24 May 2012 14:06:54 -0400
>
> > Please apply to net-next.
>
> As I've stated at least 10 times this week, net-next is not open
> and therefore submitting patches for net-next is not appropriate.
> If people are not going to even read my announcements and
> notifications of the states of the various GIT trees, I might as well
> not make them at all.
Perhaps setup a patchwork bot to autoreply to the
sender only that these won't be looked at until
after the merge window closes and train yourself
to ignore the patchwork queue until then?
^ permalink raw reply
* Re: [PATCH] solos-pci: Fix DMA support
From: David Miller @ 2012-05-24 20:21 UTC (permalink / raw)
To: dwmw2; +Cc: netdev, nathan
In-Reply-To: <1337871507.26314.132.camel@shinybook.infradead.org>
From: David Woodhouse <dwmw2@infradead.org>
Date: Thu, 24 May 2012 15:58:27 +0100
> DMA support has finally made its way to the top of the TODO list, having
> realised that a Geode using MMIO can't keep up with two ADSL2+ lines
> each running at 21Mb/s.
>
> This patch fixes a couple of bugs in the DMA support in the driver, so
> once the corresponding FPGA update is complete and tested everything
> should work properly.
>
> We weren't storing the currently-transmitting skb, so we were never
> unmapping it and never freeing/popping it when the TX was done.
> And the addition of pci_set_master() is fairly self-explanatory.
>
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: qmi_wwan: Add Sierra Wireless device IDs
From: David Miller @ 2012-05-24 20:21 UTC (permalink / raw)
To: bjorn; +Cc: netdev, linux-usb
In-Reply-To: <1337851172-28549-1-git-send-email-bjorn@mork.no>
From: Bjørn Mork <bjorn@mork.no>
Date: Thu, 24 May 2012 11:19:32 +0200
> Some additional Gobi3K IDs found in the BSD/GPL licensed
> out-of-tree GobiNet driver from Sierra Wireless.
>
> Signed-off-by: Bjørn Mork <bjorn@mork.no>
Applied.
^ permalink raw reply
* Re: [PATCH] net/wanrouter: Deprecate and schedule for removal
From: David Miller @ 2012-05-24 20:21 UTC (permalink / raw)
To: joe; +Cc: shemminger, greearb, jan.ceuleers, netdev
In-Reply-To: <1337879610.5070.17.camel@joe2Laptop>
From: Joe Perches <joe@perches.com>
Date: Thu, 24 May 2012 10:13:30 -0700
> No one uses this on current kernels anymore.
>
> Let it be known it's going to be removed eventually.
>
> Signed-off-by: Joe Perches <joe@perches.com>
Applied.
^ permalink raw reply
* Re: [PATCH] xen/netback: Calculate the number of SKB slots required correctly
From: David Miller @ 2012-05-24 20:21 UTC (permalink / raw)
To: simon.graham
Cc: Ian.Campbell, konrad.wilk, xen-devel, netdev, bhutchings,
adnan.misherfi
In-Reply-To: <1337876767-16041-1-git-send-email-simon.graham@citrix.com>
From: Simon Graham <simon.graham@citrix.com>
Date: Thu, 24 May 2012 12:26:07 -0400
> When calculating the number of slots required for a packet header, the code
> was reserving too many slots if the header crossed a page boundary. Since
> netbk_gop_skb copies the header to the start of the page, the count of
> slots required for the header should be based solely on the header size.
>
> This problem is easy to reproduce if a VIF is bridged to a USB 3G modem
> device as the skb->data value always starts near the end of the first page.
>
> Signed-off-by: Simon Graham <simon.graham@citrix.com>
Applied.
^ permalink raw reply
* Re: [PATCH] MAINTAINERS
From: David Miller @ 2012-05-24 20:21 UTC (permalink / raw)
To: jhs, hadi; +Cc: netdev
In-Reply-To: <1337863502.3513.15.camel@mojatatu>
From: jamal <hadi@cyberus.ca>
Date: Thu, 24 May 2012 08:45:02 -0400
> After about two decades, I am giving up on cyberus.
> Nabwaga Manyanga.
>
> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Applied.
^ permalink raw reply
* Re: pch_gbe: backport fails to start sending
From: David Miller @ 2012-05-24 20:13 UTC (permalink / raw)
To: andy.cress; +Cc: netdev
In-Reply-To: <40680C535D6FE6498883F1640FACD44DEA6714@ka-exchange-1.kontronamerica.local>
From: "Andy Cress" <andy.cress@us.kontron.com>
Date: Thu, 24 May 2012 13:05:05 -0700
> I have backported the pch_gbe git head (v1.00) to kernel 2.6.32
> (RHEL6.2) and when it loads, after the open completes and the link is
> up, it fails to start sending and receiving.
Nobody here is going to help you with a vendor kernel backport,
sorry. You're on your own.
^ permalink raw reply
* Re: [PATCH 1/3] TIPC: Removing EXPERIMENTAL label
From: David Miller @ 2012-05-24 20:12 UTC (permalink / raw)
To: paul.gortmaker; +Cc: jon.maloy, netdev, tipc-discussion, allan.stephens, maloy
In-Reply-To: <20120524195816.GA6487@windriver.com>
From: Paul Gortmaker <paul.gortmaker@windriver.com>
Date: Thu, 24 May 2012 15:58:16 -0400
> But for new TIPC development features, future direction, and things like
> that -- making the right call requires intimate understanding of TIPC
> and its users, which is something that a maintainer should have but
> something I know I don't have. (A man has to know his limitations.)
>
> In this context, I'm not talking about these three trivial patches; but
> more complicated stuff that I imagine will be floated in the future.
>
> To that end, I can still review and call out issues in a crap patch when
> I see them. But I'd like to see new stuff sent to netdev, so that folks
> smarter than me have a chance to catch when a patch appears generally OK
> but is architecturally the wrong direction etc.
For maintainership, taste is more important than deep knowledge of the
specific technology. Worst case you ask the submitter to explain the
background of their change more thoroughly and that information is an
absolutely requirement in the commit message and code comments
anyways.
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
^ permalink raw reply
* Re: [PATCH net-next 0/2] qlcnic: Bug fixes
From: David Miller @ 2012-05-24 20:06 UTC (permalink / raw)
To: anirban.chakraborty; +Cc: netdev, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1337882816-2097-1-git-send-email-anirban.chakraborty@qlogic.com>
From: Anirban Chakraborty <anirban.chakraborty@qlogic.com>
Date: Thu, 24 May 2012 14:06:54 -0400
> Please apply to net-next.
As I've stated at least 10 times this week, net-next is not open
and therefore submitting patches for net-next is not appropriate.
If people are not going to even read my announcements and
notifications of the states of the various GIT trees, I might as well
not make them at all.
^ permalink raw reply
* pch_gbe: backport fails to start sending
From: Andy Cress @ 2012-05-24 20:05 UTC (permalink / raw)
To: netdev
Folks,
I now have a different case where the pch_gbe driver needs help.
I have backported the pch_gbe git head (v1.00) to kernel 2.6.32
(RHEL6.2) and when it loads, after the open completes and the link is
up, it fails to start sending and receiving.
I also took the pch_gbe source which runs fine on 2.6.38 (Fedora 15) and
backported it, and get the same results.
The much bulkier pch_gbe 0.91-NAPI driver does run on 2.6.32 (RHEL6.2)
but is not maintained.
I also tried backporting the mii.c from the 2.6.38 kernel, but that
didn't help, got the same symptoms.
After adding -DDEBUG to the 1.00 driver, I can see that it seems to get
just DMA Complete interrupts when it should be getting Transmit
complete, etc. I'm not sure why it gets stuck there.
Any ideas/input is welcome.
Andy
https://sendfile.kontron.com/message/KJKmrf171EhsuvbyhjnXpe
Attached at this link are two files:
pch_gbe-100a.tar.gz = the backported pch_gbe 1.00 head source, includes
patches that were applied.
dmesg.tar.gz = Some dmesg output from test cases with debug:
dmesg-pch10a-kern2632-bad.txt = backported 1.00 from git head on
kernel 2.6.32, fails
dmesg-pch10-kern2632-bad.txt = backported 1.00 from Fedora 15 on
kernel 2.6.32, fails
dmesg-pch10-kern2638-good.txt = same 1.00 source from Fedora 15 on
kernel 2.6.38, works
^ permalink raw reply
* Re: [PATCH 1/3] TIPC: Removing EXPERIMENTAL label
From: Paul Gortmaker @ 2012-05-24 19:58 UTC (permalink / raw)
To: David Miller; +Cc: jon.maloy, netdev, tipc-discussion, allan.stephens, maloy
In-Reply-To: <20120521.023926.548567931208958037.davem@davemloft.net>
[Re: [PATCH 1/3] TIPC: Removing EXPERIMENTAL label] On 21/05/2012 (Mon 02:39) David Miller wrote:
> From: Jon Maloy <jon.maloy@ericsson.com>
> Date: Mon, 21 May 2012 01:59:12 -0400
>
> > With the latest series of patches from Paul Gortmaker and Allan
> > Stephens TIPC is now functionally mature and stable enough to
> > justify removal of the EXPERIMENTAL label.
> >
> > Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
>
> I'll let Paul Gortmaker decide whether this is warranted or
> not.
The EXPERIMENTAL thing has always been rather subjective, but
I'd like to see some level of confidence that a crafted up bogus
TIPC message can't be used to DOS a machine with active TIPC
connections before removing EXPERIMENTAL. Maybe the current code
is OK as-is in this respect but I'd feel better knowing that it
had been audited with this exact kind of thing in mind.
>
> I don't really want to all of a sudden start seeing patches from
> people like you and the windriver folks, who effectively wrote off
> upstream and left poor Paul Gortmaker holding the bag and having to
> take care of EVERYTHING.
To be fair, I should note that Al did a lot of work in the background
getting commits onto a modern baseline and answering all my questions
since the out of tree sourceforge mess was highlighted here on netdev.
>
> You can't just do nothing for years, end up making someone else
> do it, then say "Hey here I am, I feel like submitting upstream
> patches now" after I've spent this entire time starting to trust
> Paul for TIPC patches.
I've been thinking about this off and on, and I'm wondering what to
suggest going forward. Dealing with the backlog was largely going over
maintenance and bugfix type patches and sanitizing them for integration
upstream. It largely boiled down to being able to tell a crap patch
from a good one that matched upstream expectations. I figured I could
manage to not screw that up too badly, hence why I volunteered to assist
with the backlog.
But for new TIPC development features, future direction, and things like
that -- making the right call requires intimate understanding of TIPC
and its users, which is something that a maintainer should have but
something I know I don't have. (A man has to know his limitations.)
In this context, I'm not talking about these three trivial patches; but
more complicated stuff that I imagine will be floated in the future.
To that end, I can still review and call out issues in a crap patch when
I see them. But I'd like to see new stuff sent to netdev, so that folks
smarter than me have a chance to catch when a patch appears generally OK
but is architecturally the wrong direction etc.
Paul.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
^ permalink raw reply
* Re: [PATCH v6 0/3] netdev/of/phy: MDIO bus multiplexer support.
From: David Daney @ 2012-05-24 19:19 UTC (permalink / raw)
To: Timur Tabi
Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org,
linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
In-Reply-To: <4FBE8605.2020507-KZfg59tc24xl57MIdRCFDg@public.gmane.org>
On 05/24/2012 12:03 PM, Timur Tabi wrote:
> David Daney wrote:
>
>> Well, the MDIO bus must have an associated device tree node.
>>
>> For my OCTEON code, the MDIO bus device is created as a result of the
>> call to of_platform_bus_probe(), which takes care of filling in all the
>> device tree nodes of the devices it finds and creates.
>
> Ok, let me give you some background. We actually already have MDIO muxing
> code in-house, but it's different from yours. So now I'm rewriting it to
> use your design instead.
>
> So our current code looks for "virtual MDIO nodes", and we call
> mdiobus_alloc() and then of_mdiobus_register(). I think this is what I'm
> missing now.
>
> I just don't know what to do next.
You will have to debug it and find out why the device match is failing,
then fix it.
David Daney
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox