* Re: [PATCH] SUNRPC: Allow one callback request to be received from two sk_buff
From: Trond Myklebust @ 2014-01-21 15:35 UTC (permalink / raw)
To: shaobingqing
Cc: Dr Fields James Bruce, Miller David S., linuxnfs,
netdev-u79uwXL29TY76Z2rM5mHXA, Linux Kernel Mailing List
In-Reply-To: <CALrKORrZ3Kcuqc1RajQKkZcot0yiswh4VR_WuXHqfRTjn9oGQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Jan 21, 2014, at 3:08, shaobingqing <shaobingqing-Gb3srWounXyPt1CcHtbs0g@public.gmane.org> wrote:
> 2014/1/21 Trond Myklebust <trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>:
>> On Mon, 2014-01-20 at 14:59 +0800, shaobingqing wrote:
>>> In current code, there only one struct rpc_rqst is prealloced. If one
>>> callback request is received from two sk_buff, the xprt_alloc_bc_request
>>> would be execute two times with the same transport->xid. The first time
>>> xprt_alloc_bc_request will alloc one struct rpc_rqst and the TCP_RCV_COPY_DATA
>>> bit of transport->tcp_flags will not be cleared. The second time
>>> xprt_alloc_bc_request could not alloc struct rpc_rqst any more and NULL
>>> pointer will be returned, then xprt_force_disconnect occur. I think one
>>> callback request can be allowed to be received from two sk_buff.
>>>
>>> Signed-off-by: shaobingqing <shaobingqing-Gb3srWounXyPt1CcHtbs0g@public.gmane.org>
>>> ---
>>> net/sunrpc/xprtsock.c | 11 +++++++++--
>>> 1 files changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>>> index ee03d35..606950d 100644
>>> --- a/net/sunrpc/xprtsock.c
>>> +++ b/net/sunrpc/xprtsock.c
>>> @@ -1271,8 +1271,13 @@ static inline int xs_tcp_read_callback(struct rpc_xprt *xprt,
>>> struct sock_xprt *transport =
>>> container_of(xprt, struct sock_xprt, xprt);
>>> struct rpc_rqst *req;
>>> + static struct rpc_rqst *req_partial;
>>> +
>>> + if (req_partial == NULL)
>>> + req = xprt_alloc_bc_request(xprt);
>>> + else if (req_partial->rq_xid == transport->tcp_xid)
>>> + req = req_partial;
>>
>> What happens here if req_partial->rq_xid != transport->tcp_xid? AFAICS,
>> req will be undefined. Either way, you cannot use a static variable for
>> storage here: that isn't re-entrant.
>
> Because metadata sever only have one slot for backchannel request,
> req_partial->rq_xid == transport->tcp_xid always happens, if the callback
> request just being splited in two sk_buffs. But req_partial->rq_xid !=
> transport->tcp_xid may also happens in some special cases, such as
> retransmission occurs?
If the server retransmits, then it is broken. The NFSv4.1 protocol does not allow it to retransmit unless the connection breaks.
> If one callback request is splited in two sk_buffs, xs_tcp_read_callback
> will be execute two times. The req_partial should be a static variable,
> because the second execution of xs_tcp_read_callback should use
> the rpc_rqst allocated for the first execution, which saves information
> copies from the first sk_buff.
No! This is a multi-threaded/process environment which can support multiple connection. It is a bug to use a static variable.
--
Trond Myklebust
Linux NFS client maintainer
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next 00/25] bonding: introduce new option API
From: Dan Williams @ 2014-01-21 15:59 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: netdev, Andy Gospodarek, Jay Vosburgh, Veaceslav Falico,
Scott Feldman, David S. Miller
In-Reply-To: <1390316114-17815-1-git-send-email-nikolay@redhat.com>
On Tue, 2014-01-21 at 15:54 +0100, Nikolay Aleksandrov wrote:
> Hi,
> This patchset's goal is to introduce a new option API which should be used
> to properly describe the bonding options with their mode dependcies and
> requirements. With this patchset applied we get centralized option
> manipulation, automatic RTNL acquire per option setting, automatic option
> range checking, mode dependcy checking and other various flags which are
> described in detail in patch 01's commit message and comments.
> Also the parameter passing is changed to use a specialized structure which
> is initialized to a value depending on the needs.
> The main exported functions are:
> __bond_opt_set() - set an option (RTNL should be acquired prior)
> bond_opt_init(val|str) - init a bond_opt_value struct for value or string
> parameter passing
> bond_opt_tryset_rtnl() - function which tries to acquire rtnl, mainly used
> for sysfs
> bond_opt_parse - used to parse or check for valid values
> bond_opt_get - retrieve a pointer to bond_option struct for some option
> bond_opt_get_val - retrieve a pointer to a bond_opt_value struct for
> some value
>
> The same functions are used to set an option via sysfs and netlink, just
> the parameter that's passed is usually initialized in a different way.
> The converted options have multiple style fixes, there're some longer
> lines but they looked either ugly or were strings/pr_warnings, if you
> think some line would be better broken just let me know :-) there're
> also a few sscanf false-positive warnings.
> I decided to keep the "unsuppmodes" way of mode dep checking since it's
> straight forward, if we make a more general way for checking dependencies
> it'll be easy to change it.
>
> Future plans for this work include:
> - Automatic sysfs generation from the bond_opts[].
> - Use of the API in bond_check_params() and thus cleaning it up (this has
> actually started, I'll take care of the rest in a separate patch)
> - Clean up all option-unrelated files of option definitions and functions
>
> I've tried to leave as much documentation as possible, if there's anything
> unclear please let me know. One more thing, I haven't moved all
> option-related functions from bonding.h to the new bond_options.h, this
> will be done in a separate patch, it's in my todo list.
>
> This patchset has been tested by setting each converted option via sysfs
> and netlink to a couple of wrong values, a couple of correct values and
> some random values, also for the opts that have flags they have been
> tested as well.
Currently userspace has to encode a lot of the same logic the kernel has
for option validation, for example when creating a user interface for
this stuff, you have to know that miimon and arp are incompatible, and
that certain options are only relevant with certain bond modes, and it's
a mess. And this also sometimes changes when new kernel options or
capabilities are added.
So, is there any good way to describe the valid value ranges or
capabilities for userspace? One idea is to send a package of bond
options to the kernel and see if they validate before actually applying
them, though this only works when actually configuring the interface.
An additional idea would be to somehow describe the available options
(eg, value ranges for numeric values, list-of-strings for bond modes,
etc) that the kernel supports before any bonds are even created (thus
probably through netlink, not sysfs).
Thoughts?
Dan
> Best regards,
> Nikolay Aleksandrov
>
> CC: Andy Gospodarek <andy@greyhouse.net>
> CC: Jay Vosburgh <fubar@us.ibm.com>
> CC: Veaceslav Falico <vfalico@redhat.com>
> CC: Scott Feldman <sfeldma@cumulusnetworks.com>
> CC: David S. Miller <davem@davemloft.net>
>
> Nikolay Aleksandrov (25):
> bonding: add infrastructure for an option API
> bonding: convert mode setting to use the new option API
> bonding: convert packets_per_slave to use the new option API
> bonding: convert xmit_hash_policy to use the new option API
> bonding: convert arp_validate to use the new option API
> bonding: convert arp_all_targets to use the new option API
> bonding: convert fail_over_mac to use the new option API
> bonding: convert arp_interval to use the new option API
> bonding: convert arp_ip_target to use the new option API
> bonding: convert downdelay to use the new option API
> bonding: convert updelay to use the new option API
> bonding: convert lacp_rate to use the new option API
> bonding: convert min_links to use the new option API
> bonding: convert ad_select to use the new option API
> bonding: convert num_peer_notif to use the new option API
> bonding: convert miimon to use the new option API
> bonding: convert primary to use the new option API
> bonding: convert primary_reselect to use the new option API
> bonding: convert use_carrier to use the new option API
> bonding: convert active_slave to use the new option API
> bonding: convert queue_id to use the new option API
> bonding: convert all_slaves_active to use the new option API
> bonding: convert resend_igmp to use the new option API
> bonding: convert lp_interval to use the new option API
> bonding: convert slaves to use the new option API
>
> drivers/net/bonding/bond_main.c | 179 +++---
> drivers/net/bonding/bond_netlink.c | 87 ++-
> drivers/net/bonding/bond_options.c | 1086 +++++++++++++++++++++++++++---------
> drivers/net/bonding/bond_options.h | 170 ++++++
> drivers/net/bonding/bond_procfs.c | 25 +-
> drivers/net/bonding/bond_sysfs.c | 519 +++--------------
> drivers/net/bonding/bonding.h | 29 +-
> 7 files changed, 1214 insertions(+), 881 deletions(-)
> create mode 100644 drivers/net/bonding/bond_options.h
>
^ permalink raw reply
* Re: [PATCH net-next 00/25] bonding: introduce new option API
From: Nikolay Aleksandrov @ 2014-01-21 16:06 UTC (permalink / raw)
To: Dan Williams
Cc: netdev, Andy Gospodarek, Jay Vosburgh, Veaceslav Falico,
Scott Feldman, David S. Miller
In-Reply-To: <1390319983.26902.4.camel@dcbw.local>
On 01/21/2014 04:59 PM, Dan Williams wrote:
> On Tue, 2014-01-21 at 15:54 +0100, Nikolay Aleksandrov wrote:
>> Hi,
>> This patchset's goal is to introduce a new option API which should be used
>> to properly describe the bonding options with their mode dependcies and
>> requirements. With this patchset applied we get centralized option
>> manipulation, automatic RTNL acquire per option setting, automatic option
>> range checking, mode dependcy checking and other various flags which are
>> described in detail in patch 01's commit message and comments.
>> Also the parameter passing is changed to use a specialized structure which
>> is initialized to a value depending on the needs.
>> The main exported functions are:
>> __bond_opt_set() - set an option (RTNL should be acquired prior)
>> bond_opt_init(val|str) - init a bond_opt_value struct for value or string
>> parameter passing
>> bond_opt_tryset_rtnl() - function which tries to acquire rtnl, mainly used
>> for sysfs
>> bond_opt_parse - used to parse or check for valid values
>> bond_opt_get - retrieve a pointer to bond_option struct for some option
>> bond_opt_get_val - retrieve a pointer to a bond_opt_value struct for
>> some value
>>
>> The same functions are used to set an option via sysfs and netlink, just
>> the parameter that's passed is usually initialized in a different way.
>> The converted options have multiple style fixes, there're some longer
>> lines but they looked either ugly or were strings/pr_warnings, if you
>> think some line would be better broken just let me know :-) there're
>> also a few sscanf false-positive warnings.
>> I decided to keep the "unsuppmodes" way of mode dep checking since it's
>> straight forward, if we make a more general way for checking dependencies
>> it'll be easy to change it.
>>
>> Future plans for this work include:
>> - Automatic sysfs generation from the bond_opts[].
>> - Use of the API in bond_check_params() and thus cleaning it up (this has
>> actually started, I'll take care of the rest in a separate patch)
>> - Clean up all option-unrelated files of option definitions and functions
>>
>> I've tried to leave as much documentation as possible, if there's anything
>> unclear please let me know. One more thing, I haven't moved all
>> option-related functions from bonding.h to the new bond_options.h, this
>> will be done in a separate patch, it's in my todo list.
>>
>> This patchset has been tested by setting each converted option via sysfs
>> and netlink to a couple of wrong values, a couple of correct values and
>> some random values, also for the opts that have flags they have been
>> tested as well.
>
> Currently userspace has to encode a lot of the same logic the kernel has
> for option validation, for example when creating a user interface for
> this stuff, you have to know that miimon and arp are incompatible, and
> that certain options are only relevant with certain bond modes, and it's
> a mess. And this also sometimes changes when new kernel options or
> capabilities are added.
>
> So, is there any good way to describe the valid value ranges or
> capabilities for userspace? One idea is to send a package of bond
> options to the kernel and see if they validate before actually applying
> them, though this only works when actually configuring the interface.
> An additional idea would be to somehow describe the available options
> (eg, value ranges for numeric values, list-of-strings for bond modes,
> etc) that the kernel supports before any bonds are even created (thus
> probably through netlink, not sysfs).
>
> Thoughts?
>
> Dan
>
Hi Dan,
I noticed the same thing while looking at the new ip link code. With this
change though you can pass anything to the option and it'll get validated,
so even if you pass a wrong value - not a problem you'll get the
appropriate error, so I don't see a problem for ip link to accept any
input, maybe the problem comes from the dual-personality options (accepting
both string and integers), we can always switch to the way sysfs does it
with either passing the argument as a string so __bond_opt_set would parse
it or the other option would be to export the option value tables via some
interface so ip can do its own validation.
This is a nice task to add to the todo list, just have to agree on a
solution :-)
Thanks for bringing it up.
Cheers,
Nik
>> Best regards,
>> Nikolay Aleksandrov
>>
>> CC: Andy Gospodarek <andy@greyhouse.net>
>> CC: Jay Vosburgh <fubar@us.ibm.com>
>> CC: Veaceslav Falico <vfalico@redhat.com>
>> CC: Scott Feldman <sfeldma@cumulusnetworks.com>
>> CC: David S. Miller <davem@davemloft.net>
>>
>> Nikolay Aleksandrov (25):
>> bonding: add infrastructure for an option API
>> bonding: convert mode setting to use the new option API
>> bonding: convert packets_per_slave to use the new option API
>> bonding: convert xmit_hash_policy to use the new option API
>> bonding: convert arp_validate to use the new option API
>> bonding: convert arp_all_targets to use the new option API
>> bonding: convert fail_over_mac to use the new option API
>> bonding: convert arp_interval to use the new option API
>> bonding: convert arp_ip_target to use the new option API
>> bonding: convert downdelay to use the new option API
>> bonding: convert updelay to use the new option API
>> bonding: convert lacp_rate to use the new option API
>> bonding: convert min_links to use the new option API
>> bonding: convert ad_select to use the new option API
>> bonding: convert num_peer_notif to use the new option API
>> bonding: convert miimon to use the new option API
>> bonding: convert primary to use the new option API
>> bonding: convert primary_reselect to use the new option API
>> bonding: convert use_carrier to use the new option API
>> bonding: convert active_slave to use the new option API
>> bonding: convert queue_id to use the new option API
>> bonding: convert all_slaves_active to use the new option API
>> bonding: convert resend_igmp to use the new option API
>> bonding: convert lp_interval to use the new option API
>> bonding: convert slaves to use the new option API
>>
>> drivers/net/bonding/bond_main.c | 179 +++---
>> drivers/net/bonding/bond_netlink.c | 87 ++-
>> drivers/net/bonding/bond_options.c | 1086 +++++++++++++++++++++++++++---------
>> drivers/net/bonding/bond_options.h | 170 ++++++
>> drivers/net/bonding/bond_procfs.c | 25 +-
>> drivers/net/bonding/bond_sysfs.c | 519 +++--------------
>> drivers/net/bonding/bonding.h | 29 +-
>> 7 files changed, 1214 insertions(+), 881 deletions(-)
>> create mode 100644 drivers/net/bonding/bond_options.h
>>
>
>
^ permalink raw reply
* [PATCH net-next v2] ipv6: enable anycast addresses as source addresses for datagrams
From: Francois-Xavier Le Bail @ 2014-01-21 16:01 UTC (permalink / raw)
To: netdev; +Cc: Hannes Frederic Sowa, David Stevens, David Miller
This change allows to consider an anycast address valid as source address
when given via an IPV6_PKTINFO or IPV6_2292PKTINFO ancillary data item.
So, when sending a datagram with ancillary data, the unicast and anycast
addresses are handled in the same way.
- Adds ipv6_chk_acast_addr_src() to check if an anycast address is link-local
on given interface or is global.
- Uses it in ip6_datagram_send_ctl().
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
---
v2: Consideration of Hannes's review:
- Same behaviour for anycast and unicast.
- No need for a new socket option.
Typical usage :
A server uses IPV6_RECVPKTINFO socket option to get ancillary data with
recvmsg() and can use sendmsg() to reply with anycast adress as source address
in the same way it does for unicast.
include/net/addrconf.h | 5 +++--
net/ipv6/anycast.c | 11 +++++++++++
net/ipv6/datagram.c | 4 +++-
3 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 66c4a44..50e39a8 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -205,8 +205,9 @@ void ipv6_sock_ac_close(struct sock *sk);
int ipv6_dev_ac_inc(struct net_device *dev, const struct in6_addr *addr);
int __ipv6_dev_ac_dec(struct inet6_dev *idev, const struct in6_addr *addr);
bool ipv6_chk_acast_addr(struct net *net, struct net_device *dev,
- const struct in6_addr *addr);
-
+ const struct in6_addr *addr);
+bool ipv6_chk_acast_addr_src(struct net *net, struct net_device *dev,
+ const struct in6_addr *addr);
/* Device notifier */
int register_inet6addr_notifier(struct notifier_block *nb);
diff --git a/net/ipv6/anycast.c b/net/ipv6/anycast.c
index 5a80f15..d3a6e2d 100644
--- a/net/ipv6/anycast.c
+++ b/net/ipv6/anycast.c
@@ -383,6 +383,17 @@ bool ipv6_chk_acast_addr(struct net *net, struct net_device *dev,
return found;
}
+/* check if this anycast address is link-local on given interface or
+ * is global
+ */
+bool ipv6_chk_acast_addr_src(struct net *net, struct net_device *dev,
+ const struct in6_addr *addr)
+{
+ if (ipv6_addr_type(addr) & IPV6_ADDR_LINKLOCAL)
+ return ipv6_chk_acast_dev(dev, addr);
+ else
+ return ipv6_chk_acast_addr(net, NULL, addr);
+}
#ifdef CONFIG_PROC_FS
struct ac6_iter_state {
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index cd8699b..2615197 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -689,7 +689,9 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
int strict = __ipv6_addr_src_scope(addr_type) <= IPV6_ADDR_SCOPE_LINKLOCAL;
if (!(inet_sk(sk)->freebind || inet_sk(sk)->transparent) &&
!ipv6_chk_addr(net, &src_info->ipi6_addr,
- strict ? dev : NULL, 0))
+ strict ? dev : NULL, 0) &&
+ !ipv6_chk_acast_addr_src(net, dev,
+ &src_info->ipi6_addr))
err = -EINVAL;
else
fl6->saddr = src_info->ipi6_addr;
^ permalink raw reply related
* Re: issues with vxlan RX checksum offload under OVS datapath
From: Pravin Shelar @ 2014-01-21 17:30 UTC (permalink / raw)
To: Or Gerlitz; +Cc: Joseph Gasparakis, netdev
In-Reply-To: <52DC4C1B.5000309@mellanox.com>
On Sun, Jan 19, 2014 at 2:05 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> Hi Pravin,
>
> While testing the gro udp patches over a setup with openvswitch I noted that
> the RX checksum offload support introduced by Joseph's commit 0afb166
> "vxlan: Add capability of Rx checksum offload for inner packet" works fine
> when you use a setup made of
>
> NIC --> IP stack --> vxlan device --> bridge --> tap
>
> but not when its
>
> NIC --> IP stack --> ovs vxlan port --> OVS DP --> tap
>
> I narrowed it down to the fact the when going the OVS pathskb->encapsulation
> remains true also after the decap is done. Basically, this is the original
> hunk
>
>
>> @@ -607,7 +607,17 @@ static int vxlan_udp_encap_recv(struct sock *sk,
>> struct sk_buff *skb)
>>
>> __skb_tunnel_rx(skb, vxlan->dev);
>> skb_reset_network_header(skb);
>> - skb->ip_summed = CHECKSUM_NONE;
>> +
>> + /* If the NIC driver gave us an encapsulated packet with
>> + * CHECKSUM_UNNECESSARY and Rx checksum feature is enabled,
>> + * leave the CHECKSUM_UNNECESSARY, the device checksummed it
>> + * for us. Otherwise force the upper layers to verify it.
>> + */
>> + if (skb->ip_summed != CHECKSUM_UNNECESSARY || !skb->encapsulation
>> ||
>> + !(vxlan->dev->features & NETIF_F_RXCSUM))
>> + skb->ip_summed = CHECKSUM_NONE;
>> +
>> + skb->encapsulation = 0;
>
>
> which was moved by your commits
>
> 5cfccc5 vxlan: Add vxlan recv demux.
> 7ce0475 vxlan: Restructure vxlan receive.
>
> to a code which isn't shared by the vxlan driver and ovs vxlan datapath
> code, and I was thinking lets just move it back to vxlan_udp_encap_recv.
> However, we can't really make the "vxlan->dev->features & NETIF_F_RXCSUM"
> check - since there's no vxlan device for ovs datapath, any idea how to
> overcome this.
>
> Moving this to shared code (while removing the check for
> vxlan->dev->features) made things to work on my setup, but this misses one
> of the original conditions, ideas?
>
I kept csum check in vxlan-device recv path for same reason. As of now
there is no efficient way to get ovs-dev features.
May be we can cache device features in struct datapath from datapath-netdev.
> I believe its too late for 3.13 unless there is going to be -rc9, but lets
> first see what's the right fix.
>
> Or.
^ permalink raw reply
* Re: issues with vxlan RX checksum offload under OVS datapath
From: Jesse Gross @ 2014-01-21 17:37 UTC (permalink / raw)
To: Pravin Shelar; +Cc: Or Gerlitz, Joseph Gasparakis, netdev
In-Reply-To: <CALnjE+rUTejbU_d39G7WSn=k0aY2xELCE3UxjDr_Zdq3iwHbNA@mail.gmail.com>
On Tue, Jan 21, 2014 at 9:30 AM, Pravin Shelar <pshelar@nicira.com> wrote:
> On Sun, Jan 19, 2014 at 2:05 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
>> Hi Pravin,
>>
>> While testing the gro udp patches over a setup with openvswitch I noted that
>> the RX checksum offload support introduced by Joseph's commit 0afb166
>> "vxlan: Add capability of Rx checksum offload for inner packet" works fine
>> when you use a setup made of
>>
>> NIC --> IP stack --> vxlan device --> bridge --> tap
>>
>> but not when its
>>
>> NIC --> IP stack --> ovs vxlan port --> OVS DP --> tap
>>
>> I narrowed it down to the fact the when going the OVS pathskb->encapsulation
>> remains true also after the decap is done. Basically, this is the original
>> hunk
>>
>>
>>> @@ -607,7 +607,17 @@ static int vxlan_udp_encap_recv(struct sock *sk,
>>> struct sk_buff *skb)
>>>
>>> __skb_tunnel_rx(skb, vxlan->dev);
>>> skb_reset_network_header(skb);
>>> - skb->ip_summed = CHECKSUM_NONE;
>>> +
>>> + /* If the NIC driver gave us an encapsulated packet with
>>> + * CHECKSUM_UNNECESSARY and Rx checksum feature is enabled,
>>> + * leave the CHECKSUM_UNNECESSARY, the device checksummed it
>>> + * for us. Otherwise force the upper layers to verify it.
>>> + */
>>> + if (skb->ip_summed != CHECKSUM_UNNECESSARY || !skb->encapsulation
>>> ||
>>> + !(vxlan->dev->features & NETIF_F_RXCSUM))
>>> + skb->ip_summed = CHECKSUM_NONE;
>>> +
>>> + skb->encapsulation = 0;
>>
>>
>> which was moved by your commits
>>
>> 5cfccc5 vxlan: Add vxlan recv demux.
>> 7ce0475 vxlan: Restructure vxlan receive.
>>
>> to a code which isn't shared by the vxlan driver and ovs vxlan datapath
>> code, and I was thinking lets just move it back to vxlan_udp_encap_recv.
>> However, we can't really make the "vxlan->dev->features & NETIF_F_RXCSUM"
>> check - since there's no vxlan device for ovs datapath, any idea how to
>> overcome this.
>>
>> Moving this to shared code (while removing the check for
>> vxlan->dev->features) made things to work on my setup, but this misses one
>> of the original conditions, ideas?
>>
> I kept csum check in vxlan-device recv path for same reason. As of now
> there is no efficient way to get ovs-dev features.
> May be we can cache device features in struct datapath from datapath-netdev.\
I don't think that getting the features from the internal device makes
sense - a packet could be destined to a VM instead.
I think that controlling RX checksum on the device that physically
received it should be enough to be able to check for broken devices.
It's not clear what the benefit to turning it off at the VXLAN layer
is - if you really don't want it for some reason, you can always throw
away the checksum later.
^ permalink raw reply
* Re: FEC performance degradation with certain packet sizes
From: Marek Vasut @ 2014-01-21 17:49 UTC (permalink / raw)
To: fugang.duan@freescale.com
Cc: Hector Palacios, netdev@vger.kernel.org,
Fabio.Estevam@freescale.com, shawn.guo@linaro.org,
l.stach@pengutronix.de, Frank.Li@freescale.com,
bhutchings@solarflare.com, davem@davemloft.net
In-Reply-To: <45804b55ce704164bc054d4275801153@BLUPR03MB373.namprd03.prod.outlook.com>
On Monday, December 23, 2013 at 03:52:20 AM, fugang.duan@freescale.com wrote:
> From: Hector Palacios <hector.palacios@digi.com>
> Sent: Friday, December 20, 2013 11:02 PM
>
> >To: Duan Fugang-B38611; Marek Vasut; netdev@vger.kernel.org
> >Cc: Estevam Fabio-R49496; shawn.guo@linaro.org; l.stach@pengutronix.de; Li
> >Frank-B20596; bhutchings@solarflare.com; davem@davemloft.net
> >Subject: Re: FEC performance degradation with certain packet sizes
> >
> >Dear Andy,
> >
> >On 12/20/2013 04:35 AM, fugang.duan@freescale.com wrote:
> >> [...]
> >>
> >> I can reproduce the issue on imx6q/dl platform with freescale internal
> >> kernel
> >
> >tree.
> >
> >> This issue must be related to cpufreq, when set scaling_governor to
> >
> >performance:
> >> echo performance >
> >> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
> >>
> >> And then do NPtcp test, the result as below:
> >> 24: 99 bytes 5 times --> 9.89 Mbps in 76.40 usec
> >> 25: 125 bytes 5 times --> 12.10 Mbps in 78.80 usec
> >> 26: 128 bytes 5 times --> 12.27 Mbps in 79.60 usec
> >> 27: 131 bytes 5 times --> 12.80 Mbps in 78.10 usec
> >> 28: 189 bytes 5 times --> 18.00 Mbps in 80.10 usec
> >> 29: 192 bytes 5 times --> 18.31 Mbps in 80.00 usec
> >> 30: 195 bytes 5 times --> 18.41 Mbps in 80.80 usec
> >> 31: 253 bytes 5 times --> 23.34 Mbps in 82.70 usec
> >> 32: 256 bytes 5 times --> 23.91 Mbps in 81.70 usec
> >> 33: 259 bytes 5 times --> 24.19 Mbps in 81.70 usec
> >> 34: 381 bytes 5 times --> 33.18 Mbps in 87.60 usec
> >> 35: 384 bytes 5 times --> 33.87 Mbps in 86.50 usec
> >> 36: 387 bytes 5 times --> 34.41 Mbps in 85.80 usec
> >> 37: 509 bytes 5 times --> 42.72 Mbps in 90.90 usec
> >> 38: 512 bytes 5 times --> 42.60 Mbps in 91.70 usec
> >> 39: 515 bytes 5 times --> 42.80 Mbps in 91.80 usec
> >> 40: 765 bytes 5 times --> 56.45 Mbps in 103.40 usec
> >> 41: 768 bytes 5 times --> 57.11 Mbps in 102.60 usec
> >> 42: 771 bytes 5 times --> 57.22 Mbps in 102.80 usec
> >> 43: 1021 bytes 5 times --> 70.69 Mbps in 110.20 usec
> >> 44: 1024 bytes 5 times --> 70.70 Mbps in 110.50 usec
> >> 45: 1027 bytes 5 times --> 69.59 Mbps in 112.60 usec
> >> 46: 1533 bytes 5 times --> 73.56 Mbps in 159.00 usec
> >> 47: 1536 bytes 5 times --> 72.92 Mbps in 160.70 usec
> >> 48: 1539 bytes 5 times --> 73.80 Mbps in 159.10 usec
> >> 49: 2045 bytes 5 times --> 93.59 Mbps in 166.70 usec
> >> 50: 2048 bytes 5 times --> 94.07 Mbps in 166.10 usec
> >> 51: 2051 bytes 5 times --> 92.92 Mbps in 168.40 usec
> >> 52: 3069 bytes 5 times --> 123.43 Mbps in 189.70 usec
> >> 53: 3072 bytes 5 times --> 123.68 Mbps in 189.50 usec
> >
> >You are right. Unfortunately, this does not work on i.MX28 (at least for
> >me). Couldn't it be that the cpufreq is masking the problem on the i.MX6?
> >
> >Best regards,
> >--
> >Hector Palacios
>
> I reproduce the issue on imx28 evk platform, imx28 has no specific cpufreq
> driver. In kernel 3.13, the ethernet driver is almost the same for imx28
> and imx6 since they use the Same enet IP, but imx6 enet IP have some
> evolution.
>
> Now I don't know the cause. When I am free, I will dig out it.
Hi! Is there any progress on this issue ? Did you manage to find anything out
please?
Best regards,
Marek Vasut
^ permalink raw reply
* Re: [PATCH net-next 0/2] Bug Fixes for SFC driver
From: Ben Hutchings @ 2014-01-21 18:02 UTC (permalink / raw)
To: David Miller; +Cc: netdev, linux-net-drivers, Shradha Shah
In-Reply-To: <52D98860.4020402@solarflare.com>
On Fri, 2014-01-17 at 19:45 +0000, Shradha Shah wrote:
> I am taking over the upstream patch submission work for the
> sfc driver from Ben Hutchings.
>
> These patches are bug fixes to the sfc driver involving
> replacement of the PORT RESET MC command and fixing transposed
> ptp_{under,over}size_sync_window_statistics
>
> The PORT_RESET bug fix is needed for all versions supporting EF10
> i.e all versions including and after 3.12.
>
> Ben Hutchings (1):
> sfc: Fix transposed ptp_{under,over}size_sync_windows statistics
>
> Jon Cooper (1):
> sfc: Change efx_mcdi_reset_port to use ENTITY_RESET MC command.
>
> drivers/net/ethernet/sfc/ef10.c | 31 ++++++++++++++++++++++++++-----
> drivers/net/ethernet/sfc/mcdi.c | 14 +++++++++++---
> drivers/net/ethernet/sfc/ptp.c | 4 ++--
> 3 files changed, 39 insertions(+), 10 deletions(-)
David, I don't know if you're waiting for my ack on these, but in case
you are:
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Shradha will send a MAINTAINERS update in the next patch series.
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [PATCH] net: gre: don't pull skb if dealing with icmp message
From: Pravin Shelar @ 2014-01-21 18:27 UTC (permalink / raw)
To: Duan Jiong; +Cc: David Miller, Daniel Borkmann, netdev
In-Reply-To: <52DC7DE4.1050108@cn.fujitsu.com>
On Sun, Jan 19, 2014 at 5:37 PM, Duan Jiong <duanj.fnst@cn.fujitsu.com> wrote:
> 于 2014年01月20日 05:08, Pravin Shelar 写道:
>> On Sun, Jan 19, 2014 at 12:35 AM, Duan Jiong <duanj.fnst@cn.fujitsu.com> wrote:
>>>
>>> When dealing with icmp messages, the skb->data points the
>>> ip header that triggered the sending of the icmp message.
>>>
>>> In gre_cisco_err(), the parse_gre_header() is called, and the
>>> iptunnel_pull_header() is called to pull the skb at the end of
>>> the parse_gre_header(). Unfortunately, the ipgre_err still needs
>>> the skb->data points the ip header following the icmp layer,
>>> and those ip addresses in ip header will be used to look up
>>> tunnel by ip_tunnel_lookup().
>>>
>> This looks like bug.
>> Can you use ip_hdr() rather than skb->data in ipgre_err().
>> Same is done in ipgre_rcv().
>
> That's maybe not a good idea. The ip_hdr() will the return the outmost
> ip header, but we need inner ip header following the icmp layer.
>
Why do you want to use inner ip header?
If you want to really look at original packet header, you can
calculate it from outer iph header rather than changing fast path
parse_gre_header() function.
^ permalink raw reply
* Re: [PATCH] ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is called
From: Pravin Shelar @ 2014-01-21 18:33 UTC (permalink / raw)
To: Duan Jiong; +Cc: David Miller, Daniel Petre, Eric Dumazet, netdev
In-Reply-To: <52DE1F60.2060109@cn.fujitsu.com>
On Mon, Jan 20, 2014 at 11:18 PM, Duan Jiong <duanj.fnst@cn.fujitsu.com> wrote:
>
> commit a622260254ee48("ip_tunnel: fix kernel panic with icmp_dest_unreach")
> clear IPCB in ip_tunnel_xmit() , or else skb->cb[] may contain garbage from
> GSO segmentation layer.
>
> But commit 0e6fbc5b6c621("ip_tunnels: extend iptunnel_xmit()") refactor codes,
> and it clear IPCB behind the dst_link_failure().
>
ok.
> So clear IPCB in ip_tunnel_xmit() just like commti a622260254ee48("ip_tunnel:
> fix kernel panic with icmp_dest_unreach").
>
> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
> ---
> net/ipv4/ip_tunnel.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> index 90ff957..dda1e9a 100644
> --- a/net/ipv4/ip_tunnel.c
> +++ b/net/ipv4/ip_tunnel.c
> @@ -535,6 +535,7 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
>
> inner_iph = (const struct iphdr *)skb_inner_network_header(skb);
>
> + memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
> dst = tnl_params->daddr;
> if (dst == 0) {
> /* NBMA tunnel */
Can you clear IPCB in error path for dst_link_failure(). So that we do
not clear it twice in fast path.
^ permalink raw reply
* Re: [PATCH RFC 4/6] net: rfkill: gpio: add device tree support
From: Arnd Bergmann @ 2014-01-21 18:50 UTC (permalink / raw)
To: Alexandre Courbot
Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
Linus Walleij, devicetree-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Heikki Krogerus, netdev, linux-wireless, linux-sunxi,
linux-kernel, Maxime Ripard, Chen-Yu Tsai, Johannes Berg,
Mika Westerberg, David S. Miller,
linux-gpio-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <CAAVeFuKFfAvSPYLmvWV5jjT-peZFJ8sJ2bbh4F=JAYoWLhjZpA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Tuesday 21 January 2014, Alexandre Courbot wrote:
> >> As discussed earlier in this thread I'm not sure the con_id is
> >> suitable for labelling GPIOs. It'd be better to have a proper name
> >> specified in DT/ACPI instead.
> >
> > +1
>
> I wonder why you guys prefer to have the name defined in the GPIO
> mapping. Having the driver decide the label makes it easier to look up
> which GPIO does what in debugfs, whereas nothing prevents people to
> name GPIOs whatever inadequate name they want in the device DT node.
> What am I overlooking here?
I should have another look at the debugfs representation, but isn't
there a global namespace that gets used for all gpios? Neither the
con_id nor the name that the driver picks would be globally unique
and stable across kernel versions, so they don't make a good user
interface.
I think what we want here is a system-wide unique identifier for
each gpio line that may get represented in debugfs, and use a new
DT mechanism to communicate that.
Arnd
^ permalink raw reply
* Re: [Xen-devel] [PATCH v3] xen/grant-table: Avoid m2p_override during mapping
From: Zoltan Kiss @ 2014-01-21 19:43 UTC (permalink / raw)
To: Stefano Stabellini, David Vrabel
Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel,
xen-devel
In-Reply-To: <alpine.DEB.2.02.1401211416420.21510@kaball.uk.xensource.com>
On 21/01/14 14:22, Stefano Stabellini wrote:
> On Tue, 21 Jan 2014, David Vrabel wrote:
>> On 21/01/14 12:26, Stefano Stabellini wrote:
>>> On Mon, 20 Jan 2014, Zoltan Kiss wrote:
>>>
>>>> - ret = m2p_add_override(mfn, pages[i], kmap_ops ?
>>>> - &kmap_ops[i] : NULL);
>>>> + if (m2p_override)
>>>> + ret = m2p_add_override(mfn, pages[i], kmap_ops ?
>>>> + &kmap_ops[i] : NULL);
>>>> + else {
>>>> + unsigned long pfn = page_to_pfn(pages[i]);
>>>> + WARN_ON(PagePrivate(pages[i]));
>>>> + SetPagePrivate(pages[i]);
>>>> + set_page_private(pages[i], mfn);
>>>> + pages[i]->index = pfn_to_mfn(pfn);
>>>> + if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn))))
>>>> + return -ENOMEM;
>>>
>>> What happens if the page is PageHighMem?
>>>
>>> This looks like a subset of m2p_add_override, but it is missing some
>>> relevant bits, like the PageHighMem check, or the p2m(m2p(mfn)) == mfn
>>> check. Maybe we can find a way to avoid duplicating the code.
>>> We could split m2p_add_override in two functions or add yet another
>>> parameter to it.
>>
>> The PageHighMem() check isn't relevant as we're not mapping anything
>> here. Also, a page for a kernel grant mapping only cannot be highmem.
>>
>> The check for a local mfn and the additional set_phys_to_machine() is
>> only necessary if something tries an mfn_to_pfn() on the local mfn. We
>> can only omit adding an m2p override if we know thing will try
>> mfn_to_pfn(), therefore the check and set_phys_to_machine() is unnecessary.
>
> OK, you convinced me that the two checks are superfluous for this case.
>
> Can we still avoid the code duplication by removing the corresponding
> code from m2p_add_override and m2p_remove_override and doing the
> set_page_private thing uniquely in grant-table.c?
Yes, I moved these parts out from the m2p* funcions to the gntmap
functions. One change is that now we pass pfn/mfn to m2p* functions as
they are changing right before the call. Also, to avoid racing I clear
the page->private value before calling m2p_remove_override. I'll send in
the patch soon.
Zoli
^ permalink raw reply
* I Need Your Assistance.
From: Malik Wasim @ 2014-01-21 19:43 UTC (permalink / raw)
Dear Friend,
I am Dr.Malik Wasim, Manager Auditing and Accountancy Department, Bank of Africa in (B.O.A) Burkina Faso. I got your email account while searching for a business oriented personality and you happen to be the one.
I bring forth a business proposal in the tune of $12.5 million U.S. dollars to be transferred to an offshore account with your assistance acting as beneficiary and next-of-kin to the funds.
If you know you are capable of involving and partaking in this transaction, please send down the following personal details to me via dr.malik_wasim@yahoo.com
(1)Full names:........................ .....
(2)Occupation:................ ..............
(3)Age and Sex:..........................
(4)Marital………….
Status:................
(5)Private phone number:..................
(6)Current residential address:.................
Noted; that every discussion is in the detail.
Dr.Malik Wasim.
^ permalink raw reply
* Re: [PATCH] DT: net: document Ethernet bindings in one place
From: Florian Fainelli @ 2014-01-21 19:56 UTC (permalink / raw)
To: Rob Herring
Cc: Sergei Shtylyov, netdev, Rob Herring, Pawel Moll, Mark Rutland,
Ian Campbell, Kumar Gala, devicetree@vger.kernel.org, Rob Landley,
linux-doc@vger.kernel.org
In-Reply-To: <CAL_JsqJXbF1-PcPHR2VP+Vi9A1aWizdsG_r3kDvRt3itXDhCGQ@mail.gmail.com>
2014/1/20 Rob Herring <robherring2@gmail.com>:
> On Mon, Jan 20, 2014 at 3:45 PM, Sergei Shtylyov
> <sergei.shtylyov@cogentembedded.com> wrote:
>> On 01/20/2014 05:06 PM, Rob Herring wrote:
>>
>>>> This patch is an attempt to gather the Ethernet related bindings in one
>>>> file,
>>>> like it's done in the MMC and some other subsystems. It should save the
>>>> trouble
>>>> of documenting several properties over and over in each binding document.
>>
>>
>>>> I have used the Embedded Power Architecture(TM) Platform Requirements
>>>> (ePAPR)
>>>> standard as a base for the properties description, also documenting some
>>>> ad-hoc
>>>> properties that have been introduced over time despite having direct
>>>> analogs in
>>>> ePAPR.
>>
>>
>>>> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
>>
>>
>>>> ---
>>>> The patch is against DaveM's 'net-next.git' repo and the DaVinci EMAC
>>>> bindings
>>>> fix I've posted yesterday:
>>
>>
>>>> http://patchwork.ozlabs.org/patch/311854/
>>
>>
>> [...]
>
>>>> Index: net-next/Documentation/devicetree/bindings/net/ethernet.txt
>>>> ===================================================================
>>>> --- /dev/null
>>>> +++ net-next/Documentation/devicetree/bindings/net/ethernet.txt
>>>> @@ -0,0 +1,20 @@
>>>> +The following properties are common to the Ethernet controllers:
>>>> +
>>>> +- local-mac-address: array of 6 bytes, specifies the MAC address that
>>>> was
>>>> + assigned to the network device;
>>>> +- mac-address: array of 6 bytes, specifies the MAC address that was last
>>>> used by
>>>> + the boot program; should be used in cases where the MAC address
>>>> assigned to
>>>> + the device by the boot program is different from the
>>>> "local-mac-address"
>>>> + property;
>>>> +- max-speed: number, specifies maximum speed in Mbit/s supported by the
>>>> device;
>>>> +- phy-mode: string, operation mode of the PHY interface; supported
>>>> values are
>>>> + "mii", "gmii", "sgmii", "tbi", "rev-mii", "rmii", "rgmii", "rgmii-id",
>>>> + "rgmii-rxid", "rgmii-txid", "rtbi", "smii", "xgmii";
>>
>>
>>> Mark this as deprecated
>>
>>
>> That's kind of wishful thinking at this point, as that's what the
>> majority of drivers use. I'm unsure of the reasons why that was done,
>> probably people just didn't read the proper specs...
>
> Or the spec was defined after those bindings. Deprecating does not
> matter for existing bindings. It's only defining new ones that are
> affected. I was more concerned with giving people guidance on which
> one to use for new bindings.
>
>>> in favor of phy-connection-type
>>
>>
>> That one is only used by the oldish PowerPC 'gianfar' driver.
>>
>>
>>> so it's use does not spread.
>>
>>
>> I'm afraid that's too late, it has spread very far, so that
>> of_get_phy_mode() handles that property, not "phy-connection-type".
>
> Uggg, I guess this is a case of a defacto standard then if the kernel
> doesn't even support it.
Maybe I forgot to CC you on patch sent to Grant only, I sent a patch a
while ago for of_get_phy_mode() to look for both "phy-mode" and
"phy-connection-type" since the former has been a Linux invention, but
the latter is ePAPR specified.
>
>>>> +- phy-connection-type: the same as "phy-mode" property (but described in
>>>> ePAPR);
>>>> +- phy-handle: phandle, specifies a reference to a node representing a
>>>> PHY
>>>> + device (this property is described in ePAPR);
>>>> +- phy: the same as "phy-handle" property (but actually ad-hoc one).
>>
>>
>>> Mark this as deprecated in favor of phy-handle.
>>
>>
>> Here situation is more optimistic. Quite many drivers still use
>> "phy-handle", though some use even more exotic props I didn't document here.
>
> Perhaps flagging as "Not recommended for new bindings" would be nicer wording...
Ok, so what's the deal here, can't we use phy-handle? There is
currently no infrastructure in drivers/net/ or drivers/of/ to look for
the "phy-handle" nor "phy" properties as phandles to fetch an Ethernet
PHY device tree node. If we are to provide one common place where we
want to do this, we will have to look for both properties, why can't
we mandate the use of "phy-handle" since this is the ePAPR compliant
one?
>
> Rob
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Florian
^ permalink raw reply
* Re: [PATCH] DT: net: document Ethernet bindings in one place
From: Florian Fainelli @ 2014-01-21 20:05 UTC (permalink / raw)
To: Rob Herring
Cc: Sergei Shtylyov, netdev, Rob Herring, Pawel Moll, Mark Rutland,
Ian Campbell, Kumar Gala, devicetree@vger.kernel.org, Rob Landley,
linux-doc@vger.kernel.org, Grant Likely
In-Reply-To: <CAGVrzcb3X3soJNJCE5+YSpQrr+EdycCRFkptPvBCgFg4CbGJ4Q@mail.gmail.com>
2014/1/21 Florian Fainelli <f.fainelli@gmail.com>:
>>> I'm afraid that's too late, it has spread very far, so that
>>> of_get_phy_mode() handles that property, not "phy-connection-type".
>>
>> Uggg, I guess this is a case of a defacto standard then if the kernel
>> doesn't even support it.
>
> Maybe I forgot to CC you on patch sent to Grant only, I sent a patch a
> while ago for of_get_phy_mode() to look for both "phy-mode" and
> "phy-connection-type" since the former has been a Linux invention, but
> the latter is ePAPR specified.
Here is a link to the actual patch in question, not sure which tree
Grant applied it to though:
http://lkml.indiana.edu/hypermail/linux/kernel/1311.2/00048.html
>
>>
>>>>> +- phy-connection-type: the same as "phy-mode" property (but described in
>>>>> ePAPR);
>>>>> +- phy-handle: phandle, specifies a reference to a node representing a
>>>>> PHY
>>>>> + device (this property is described in ePAPR);
>>>>> +- phy: the same as "phy-handle" property (but actually ad-hoc one).
>>>
>>>
>>>> Mark this as deprecated in favor of phy-handle.
>>>
>>>
>>> Here situation is more optimistic. Quite many drivers still use
>>> "phy-handle", though some use even more exotic props I didn't document here.
>>
>> Perhaps flagging as "Not recommended for new bindings" would be nicer wording...
>
> Ok, so what's the deal here, can't we use phy-handle? There is
> currently no infrastructure in drivers/net/ or drivers/of/ to look for
> the "phy-handle" nor "phy" properties as phandles to fetch an Ethernet
> PHY device tree node. If we are to provide one common place where we
> want to do this, we will have to look for both properties, why can't
> we mandate the use of "phy-handle" since this is the ePAPR compliant
> one?
>
>>
>> Rob
>> --
>> To unsubscribe from this list: send the line "unsubscribe devicetree" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Florian
--
Florian
^ permalink raw reply
* Re: [PATCH] DT: net: document Ethernet bindings in one place
From: Sergei Shtylyov @ 2014-01-21 21:19 UTC (permalink / raw)
To: Florian Fainelli, Rob Herring
Cc: netdev, Rob Herring, Pawel Moll, Mark Rutland, Ian Campbell,
Kumar Gala, devicetree@vger.kernel.org, Rob Landley,
linux-doc@vger.kernel.org, Grant Likely
In-Reply-To: <CAGVrzcb0dfw1orZzUt5-YkShOg-HNbUYvvo2vfmsZUZXy1Aqfg@mail.gmail.com>
Hello.
On 01/21/2014 11:05 PM, Florian Fainelli wrote:
>>>> I'm afraid that's too late, it has spread very far, so that
>>>> of_get_phy_mode() handles that property, not "phy-connection-type".
>>> Uggg, I guess this is a case of a defacto standard then if the kernel
>>> doesn't even support it.
>> Maybe I forgot to CC you on patch sent to Grant only, I sent a patch a
>> while ago for of_get_phy_mode() to look for both "phy-mode" and
>> "phy-connection-type" since the former has been a Linux invention, but
>> the latter is ePAPR specified.
> Here is a link to the actual patch in question, not sure which tree
> Grant applied it to though:
> http://lkml.indiana.edu/hypermail/linux/kernel/1311.2/00048.html
It's not the patch mail, it's Grant's "applied" reply, patch is mangled in
this reply, and I couldn't follow the thread. Here's the actual patch mail:
http://marc.info/?l=devicetree&m=138449662807254
WBR, Sergei
^ permalink raw reply
* [PATCH v4] xen/grant-table: Avoid m2p_override during mapping
From: Zoltan Kiss @ 2014-01-21 20:22 UTC (permalink / raw)
To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel,
jonathan.davies
Cc: Zoltan Kiss
The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
for blkback and future netback patches it just cause a lock contention, as
those pages never go to userspace. Therefore this series does the following:
- the original functions were renamed to __gnttab_[un]map_refs, with a new
parameter m2p_override
- based on m2p_override either they follow the original behaviour, or just set
the private flag and call set_phys_to_machine
- gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
m2p_override false
- a new function gnttab_[un]map_refs_userspace provides the old behaviour
v2:
- move the storing of the old mfn in page->index to gnttab_map_refs
- move the function header update to a separate patch
v3:
- a new approach to retain old behaviour where it needed
- squash the patches into one
v4:
- move out the common bits from m2p* functions, and pass pfn/mfn as parameter
- clear page->private before doing anything with the page, so m2p_find_override
won't race with this
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Suggested-by: David Vrabel <david.vrabel@citrix.com>
---
arch/x86/include/asm/xen/page.h | 12 +++--
arch/x86/xen/p2m.c | 25 ++--------
drivers/block/xen-blkback/blkback.c | 15 +++---
drivers/xen/gntdev.c | 13 +++--
drivers/xen/grant-table.c | 90 +++++++++++++++++++++++++++++------
include/xen/grant_table.h | 8 +++-
6 files changed, 107 insertions(+), 56 deletions(-)
diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
index b913915..68a1438 100644
--- a/arch/x86/include/asm/xen/page.h
+++ b/arch/x86/include/asm/xen/page.h
@@ -49,10 +49,14 @@ extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn);
extern unsigned long set_phys_range_identity(unsigned long pfn_s,
unsigned long pfn_e);
-extern int m2p_add_override(unsigned long mfn, struct page *page,
- struct gnttab_map_grant_ref *kmap_op);
+extern int m2p_add_override(unsigned long mfn,
+ struct page *page,
+ struct gnttab_map_grant_ref *kmap_op,
+ unsigned long pfn);
extern int m2p_remove_override(struct page *page,
- struct gnttab_map_grant_ref *kmap_op);
+ struct gnttab_map_grant_ref *kmap_op,
+ unsigned long pfn,
+ unsigned long mfn);
extern struct page *m2p_find_override(unsigned long mfn);
extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn);
@@ -121,7 +125,7 @@ static inline unsigned long mfn_to_pfn(unsigned long mfn)
pfn = m2p_find_override_pfn(mfn, ~0);
}
- /*
+ /*
* pfn is ~0 if there are no entries in the m2p for mfn or if the
* entry doesn't map back to the mfn and m2p_override doesn't have a
* valid entry for it.
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 2ae8699..0060178 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -872,15 +872,13 @@ static unsigned long mfn_hash(unsigned long mfn)
/* Add an MFN override for a particular page */
int m2p_add_override(unsigned long mfn, struct page *page,
- struct gnttab_map_grant_ref *kmap_op)
+ struct gnttab_map_grant_ref *kmap_op, unsigned long pfn)
{
unsigned long flags;
- unsigned long pfn;
unsigned long uninitialized_var(address);
unsigned level;
pte_t *ptep = NULL;
- pfn = page_to_pfn(page);
if (!PageHighMem(page)) {
address = (unsigned long)__va(pfn << PAGE_SHIFT);
ptep = lookup_address(address, &level);
@@ -888,13 +886,6 @@ int m2p_add_override(unsigned long mfn, struct page *page,
"m2p_add_override: pfn %lx not mapped", pfn))
return -EINVAL;
}
- WARN_ON(PagePrivate(page));
- SetPagePrivate(page);
- set_page_private(page, mfn);
- page->index = pfn_to_mfn(pfn);
-
- if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn))))
- return -ENOMEM;
if (kmap_op != NULL) {
if (!PageHighMem(page)) {
@@ -933,20 +924,15 @@ int m2p_add_override(unsigned long mfn, struct page *page,
}
EXPORT_SYMBOL_GPL(m2p_add_override);
int m2p_remove_override(struct page *page,
- struct gnttab_map_grant_ref *kmap_op)
+ struct gnttab_map_grant_ref *kmap_op,
+ unsigned long pfn,
+ unsigned long mfn)
{
unsigned long flags;
- unsigned long mfn;
- unsigned long pfn;
unsigned long uninitialized_var(address);
unsigned level;
pte_t *ptep = NULL;
- pfn = page_to_pfn(page);
- mfn = get_phys_to_machine(pfn);
- if (mfn == INVALID_P2M_ENTRY || !(mfn & FOREIGN_FRAME_BIT))
- return -EINVAL;
-
if (!PageHighMem(page)) {
address = (unsigned long)__va(pfn << PAGE_SHIFT);
ptep = lookup_address(address, &level);
@@ -959,10 +945,7 @@ int m2p_remove_override(struct page *page,
spin_lock_irqsave(&m2p_override_lock, flags);
list_del(&page->lru);
spin_unlock_irqrestore(&m2p_override_lock, flags);
- WARN_ON(!PagePrivate(page));
- ClearPagePrivate(page);
- set_phys_to_machine(pfn, page->index);
if (kmap_op != NULL) {
if (!PageHighMem(page)) {
struct multicall_space mcs;
diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 6620b73..875025f 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -285,8 +285,7 @@ static void free_persistent_gnts(struct xen_blkif *blkif, struct rb_root *root,
if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST ||
!rb_next(&persistent_gnt->node)) {
- ret = gnttab_unmap_refs(unmap, NULL, pages,
- segs_to_unmap);
+ ret = gnttab_unmap_refs(unmap, pages, segs_to_unmap);
BUG_ON(ret);
put_free_pages(blkif, pages, segs_to_unmap);
segs_to_unmap = 0;
@@ -321,8 +320,7 @@ static void unmap_purged_grants(struct work_struct *work)
pages[segs_to_unmap] = persistent_gnt->page;
if (++segs_to_unmap == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
- ret = gnttab_unmap_refs(unmap, NULL, pages,
- segs_to_unmap);
+ ret = gnttab_unmap_refs(unmap, pages, segs_to_unmap);
BUG_ON(ret);
put_free_pages(blkif, pages, segs_to_unmap);
segs_to_unmap = 0;
@@ -330,7 +328,7 @@ static void unmap_purged_grants(struct work_struct *work)
kfree(persistent_gnt);
}
if (segs_to_unmap > 0) {
- ret = gnttab_unmap_refs(unmap, NULL, pages, segs_to_unmap);
+ ret = gnttab_unmap_refs(unmap, pages, segs_to_unmap);
BUG_ON(ret);
put_free_pages(blkif, pages, segs_to_unmap);
}
@@ -670,15 +668,14 @@ static void xen_blkbk_unmap(struct xen_blkif *blkif,
GNTMAP_host_map, pages[i]->handle);
pages[i]->handle = BLKBACK_INVALID_HANDLE;
if (++invcount == BLKIF_MAX_SEGMENTS_PER_REQUEST) {
- ret = gnttab_unmap_refs(unmap, NULL, unmap_pages,
- invcount);
+ ret = gnttab_unmap_refs(unmap, unmap_pages, invcount);
BUG_ON(ret);
put_free_pages(blkif, unmap_pages, invcount);
invcount = 0;
}
}
if (invcount) {
- ret = gnttab_unmap_refs(unmap, NULL, unmap_pages, invcount);
+ ret = gnttab_unmap_refs(unmap, unmap_pages, invcount);
BUG_ON(ret);
put_free_pages(blkif, unmap_pages, invcount);
}
@@ -740,7 +737,7 @@ again:
}
if (segs_to_map) {
- ret = gnttab_map_refs(map, NULL, pages_to_gnt, segs_to_map);
+ ret = gnttab_map_refs(map, pages_to_gnt, segs_to_map);
BUG_ON(ret);
}
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index e41c79c..e652c0e 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -284,8 +284,10 @@ static int map_grant_pages(struct grant_map *map)
}
pr_debug("map %d+%d\n", map->index, map->count);
- err = gnttab_map_refs(map->map_ops, use_ptemod ? map->kmap_ops : NULL,
- map->pages, map->count);
+ err = gnttab_map_refs_userspace(map->map_ops,
+ use_ptemod ? map->kmap_ops : NULL,
+ map->pages,
+ map->count);
if (err)
return err;
@@ -315,9 +317,10 @@ static int __unmap_grant_pages(struct grant_map *map, int offset, int pages)
}
}
- err = gnttab_unmap_refs(map->unmap_ops + offset,
- use_ptemod ? map->kmap_ops + offset : NULL, map->pages + offset,
- pages);
+ err = gnttab_unmap_refs_userspace(map->unmap_ops + offset,
+ use_ptemod ? map->kmap_ops + offset : NULL,
+ map->pages + offset,
+ pages);
if (err)
return err;
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index aa846a4..1f97fa0 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -880,15 +880,17 @@ void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count)
}
EXPORT_SYMBOL_GPL(gnttab_batch_copy);
-int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
+int __gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
struct gnttab_map_grant_ref *kmap_ops,
- struct page **pages, unsigned int count)
+ struct page **pages, unsigned int count,
+ bool m2p_override)
{
int i, ret;
bool lazy = false;
pte_t *pte;
- unsigned long mfn;
+ unsigned long mfn, pfn;
+ BUG_ON(kmap_ops && !m2p_override);
ret = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, map_ops, count);
if (ret)
return ret;
@@ -907,10 +909,12 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
set_phys_to_machine(map_ops[i].host_addr >> PAGE_SHIFT,
map_ops[i].dev_bus_addr >> PAGE_SHIFT);
}
- return ret;
+ return 0;
}
- if (!in_interrupt() && paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
+ if (m2p_override &&
+ !in_interrupt() &&
+ paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
arch_enter_lazy_mmu_mode();
lazy = true;
}
@@ -927,8 +931,18 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
} else {
mfn = PFN_DOWN(map_ops[i].dev_bus_addr);
}
- ret = m2p_add_override(mfn, pages[i], kmap_ops ?
- &kmap_ops[i] : NULL);
+ pfn = page_to_pfn(pages[i]);
+
+ WARN_ON(PagePrivate(pages[i]));
+ SetPagePrivate(pages[i]);
+ set_page_private(pages[i], mfn);
+
+ pages[i]->index = pfn_to_mfn(pfn);
+ if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn))))
+ return -ENOMEM;
+ if (m2p_override)
+ ret = m2p_add_override(mfn, pages[i], kmap_ops ?
+ &kmap_ops[i] : NULL, pfn);
if (ret)
goto out;
}
@@ -937,17 +951,34 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
if (lazy)
arch_leave_lazy_mmu_mode();
- return ret;
+ return 0;
+}
+
+int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
+ struct page **pages, unsigned int count)
+{
+ return __gnttab_map_refs(map_ops, NULL, pages, count, false);
}
EXPORT_SYMBOL_GPL(gnttab_map_refs);
-int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
+int gnttab_map_refs_userspace(struct gnttab_map_grant_ref *map_ops,
+ struct gnttab_map_grant_ref *kmap_ops,
+ struct page **pages, unsigned int count)
+{
+ return __gnttab_map_refs(map_ops, kmap_ops, pages, count, true);
+}
+EXPORT_SYMBOL_GPL(gnttab_map_refs_userspace);
+
+int __gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
struct gnttab_map_grant_ref *kmap_ops,
- struct page **pages, unsigned int count)
+ struct page **pages, unsigned int count,
+ bool m2p_override)
{
int i, ret;
bool lazy = false;
+ unsigned long pfn, mfn;
+ BUG_ON(kmap_ops && !m2p_override);
ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, unmap_ops, count);
if (ret)
return ret;
@@ -958,17 +989,32 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
set_phys_to_machine(unmap_ops[i].host_addr >> PAGE_SHIFT,
INVALID_P2M_ENTRY);
}
- return ret;
+ return 0;
}
- if (!in_interrupt() && paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
+ if (m2p_override &&
+ !in_interrupt() &&
+ paravirt_get_lazy_mode() == PARAVIRT_LAZY_NONE) {
arch_enter_lazy_mmu_mode();
lazy = true;
}
for (i = 0; i < count; i++) {
- ret = m2p_remove_override(pages[i], kmap_ops ?
- &kmap_ops[i] : NULL);
+ pfn = page_to_pfn(pages[i]);
+ mfn = get_phys_to_machine(pfn);
+ if (mfn == INVALID_P2M_ENTRY || !(mfn & FOREIGN_FRAME_BIT))
+ return -EINVAL;
+
+ set_page_private(pages[i], INVALID_P2M_ENTRY);
+ WARN_ON(!PagePrivate(pages[i]));
+ ClearPagePrivate(pages[i]);
+ set_phys_to_machine(pfn, pages[i]->index);
+ if (m2p_override)
+ ret = m2p_remove_override(pages[i],
+ kmap_ops ?
+ &kmap_ops[i] : NULL,
+ pfn,
+ mfn);
if (ret)
goto out;
}
@@ -977,10 +1023,24 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
if (lazy)
arch_leave_lazy_mmu_mode();
- return ret;
+ return 0;
+}
+
+int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *map_ops,
+ struct page **pages, unsigned int count)
+{
+ return __gnttab_unmap_refs(map_ops, NULL, pages, count, false);
}
EXPORT_SYMBOL_GPL(gnttab_unmap_refs);
+int gnttab_unmap_refs_userspace(struct gnttab_unmap_grant_ref *map_ops,
+ struct gnttab_map_grant_ref *kmap_ops,
+ struct page **pages, unsigned int count)
+{
+ return __gnttab_unmap_refs(map_ops, kmap_ops, pages, count, true);
+}
+EXPORT_SYMBOL_GPL(gnttab_unmap_refs_userspace);
+
static unsigned nr_status_frames(unsigned nr_grant_frames)
{
BUG_ON(grefs_per_grant_frame == 0);
diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
index 694dcaf..9a919b1 100644
--- a/include/xen/grant_table.h
+++ b/include/xen/grant_table.h
@@ -184,11 +184,15 @@ unsigned int gnttab_max_grant_frames(void);
#define gnttab_map_vaddr(map) ((void *)(map.host_virt_addr))
int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
- struct gnttab_map_grant_ref *kmap_ops,
struct page **pages, unsigned int count);
+int gnttab_map_refs_userspace(struct gnttab_map_grant_ref *map_ops,
+ struct gnttab_map_grant_ref *kmap_ops,
+ struct page **pages, unsigned int count);
int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
- struct gnttab_map_grant_ref *kunmap_ops,
struct page **pages, unsigned int count);
+int gnttab_unmap_refs_userspace(struct gnttab_unmap_grant_ref *unmap_ops,
+ struct gnttab_map_grant_ref *kunmap_ops,
+ struct page **pages, unsigned int count);
/* Perform a batch of grant map/copy operations. Retry every batch slot
* for which the hypervisor returns GNTST_eagain. This is typically due
^ permalink raw reply related
* Kernel fails to drop packet with martian source address (subnet broadcast)
From: gregory hoggarth @ 2014-01-21 20:45 UTC (permalink / raw)
To: netdev; +Cc: davem
Hi,
This is my second mail to the list about this issue, since I didn't get much of
a response for my first one.
Earlier in the year we upgraded the linux kernel on our L3 switch firmware to
v3.6.2 which includes the "ipv4: Elidge fib_validate_source() completely when
possible" patch. Our testers have picked up a change in behaviour which I have
traced back to this patch and I'd like to get comments on whether this is a bug
or not. Here's the patch:
https://github.com/torvalds/linux/commit/7a9bc9b81a5bc6e44ebc80ef781332e4385083f2
We found that in a configuration with a device under test (DUT) that has an IP
address 192.168.0.1 / 24, when it is sent an ICMP echo request with a source IP
address of 192.168.0.255 will now be accepted by the DUT, which will send an
ICMP echo reply back to the 192.168.0.255 address with a broadcast MAC address
of FFFF.FFFF.FFFF, meaning any other devices in the L2 domain will pick up and
process the ICMP reply.
Prior to the upgrade, our device was running on linux kernel 2.6.38.2, and in
that version the DUT would reject the packet with a martian source address
error:
11:20:54 awplus kernel: RxPkt(00001) Q:1 dev:8 lport:00000a00 cpuC:154 encap:0
len:98 bufs:1
11:20:54 awplus kernel: RxPkt(00001) vid:1 port2.0.23
11:20:54 awplus kernel: <7> 0000cd29 980d0000 cd27c1eb 08004500
11:20:54 awplus kernel: <7> 00540000 40004001 b858c0a8 00ffc0a8
11:20:54 awplus kernel: <7> 00010800 a4cf588c 000552dd c9990000
11:20:54 awplus kernel: <7> f3240809 0a0b0c0d 0e0f1011 12131415
11:20:54 awplus kernel: <7> (...)
11:20:54 awplus kernel: martian source 192.168.0.1 from 192.168.0.255, on dev
vlan1
11:20:54 awplus kernel: ll header:
00:00:cd:29:98:0d:00:00:cd:27:c1:eb:08:00:45:00
Initially we were thinking that a device configured with 192.168.0.255 / 23
sending a ping to 192.168.0.1 / 24 should expect to get a reply, even though
this is somewhat of a misconfiguration in that the subnets aren't matching and
that some things may not work properly. For example ping from a device with IP
address 192.168.0.254 would behave correctly, so it seemed correct that a ping
from 192.168.0.255 should also work properly. However it appears that some
other part of the kernel is detecting that the echo reply packet sent by the
DUT has a destination IP address of the subnet broadcast, so instead of sending
it with a unicast MAC address it sends it with the broadcast MAC address, which
means all other devices on the subnet would process the ICMP echo reply. This
therefore makes it seem like the ICMP echo request should have been dropped in
the first place.
The patch added the "if (!r && !fib_num_tclassid_users)" check to
fib_validate_source, which for our devices evaluates to true, hence returning
early and the source address check inside __fib_validate_source is not carried
out.
Does anyone have comments as to what the correct behaviour should be in this
case, and whether this is an unexpected side-effect from the patch or not?
Thanks,
Greg
^ permalink raw reply
* Re: [PATCH] ixgbevf: delete unneeded call to pci_set_power_state
From: Brown, Aaron F @ 2014-01-21 20:47 UTC (permalink / raw)
To: Julia.Lawall@lip6.fr
Cc: Kirsher, Jeffrey T, Brandeburg, Jesse, Allan, Bruce W,
Wyborny, Carolyn, Skidmore, Donald C, Rose, Gregory V,
Duyck, Alexander H, Ronciak, John,
kernel-janitors@vger.kernel.org,
e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <1389114026-27854-1-git-send-email-Julia.Lawall@lip6.fr>
On Tue, 2014-01-07 at 18:00 +0100, Julia Lawall wrote:
> From: Julia Lawall <Julia.Lawall@lip6.fr>
>
> This driver does not need to adjust the power state on suspend, so the
> call to pci_set_power_state in the resume function is a no-op. Drop it, to
> make the code more understandable.
>
> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Aaron Brown <aaron.f.brown@intel.com>
> ---
> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> index a5d3167..5709fb0 100644
> --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
> @@ -3226,7 +3226,6 @@ static int ixgbevf_resume(struct pci_dev *pdev)
> struct net_device *netdev = adapter->netdev;
> u32 err;
>
> - pci_set_power_state(pdev, PCI_D0);
> pci_restore_state(pdev);
> /*
> * pci_restore_state clears dev->state_saved so call
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: issues with vxlan RX checksum offload under OVS datapath
From: Or Gerlitz @ 2014-01-21 20:55 UTC (permalink / raw)
To: Pravin Shelar; +Cc: Or Gerlitz, Joseph Gasparakis, netdev
In-Reply-To: <CALnjE+rUTejbU_d39G7WSn=k0aY2xELCE3UxjDr_Zdq3iwHbNA@mail.gmail.com>
On Tue, Jan 21, 2014 at 7:30 PM, Pravin Shelar <pshelar@nicira.com> wrote:
> On Sun, Jan 19, 2014 at 2:05 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
>> While testing the gro udp patches over a setup with openvswitch I noted that
>> the RX checksum offload support introduced by Joseph's commit 0afb166
>> "vxlan: Add capability of Rx checksum offload for inner packet" works fine
>> when you use a setup made of
>> NIC --> IP stack --> vxlan device --> bridge --> tap
>> but not when its
>> NIC --> IP stack --> ovs vxlan port --> OVS DP --> tap
>> I narrowed it down to the fact the when going the OVS pathskb->encapsulation
>> remains true also after the decap is done. Basically, this is the original hunk
[...]
>>> + skb->encapsulation = 0;
[...]
>> Moving this to shared code (while removing the check for
>> vxlan->dev->features) made things to work on my setup, but this misses one
>> of the original conditions, ideas?
> I kept csum check in vxlan-device recv path for same reason. As of now
> there is no efficient way to get ovs-dev features.
> May be we can cache device features in struct datapath from datapath-netdev.
To be a bit more precise/concrete here, do we agree that the both paths must do
skb->encapsulation = 0;
which is done now only by the non-ovs path
Or.
^ permalink raw reply
* Re: Fwd: [RFC PATCH net-next 0/3] virtio_net: add aRFS support
From: Stefan Hajnoczi @ 2014-01-20 16:49 UTC (permalink / raw)
To: Jason Wang
Cc: Zhi Yong Wu, Linux Netdev List, Tom Herbert, Eric Dumazet,
David S. Miller, Zhi Yong Wu, Michael S. Tsirkin, Rusty Russell
In-Reply-To: <52D89DA0.1020804@redhat.com>
On Fri, Jan 17, 2014 at 11:04:00AM +0800, Jason Wang wrote:
> On 01/16/2014 04:52 PM, Stefan Hajnoczi wrote:
> > On Thu, Jan 16, 2014 at 04:34:10PM +0800, Zhi Yong Wu wrote:
> >> CC: stefanha, MST, Rusty Russel
> >>
> >> ---------- Forwarded message ----------
> >> From: Jason Wang <jasowang@redhat.com>
> >> Date: Thu, Jan 16, 2014 at 12:23 PM
> >> Subject: Re: [RFC PATCH net-next 0/3] virtio_net: add aRFS support
> >> To: Zhi Yong Wu <zwu.kernel@gmail.com>
> >> Cc: netdev@vger.kernel.org, therbert@google.com, edumazet@google.com,
> >> davem@davemloft.net, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> >>
> >>
> >> On 01/15/2014 10:20 PM, Zhi Yong Wu wrote:
> >>> From: Zhi Yong Wu<wuzhy@linux.vnet.ibm.com>
> >>>
> >>> HI, folks
> >>>
> >>> The patchset is trying to integrate aRFS support to virtio_net. In this case,
> >>> aRFS will be used to select the RX queue. To make sure that it's going ahead
> >>> in the correct direction, although it is still one RFC and isn't tested, it's
> >>> post out ASAP. Any comment are appreciated, thanks.
> >>>
> >>> If anyone is interested in playing with it, you can get this patchset from my
> >>> dev git on github:
> >>> git://github.com/wuzhy/kernel.git virtnet_rfs
> >>>
> >>> Zhi Yong Wu (3):
> >>> virtio_pci: Introduce one new config api vp_get_vq_irq()
> >>> virtio_net: Introduce one dummy function virtnet_filter_rfs()
> >>> virtio-net: Add accelerated RFS support
> >>>
> >>> drivers/net/virtio_net.c | 67 ++++++++++++++++++++++++++++++++++++++++-
> >>> drivers/virtio/virtio_pci.c | 11 +++++++
> >>> include/linux/virtio_config.h | 12 +++++++
> >>> 3 files changed, 89 insertions(+), 1 deletions(-)
> >>>
> >> Please run get_maintainter.pl before sending the patch. You'd better
> >> at least cc virtio maintainer/list for this.
> >>
> >> The core aRFS method is a noop in this RFC which make this series no
> >> much sense to discuss. You should at least mention the big picture
> >> here in the cover letter. I suggest you should post a RFC which can
> >> run and has expected result or you can just raise a thread for the
> >> design discussion.
> >>
> >> And this method has been discussed before, you can search "[net-next
> >> RFC PATCH 5/5] virtio-net: flow director support" in netdev archive
> >> for a very old prototype implemented by me. It can work and looks like
> >> most of this RFC have already done there.
> >>
> >> A basic question is whether or not we need this, not all the mq cards
> >> use aRFS (see ixgbe ATR). And whether or not it can bring extra
> >> overheads? For virtio, we want to reduce the vmexits as much as
> >> possible but this aRFS seems introduce a lot of more of this. Making a
> >> complex interfaces just for an virtual device may not be good, simple
> >> method may works for most of the cases.
> >>
> >> We really should consider to offload this to real nic. VMDq and L2
> >> forwarding offload may help in this case.
> > Zhi Yong and I had an IRC chat. I wanted to post my questions on the
> > list - it's still the same concern I had in the old email thread that
> > Jason mentioned.
> >
> > In order for virtio-net aRFS to make sense there needs to be an overall
> > plan for pushing flow mapping information down to the physical NIC.
> > That's the only way to actually achieve the benefit of steering:
> > processing the packet on the CPU where the application is running.
>
> There are several places that the packet needs to be processing:
> 1) If you mean send the interrupt of virtio-net to the vcpu that the
> application is running, current virito-net has already had the ability
> with the help of flow caches of tun and the vcpu private queue
> configuration of XPS and irq affinity hint which is automatically done
> through the driver itself.
> 2) If you mean send the interrupt or doing the softirq of the host card
> to the cpu where vhost thread is running, recent RFS support of tun can
> do this.
> 3) And about the affinity of vhost thread and vcpu thread, this seems
> beyond the scope of aRFS.
I guess you're right, the vhost thread affinity is a factor if we're
trying to take the interrupt on the host CPU that will eventually
execute the vcpu. It's beyond the scope of this effort but something
that would be nice to eventually solve.
Stefan
^ permalink raw reply
* [PATCH RFC 00/73] tree-wide: clean up some no longer required #include <linux/init.h>
From: Paul Gortmaker @ 2014-01-21 21:22 UTC (permalink / raw)
To: linux-kernel
Cc: linux-arch, Paul Gortmaker, linux-alpha, linux-arm-kernel,
linux-ia64, linux-m68k, linux-mips, linuxppc-dev, linux-s390,
sparclinux, x86, netdev, kvm, sfr, rusty, gregkh, akpm, torvalds
TL;DR - We removed cpuinit and devinit, which left ~2000 instances of
include <linux/init.h> that were no longer needed. To fully enable
this removal/cleanup, we relocate module_init() from init.h into
module.h. Multi arch/multi config build testing on linux-next has
been used to find and fix any implicit header dependencies prior to
deploying the actual init.h --> module.h move, to preserve bisection.
Additional details beyond TL;DR:
module_init/module_exit and friends moved to module.h
=====================================================
Aside from enabling this init.h cleanup to extend into modular files,
it actually does make sense. For all modules will use some form of
our initfunc processing/categorization, but not all initfunc users
will be necessarily using modular functionality. So we move these
module related macros to module.h and ensure module.h sources init.h
module_init in non modular code:
================================
This series uncovered that we are enabling people to use module_init
in non-modular code. While that works fine, there are at least three
reasons why it probably should not be encouraged:
1) it makes a casual reader of the code assume the code is modular
even though it is obj-y (builtin) or controlled by a bool Kconfig.
2) it makes it too easy to add dead code in a function that is handed
to module_exit() -- [more on that below]
3) it breaks our ability to use priority sorted initcalls properly
[more on that below.]
After this change, a new coder who tries to make use of module_init in
non modular code would find themselves also needing to include the
module.h header. At which point the odds are greater that they would
ask themselves "Am I doing this right? I shouldn't need this."
Note that existing non-modular code that already includes module.h and
uses module_init doesn't get fixed here, since they already build w/o
errors triggered by this change; we'll have to hunt them down later.
module_init and initcall ordering:
==================================
We have a group of about ten priority sorted initcalls, that are
called in init/main.c after most of the hard coded/direct calls
have been processed. These serve the purpose of avoiding everyone
poking at init/main.c to hook in their init sequence. The bins are:
pure_initcall 0
core_initcall 1
postcore_initcall 2
arch_initcall 3
subsys_initcall 4
fs_initcall 5
device_initcall 6
late_initcall 7
These are meant to eventually replace users of the non specific
priority "__initcall" which currently maps onto device_initcall.
This is of interest, because in non-modular code, cpp does this:
module_init --> __initcall --> device_initcall
So all module_init() land in the device_initcall bucket, rather late
in the sequence. That makes sense, since if it was a module, the init
could be real late (days, weeks after boot). But now imagine you add
support for some non-modular bus/arch/infrastructure (say for e.g. PCI)
and you use module_init for it. That means anybody else who wants
to use your subsystem can't do so if they use an initcall of 0 --> 5
priority. For a real world example of this, see patch #1 in this series:
https://lkml.org/lkml/2014/1/14/809
We don't want to force code that is clearly arch or subsys or fs
specific to have to use the device_initcall just because something
else has been mistakenly put (or left) in that bucket. So a couple of
changes do actually change the initcall level where it is inevitably
appropriate to do so. Those are called out explicitly in their
respective commit logs.
module_exit and dead code
=========================
Built in code will never have an opportunity to call functions that
are registered with module_exit(), so any cases of that uncovered in
this series delete that dead code. Note that any built-in code that
was already including module.h and using module_exit won't have shown
up as breakage on the build coverage of this series, so we'll have to
find those independently later. It looks like there may be quite a
few that are invisibly created via module_platform_driver -- a macro
that creates module_init and module_exit automatically. We may want
to consider relocating module_platform_driver into module.h later...
cpuinit
=======
To finalize the removal of cpuinit, which was done several releases
ago, we remove the remaining stub functions from init.h in this
series. We've seen one or two "users" try to creep back in, so this
will close the door on that chapter and prevent creep.
When, what and where?
=====================
When: Ideally, barring any objections or massive oversights on my
part, this will go in at or around rc1, i.e. in about 2wks. In the
meantime I will continue daily re-test on linux-next across ~10 different
arch, using allyesconfig, allmodconfig and arch specific defconfigs
for things like mips/arm/powerpc; as I have been doing for a while.
Where: This work exists as a queue of patches that I apply to
linux-next; since the changes are fixing some things that currently
can only be found there. The patch series can be found at:
http://git.kernel.org/cgit/linux/kernel/git/paulg/init.git
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/init.git
The patches are not in strict chronological order, since when I've
found a header change causes a build regression that is due to an
implicit dependency/inclusion, I place the dependency fix before the
header change that caused it, so that bisection is preserved.
I've avoided annoying Stephen with another queue of patches for
linux-next while the development content was in flux, but now that
the merge window has opened, and new additions are fewer, perhaps he
wouldn't mind tacking it on the end... Stephen?
In order to reduce the size of the overall queue here, I have already
put some dependency-free changes through maintainer trees after
re-testing them on whatever their development baseline was. That made
sense for the larger ones (drivers/[net,usb,input] some arch trees...)
and for the kernel/ mm/ and fs/ ones where the changes were less
trivial and an earlier review was desired. But that independent treatment
doesn't scale for handling all the commits -- hence ~1400 of the
full ~2k of init.h removals remain here in this series.
What: The audit for removal of extra init.h lines has covered
drivers/, all of the main architectures (and some of the more fringe
ones), and core dirs like mm/ fs/ and kernel/ too. The removals from
include/ itself are probably the most valuable, in terms of reducing
the amount of stuff we needlessly feed CPP. There is probably more
fringe ones to be found, but this covers the majority of them.
Additional ones can be fed in later (through the trivial tree perhaps)
as desired.
Build coverage (from memory) has included, but is not limited to:
allyesconfig, allmodconfig:
x86, x86_64, ia64, s390, arm, mips, sparc, powerpc
arch specifc arch/<name>/config/*config files:
arm, mips, powerpc
defconfig:
(all of the above), c6x, parisc, uml, tile, c6x, blackfin, ...
and it will continue to take place for the next ~2wks, until I can
reliably apply the queue to master and submit a pull request.
Thanks for reading this far, and thanks to those who have merged init.h
cleanup commits already! Additional comments, reviews and acks welcomed.
Paul.
---
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-mips@linux-mips.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: x86@kernel.org
Cc: netdev@vger.kernel.org
Cc: kvm@vger.kernel.org
Cc: sfr@canb.auug.org.au
Cc: rusty@rustcorp.com.au
Cc: gregkh@linuxfoundation.org
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Paul Gortmaker (73):
init: delete the __cpuinit related stubs
mm: replace module_init usages with subsys_initcall in nommu.c
fs/notify: don't use module_init for non-modular inotify_user code
netfilter: don't use module_init/exit in core IPV4 code
x86: don't use module_init in non-modular intel_mid_vrtc.c
x86: don't use module_init for non-modular core bootflag code
x86: replace __init_or_module with __init in non-modular vsmp_64.c
drivers/tty/hvc: don't use module_init in non-modular hyp. console code
staging: don't use module_init in non-modular ion_dummy_driver.c
powerpc: use device_initcall for registering rtc devices
powerpc: book3s KVM can be modular so it should use module.h
powerpc: kvm e500/44x is not modular, so don't use module_init
powerpc: use subsys_initcall for Freescale Local Bus
powerpc: don't use module_init for non-modular core hugetlb code
powerpc: don't use module_init in non-modular 83xx suspend code
arm: include module.h in drivers/bus/omap_l3_smx.c
arm: fix implicit module.h use in mach-at91 gpio.h
arm: fix implicit #include <linux/init.h> in entry asm.
arm: mach-s3c64xx mach-crag6410-module.c is not modular
arm: use subsys_initcall in non-modular pl320 IPC code
arm: don't use module_init in non-modular mach-vexpress/spc.c code
alpha: don't use module_init for non-modular core code
sparc: don't use module_init in non-modular pci.c code
m68k: don't use module_init in non-modular mvme16x/rtc.c code
ia64: don't use module_init for non-modular core kernel/mca.c code
ia64: don't use module_init in non-modular sim/simscsi.c code
drivers/clk: don't use module_init in clk-nomadik.c which is non-modular
cpuidle: don't use modular platform register in non-modular ARM drivers
drivers/platform: don't use modular register in non-modular pdev_bus.c
drivers/i2c: busses/i2c-acorn.c is tristate and should use module.h
module: relocate module_init from init.h to module.h
logo: emit "#include <linux/init.h> in autogenerated C file
arm: delete non-required instances of include <linux/init.h>
mips: delete non-required instances of include <linux/init.h>
sparc: delete non-required instances of include <linux/init.h>
s390: delete non-required instances of include <linux/init.h>
alpha: delete non-required instances of <linux/init.h>
blackfin: delete non-required instances of <linux/init.h>
powerpc: delete another unrequired instance of <linux/init.h>
watchdog: delete non-required instances of include <linux/init.h>
video: delete non-required instances of include <linux/init.h>
rtc: delete non-required instances of include <linux/init.h>
scsi: delete non-required instances of include <linux/init.h>
spi: delete non-required instances of include <linux/init.h>
acpi: delete non-required instances of include <linux/init.h>
drivers/power: delete non-required instances of include <linux/init.h>
drivers/media: delete non-required instances of include <linux/init.h>
drivers/ata: delete non-required instances of include <linux/init.h>
drivers/mtd: delete non-required instances of include <linux/init.h>
drivers/hwmon: delete non-required instances of include <linux/init.h>
drivers/i2c: delete non-required instances of include <linux/init.h>
drivers/pinctrl: delete non-required instances of include <linux/init.h>
drivers/isdn: delete non-required instances of include <linux/init.h>
drivers/leds: delete non-required instances of include <linux/init.h>
drivers/pcmcia: delete non-required instances of include <linux/init.h>
drivers/char: delete non-required instances of include <linux/init.h>
drivers/infiniband: delete non-required instances of include <linux/init.h>
drivers/mfd: delete non-required instances of include <linux/init.h>
drivers/gpio: delete non-required instances of include <linux/init.h>
drivers/bluetooth: delete non-required instances of include <linux/init.h>
drivers/mmc: delete non-required instances of include <linux/init.h>
drivers/crypto: delete non-required instances of include <linux/init.h>
drivers/platform: delete non-required instances of include <linux/init.h>
drivers/misc: delete non-required instances of include <linux/init.h>
drivers/edac: delete non-required instances of include <linux/init.h>
drivers/macintosh: delete non-required instances of include <linux/init.h>
drivers/base: delete non-required instances of include <linux/init.h>
drivers/cpufreq: delete non-required instances of <linux/init.h>
drivers/pci: delete non-required instances of <linux/init.h>
drivers/dma: delete non-required instances of <linux/init.h>
drivers/gpu: delete non-required instances of <linux/init.h>
drivers: delete remaining non-required instances of <linux/init.h>
include: remove needless instances of <linux/init.h>
arch/alpha/kernel/err_ev6.c | 1 -
arch/alpha/kernel/irq.c | 1 -
arch/alpha/kernel/srmcons.c | 3 +-
arch/alpha/kernel/traps.c | 1 -
arch/alpha/oprofile/op_model_ev4.c | 1 -
arch/alpha/oprofile/op_model_ev5.c | 1 -
arch/alpha/oprofile/op_model_ev6.c | 1 -
arch/alpha/oprofile/op_model_ev67.c | 1 -
arch/arm/common/dmabounce.c | 1 -
arch/arm/firmware/trusted_foundations.c | 1 -
arch/arm/include/asm/arch_timer.h | 1 -
arch/arm/kernel/entry-armv.S | 2 +
arch/arm/kernel/entry-header.S | 1 -
arch/arm/kernel/hyp-stub.S | 1 -
arch/arm/kernel/suspend.c | 1 -
arch/arm/kernel/unwind.c | 1 -
arch/arm/mach-at91/include/mach/gpio.h | 1 +
arch/arm/mach-cns3xxx/pm.c | 1 -
arch/arm/mach-exynos/headsmp.S | 1 -
arch/arm/mach-footbridge/personal.c | 1 -
arch/arm/mach-imx/headsmp.S | 1 -
arch/arm/mach-imx/iomux-v3.c | 1 -
[.... snip ~1300 lines ...]
drivers/watchdog/stmp3xxx_rtc_wdt.c | 1 -
drivers/watchdog/wdt_pci.c | 1 -
drivers/xen/xen-stub.c | 1 -
fs/notify/inotify/inotify_user.c | 4 +-
include/drm/drmP.h | 2 +-
include/linux/fb.h | 1 -
include/linux/ide.h | 1 -
include/linux/init.h | 77 ----------------------
include/linux/kdb.h | 1 -
include/linux/linux_logo.h | 3 -
include/linux/lsm_audit.h | 1 -
include/linux/module.h | 72 ++++++++++++++++++++
include/linux/moduleparam.h | 1 -
include/linux/netfilter.h | 1 -
include/linux/nls.h | 2 +-
include/linux/percpu_ida.h | 1 -
include/linux/profile.h | 1 -
include/linux/pstore_ram.h | 1 -
include/linux/usb/gadget.h | 1 -
include/linux/zorro.h | 1 -
include/xen/xenbus.h | 1 -
mm/nommu.c | 4 +-
net/ipv4/netfilter.c | 9 +--
scripts/pnmtologo.c | 1 +
scripts/tags.sh | 2 +-
1254 files changed, 131 insertions(+), 1431 deletions(-)
mode change 100755 => 100644 scripts/tags.sh
--
1.8.4.1
^ permalink raw reply
* [PATCH 53/73] drivers/isdn: delete non-required instances of include <linux/init.h>
From: Paul Gortmaker @ 2014-01-21 21:22 UTC (permalink / raw)
To: linux-kernel
Cc: linux-arch, Paul Gortmaker, Karsten Keil, Armin Schindler, netdev
In-Reply-To: <1390339396-3479-1-git-send-email-paul.gortmaker@windriver.com>
None of these files are actually using any __init type directives
and hence don't need to include <linux/init.h>. Most are just a
left over from __devinit and __cpuinit removal, or simply due to
code getting copied from one driver to the next.
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Armin Schindler <mac@melware.de> (maintainer:ISDN SUBSYSTEM
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
drivers/isdn/hardware/avm/avm_cs.c | 1 -
drivers/isdn/hardware/eicon/platform.h | 1 -
drivers/isdn/hisax/amd7930_fn.c | 1 -
drivers/isdn/hisax/asuscom.c | 1 -
drivers/isdn/hisax/avm_a1.c | 1 -
drivers/isdn/hisax/avm_a1p.c | 1 -
drivers/isdn/hisax/avma1_cs.c | 1 -
drivers/isdn/hisax/bkm_a4t.c | 1 -
drivers/isdn/hisax/bkm_a8.c | 1 -
drivers/isdn/hisax/diva.c | 1 -
drivers/isdn/hisax/elsa.c | 1 -
drivers/isdn/hisax/elsa_cs.c | 1 -
drivers/isdn/hisax/enternow_pci.c | 1 -
drivers/isdn/hisax/fsm.c | 1 -
drivers/isdn/hisax/gazel.c | 1 -
drivers/isdn/hisax/hfc_2bds0.c | 1 -
drivers/isdn/hisax/hfc_2bs0.c | 1 -
drivers/isdn/hisax/hfc_pci.c | 1 -
drivers/isdn/hisax/hfc_sx.c | 1 -
drivers/isdn/hisax/hfcscard.c | 1 -
drivers/isdn/hisax/hscx.c | 1 -
drivers/isdn/hisax/icc.c | 1 -
drivers/isdn/hisax/ipacx.c | 1 -
drivers/isdn/hisax/isac.c | 1 -
drivers/isdn/hisax/isar.c | 1 -
drivers/isdn/hisax/isurf.c | 1 -
drivers/isdn/hisax/ix1_micro.c | 1 -
drivers/isdn/hisax/jade.c | 1 -
drivers/isdn/hisax/mic.c | 1 -
drivers/isdn/hisax/netjet.c | 1 -
drivers/isdn/hisax/niccy.c | 1 -
drivers/isdn/hisax/nj_s.c | 1 -
drivers/isdn/hisax/nj_u.c | 1 -
drivers/isdn/hisax/s0box.c | 1 -
drivers/isdn/hisax/saphir.c | 1 -
drivers/isdn/hisax/sedlbauer.c | 1 -
drivers/isdn/hisax/sedlbauer_cs.c | 1 -
drivers/isdn/hisax/sportster.c | 1 -
drivers/isdn/hisax/st5481_b.c | 1 -
drivers/isdn/hisax/st5481_usb.c | 1 -
drivers/isdn/hisax/teleint.c | 1 -
drivers/isdn/hisax/teles0.c | 1 -
drivers/isdn/hisax/teles3.c | 1 -
drivers/isdn/hisax/teles_cs.c | 1 -
drivers/isdn/hisax/telespci.c | 1 -
drivers/isdn/hisax/w6692.c | 1 -
drivers/isdn/i4l/isdnhdlc.c | 1 -
47 files changed, 47 deletions(-)
diff --git a/drivers/isdn/hardware/avm/avm_cs.c b/drivers/isdn/hardware/avm/avm_cs.c
index 62b8030..4332bd8 100644
--- a/drivers/isdn/hardware/avm/avm_cs.c
+++ b/drivers/isdn/hardware/avm/avm_cs.c
@@ -11,7 +11,6 @@
#include <linux/module.h>
#include <linux/kernel.h>
-#include <linux/init.h>
#include <linux/ptrace.h>
#include <linux/string.h>
#include <linux/tty.h>
diff --git a/drivers/isdn/hardware/eicon/platform.h b/drivers/isdn/hardware/eicon/platform.h
index b2edb75..b034ac1 100644
--- a/drivers/isdn/hardware/eicon/platform.h
+++ b/drivers/isdn/hardware/eicon/platform.h
@@ -19,7 +19,6 @@
#endif
#include <linux/module.h>
-#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/skbuff.h>
diff --git a/drivers/isdn/hisax/amd7930_fn.c b/drivers/isdn/hisax/amd7930_fn.c
index 36817e0..4d437f2 100644
--- a/drivers/isdn/hisax/amd7930_fn.c
+++ b/drivers/isdn/hisax/amd7930_fn.c
@@ -58,7 +58,6 @@
#include "isac.h"
#include "amd7930_fn.h"
#include <linux/interrupt.h>
-#include <linux/init.h>
#include <linux/gfp.h>
static void Amd7930_new_ph(struct IsdnCardState *cs);
diff --git a/drivers/isdn/hisax/asuscom.c b/drivers/isdn/hisax/asuscom.c
index 62f9c43..cdbbec7 100644
--- a/drivers/isdn/hisax/asuscom.c
+++ b/drivers/isdn/hisax/asuscom.c
@@ -12,7 +12,6 @@
*
*/
-#include <linux/init.h>
#include <linux/isapnp.h>
#include "hisax.h"
#include "isac.h"
diff --git a/drivers/isdn/hisax/avm_a1.c b/drivers/isdn/hisax/avm_a1.c
index 7dd7408..d3dbf02 100644
--- a/drivers/isdn/hisax/avm_a1.c
+++ b/drivers/isdn/hisax/avm_a1.c
@@ -10,7 +10,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "hscx.h"
diff --git a/drivers/isdn/hisax/avm_a1p.c b/drivers/isdn/hisax/avm_a1p.c
index bc52d54..50f1b1a 100644
--- a/drivers/isdn/hisax/avm_a1p.c
+++ b/drivers/isdn/hisax/avm_a1p.c
@@ -13,7 +13,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "hscx.h"
diff --git a/drivers/isdn/hisax/avma1_cs.c b/drivers/isdn/hisax/avma1_cs.c
index baad94e..cf305b5 100644
--- a/drivers/isdn/hisax/avma1_cs.c
+++ b/drivers/isdn/hisax/avma1_cs.c
@@ -13,7 +13,6 @@
#include <linux/kernel.h>
-#include <linux/init.h>
#include <linux/ptrace.h>
#include <linux/slab.h>
#include <linux/string.h>
diff --git a/drivers/isdn/hisax/bkm_a4t.c b/drivers/isdn/hisax/bkm_a4t.c
index c360164..8ac171b 100644
--- a/drivers/isdn/hisax/bkm_a4t.c
+++ b/drivers/isdn/hisax/bkm_a4t.c
@@ -11,7 +11,6 @@
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "hscx.h"
diff --git a/drivers/isdn/hisax/bkm_a8.c b/drivers/isdn/hisax/bkm_a8.c
index dd663ea..20ad1f7 100644
--- a/drivers/isdn/hisax/bkm_a8.c
+++ b/drivers/isdn/hisax/bkm_a8.c
@@ -11,7 +11,6 @@
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "ipac.h"
diff --git a/drivers/isdn/hisax/diva.c b/drivers/isdn/hisax/diva.c
index 4fc90de..c8d2682 100644
--- a/drivers/isdn/hisax/diva.c
+++ b/drivers/isdn/hisax/diva.c
@@ -15,7 +15,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "hscx.h"
diff --git a/drivers/isdn/hisax/elsa.c b/drivers/isdn/hisax/elsa.c
index 2be1c8a..bdf952b 100644
--- a/drivers/isdn/hisax/elsa.c
+++ b/drivers/isdn/hisax/elsa.c
@@ -18,7 +18,6 @@
*
*/
-#include <linux/init.h>
#include <linux/slab.h>
#include "hisax.h"
#include "arcofi.h"
diff --git a/drivers/isdn/hisax/elsa_cs.c b/drivers/isdn/hisax/elsa_cs.c
index 40f6fad..ca35784 100644
--- a/drivers/isdn/hisax/elsa_cs.c
+++ b/drivers/isdn/hisax/elsa_cs.c
@@ -37,7 +37,6 @@
#include <linux/module.h>
#include <linux/kernel.h>
-#include <linux/init.h>
#include <linux/ptrace.h>
#include <linux/slab.h>
#include <linux/string.h>
diff --git a/drivers/isdn/hisax/enternow_pci.c b/drivers/isdn/hisax/enternow_pci.c
index e8d431a..2fd3886 100644
--- a/drivers/isdn/hisax/enternow_pci.c
+++ b/drivers/isdn/hisax/enternow_pci.c
@@ -67,7 +67,6 @@
#include <linux/interrupt.h>
#include <linux/ppp_defs.h>
#include <linux/pci.h>
-#include <linux/init.h>
#include "netjet.h"
diff --git a/drivers/isdn/hisax/fsm.c b/drivers/isdn/hisax/fsm.c
index c7a9471..61db6fa 100644
--- a/drivers/isdn/hisax/fsm.c
+++ b/drivers/isdn/hisax/fsm.c
@@ -16,7 +16,6 @@
#include <linux/module.h>
#include <linux/slab.h>
-#include <linux/init.h>
#include "hisax.h"
#define FSM_TIMER_DEBUG 0
diff --git a/drivers/isdn/hisax/gazel.c b/drivers/isdn/hisax/gazel.c
index 35c6df6..5157ac7 100644
--- a/drivers/isdn/hisax/gazel.c
+++ b/drivers/isdn/hisax/gazel.c
@@ -11,7 +11,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "hscx.h"
diff --git a/drivers/isdn/hisax/hfc_2bds0.c b/drivers/isdn/hisax/hfc_2bds0.c
index a756e5c..17fcbbd 100644
--- a/drivers/isdn/hisax/hfc_2bds0.c
+++ b/drivers/isdn/hisax/hfc_2bds0.c
@@ -10,7 +10,6 @@
*
*/
-#include <linux/init.h>
#include <linux/sched.h>
#include <linux/slab.h>
#include "hisax.h"
diff --git a/drivers/isdn/hisax/hfc_2bs0.c b/drivers/isdn/hisax/hfc_2bs0.c
index 838531b..2e2c319 100644
--- a/drivers/isdn/hisax/hfc_2bs0.c
+++ b/drivers/isdn/hisax/hfc_2bs0.c
@@ -10,7 +10,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "hfc_2bs0.h"
#include "isac.h"
diff --git a/drivers/isdn/hisax/hfc_pci.c b/drivers/isdn/hisax/hfc_pci.c
index 4a48255..5d8a1c7 100644
--- a/drivers/isdn/hisax/hfc_pci.c
+++ b/drivers/isdn/hisax/hfc_pci.c
@@ -15,7 +15,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "hfc_pci.h"
#include "isdnl1.h"
diff --git a/drivers/isdn/hisax/hfc_sx.c b/drivers/isdn/hisax/hfc_sx.c
index fa1fefd..99c7f09 100644
--- a/drivers/isdn/hisax/hfc_sx.c
+++ b/drivers/isdn/hisax/hfc_sx.c
@@ -11,7 +11,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "hfc_sx.h"
#include "isdnl1.h"
diff --git a/drivers/isdn/hisax/hfcscard.c b/drivers/isdn/hisax/hfcscard.c
index 394da64..1b69f0c 100644
--- a/drivers/isdn/hisax/hfcscard.c
+++ b/drivers/isdn/hisax/hfcscard.c
@@ -10,7 +10,6 @@
*
*/
-#include <linux/init.h>
#include <linux/isapnp.h>
#include "hisax.h"
#include "hfc_2bds0.h"
diff --git a/drivers/isdn/hisax/hscx.c b/drivers/isdn/hisax/hscx.c
index 3e305fe..8c237dc 100644
--- a/drivers/isdn/hisax/hscx.c
+++ b/drivers/isdn/hisax/hscx.c
@@ -10,7 +10,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "hscx.h"
#include "isac.h"
diff --git a/drivers/isdn/hisax/icc.c b/drivers/isdn/hisax/icc.c
index 51dae91..575700f 100644
--- a/drivers/isdn/hisax/icc.c
+++ b/drivers/isdn/hisax/icc.c
@@ -14,7 +14,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "icc.h"
// #include "arcofi.h"
diff --git a/drivers/isdn/hisax/ipacx.c b/drivers/isdn/hisax/ipacx.c
index 5faa5de..d522dfb 100644
--- a/drivers/isdn/hisax/ipacx.c
+++ b/drivers/isdn/hisax/ipacx.c
@@ -11,7 +11,6 @@
*/
#include <linux/kernel.h>
#include <linux/slab.h>
-#include <linux/init.h>
#include "hisax_if.h"
#include "hisax.h"
#include "isdnl1.h"
diff --git a/drivers/isdn/hisax/isac.c b/drivers/isdn/hisax/isac.c
index 7fdf78f..e671728 100644
--- a/drivers/isdn/hisax/isac.c
+++ b/drivers/isdn/hisax/isac.c
@@ -19,7 +19,6 @@
#include "isdnl1.h"
#include <linux/interrupt.h>
#include <linux/slab.h>
-#include <linux/init.h>
#define DBUSY_TIMER_VALUE 80
#define ARCOFI_USE 1
diff --git a/drivers/isdn/hisax/isar.c b/drivers/isdn/hisax/isar.c
index f4956c7..993c6f3 100644
--- a/drivers/isdn/hisax/isar.c
+++ b/drivers/isdn/hisax/isar.c
@@ -8,7 +8,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isar.h"
#include "isdnl1.h"
diff --git a/drivers/isdn/hisax/isurf.c b/drivers/isdn/hisax/isurf.c
index 1399ddd..0d5849f 100644
--- a/drivers/isdn/hisax/isurf.c
+++ b/drivers/isdn/hisax/isurf.c
@@ -10,7 +10,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "isar.h"
diff --git a/drivers/isdn/hisax/ix1_micro.c b/drivers/isdn/hisax/ix1_micro.c
index 7ae39f5..7d3b35c 100644
--- a/drivers/isdn/hisax/ix1_micro.c
+++ b/drivers/isdn/hisax/ix1_micro.c
@@ -17,7 +17,6 @@
* Germany
*/
-#include <linux/init.h>
#include <linux/isapnp.h>
#include "hisax.h"
#include "isac.h"
diff --git a/drivers/isdn/hisax/jade.c b/drivers/isdn/hisax/jade.c
index e2ae787..f2c24f7 100644
--- a/drivers/isdn/hisax/jade.c
+++ b/drivers/isdn/hisax/jade.c
@@ -11,7 +11,6 @@
*/
-#include <linux/init.h>
#include "hisax.h"
#include "hscx.h"
#include "jade.h"
diff --git a/drivers/isdn/hisax/mic.c b/drivers/isdn/hisax/mic.c
index 9339867..6cf1367 100644
--- a/drivers/isdn/hisax/mic.c
+++ b/drivers/isdn/hisax/mic.c
@@ -10,7 +10,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "hscx.h"
diff --git a/drivers/isdn/hisax/netjet.c b/drivers/isdn/hisax/netjet.c
index 233e432..93b10a5 100644
--- a/drivers/isdn/hisax/netjet.c
+++ b/drivers/isdn/hisax/netjet.c
@@ -14,7 +14,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "hscx.h"
diff --git a/drivers/isdn/hisax/niccy.c b/drivers/isdn/hisax/niccy.c
index e4c33cf..480cc66 100644
--- a/drivers/isdn/hisax/niccy.c
+++ b/drivers/isdn/hisax/niccy.c
@@ -13,7 +13,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "hscx.h"
diff --git a/drivers/isdn/hisax/nj_s.c b/drivers/isdn/hisax/nj_s.c
index 32b4bbd..3c43f52 100644
--- a/drivers/isdn/hisax/nj_s.c
+++ b/drivers/isdn/hisax/nj_s.c
@@ -5,7 +5,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "isdnl1.h"
diff --git a/drivers/isdn/hisax/nj_u.c b/drivers/isdn/hisax/nj_u.c
index 4e8adbe..9f6eaab 100644
--- a/drivers/isdn/hisax/nj_u.c
+++ b/drivers/isdn/hisax/nj_u.c
@@ -5,7 +5,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "icc.h"
#include "isdnl1.h"
diff --git a/drivers/isdn/hisax/s0box.c b/drivers/isdn/hisax/s0box.c
index 4e7d0aa..3749b09 100644
--- a/drivers/isdn/hisax/s0box.c
+++ b/drivers/isdn/hisax/s0box.c
@@ -10,7 +10,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "hscx.h"
diff --git a/drivers/isdn/hisax/saphir.c b/drivers/isdn/hisax/saphir.c
index 6b2d0ec..dc36783 100644
--- a/drivers/isdn/hisax/saphir.c
+++ b/drivers/isdn/hisax/saphir.c
@@ -12,7 +12,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "hscx.h"
diff --git a/drivers/isdn/hisax/sedlbauer.c b/drivers/isdn/hisax/sedlbauer.c
index f16a47b..1b9459f 100644
--- a/drivers/isdn/hisax/sedlbauer.c
+++ b/drivers/isdn/hisax/sedlbauer.c
@@ -38,7 +38,6 @@
* For example: hisaxctrl <DriverID> 9 ISAR.BIN
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "ipac.h"
diff --git a/drivers/isdn/hisax/sedlbauer_cs.c b/drivers/isdn/hisax/sedlbauer_cs.c
index 92ef62d..f738f26 100644
--- a/drivers/isdn/hisax/sedlbauer_cs.c
+++ b/drivers/isdn/hisax/sedlbauer_cs.c
@@ -37,7 +37,6 @@
#include <linux/kernel.h>
#include <linux/module.h>
-#include <linux/init.h>
#include <linux/ptrace.h>
#include <linux/slab.h>
#include <linux/string.h>
diff --git a/drivers/isdn/hisax/sportster.c b/drivers/isdn/hisax/sportster.c
index 18cee63..943a186 100644
--- a/drivers/isdn/hisax/sportster.c
+++ b/drivers/isdn/hisax/sportster.c
@@ -12,7 +12,6 @@
*
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "hscx.h"
diff --git a/drivers/isdn/hisax/st5481_b.c b/drivers/isdn/hisax/st5481_b.c
index 4098491..ca141d6 100644
--- a/drivers/isdn/hisax/st5481_b.c
+++ b/drivers/isdn/hisax/st5481_b.c
@@ -10,7 +10,6 @@
*
*/
-#include <linux/init.h>
#include <linux/gfp.h>
#include <linux/usb.h>
#include <linux/netdevice.h>
diff --git a/drivers/isdn/hisax/st5481_usb.c b/drivers/isdn/hisax/st5481_usb.c
index ead0a4f..ee78014 100644
--- a/drivers/isdn/hisax/st5481_usb.c
+++ b/drivers/isdn/hisax/st5481_usb.c
@@ -10,7 +10,6 @@
*
*/
-#include <linux/init.h>
#include <linux/usb.h>
#include <linux/slab.h>
#include "st5481.h"
diff --git a/drivers/isdn/hisax/teleint.c b/drivers/isdn/hisax/teleint.c
index bf64754..d947abc 100644
--- a/drivers/isdn/hisax/teleint.c
+++ b/drivers/isdn/hisax/teleint.c
@@ -10,7 +10,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "hfc_2bs0.h"
diff --git a/drivers/isdn/hisax/teles0.c b/drivers/isdn/hisax/teles0.c
index ce9eabd..bd439a9 100644
--- a/drivers/isdn/hisax/teles0.c
+++ b/drivers/isdn/hisax/teles0.c
@@ -15,7 +15,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isdnl1.h"
#include "isac.h"
diff --git a/drivers/isdn/hisax/teles3.c b/drivers/isdn/hisax/teles3.c
index 38fb2c1..c608d80 100644
--- a/drivers/isdn/hisax/teles3.c
+++ b/drivers/isdn/hisax/teles3.c
@@ -13,7 +13,6 @@
* Beat Doebeli
*
*/
-#include <linux/init.h>
#include <linux/isapnp.h>
#include "hisax.h"
#include "isac.h"
diff --git a/drivers/isdn/hisax/teles_cs.c b/drivers/isdn/hisax/teles_cs.c
index b8dd149..c0e532e 100644
--- a/drivers/isdn/hisax/teles_cs.c
+++ b/drivers/isdn/hisax/teles_cs.c
@@ -18,7 +18,6 @@
#include <linux/kernel.h>
#include <linux/module.h>
-#include <linux/init.h>
#include <linux/ptrace.h>
#include <linux/slab.h>
#include <linux/string.h>
diff --git a/drivers/isdn/hisax/telespci.c b/drivers/isdn/hisax/telespci.c
index 33eeb46..f9b2169 100644
--- a/drivers/isdn/hisax/telespci.c
+++ b/drivers/isdn/hisax/telespci.c
@@ -12,7 +12,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "isac.h"
#include "hscx.h"
diff --git a/drivers/isdn/hisax/w6692.c b/drivers/isdn/hisax/w6692.c
index a858955..7e0c60c 100644
--- a/drivers/isdn/hisax/w6692.c
+++ b/drivers/isdn/hisax/w6692.c
@@ -10,7 +10,6 @@
*
*/
-#include <linux/init.h>
#include "hisax.h"
#include "w6692.h"
#include "isdnl1.h"
diff --git a/drivers/isdn/i4l/isdnhdlc.c b/drivers/isdn/i4l/isdnhdlc.c
index 027d1c5..5daace1 100644
--- a/drivers/isdn/i4l/isdnhdlc.c
+++ b/drivers/isdn/i4l/isdnhdlc.c
@@ -23,7 +23,6 @@
*/
#include <linux/module.h>
-#include <linux/init.h>
#include <linux/crc-ccitt.h>
#include <linux/isdn/hdlc.h>
#include <linux/bitrev.h>
--
1.8.4.1
^ permalink raw reply related
* [PATCH 04/73] netfilter: don't use module_init/exit in core IPV4 code
From: Paul Gortmaker @ 2014-01-21 21:22 UTC (permalink / raw)
To: linux-kernel
Cc: linux-arch, Paul Gortmaker, Pablo Neira Ayuso, Patrick McHardy,
Jozsef Kadlecsik, David S. Miller, netfilter-devel, netdev
In-Reply-To: <1390339396-3479-1-git-send-email-paul.gortmaker@windriver.com>
The file net/ipv4/netfilter.o is created based on whether
CONFIG_NETFILTER is set. However that is defined as a bool, and
hence this file with the core netfilter hooks will never be
modular. So using module_init as an alias for __initcall can be
somewhat misleading.
Fix this up now, so that we can relocate module_init from
init.h into module.h in the future. If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing. Also add an inclusion of init.h, as
that was previously implicit here in the netfilter.c file.
Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups. As __initcall gets
mapped onto device_initcall, our use of subsys_initcall (which
seems to make sense for netfilter code) will thus change this
registration from level 6-device to level 4-subsys (i.e. slightly
earlier). However no observable impact of that small difference
has been observed during testing, or is expected. (i.e. the
location of the netfilter messages in dmesg remains unchanged
with respect to all the other surrounding messages.)
As for the module_exit, rather than replace it with __exitcall,
we simply remove it, since it appears only UML does anything
with those, and even for UML, there is no relevant cleanup
to be done here.
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netfilter-devel@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
net/ipv4/netfilter.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c
index c3e0ade..31abf96 100644
--- a/net/ipv4/netfilter.c
+++ b/net/ipv4/netfilter.c
@@ -197,11 +197,4 @@ static int __init ipv4_netfilter_init(void)
{
return nf_register_afinfo(&nf_ip_afinfo);
}
-
-static void __exit ipv4_netfilter_fini(void)
-{
- nf_unregister_afinfo(&nf_ip_afinfo);
-}
-
-module_init(ipv4_netfilter_init);
-module_exit(ipv4_netfilter_fini);
+device_initcall(ipv4_netfilter_init);
--
1.8.4.1
^ permalink raw reply related
* Re: issues with vxlan RX checksum offload under OVS datapath
From: Joseph Gasparakis @ 2014-01-21 21:47 UTC (permalink / raw)
To: Or Gerlitz; +Cc: Pravin Shelar, Or Gerlitz, Joseph Gasparakis, netdev
In-Reply-To: <CAJZOPZJ6xj=fOAY9rkXZZs99ONkHwcW8RMXb+H2FjJzV3eoAPg@mail.gmail.com>
On Tue, 21 Jan 2014, Or Gerlitz wrote:
> On Tue, Jan 21, 2014 at 7:30 PM, Pravin Shelar <pshelar@nicira.com> wrote:
> > On Sun, Jan 19, 2014 at 2:05 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
>
> >> While testing the gro udp patches over a setup with openvswitch I noted that
> >> the RX checksum offload support introduced by Joseph's commit 0afb166
> >> "vxlan: Add capability of Rx checksum offload for inner packet" works fine
> >> when you use a setup made of
> >> NIC --> IP stack --> vxlan device --> bridge --> tap
> >> but not when its
> >> NIC --> IP stack --> ovs vxlan port --> OVS DP --> tap
> >> I narrowed it down to the fact the when going the OVS pathskb->encapsulation
> >> remains true also after the decap is done. Basically, this is the original hunk
> [...]
> >>> + skb->encapsulation = 0;
> [...]
>
> >> Moving this to shared code (while removing the check for
> >> vxlan->dev->features) made things to work on my setup, but this misses one
> >> of the original conditions, ideas?
>
> > I kept csum check in vxlan-device recv path for same reason. As of now
> > there is no efficient way to get ovs-dev features.
> > May be we can cache device features in struct datapath from datapath-netdev.
>
> To be a bit more precise/concrete here, do we agree that the both paths must do
>
> skb->encapsulation = 0;
>
> which is done now only by the non-ovs path
Originally skb->encapsulation had (and still has) the meaning of "does
this skb have outer *and* (valid) inner headers? If so, it is an
encapsulated packet".
So based on this skb->encapsulation should be set as soon an (inner)
packet gets encapsulated and unset when decapsulation takes place, and
ideally this should happen in the ovs path too. Together with
skb->encapsulation the inner headers should be correctly set.
>
> Or.
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox