* Re: [PATCH 1/5 v2] mv643xx_eth: add Device Tree bindings
From: Florian Fainelli @ 2013-04-05 14:23 UTC (permalink / raw)
To: Sebastian Hesselbarth
Cc: thomas.petazzoni, moinejf, Simon Baatz, andrew, netdev,
devicetree-discuss, rob.herring, grant.likely, jogo,
linux-arm-kernel, jm, davem, buytenh, jason
In-Reply-To: <CABJ1b_TdzE_Q9qLsdSGZy8byFpCTtVBUy5FtxiZAx7QqgG-rVg@mail.gmail.com>
Hello Sebastian,
Le 04/05/13 15:58, Sebastian Hesselbarth a écrit :
> On Fri, Apr 5, 2013 at 11:56 AM, Florian Fainelli <florian@openwrt.org> wrote:
>> [snip]
>
> Florian,
>
> took me a while to try you patches out on Dove but now I fixed all
> issues. I will
> comment on all related patches but first I want to comment here.
>
> One general note for Dove related patches: You didn't remove the registration of
> ge platform_device from mach-dove/board-dt.c. That will lead to double
> registration
> of mdio and mv643xx_eth/shared, so you'll never be sure if DT or non-DT code is
> executed. I haven't checked mach-kirkwood/board-dt.c or orion5x code.
This was intentional, this patchset is just preparatory in the sense
that it does no conversion of the existing users of the mv643xx_eth
platform driver over DT (have some patches to that though). I wanted to
resume the discussion on these bindings first, then proceed with the
conversion.
>
>>>> if (!mv643xx_eth_version_printed++)
>>>> pr_notice("MV-643xx 10/100/1000 ethernet driver version
>>>> %s\n",
>>>
>>>
>>> This is not related to your change, but there is a problem in this
>>> function that has already been discussed in the past if I remember
>>> correctly: The respective clock needs to be enabled here (at least
>>> on Kirkwood), since accesses to the hardware are done below.
>>> Enabling the clock only in mv643xx_eth_probe() is too late.
>>>
>>> As said, this is not a problem introduced by your changes (and which
>>> is currently circumvented by enabling the respective clocks in
>>> kirkwood_legacy_clk_init() and kirkwood_ge0x_init()), but we might
>>> want to fix this now to get rid of unconditionally enabling the GE
>>> clocks in the DT case.
>>
>>
>> I think there may have been some confusion between the "ethernet-group"
>> clock and the actual Ethernet port inside the "ethernet-group". The
>> mv643xx_eth driver assumes we have a per-port clock gating scheme, while I
>> think we have a per "ethernet-group" clock gating scheme instead. Like you
>> said, I think this should be addressed separately.
>
> IMHO, there should be a clocks property where ever you try to access registers,
> i.e. in all three "parts" mv643xx_eth_shared (group), mv643xx_eth
> (port) and mdio.
> Since port depends on shared it would be ok to have it per group but that may
> collide with other SoCs than Dove/Kirkwood that have per port clocks.
Ok, which means that we should also teach mv643xx_eth_shared_probe()
about it, as well as the orion-mdio driver. I don't have any particular
objections since it should just make things safer with respect to clocking.
>
> Is that separation (group/port) really required for any SoC?
Probably not, it was not clear when I looked at mv78xx0 if it uses two
ports per group or 4 groups and 1 port. Anyway, since we are re-using
the existing Device Tree binding definition and that the hardware
present itself as ethernet groups and ports, I don't see any problem
with keeping that difference since it allows for fine-grained
representation of the hardware.
>
>>
>> [snip]
>>>
>>> You don't change the clk initialization here:
>>>
>>> #if defined(CONFIG_HAVE_CLK)
>>> mp->clk = clk_get(&pdev->dev, (pdev->id ? "1" : "0"));
>>> if (!IS_ERR(mp->clk)) {
>>> clk_prepare_enable(mp->clk);
>>> mp->t_clk = clk_get_rate(mp->clk);
>>> }
>>> #endif
>>>
>>> Which, if I understand correctly, works in the DT case because you
>>> assign "clock-names" to the clocks in the DTS. However, I wonder
>>> whether this works for any but the first Ethernet device.
>
> Yes, it does. Assigned clocks from clocks property get a clock alias for
> that device name (node name). Using anything else than NULL here is
> IMHO just wrong. We should rather provide proper clock aliases for non-DT case.
>
>>> In the old platform device setup, the pdev->id was set when
>>> initialiazing the platform_device structure in common.c. Where is
>>> this done in the DT case?
>>
>> Looks like you are right, in the DT case, I assume that we should lookup the
>> clock using NULL instead of "1" or "0" so we match any clock instead of a
>> specific one.
>
> Yes.
>
>> [snip]
>>>
>>>
>>> In phy_scan(), the phy is searched like this:
>>>
>>> snprintf(phy_id, sizeof(phy_id), PHY_ID_FMT,
>>> "orion-mdio-mii", addr);
>>>
>>> phydev = phy_connect(mp->dev, phy_id,
>>> mv643xx_eth_adjust_link,
>>> PHY_INTERFACE_MODE_GMII);
>>>
>>> But "orion-mdio-mii:xx" is the name of the PHY if MDIO is setup via a
>>> platform_device. I could not get this to work if the MDIO device is
>>> setup via DT. Am I doing something wrong?
>>
>> I just missed updating this part of the code to probe for PHYs. The board I
>> tested with uses a "PHY_NONE" configuration. I will add the missing bits for
>> of_phy_connect() to be called here.
>
> I don't think that the ethernet controller should probe the PHY's on mdio-bus
> at all. At least not for DT enabled platforms. I had a look at DT and non-DT
> mdio-bus sources, and realized that there is a bus scan for non-DT only.
> of_mdiobus_register requires you to set (and know) the PHY address.
One reason the Ethernet controller could do the probing is in the case
we need to apply quirks (e.g: using phydev->flags) for instance. This
can be done even after the MDIO bus driver did probe PHY devices though.
>
> I prepared a patch for of_mdio_register that will allow you to probe mdio and
> assign phy addresses to each node found. Currently, the heuristic for probing
> is: assign each phy node the next probed phy_addr starting with 0. But that
> will not allow to e.g. set some PHY addresses and probe the rest.
Ok, we just need to make sure that this does not break any specific use
case, I don't think it does, since it seems to be more accurate or
equivalent to Ethernet driver doing the probing.
>
> We had a similar discussion whether to probe or not for DT nodes, and I guess
> there also will be some discussion about the above patch. OTOH we could just
> (again) ask users of every kirkwood/orion5x/dove board to tell their
> phy addresses
> and fail to probe the phy for new boards...
>
> I will prepare a proper patch soon and post it on the corresponding lists.
Cool, thanks!
>
>>> Additionally, in phy_scan() there is this:
>>>
>>> if (phy_addr == MV643XX_ETH_PHY_ADDR_DEFAULT) {
>>> start = phy_addr_get(mp) & 0x1f;
>>> num = 32;
>>> } else {
>>> ...
>>>
>>> MV643XX_ETH_PHY_ADDR_DEFAULT is defined as 0. However, many Kirkwood
>>> devices use "MV643XX_ETH_PHY_ADDR(0)". If the module probe is
>>> deferred in mv643xx_eth because the MDIO driver is not yet loaded,
>>> all 32 PHY addresses are scanned without success. This is not needed
>>> and clutters the log.
>>
>>
>> Ok, I am not sure how we can circumvent the log cluttering that happens,
>> what would be your suggestion?
>
> My suggestion is to change MV643XX_ETH_PHY_ADDR_DEFAULT from a valid
> phy address (0)
> to something invalid (32). I understand that using 0 helps if you
> don't want to set it in mv643xx's platform_data
> but it is always difficult to rely on that if 0 is a valid number.
>
> Changing the above to 32 should just work because most (all?) boards
> using phy_scan should also
> already use MV643XX_ETH_PHY_ADDR_DEFAULT. I also suggest to rename
> current define to a
> better name, e.g. MV643XX_ETH_PHY_ADDR_AUTOSCAN.
Sounds good to me.
--
Florian
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* RE: hv_netvsc: WARNING in softirq.c
From: Haiyang Zhang @ 2013-04-05 14:13 UTC (permalink / raw)
To: Richard Genoud
Cc: KY Srinivasan, devel@linuxdriverproject.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <CACQ1gAgh7QX=cDYsrw9J5qYNKYpgQ841QijA145TDO_Ae5EsQQ@mail.gmail.com>
> -----Original Message-----
> From: Richard Genoud [mailto:richard.genoud@gmail.com]
> Sent: Friday, April 05, 2013 7:00 AM
> To: Haiyang Zhang
> Cc: KY Srinivasan; devel@linuxdriverproject.org; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org
> Subject: Re: hv_netvsc: WARNING in softirq.c
>
> Ok, it has been a little bit more than 2 weeks now, and I did not see this
> warning any more, just the
> "hv_vmbus: child device vmbus_0_8 unregistered" sometimes.
>
> so it seems to be fixed !
>
> Thanks !
>
>
> Reported-by: Richard Genoud <richard.genoud@gmail.com>
> Tested-by: Richard Genoud <richard.genoud@gmail.com>
>
Thank you! I will submit the patch soon.
- Haiyang
^ permalink raw reply
* Re: [PATCH 1/5 v2] mv643xx_eth: add Device Tree bindings
From: Sebastian Hesselbarth @ 2013-04-05 13:58 UTC (permalink / raw)
To: Florian Fainelli
Cc: thomas.petazzoni, moinejf, Simon Baatz, andrew, netdev,
devicetree-discuss, rob.herring, grant.likely, jogo,
linux-arm-kernel, jm, davem, buytenh, jason
In-Reply-To: <515E9FE0.8050402@openwrt.org>
On Fri, Apr 5, 2013 at 11:56 AM, Florian Fainelli <florian@openwrt.org> wrote:
> [snip]
Florian,
took me a while to try you patches out on Dove but now I fixed all
issues. I will
comment on all related patches but first I want to comment here.
One general note for Dove related patches: You didn't remove the registration of
ge platform_device from mach-dove/board-dt.c. That will lead to double
registration
of mdio and mv643xx_eth/shared, so you'll never be sure if DT or non-DT code is
executed. I haven't checked mach-kirkwood/board-dt.c or orion5x code.
>>> if (!mv643xx_eth_version_printed++)
>>> pr_notice("MV-643xx 10/100/1000 ethernet driver version
>>> %s\n",
>>
>>
>> This is not related to your change, but there is a problem in this
>> function that has already been discussed in the past if I remember
>> correctly: The respective clock needs to be enabled here (at least
>> on Kirkwood), since accesses to the hardware are done below.
>> Enabling the clock only in mv643xx_eth_probe() is too late.
>>
>> As said, this is not a problem introduced by your changes (and which
>> is currently circumvented by enabling the respective clocks in
>> kirkwood_legacy_clk_init() and kirkwood_ge0x_init()), but we might
>> want to fix this now to get rid of unconditionally enabling the GE
>> clocks in the DT case.
>
>
> I think there may have been some confusion between the "ethernet-group"
> clock and the actual Ethernet port inside the "ethernet-group". The
> mv643xx_eth driver assumes we have a per-port clock gating scheme, while I
> think we have a per "ethernet-group" clock gating scheme instead. Like you
> said, I think this should be addressed separately.
IMHO, there should be a clocks property where ever you try to access registers,
i.e. in all three "parts" mv643xx_eth_shared (group), mv643xx_eth
(port) and mdio.
Since port depends on shared it would be ok to have it per group but that may
collide with other SoCs than Dove/Kirkwood that have per port clocks.
Is that separation (group/port) really required for any SoC?
>
> [snip]
>>
>> You don't change the clk initialization here:
>>
>> #if defined(CONFIG_HAVE_CLK)
>> mp->clk = clk_get(&pdev->dev, (pdev->id ? "1" : "0"));
>> if (!IS_ERR(mp->clk)) {
>> clk_prepare_enable(mp->clk);
>> mp->t_clk = clk_get_rate(mp->clk);
>> }
>> #endif
>>
>> Which, if I understand correctly, works in the DT case because you
>> assign "clock-names" to the clocks in the DTS. However, I wonder
>> whether this works for any but the first Ethernet device.
Yes, it does. Assigned clocks from clocks property get a clock alias for
that device name (node name). Using anything else than NULL here is
IMHO just wrong. We should rather provide proper clock aliases for non-DT case.
>> In the old platform device setup, the pdev->id was set when
>> initialiazing the platform_device structure in common.c. Where is
>> this done in the DT case?
>
> Looks like you are right, in the DT case, I assume that we should lookup the
> clock using NULL instead of "1" or "0" so we match any clock instead of a
> specific one.
Yes.
> [snip]
>>
>>
>> In phy_scan(), the phy is searched like this:
>>
>> snprintf(phy_id, sizeof(phy_id), PHY_ID_FMT,
>> "orion-mdio-mii", addr);
>>
>> phydev = phy_connect(mp->dev, phy_id,
>> mv643xx_eth_adjust_link,
>> PHY_INTERFACE_MODE_GMII);
>>
>> But "orion-mdio-mii:xx" is the name of the PHY if MDIO is setup via a
>> platform_device. I could not get this to work if the MDIO device is
>> setup via DT. Am I doing something wrong?
>
> I just missed updating this part of the code to probe for PHYs. The board I
> tested with uses a "PHY_NONE" configuration. I will add the missing bits for
> of_phy_connect() to be called here.
I don't think that the ethernet controller should probe the PHY's on mdio-bus
at all. At least not for DT enabled platforms. I had a look at DT and non-DT
mdio-bus sources, and realized that there is a bus scan for non-DT only.
of_mdiobus_register requires you to set (and know) the PHY address.
I prepared a patch for of_mdio_register that will allow you to probe mdio and
assign phy addresses to each node found. Currently, the heuristic for probing
is: assign each phy node the next probed phy_addr starting with 0. But that
will not allow to e.g. set some PHY addresses and probe the rest.
We had a similar discussion whether to probe or not for DT nodes, and I guess
there also will be some discussion about the above patch. OTOH we could just
(again) ask users of every kirkwood/orion5x/dove board to tell their
phy addresses
and fail to probe the phy for new boards...
I will prepare a proper patch soon and post it on the corresponding lists.
>> Additionally, in phy_scan() there is this:
>>
>> if (phy_addr == MV643XX_ETH_PHY_ADDR_DEFAULT) {
>> start = phy_addr_get(mp) & 0x1f;
>> num = 32;
>> } else {
>> ...
>>
>> MV643XX_ETH_PHY_ADDR_DEFAULT is defined as 0. However, many Kirkwood
>> devices use "MV643XX_ETH_PHY_ADDR(0)". If the module probe is
>> deferred in mv643xx_eth because the MDIO driver is not yet loaded,
>> all 32 PHY addresses are scanned without success. This is not needed
>> and clutters the log.
>
>
> Ok, I am not sure how we can circumvent the log cluttering that happens,
> what would be your suggestion?
My suggestion is to change MV643XX_ETH_PHY_ADDR_DEFAULT from a valid
phy address (0)
to something invalid (32). I understand that using 0 helps if you
don't want to set it in mv643xx's platform_data
but it is always difficult to rely on that if 0 is a valid number.
Changing the above to 32 should just work because most (all?) boards
using phy_scan should also
already use MV643XX_ETH_PHY_ADDR_DEFAULT. I also suggest to rename
current define to a
better name, e.g. MV643XX_ETH_PHY_ADDR_AUTOSCAN.
Sebastian
^ permalink raw reply
* Re: [Patch net-next v2 3/4] vxlan: add ipv6 support
From: David Stevens @ 2013-04-05 13:33 UTC (permalink / raw)
To: Cong Wang; +Cc: Cong Wang, David S. Miller, netdev, Stephen Hemminger
In-Reply-To: <1365164186-21719-3-git-send-email-amwang@redhat.com>
Cong Wang <amwang@redhat.com> wrote on 04/05/2013 08:16:24 AM:
> +struct vxlan_ip {
> + union {
> + struct sockaddr_in sin;
> +#if IS_ENABLED(CONFIG_IPV6)
> + struct sockaddr_in6 sin6;
> +#endif
> + struct sockaddr sa;
> + } u;
> +#define ip4 u.sin.sin_addr.s_addr
> +#define ip6 u.sin6.sin6_addr
> +#define proto u.sa.sa_family
> +};
> +
I'd prefer:
1) all #defines at the same level in the union/struct -- you
have a) a field within an in6_addr struct, b) a in6_addr
struct and c) a field within a sockaddr -- all at a different
level in the union/struct
2) union tags the odd names and the defines indicating the type
and also not something likely to appear as a variable name
(ie, maybe sun_sin6 for the union and vip_sin6 for the #define)
3) a family (e.g. AF_INET6) is not a proto (e.g. IPPROTO_IPV6)
4) "vxlan_ip" sounds like an ipv4 type to me -- maybe "vxlan_addr"?
5) I'd leave out the #ifdef; I believe sockaddr_in6 is defined without
CONFIG_IPV6. I may me wrong on that, and it makes it bigger for
v4-only configs, but fewer #ifdefs makes maintenance easier and
there's an advantage in it always being the same size; this one
is more a matter of taste.
So:
struct vxlan_addr {
union {
struct sockaddr_in sun_sin
struct sockaddr_in6 sun_sin6;
struct sockaddr sun_sa;
} sun;
}
#define vip_sin sun.sin
#define vip_sin6 sun.sin6
#define vip_sa sun.sa
As it is, you're trying to hide that it's a sockaddr, which
makes the code more difficult to read in context. "vip_sa.sa_family"
is obviously the sa_family field of a sockaddr, "proto" could be
lots of things.
> +#if IS_ENABLED(CONFIG_IPV6)
> +static inline bool vxlan_ip_equal(const struct vxlan_ip *a, const
> struct vxlan_ip *b)
> +{
> + if (a->proto != b->proto)
> + return false;
> + switch (a->proto) {
> + case AF_INET:
> + return a->ip4 == b->ip4;
> + case AF_INET6:
> + return ipv6_addr_equal(&a->ip6, &b->ip6);
> + }
> + return false;
> +}
> +
> +static inline bool vxlan_ip_any(const struct vxlan_ip *ipa)
> +{
> + if (ipa->proto == AF_INET)
> + return ipa->ip4 == htonl(INADDR_ANY);
> + else
> + return ipv6_addr_any(&ipa->ip6);
> +}
I think these functions shouldn't be under the #ifdef; if
CONFIG_IPV6 is not set, you'll never have AF_INET6. Also, if you
zero everything on allocation, you can use "memcmp" on the whole
thing and don't need to check family and addresses separately or
explicitly, or use ipv6_addr_equal() with a CONFIG_IPV6 dependency.
> +
> +static int vxlan_nla_get_addr(struct vxlan_ip *ip, struct nlattr *nla)
> +{
> + if (nla_len(nla) == sizeof(__be32)) {
> + ip->ip4 = nla_get_be32(nla);
> + ip->proto = AF_INET;
> + } else if (nla_len(nla) == sizeof(struct in6_addr)) {
> + nla_memcpy(&ip->ip6, nla, sizeof(struct in6_addr));
> + ip->proto = AF_INET6;
> + } else
> + return -EAFNOSUPPORT;
> + return 0;
> +}
netlink messages use padding -- I'm not sure nla_len() will
be correct in all cases here. I think using a sockaddr here too
would be better (then the family tells you the address type), as
long as that doesn't create include issues Dave mentioned within
the "ip" and "bridge" commands. Otherwise, maybe use a different
NL type for v6.
Another reason to do this is that link-local v6 addresses
require a scope_id and matching addresses are only equal if the
interfaces are equal. You need to get that from user level and
compare it in address matching, too, but it is in the sockaddr_in6
so if you set it there, memcmp() will cover that as well.
> +static inline bool vxlan_ip_equal(const struct vxlan_ip *a, const
> struct vxlan_ip *b)
> +{
> + return a->ip4 == b->ip4;
> +}
Again, shouldn't have entirely separate function for v4;
either use memcmp() on the whole thing, or just #ifdef the v6
case in the switch.
> +
> +static inline bool vxlan_ip_any(const struct vxlan_ip *ipa)
> +{
> + return ipa->ip4 == htonl(INADDR_ANY);
> +}
ditto. And also for nla_get/put funcs.
> @@ -617,7 +708,14 @@ static int vxlan_join_group(struct net_device *dev)
> /* Need to drop RTNL to call multicast join */
> rtnl_unlock();
> lock_sock(sk);
> +#if IS_ENABLED(CONFIG_IPV6)
> + if (vxlan->gaddr.proto == AF_INET)
> + err = ip_mc_join_group(sk, &mreq);
> + else
> + err = ipv6_sock_mc_join(sk, vxlan->link, &vxlan->gaddr.ip6);
> +#else
> err = ip_mc_join_group(sk, &mreq);
> +#endif
#if IS_ENABLED(CONFIG_IPV6)
if (vxlan->gaddr.proto == AF_INET6)
err = ipv6_sock_mc_join....
else
#endif
err = ip_mc_join_group..
> @@ -644,7 +742,14 @@ static int vxlan_leave_group(struct net_device
*dev)
> /* Need to drop RTNL to call multicast leave */
> rtnl_unlock();
> lock_sock(sk);
> +#if IS_ENABLED(CONFIG_IPV6)
> + if (vxlan->gaddr.proto == AF_INET)
> + err = ip_mc_leave_group(sk, &mreq);
> + else
> + err = ipv6_sock_mc_drop(sk, vxlan->link, &vxlan->gaddr.ip6);
> +#else
> err = ip_mc_leave_group(sk, &mreq);
> +#endif
> release_sock(sk);
> rtnl_lock();
Again, arrange as above so only one instance of v4 leave
and #ifdefs for the optional v6 code.
> +#if IS_ENABLED(CONFIG_IPV6)
> + if (oip6) {
> + err = IP6_ECN_decapsulate(oip6, skb);
> + if (unlikely(err)) {
> + if (log_ecn_error)
> + net_info_ratelimited("non-ECT from %pI4\n",
%pI6
> + &oip6->saddr);
> + if (err > 1) {
> + ++vxlan->dev->stats.rx_frame_errors;
> + ++vxlan->dev->stats.rx_errors;
> + goto drop;
> + }
> + }
> + }
> +#endif
> + if (oip) {
> + err = IP_ECN_decapsulate(oip, skb);
> + if (unlikely(err)) {
> + if (log_ecn_error)
> + net_info_ratelimited("non-ECT from %pI4 with TOS=%#x\n",
> + &oip->saddr, oip->tos);
> + if (err > 1) {
> + ++vxlan->dev->stats.rx_frame_errors;
> + ++vxlan->dev->stats.rx_errors;
> + goto drop;
> + }
> }
> }
These are duplicated sections of code -- would rather
see one common section for stats based on err, though the
logging would still need an ifdef.
I think the IP v6/v4 header and routing code can be
refactored better -- maybe separate functions to handle
each, since it's all one or the other.
+-DLS
^ permalink raw reply
* be2net: GRO for non-inet protocols
From: Erik Hugne @ 2013-04-05 13:20 UTC (permalink / raw)
To: sathya.perla, subbu.seetharaman, ajit.khaparde; +Cc: netdev
I'm adding GRO support for the TIPC protocol and
tried to test it on a pair of HP G7 blades with
Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) NICs.
However, my GRO callback is never invoked on this setup, and
i suspect the be2net driver is to blame.
I had a brief look at the driver, first suspecting that the
do_gro check only passed for inet protocols (tcpf must be set
in the be_rx_compl_info struct).
However, removing this check did not change the behavior.
#ethtool -i eth1
driver: be2net
version: 4.4.161.0s
firmware-version: 4.1.450.7
bus-info: 0000:02:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: no
It works as expected on another setup with with Intel NIC's/e1000 driver.
//E
^ permalink raw reply
* Re: bnx2x multicast packet loss on igmp join/leave
From: vincent Richard @ 2013-04-05 13:06 UTC (permalink / raw)
To: Dmitry Kravkov; +Cc: netdev@vger.kernel.org
In-Reply-To: <CAM8tLiMrMtrqzZKPK+65UFoojcAfWBeeBTWhsU7ALtaOSozwLw@mail.gmail.com>
2013/4/5, Dmitry Kravkov <dkravkov@gmail.com>:
> On Fri, Apr 5, 2013 at 1:13 PM, vincent Richard
> <vincent.richard449@gmail.com> wrote:
>>
>> I all,
>>
>> I am using a HP server with bnx2x ethernet cards in it.
>> I have compiled the latest linux 3.2 version, and I am encountering
>> the following error :
>>
>> when joining or leaving a multicast group, other multicast streams
>> lose some packets.
>>
>> I tried latest firmware and different drivers (latest HP recommended
>> one (redhat one)) but still the same..
>>
>> I observe that putting the interface in promiscuous mode make the
>> problem disappear.
>>
>> looking around bnx2x_set_rx_mode, i notice that there is 3 rx_mode,
>> NORMAL, PROMISCUOUS and ALLMULTI . forcing other than normal mode
>> correct my issue..
>>
>> My bnx2x is in a blade center so there is a switch just after. So
>> forcing PROMISCUOUS or ALLMULTI mode have no major impact.
>>
>> To reproduce the problem I just "listen" 2 multicast streams and
>> join/leave another one periodically.
>>
>> Thanks in advance for your help.
>> Regards,
>> Vincent
>
> This because on MC group change, bnx2x first removes the devices from
> ALL MC groups and then
> add the device to newly configured groups:
>
> 11112 /* first, clear all configured multicast MACs */
> 11113 rc = bnx2x_config_mcast(bp, &rparam, BNX2X_MCAST_CMD_DEL);
> 11114 if (rc < 0) {
> 11115 BNX2X_ERR("Failed to clear multicast
> configuration: %d\n", rc);
> 11116 return rc;
> 11117 }
> 11118
> 11119 /* then, configure a new MACs list */
> 11120 if (netdev_mc_count(dev)) {
> 11121 rc = bnx2x_init_mcast_macs_list(bp, &rparam);
>
Thanks for the quick answer, BTW I am very surprised of this behavior.
It is the first time I encounter (or see) this.
^ permalink raw reply
* Re: [RFC PATCH ipsec] xfrm: use the right dev to fill xdst
From: Daniel Baluta @ 2013-04-05 12:59 UTC (permalink / raw)
To: Steffen Klassert; +Cc: Nicolas Dichtel, herbert, davem, netdev
In-Reply-To: <20130405094629.GV21448@secunet.com>
On Fri, Apr 5, 2013 at 12:46 PM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> On Thu, Apr 04, 2013 at 05:12:42PM +0200, Nicolas Dichtel wrote:
>> Commit bc8e4b954e46 (xfrm6: ensure to use the same dev when building a bundle)
>> broke IPsec for IPv4 over IPv6 tunnels (because dev points to an IPv4 only
>> interface, hence in6_dev_get(dev) returns NULL.
>
> Can you give some informations on how to reproduce this? I'm running
> interfamily tunnels on our testing environment and it seems to
> work fine.
I can hit this in our setup while using some internal custom simulated
interfaces.
Anyhow, this should be reproducible with a classic IPv6 IPsec over
IPv4 test. Please make sure
that the IPv4 interface doesn't have an IPv6 address set up.
Quoting from commit bc8e4b954e46 (xfrm6: ensure to use the same dev
when building a bundle):
- xdst->u.rt6.rt6i_idev = in6_dev_get(rt->u.dst.dev);
+ xdst->u.rt6.rt6i_idev = in6_dev_get(dev);
dev points to IPv4 endpoint and if it doesn't have an IPv6 address
associated then
in6_dev_get(dev) will return NULL.
>
>>
>> After looking again into commit 25ee3286dcbc ([IPSEC]: Merge common code into
>> xfrm_bundle_create), it seems that previously we were using dev from the route,
>> for both IPv4 and IPv6.
>
> I think this was the right way. We need to attach the dev from the
> corresponding route to the xdst.
>
>>
>> In fact, xfrm_fill_dst() is called during a loop on chained dst, but dev points
>> always to the same device.
>
> The way we do it now can be problematic for tunnel in tunnel scenarios too.
> We assign the dev from the first tunnel route to all the bundle entries,
> this looks really wrong.
>
> I think your patch is correct, but I want understand the breaking
> scenario first.
thanks,
Daniel.
^ permalink raw reply
* [PATCH net-next 3/4] netxen_nic: Add module parameters support to switch interrupts
From: Manish Chopra @ 2013-04-05 12:36 UTC (permalink / raw)
To: davem; +Cc: netdev, rajesh.borundia, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1365165403-18348-1-git-send-email-manish.chopra@qlogic.com>
o Adapters do not support co-existence of legacy Interrupt with
MSI-X or MSI among multiple functions. Prevent attaching
to a function during normal load, if MSI-x or MSI vectors are not available.
o Using use_msi=0 and use_msi_x=0 module parameters driver can be loaded in
legacy mode as well as in MSI mode for all functions in the adapter.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
---
.../net/ethernet/qlogic/netxen/netxen_nic_main.c | 34 +++++++++++++-------
1 files changed, 22 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
index cde5109..ff113e7 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
@@ -52,9 +52,13 @@ static int port_mode = NETXEN_PORT_MODE_AUTO_NEG;
/* Default to restricted 1G auto-neg mode */
static int wol_port_mode = 5;
-static int use_msi = 1;
+static int netxen_use_msi = 1;
+MODULE_PARM_DESC(use_msi, "MSI interrupt (0=disabled, 1=enabled)");
+module_param_named(use_msi, netxen_use_msi, int, 0444);
-static int use_msi_x = 1;
+static int netxen_use_msi_x = 1;
+MODULE_PARM_DESC(use_msi_x, "MSI-X interrupt (0=disabled, 1=enabled)");
+module_param_named(use_msi_x, netxen_use_msi_x, int, 0444);
static int auto_fw_reset = AUTO_FW_RESET_ENABLED;
module_param(auto_fw_reset, int, 0644);
@@ -592,8 +596,7 @@ static const struct net_device_ops netxen_netdev_ops = {
#endif
};
-static void
-netxen_setup_intr(struct netxen_adapter *adapter)
+static int netxen_setup_intr(struct netxen_adapter *adapter)
{
struct netxen_legacy_intr_set *legacy_intrp;
struct pci_dev *pdev = adapter->pdev;
@@ -644,7 +647,7 @@ netxen_setup_intr(struct netxen_adapter *adapter)
adapter->max_sds_rings = num_msix;
dev_info(&pdev->dev, "using msi-x interrupts\n");
- return;
+ return err;
}
if (err > 0)
@@ -653,17 +656,21 @@ netxen_setup_intr(struct netxen_adapter *adapter)
/* fall through for msi */
}
- if (use_msi && !pci_enable_msi(pdev)) {
+ if (netxen_use_msi && !pci_enable_msi(pdev)) {
adapter->flags |= NETXEN_NIC_MSI_ENABLED;
adapter->tgt_status_reg = netxen_get_ioaddr(adapter,
msi_tgt_status[adapter->ahw.pci_func]);
dev_info(&pdev->dev, "using msi interrupts\n");
adapter->msix_entries[0].vector = pdev->irq;
- return;
+ return 0;
}
+ if (netxen_use_msi || netxen_use_msi_x)
+ return -EOPNOTSUPP;
+
dev_info(&pdev->dev, "using legacy interrupts\n");
adapter->msix_entries[0].vector = pdev->irq;
+ return 0;
}
static void
@@ -879,8 +886,8 @@ netxen_check_options(struct netxen_adapter *adapter)
adapter->msix_supported = 0;
if (NX_IS_REVISION_P3(adapter->ahw.revision_id)) {
- adapter->msix_supported = !!use_msi_x;
- adapter->rss_supported = !!use_msi_x;
+ adapter->msix_supported = !!netxen_use_msi_x;
+ adapter->rss_supported = !!netxen_use_msi_x;
} else {
u32 flashed_ver = 0;
netxen_rom_fast_read(adapter,
@@ -891,8 +898,8 @@ netxen_check_options(struct netxen_adapter *adapter)
switch (adapter->ahw.board_type) {
case NETXEN_BRDTYPE_P2_SB31_10G:
case NETXEN_BRDTYPE_P2_SB31_10G_CX4:
- adapter->msix_supported = !!use_msi_x;
- adapter->rss_supported = !!use_msi_x;
+ adapter->msix_supported = !!netxen_use_msi_x;
+ adapter->rss_supported = !!netxen_use_msi_x;
break;
default:
break;
@@ -1510,7 +1517,10 @@ netxen_nic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
netxen_nic_clear_stats(adapter);
- netxen_setup_intr(adapter);
+ if (netxen_setup_intr(adapter)) {
+ dev_err(&adapter->pdev->dev, "failed to setup interrupts\n");
+ goto err_out_decr_ref;
+ }
err = netxen_setup_netdev(adapter, netdev);
if (err)
--
1.5.6
^ permalink raw reply related
* [PATCH net-next 1/4] netxen_nic: Fix mismatched board type error
From: Manish Chopra @ 2013-04-05 12:36 UTC (permalink / raw)
To: davem; +Cc: netdev, rajesh.borundia, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1365165403-18348-1-git-send-email-manish.chopra@qlogic.com>
o Display unknown board name and serial number
in case of mismatched board type
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
---
drivers/net/ethernet/qlogic/netxen/netxen_nic.h | 10 +++++++---
.../net/ethernet/qlogic/netxen/netxen_nic_main.c | 4 +++-
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic.h b/drivers/net/ethernet/qlogic/netxen/netxen_nic.h
index 322a36b..e5ab1ec 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic.h
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic.h
@@ -1855,7 +1855,7 @@ static const struct netxen_brdinfo netxen_boards[] = {
#define NUM_SUPPORTED_BOARDS ARRAY_SIZE(netxen_boards)
-static inline void get_brd_name_by_type(u32 type, char *name)
+static inline int netxen_nic_get_brd_name_by_type(u32 type, char *name)
{
int i, found = 0;
for (i = 0; i < NUM_SUPPORTED_BOARDS; ++i) {
@@ -1864,10 +1864,14 @@ static inline void get_brd_name_by_type(u32 type, char *name)
found = 1;
break;
}
+ }
+ if (!found) {
+ strcpy(name, "Unknown");
+ return -EINVAL;
}
- if (!found)
- name = "Unknown";
+
+ return 0;
}
static inline u32 netxen_tx_avail(struct nx_host_tx_ring *tx_ring)
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
index 7867aeb..6049637 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
@@ -841,7 +841,9 @@ netxen_check_options(struct netxen_adapter *adapter)
}
if (adapter->portnum == 0) {
- get_brd_name_by_type(adapter->ahw.board_type, brd_name);
+ if (netxen_nic_get_brd_name_by_type(adapter->ahw.board_type,
+ brd_name))
+ strcpy(serial_num, "Unknown");
pr_info("%s: %s Board S/N %s Chip rev 0x%x\n",
module_name(THIS_MODULE),
--
1.5.6
^ permalink raw reply related
* [PATCH net-next 0/4] netxen_nic: Bug fixes and enhancement
From: Manish Chopra @ 2013-04-05 12:36 UTC (permalink / raw)
To: davem; +Cc: netdev, rajesh.borundia, Dept_NX_Linux_NIC_Driver
Please apply this patch series to net-next.
Thanks,
Manish
Manish Chopra (4):
netxen_nic: Fix mismatched board type error
netxen_nic: Log driver version with firmware version
netxen_nic: Add module parameters support to switch interrupts
netxen_nic: Bump up the version to 4.0.81
^ permalink raw reply
* [PATCH net-next 2/4] netxen_nic: Log driver version with firmware version
From: Manish Chopra @ 2013-04-05 12:36 UTC (permalink / raw)
To: davem; +Cc: netdev, rajesh.borundia, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1365165403-18348-1-git-send-email-manish.chopra@qlogic.com>
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
---
.../net/ethernet/qlogic/netxen/netxen_nic_main.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
index 6049637..cde5109 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_main.c
@@ -862,9 +862,9 @@ netxen_check_options(struct netxen_adapter *adapter)
adapter->ahw.cut_through = (i & 0x8000) ? 1 : 0;
}
- dev_info(&pdev->dev, "firmware v%d.%d.%d [%s]\n",
- fw_major, fw_minor, fw_build,
- adapter->ahw.cut_through ? "cut-through" : "legacy");
+ dev_info(&pdev->dev, "Driver v%s, firmware v%d.%d.%d [%s]\n",
+ NETXEN_NIC_LINUX_VERSIONID, fw_major, fw_minor, fw_build,
+ adapter->ahw.cut_through ? "cut-through" : "legacy");
if (adapter->fw_version >= NETXEN_VERSION_CODE(4, 0, 222))
adapter->capabilities = NXRD32(adapter, CRB_FW_CAPABILITIES_1);
--
1.5.6
^ permalink raw reply related
* [PATCH net-next 4/4] netxen_nic: Bump up the version to 4.0.81
From: Manish Chopra @ 2013-04-05 12:36 UTC (permalink / raw)
To: davem; +Cc: netdev, rajesh.borundia, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1365165403-18348-1-git-send-email-manish.chopra@qlogic.com>
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
---
drivers/net/ethernet/qlogic/netxen/netxen_nic.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic.h b/drivers/net/ethernet/qlogic/netxen/netxen_nic.h
index e5ab1ec..3fe09ab 100644
--- a/drivers/net/ethernet/qlogic/netxen/netxen_nic.h
+++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic.h
@@ -53,8 +53,8 @@
#define _NETXEN_NIC_LINUX_MAJOR 4
#define _NETXEN_NIC_LINUX_MINOR 0
-#define _NETXEN_NIC_LINUX_SUBVERSION 80
-#define NETXEN_NIC_LINUX_VERSIONID "4.0.80"
+#define _NETXEN_NIC_LINUX_SUBVERSION 81
+#define NETXEN_NIC_LINUX_VERSIONID "4.0.81"
#define NETXEN_VERSION_CODE(a, b, c) (((a) << 24) + ((b) << 16) + (c))
#define _major(v) (((v) >> 24) & 0xff)
--
1.5.6
^ permalink raw reply related
* Re: bnx2x multicast packet loss on igmp join/leave
From: Dmitry Kravkov @ 2013-04-05 12:28 UTC (permalink / raw)
To: vincent Richard; +Cc: netdev@vger.kernel.org
In-Reply-To: <CABVbdp_R7H5tWfFUF_8c=4GWLC80Vr8SajZskc8C51Hduue49Q@mail.gmail.com>
On Fri, Apr 5, 2013 at 1:13 PM, vincent Richard
<vincent.richard449@gmail.com> wrote:
>
> I all,
>
> I am using a HP server with bnx2x ethernet cards in it.
> I have compiled the latest linux 3.2 version, and I am encountering
> the following error :
>
> when joining or leaving a multicast group, other multicast streams
> lose some packets.
>
> I tried latest firmware and different drivers (latest HP recommended
> one (redhat one)) but still the same..
>
> I observe that putting the interface in promiscuous mode make the
> problem disappear.
>
> looking around bnx2x_set_rx_mode, i notice that there is 3 rx_mode,
> NORMAL, PROMISCUOUS and ALLMULTI . forcing other than normal mode
> correct my issue..
>
> My bnx2x is in a blade center so there is a switch just after. So
> forcing PROMISCUOUS or ALLMULTI mode have no major impact.
>
> To reproduce the problem I just "listen" 2 multicast streams and
> join/leave another one periodically.
>
> Thanks in advance for your help.
> Regards,
> Vincent
This because on MC group change, bnx2x first removes the devices from
ALL MC groups and then
add the device to newly configured groups:
11112 /* first, clear all configured multicast MACs */
11113 rc = bnx2x_config_mcast(bp, &rparam, BNX2X_MCAST_CMD_DEL);
11114 if (rc < 0) {
11115 BNX2X_ERR("Failed to clear multicast
configuration: %d\n", rc);
11116 return rc;
11117 }
11118
11119 /* then, configure a new MACs list */
11120 if (netdev_mc_count(dev)) {
11121 rc = bnx2x_init_mcast_macs_list(bp, &rparam);
^ permalink raw reply
* [Patch net-next v2 4/4] ipv6: Add generic UDP Tunnel segmentation
From: Cong Wang @ 2013-04-05 12:16 UTC (permalink / raw)
To: netdev
Cc: Jesse Gross, Pravin B Shelar, Stephen Hemminger, David S. Miller,
Cong Wang
In-Reply-To: <1365164186-21719-1-git-send-email-amwang@redhat.com>
From: Cong Wang <amwang@redhat.com>
Similar to commit 731362674580cb0c696cd1b1a03d8461a10cf90a
(tunneling: Add generic Tunnel segmentation)
This patch adds generic tunneling offloading support for IPv6-UDP based
tunnels.
This can be used by tunneling protocols like VXLAN.
Cc: Jesse Gross <jesse@nicira.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
net/ipv6/ip6_offload.c | 4 +-
net/ipv6/udp_offload.c | 155 +++++++++++++++++++++++++++++++++---------------
2 files changed, 110 insertions(+), 49 deletions(-)
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 71b766e..f031ccf 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -91,6 +91,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
unsigned int unfrag_ip6hlen;
u8 *prevhdr;
int offset = 0;
+ bool tunnel;
if (unlikely(skb_shinfo(skb)->gso_type &
~(SKB_GSO_UDP |
@@ -105,6 +106,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
if (unlikely(!pskb_may_pull(skb, sizeof(*ipv6h))))
goto out;
+ tunnel = !!skb->encapsulation;
ipv6h = ipv6_hdr(skb);
__skb_pull(skb, sizeof(*ipv6h));
segs = ERR_PTR(-EPROTONOSUPPORT);
@@ -125,7 +127,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
ipv6h = ipv6_hdr(skb);
ipv6h->payload_len = htons(skb->len - skb->mac_len -
sizeof(*ipv6h));
- if (proto == IPPROTO_UDP) {
+ if (!tunnel && proto == IPPROTO_UDP) {
unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
fptr = (struct frag_hdr *)(skb_network_header(skb) +
unfrag_ip6hlen);
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 3bb3a89..bbde7ba 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -21,26 +21,81 @@ static int udp6_ufo_send_check(struct sk_buff *skb)
const struct ipv6hdr *ipv6h;
struct udphdr *uh;
- /* UDP Tunnel offload on ipv6 is not yet supported. */
- if (skb->encapsulation)
- return -EINVAL;
-
if (!pskb_may_pull(skb, sizeof(*uh)))
return -EINVAL;
- ipv6h = ipv6_hdr(skb);
- uh = udp_hdr(skb);
+ if (likely(!skb->encapsulation)) {
+ ipv6h = ipv6_hdr(skb);
+ uh = udp_hdr(skb);
+
+ uh->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, skb->len,
+ IPPROTO_UDP, 0);
+ skb->csum_start = skb_transport_header(skb) - skb->head;
+ skb->csum_offset = offsetof(struct udphdr, check);
+ skb->ip_summed = CHECKSUM_PARTIAL;
+ }
- uh->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, skb->len,
- IPPROTO_UDP, 0);
- skb->csum_start = skb_transport_header(skb) - skb->head;
- skb->csum_offset = offsetof(struct udphdr, check);
- skb->ip_summed = CHECKSUM_PARTIAL;
return 0;
}
+static struct sk_buff *skb_udp6_tunnel_segment(struct sk_buff *skb,
+ netdev_features_t features)
+{
+ struct sk_buff *segs = ERR_PTR(-EINVAL);
+ int mac_len = skb->mac_len;
+ int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb);
+ int outer_hlen;
+ netdev_features_t enc_features;
+
+ if (unlikely(!pskb_may_pull(skb, tnl_hlen)))
+ goto out;
+
+ skb->encapsulation = 0;
+ __skb_pull(skb, tnl_hlen);
+ skb_reset_mac_header(skb);
+ skb_set_network_header(skb, skb_inner_network_offset(skb));
+ skb->mac_len = skb_inner_network_offset(skb);
+
+ /* segment inner packet. */
+ enc_features = skb->dev->hw_enc_features & netif_skb_features(skb);
+ segs = skb_mac_gso_segment(skb, enc_features);
+ if (!segs || IS_ERR(segs))
+ goto out;
+
+ outer_hlen = skb_tnl_header_len(skb);
+ skb = segs;
+ do {
+ struct udphdr *uh;
+ int udp_offset = outer_hlen - tnl_hlen;
+
+ skb->mac_len = mac_len;
+
+ skb_push(skb, outer_hlen);
+ skb_reset_mac_header(skb);
+ skb_set_network_header(skb, mac_len);
+ skb_set_transport_header(skb, udp_offset);
+ uh = udp_hdr(skb);
+ uh->len = htons(skb->len - udp_offset);
+
+ /* csum segment if tunnel sets skb with csum. */
+ if (unlikely(uh->check)) {
+ struct ipv6hdr *iph = ipv6_hdr(skb);
+
+ uh->check = csum_ipv6_magic(&iph->saddr, &iph->daddr,
+ skb->len - udp_offset,
+ IPPROTO_UDP, 0);
+ if (uh->check == 0)
+ uh->check = CSUM_MANGLED_0;
+
+ }
+ skb->ip_summed = CHECKSUM_NONE;
+ } while ((skb = skb->next));
+out:
+ return segs;
+}
+
static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
- netdev_features_t features)
+ netdev_features_t features)
{
struct sk_buff *segs = ERR_PTR(-EINVAL);
unsigned int mss;
@@ -73,43 +128,47 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
goto out;
}
- /* Do software UFO. Complete and fill in the UDP checksum as HW cannot
- * do checksum of UDP packets sent as multiple IP fragments.
- */
- offset = skb_checksum_start_offset(skb);
- csum = skb_checksum(skb, offset, skb->len - offset, 0);
- offset += skb->csum_offset;
- *(__sum16 *)(skb->data + offset) = csum_fold(csum);
- skb->ip_summed = CHECKSUM_NONE;
-
- /* Check if there is enough headroom to insert fragment header. */
- if ((skb_mac_header(skb) < skb->head + frag_hdr_sz) &&
- pskb_expand_head(skb, frag_hdr_sz, 0, GFP_ATOMIC))
- goto out;
+ if (skb->encapsulation && skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL)
+ segs = skb_udp6_tunnel_segment(skb, features);
+ else {
+ /* Do software UFO. Complete and fill in the UDP checksum as HW cannot
+ * do checksum of UDP packets sent as multiple IP fragments.
+ */
+ offset = skb_checksum_start_offset(skb);
+ csum = skb_checksum(skb, offset, skb->len - offset, 0);
+ offset += skb->csum_offset;
+ *(__sum16 *)(skb->data + offset) = csum_fold(csum);
+ skb->ip_summed = CHECKSUM_NONE;
+
+ /* Check if there is enough headroom to insert fragment header. */
+ if ((skb_mac_header(skb) < skb->head + frag_hdr_sz) &&
+ pskb_expand_head(skb, frag_hdr_sz, 0, GFP_ATOMIC))
+ goto out;
- /* Find the unfragmentable header and shift it left by frag_hdr_sz
- * bytes to insert fragment header.
- */
- unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
- nexthdr = *prevhdr;
- *prevhdr = NEXTHDR_FRAGMENT;
- unfrag_len = skb_network_header(skb) - skb_mac_header(skb) +
- unfrag_ip6hlen;
- mac_start = skb_mac_header(skb);
- memmove(mac_start-frag_hdr_sz, mac_start, unfrag_len);
-
- skb->mac_header -= frag_hdr_sz;
- skb->network_header -= frag_hdr_sz;
-
- fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
- fptr->nexthdr = nexthdr;
- fptr->reserved = 0;
- ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb));
-
- /* Fragment the skb. ipv6 header and the remaining fields of the
- * fragment header are updated in ipv6_gso_segment()
- */
- segs = skb_segment(skb, features);
+ /* Find the unfragmentable header and shift it left by frag_hdr_sz
+ * bytes to insert fragment header.
+ */
+ unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
+ nexthdr = *prevhdr;
+ *prevhdr = NEXTHDR_FRAGMENT;
+ unfrag_len = skb_network_header(skb) - skb_mac_header(skb) +
+ unfrag_ip6hlen;
+ mac_start = skb_mac_header(skb);
+ memmove(mac_start-frag_hdr_sz, mac_start, unfrag_len);
+
+ skb->mac_header -= frag_hdr_sz;
+ skb->network_header -= frag_hdr_sz;
+
+ fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
+ fptr->nexthdr = nexthdr;
+ fptr->reserved = 0;
+ ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb));
+
+ /* Fragment the skb. ipv6 header and the remaining fields of the
+ * fragment header are updated in ipv6_gso_segment()
+ */
+ segs = skb_segment(skb, features);
+ }
out:
return segs;
--
1.7.7.6
^ permalink raw reply related
* [PATCH iproute2] vxlan: add ipv6 support
From: Cong Wang @ 2013-04-05 12:16 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger, Cong Wang
In-Reply-To: <1365164186-21719-1-git-send-email-amwang@redhat.com>
From: Cong Wang <amwang@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
include/linux/if_link.h | 2 ++
ip/iplink_vxlan.c | 45 ++++++++++++++++++++++++++++++++++++++-------
2 files changed, 40 insertions(+), 7 deletions(-)
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 40167af..f74b8cc 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -306,6 +306,8 @@ enum {
IFLA_VXLAN_RSC,
IFLA_VXLAN_L2MISS,
IFLA_VXLAN_L3MISS,
+ IFLA_VXLAN_GROUP6,
+ IFLA_VXLAN_LOCAL6,
__IFLA_VXLAN_MAX
};
#define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
index 1025326..c10ec0f 100644
--- a/ip/iplink_vxlan.c
+++ b/ip/iplink_vxlan.c
@@ -42,6 +42,8 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
int vni_set = 0;
__u32 saddr = 0;
__u32 gaddr = 0;
+ struct in6_addr saddr6 = IN6ADDR_ANY_INIT;
+ struct in6_addr gaddr6 = IN6ADDR_ANY_INIT;
unsigned link = 0;
__u8 tos = 0;
__u8 ttl = 0;
@@ -65,15 +67,26 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
vni_set = 1;
} else if (!matches(*argv, "group")) {
NEXT_ARG();
- gaddr = get_addr32(*argv);
-
- if (!IN_MULTICAST(ntohl(gaddr)))
- invarg("invald group address", *argv);
+ if (!inet_pton(AF_INET, *argv, &gaddr)) {
+ if (!inet_pton(AF_INET6, *argv, &gaddr6)) {
+ fprintf(stderr, "Invalid address \"%s\"\n", *argv);
+ return -1;
+ } else if (!IN6_IS_ADDR_MULTICAST(&gaddr6))
+ invarg("invald group address", *argv);
+ } else if (!IN_MULTICAST(ntohl(gaddr)))
+ invarg("invald group address", *argv);
} else if (!matches(*argv, "local")) {
NEXT_ARG();
- if (strcmp(*argv, "any"))
- saddr = get_addr32(*argv);
- if (IN_MULTICAST(ntohl(saddr)))
+ if (strcmp(*argv, "any")) {
+ if (!inet_pton(AF_INET, *argv, &saddr)) {
+ if (!inet_pton(AF_INET6, *argv, &saddr6)) {
+ fprintf(stderr, "Invalid address \"%s\"\n", *argv);
+ return -1;
+ }
+ }
+ }
+
+ if (IN_MULTICAST(ntohl(saddr)) || IN6_IS_ADDR_MULTICAST(&saddr6))
invarg("invalid local address", *argv);
} else if (!matches(*argv, "dev")) {
NEXT_ARG();
@@ -163,8 +176,14 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
addattr32(n, 1024, IFLA_VXLAN_ID, vni);
if (gaddr)
addattr_l(n, 1024, IFLA_VXLAN_GROUP, &gaddr, 4);
+ else if (memcmp(&gaddr6, &in6addr_any, sizeof(gaddr6)) != 0)
+ addattr_l(n, 1024, IFLA_VXLAN_GROUP6, &gaddr6, sizeof(struct in6_addr));
+
if (saddr)
addattr_l(n, 1024, IFLA_VXLAN_LOCAL, &saddr, 4);
+ else if (memcmp(&saddr6, &in6addr_any, sizeof(saddr6)) != 0)
+ addattr_l(n, 1024, IFLA_VXLAN_LOCAL6, &saddr6, sizeof(struct in6_addr));
+
if (link)
addattr32(n, 1024, IFLA_VXLAN_LINK, link);
addattr8(n, 1024, IFLA_VXLAN_TTL, ttl);
@@ -211,6 +230,12 @@ static void vxlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
if (addr)
fprintf(f, "group %s ",
format_host(AF_INET, 4, &addr, s1, sizeof(s1)));
+ } else if (tb[IFLA_VXLAN_GROUP6]) {
+ struct in6_addr addr;
+ memcpy(&addr, RTA_DATA(tb[IFLA_VXLAN_GROUP6]), sizeof(struct in6_addr));
+ if (memcmp(&addr, &in6addr_any, sizeof(addr)) != 0)
+ fprintf(f, "group %s ",
+ format_host(AF_INET6, sizeof(struct in6_addr), &addr, s1, sizeof(s1)));
}
if (tb[IFLA_VXLAN_LOCAL]) {
@@ -218,6 +243,12 @@ static void vxlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
if (addr)
fprintf(f, "local %s ",
format_host(AF_INET, 4, &addr, s1, sizeof(s1)));
+ } else if (tb[IFLA_VXLAN_LOCAL6]) {
+ struct in6_addr addr;
+ memcpy(&addr, RTA_DATA(tb[IFLA_VXLAN_LOCAL6]), sizeof(struct in6_addr));
+ if (memcmp(&addr, &in6addr_any, sizeof(addr)) != 0)
+ fprintf(f, "local %s ",
+ format_host(AF_INET6, sizeof(struct in6_addr), &addr, s1, sizeof(s1)));
}
if (tb[IFLA_VXLAN_LINK] &&
--
1.7.7.6
^ permalink raw reply related
* [Patch net-next v2 3/4] vxlan: add ipv6 support
From: Cong Wang @ 2013-04-05 12:16 UTC (permalink / raw)
To: netdev; +Cc: David Stevens, Stephen Hemminger, David S. Miller, Cong Wang
In-Reply-To: <1365164186-21719-1-git-send-email-amwang@redhat.com>
From: Cong Wang <amwang@redhat.com>
v2: fix some compile error when !CONFIG_IPV6
improve some code based by Stephen's comments
use sockaddr suggested by David
This patch adds IPv6 support to vxlan device, as the new version
RFC already mentioned it:
http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03
Cc: David Stevens <dlstevens@us.ibm.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
drivers/net/vxlan.c | 573 +++++++++++++++++++++++++++++++++---------
include/uapi/linux/if_link.h | 2 +
2 files changed, 453 insertions(+), 122 deletions(-)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index cac4e4f..2a0f969 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -9,7 +9,6 @@
*
* TODO
* - use IANA UDP port number (when defined)
- * - IPv6 (not in RFC)
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -42,6 +41,11 @@
#include <net/inet_ecn.h>
#include <net/net_namespace.h>
#include <net/netns/generic.h>
+#if IS_ENABLED(CONFIG_IPV6)
+#include <net/addrconf.h>
+#include <net/ip6_route.h>
+#include <net/ip6_tunnel.h>
+#endif
#define VXLAN_VERSION "0.1"
@@ -56,6 +60,7 @@
#define VXLAN_VID_MASK (VXLAN_N_VID - 1)
/* IP header + UDP + VXLAN + Ethernet header */
#define VXLAN_HEADROOM (20 + 8 + 8 + 14)
+#define VXLAN6_HEADROOM (40 + 8 + 8 + 14)
#define VXLAN_FLAGS 0x08000000 /* struct vxlanhdr.vx_flags required value. */
@@ -81,9 +86,22 @@ struct vxlan_net {
struct hlist_head vni_list[VNI_HASH_SIZE];
};
+struct vxlan_ip {
+ union {
+ struct sockaddr_in sin;
+#if IS_ENABLED(CONFIG_IPV6)
+ struct sockaddr_in6 sin6;
+#endif
+ struct sockaddr sa;
+ } u;
+#define ip4 u.sin.sin_addr.s_addr
+#define ip6 u.sin6.sin6_addr
+#define proto u.sa.sa_family
+};
+
struct vxlan_rdst {
struct rcu_head rcu;
- __be32 remote_ip;
+ struct vxlan_ip remote_ip;
__be16 remote_port;
u32 remote_vni;
u32 remote_ifindex;
@@ -106,8 +124,8 @@ struct vxlan_dev {
struct hlist_node hlist;
struct net_device *dev;
__u32 vni; /* virtual network id */
- __be32 gaddr; /* multicast group */
- __be32 saddr; /* source address */
+ struct vxlan_ip gaddr; /* multicast group */
+ struct vxlan_ip saddr; /* source address */
unsigned int link; /* link to multicast over */
__u16 port_min; /* source port range */
__u16 port_max;
@@ -130,6 +148,80 @@ struct vxlan_dev {
#define VXLAN_F_L2MISS 0x08
#define VXLAN_F_L3MISS 0x10
+#if IS_ENABLED(CONFIG_IPV6)
+static inline bool vxlan_ip_equal(const struct vxlan_ip *a, const struct vxlan_ip *b)
+{
+ if (a->proto != b->proto)
+ return false;
+ switch (a->proto) {
+ case AF_INET:
+ return a->ip4 == b->ip4;
+ case AF_INET6:
+ return ipv6_addr_equal(&a->ip6, &b->ip6);
+ }
+ return false;
+}
+
+static inline bool vxlan_ip_any(const struct vxlan_ip *ipa)
+{
+ if (ipa->proto == AF_INET)
+ return ipa->ip4 == htonl(INADDR_ANY);
+ else
+ return ipv6_addr_any(&ipa->ip6);
+}
+
+static int vxlan_nla_get_addr(struct vxlan_ip *ip, struct nlattr *nla)
+{
+ if (nla_len(nla) == sizeof(__be32)) {
+ ip->ip4 = nla_get_be32(nla);
+ ip->proto = AF_INET;
+ } else if (nla_len(nla) == sizeof(struct in6_addr)) {
+ nla_memcpy(&ip->ip6, nla, sizeof(struct in6_addr));
+ ip->proto = AF_INET6;
+ } else
+ return -EAFNOSUPPORT;
+ return 0;
+}
+
+static int vxlan_nla_put_addr(struct sk_buff *skb, int attr, const struct vxlan_ip *ip)
+{
+ if (ip->proto == AF_INET)
+ return nla_put_be32(skb, attr, ip->ip4);
+ else if (ip->proto == AF_INET6)
+ return nla_put(skb, attr, sizeof(struct in6_addr), &ip->ip6);
+ else
+ return -EAFNOSUPPORT;
+}
+#else
+static inline bool vxlan_ip_equal(const struct vxlan_ip *a, const struct vxlan_ip *b)
+{
+ return a->ip4 == b->ip4;
+}
+
+static inline bool vxlan_ip_any(const struct vxlan_ip *ipa)
+{
+ return ipa->ip4 == htonl(INADDR_ANY);
+}
+
+static int vxlan_nla_get_addr(struct vxlan_ip *ip, struct nlattr *nla)
+{
+ if (nla_len(nla) == sizeof(__be32)) {
+ ip->ip4 = nla_get_be32(nla);
+ ip->proto = AF_INET;
+ return 0;
+ } else
+ return -EAFNOSUPPORT;
+}
+
+static int vxlan_nla_put_addr(struct sk_buff *skb, int attr, const struct vxlan_ip *ip)
+{
+ if (ip->proto == AF_INET)
+ return nla_put_be32(skb, attr, ip->ip4);
+ else
+ return -EAFNOSUPPORT;
+}
+#endif
+
/* salt for hash table */
static u32 vxlan_salt __read_mostly;
@@ -176,7 +268,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
if (type == RTM_GETNEIGH) {
ndm->ndm_family = AF_INET;
- send_ip = rdst->remote_ip != htonl(INADDR_ANY);
+ send_ip = !vxlan_ip_any(&rdst->remote_ip);
send_eth = !is_zero_ether_addr(fdb->eth_addr);
} else
ndm->ndm_family = AF_BRIDGE;
@@ -188,7 +280,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
if (send_eth && nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->eth_addr))
goto nla_put_failure;
- if (send_ip && nla_put_be32(skb, NDA_DST, rdst->remote_ip))
+ if (send_ip && vxlan_nla_put_addr(skb, NDA_DST, &rdst->remote_ip))
goto nla_put_failure;
if (rdst->remote_port && rdst->remote_port != vxlan_port &&
@@ -220,7 +312,7 @@ static inline size_t vxlan_nlmsg_size(void)
{
return NLMSG_ALIGN(sizeof(struct ndmsg))
+ nla_total_size(ETH_ALEN) /* NDA_LLADDR */
- + nla_total_size(sizeof(__be32)) /* NDA_DST */
+ + nla_total_size(sizeof(struct in6_addr)) /* NDA_DST */
+ nla_total_size(sizeof(__be32)) /* NDA_PORT */
+ nla_total_size(sizeof(__be32)) /* NDA_VNI */
+ nla_total_size(sizeof(__u32)) /* NDA_IFINDEX */
@@ -253,14 +345,14 @@ errout:
rtnl_set_sk_err(net, RTNLGRP_NEIGH, err);
}
-static void vxlan_ip_miss(struct net_device *dev, __be32 ipa)
+static void vxlan_ip_miss(struct net_device *dev, struct vxlan_ip *ipa)
{
struct vxlan_dev *vxlan = netdev_priv(dev);
struct vxlan_fdb f;
memset(&f, 0, sizeof f);
f.state = NUD_STALE;
- f.remote.remote_ip = ipa; /* goes to NDA_DST */
+ f.remote.remote_ip = *ipa; /* goes to NDA_DST */
f.remote.remote_vni = VXLAN_N_VID;
vxlan_fdb_notify(vxlan, &f, RTM_GETNEIGH);
@@ -316,13 +408,13 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan,
/* Add/update destinations for multicast */
static int vxlan_fdb_append(struct vxlan_fdb *f,
- __be32 ip, __u32 port, __u32 vni, __u32 ifindex)
+ struct vxlan_ip *ip, __u32 port, __u32 vni, __u32 ifindex)
{
struct vxlan_rdst *rd_prev, *rd;
rd_prev = NULL;
for (rd = &f->remote; rd; rd = rd->remote_next) {
- if (rd->remote_ip == ip &&
+ if (vxlan_ip_equal(&rd->remote_ip, ip) &&
rd->remote_port == port &&
rd->remote_vni == vni &&
rd->remote_ifindex == ifindex)
@@ -332,7 +424,7 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
rd = kmalloc(sizeof(*rd), GFP_ATOMIC);
if (rd == NULL)
return -ENOBUFS;
- rd->remote_ip = ip;
+ rd->remote_ip = *ip;
rd->remote_port = port;
rd->remote_vni = vni;
rd->remote_ifindex = ifindex;
@@ -343,7 +435,7 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
/* Add new entry to forwarding table -- assumes lock held */
static int vxlan_fdb_create(struct vxlan_dev *vxlan,
- const u8 *mac, __be32 ip,
+ const u8 *mac, struct vxlan_ip *ip,
__u16 state, __u16 flags,
__u32 port, __u32 vni, __u32 ifindex)
{
@@ -383,7 +475,7 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
return -ENOMEM;
notify = 1;
- f->remote.remote_ip = ip;
+ f->remote.remote_ip = *ip;
f->remote.remote_port = port;
f->remote.remote_vni = vni;
f->remote.remote_ifindex = ifindex;
@@ -435,7 +527,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
{
struct vxlan_dev *vxlan = netdev_priv(dev);
struct net *net = dev_net(vxlan->dev);
- __be32 ip;
+ struct vxlan_ip ip;
u32 port, vni, ifindex;
int err;
@@ -448,10 +540,9 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
if (tb[NDA_DST] == NULL)
return -EINVAL;
- if (nla_len(tb[NDA_DST]) != sizeof(__be32))
- return -EAFNOSUPPORT;
-
- ip = nla_get_be32(tb[NDA_DST]);
+ err = vxlan_nla_get_addr(&ip, tb[NDA_DST]);
+ if (err)
+ return err;
if (tb[NDA_PORT]) {
if (nla_len(tb[NDA_PORT]) != sizeof(u32))
@@ -481,7 +572,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
ifindex = 0;
spin_lock_bh(&vxlan->hash_lock);
- err = vxlan_fdb_create(vxlan, addr, ip, ndm->ndm_state, flags, port,
+ err = vxlan_fdb_create(vxlan, addr, &ip, ndm->ndm_state, flags, port,
vni, ifindex);
spin_unlock_bh(&vxlan->hash_lock);
@@ -545,7 +636,7 @@ skip:
* and Tunnel endpoint.
*/
static void vxlan_snoop(struct net_device *dev,
- __be32 src_ip, const u8 *src_mac)
+ struct vxlan_ip *src_ip, const u8 *src_mac)
{
struct vxlan_dev *vxlan = netdev_priv(dev);
struct vxlan_fdb *f;
@@ -554,7 +645,7 @@ static void vxlan_snoop(struct net_device *dev,
f = vxlan_find_mac(vxlan, src_mac);
if (likely(f)) {
f->used = jiffies;
- if (likely(f->remote.remote_ip == src_ip))
+ if (likely(vxlan_ip_equal(&f->remote.remote_ip, src_ip)))
return;
if (net_ratelimit())
@@ -562,7 +653,7 @@ static void vxlan_snoop(struct net_device *dev,
"%pM migrated from %pI4 to %pI4\n",
src_mac, &f->remote.remote_ip, &src_ip);
- f->remote.remote_ip = src_ip;
+ f->remote.remote_ip = *src_ip;
f->updated = jiffies;
} else {
/* learned new entry */
@@ -591,7 +682,7 @@ static bool vxlan_group_used(struct vxlan_net *vn,
if (!netif_running(vxlan->dev))
continue;
- if (vxlan->gaddr == this->gaddr)
+ if (vxlan_ip_equal(&vxlan->gaddr, &this->gaddr))
return true;
}
@@ -605,7 +696,7 @@ static int vxlan_join_group(struct net_device *dev)
struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
struct sock *sk = vn->sock->sk;
struct ip_mreqn mreq = {
- .imr_multiaddr.s_addr = vxlan->gaddr,
+ .imr_multiaddr.s_addr = vxlan->gaddr.ip4,
.imr_ifindex = vxlan->link,
};
int err;
@@ -617,7 +708,14 @@ static int vxlan_join_group(struct net_device *dev)
/* Need to drop RTNL to call multicast join */
rtnl_unlock();
lock_sock(sk);
+#if IS_ENABLED(CONFIG_IPV6)
+ if (vxlan->gaddr.proto == AF_INET)
+ err = ip_mc_join_group(sk, &mreq);
+ else
+ err = ipv6_sock_mc_join(sk, vxlan->link, &vxlan->gaddr.ip6);
+#else
err = ip_mc_join_group(sk, &mreq);
+#endif
release_sock(sk);
rtnl_lock();
@@ -633,7 +731,7 @@ static int vxlan_leave_group(struct net_device *dev)
int err = 0;
struct sock *sk = vn->sock->sk;
struct ip_mreqn mreq = {
- .imr_multiaddr.s_addr = vxlan->gaddr,
+ .imr_multiaddr.s_addr = vxlan->gaddr.ip4,
.imr_ifindex = vxlan->link,
};
@@ -644,7 +742,14 @@ static int vxlan_leave_group(struct net_device *dev)
/* Need to drop RTNL to call multicast leave */
rtnl_unlock();
lock_sock(sk);
+#if IS_ENABLED(CONFIG_IPV6)
+ if (vxlan->gaddr.proto == AF_INET)
+ err = ip_mc_leave_group(sk, &mreq);
+ else
+ err = ipv6_sock_mc_drop(sk, vxlan->link, &vxlan->gaddr.ip6);
+#else
err = ip_mc_leave_group(sk, &mreq);
+#endif
release_sock(sk);
rtnl_lock();
@@ -654,10 +759,14 @@ static int vxlan_leave_group(struct net_device *dev)
/* Callback from net/ipv4/udp.c to receive packets */
static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
{
- struct iphdr *oip;
+ struct iphdr *oip = NULL;
+#if IS_ENABLED(CONFIG_IPV6)
+ struct ipv6hdr *oip6 = NULL;
+#endif
struct vxlanhdr *vxh;
struct vxlan_dev *vxlan;
struct pcpu_tstats *stats;
+ struct vxlan_ip src_ip;
__u32 vni;
int err;
@@ -696,7 +805,13 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
skb_reset_mac_header(skb);
/* Re-examine inner Ethernet packet */
- oip = ip_hdr(skb);
+ if (skb->protocol == htons(ETH_P_IP))
+ oip = ip_hdr(skb);
+#if IS_ENABLED(CONFIG_IPV6)
+ else
+ oip6 = ipv6_hdr(skb);
+#endif
+
skb->protocol = eth_type_trans(skb, vxlan->dev);
/* Ignore packet loops (and multicast echo) */
@@ -704,8 +819,19 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
vxlan->dev->dev_addr) == 0)
goto drop;
- if (vxlan->flags & VXLAN_F_LEARN)
- vxlan_snoop(skb->dev, oip->saddr, eth_hdr(skb)->h_source);
+ if (vxlan->flags & VXLAN_F_LEARN) {
+ if (oip) {
+ src_ip.ip4 = oip->saddr;
+ src_ip.proto = AF_INET;
+ }
+#if IS_ENABLED(CONFIG_IPV6)
+ if (oip6) {
+ src_ip.ip6 = oip6->saddr;
+ src_ip.proto = AF_INET6;
+ }
+#endif
+ vxlan_snoop(skb->dev, &src_ip, eth_hdr(skb)->h_source);
+ }
__skb_tunnel_rx(skb, vxlan->dev);
skb_reset_network_header(skb);
@@ -721,15 +847,32 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
skb->encapsulation = 0;
- err = IP_ECN_decapsulate(oip, skb);
- if (unlikely(err)) {
- if (log_ecn_error)
- net_info_ratelimited("non-ECT from %pI4 with TOS=%#x\n",
- &oip->saddr, oip->tos);
- if (err > 1) {
- ++vxlan->dev->stats.rx_frame_errors;
- ++vxlan->dev->stats.rx_errors;
- goto drop;
+#if IS_ENABLED(CONFIG_IPV6)
+ if (oip6) {
+ err = IP6_ECN_decapsulate(oip6, skb);
+ if (unlikely(err)) {
+ if (log_ecn_error)
+ net_info_ratelimited("non-ECT from %pI4\n",
+ &oip6->saddr);
+ if (err > 1) {
+ ++vxlan->dev->stats.rx_frame_errors;
+ ++vxlan->dev->stats.rx_errors;
+ goto drop;
+ }
+ }
+ }
+#endif
+ if (oip) {
+ err = IP_ECN_decapsulate(oip, skb);
+ if (unlikely(err)) {
+ if (log_ecn_error)
+ net_info_ratelimited("non-ECT from %pI4 with TOS=%#x\n",
+ &oip->saddr, oip->tos);
+ if (err > 1) {
+ ++vxlan->dev->stats.rx_frame_errors;
+ ++vxlan->dev->stats.rx_errors;
+ goto drop;
+ }
}
}
@@ -760,6 +903,7 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb)
u8 *arpptr, *sha;
__be32 sip, tip;
struct neighbour *n;
+ struct vxlan_ip ipa;
if (dev->flags & IFF_NOARP)
goto out;
@@ -801,7 +945,7 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb)
}
f = vxlan_find_mac(vxlan, n->ha);
- if (f && f->remote.remote_ip == htonl(INADDR_ANY)) {
+ if (f && vxlan_ip_any(&f->remote.remote_ip)) {
/* bridge-local neighbor */
neigh_release(n);
goto out;
@@ -819,8 +963,11 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb)
if (netif_rx_ni(reply) == NET_RX_DROP)
dev->stats.rx_dropped++;
- } else if (vxlan->flags & VXLAN_F_L3MISS)
- vxlan_ip_miss(dev, tip);
+ } else if (vxlan->flags & VXLAN_F_L3MISS) {
+ ipa.ip4 = tip;
+ ipa.proto = AF_INET;
+ vxlan_ip_miss(dev, &ipa);
+ }
out:
consume_skb(skb);
return NETDEV_TX_OK;
@@ -842,6 +989,14 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
return false;
pip = ip_hdr(skb);
n = neigh_lookup(&arp_tbl, &pip->daddr, dev);
+ if (!n && vxlan->flags & VXLAN_F_L3MISS) {
+ struct vxlan_ip ipa;
+ ipa.ip4 = pip->daddr;
+ ipa.proto = AF_INET;
+ vxlan_ip_miss(dev, &ipa);
+ return false;
+ }
+
break;
default:
return false;
@@ -858,8 +1013,8 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
}
neigh_release(n);
return diff;
- } else if (vxlan->flags & VXLAN_F_L3MISS)
- vxlan_ip_miss(dev, pip->daddr);
+ }
+
return false;
}
@@ -869,7 +1024,8 @@ static void vxlan_sock_free(struct sk_buff *skb)
}
/* On transmit, associate with the tunnel socket */
-static void vxlan_set_owner(struct net_device *dev, struct sk_buff *skb)
+static inline void vxlan_set_owner(struct net_device *dev,
+ struct sk_buff *skb)
{
struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
struct sock *sk = vn->sock->sk;
@@ -917,23 +1073,30 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
{
struct vxlan_dev *vxlan = netdev_priv(dev);
struct rtable *rt;
- const struct iphdr *old_iph;
+ const struct iphdr *old_iph = NULL;
struct iphdr *iph;
struct vxlanhdr *vxh;
struct udphdr *uh;
struct flowi4 fl4;
+#if IS_ENABLED(CONFIG_IPV6)
+ struct flowi6 fl6;
+ struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
+ struct sock *sk = vn->sock->sk;
+ struct ipv6hdr *ip6h;
+#endif
unsigned int pkt_len = skb->len;
- __be32 dst;
- __u16 src_port, dst_port;
+ const struct vxlan_ip *dst;
+ struct dst_entry *ndst = NULL;
+ __u16 src_port = 0, dst_port;
u32 vni;
__be16 df = 0;
__u8 tos, ttl;
dst_port = rdst->remote_port ? rdst->remote_port : vxlan_port;
vni = rdst->remote_vni;
- dst = rdst->remote_ip;
+ dst = &rdst->remote_ip;
- if (!dst) {
+ if (vxlan_ip_any(dst)) {
if (did_rsc) {
__skb_pull(skb, skb_network_offset(skb));
skb->ip_summed = CHECKSUM_NONE;
@@ -961,47 +1124,86 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
skb->encapsulation = 1;
}
- /* Need space for new headers (invalidates iph ptr) */
- if (skb_cow_head(skb, VXLAN_HEADROOM))
- goto drop;
+ ttl = vxlan->ttl;
+ tos = vxlan->tos;
+ if (dst->proto == AF_INET) {
+ /* Need space for new headers (invalidates iph ptr) */
+ if (skb_cow_head(skb, VXLAN_HEADROOM))
+ goto drop;
- old_iph = ip_hdr(skb);
+ old_iph = ip_hdr(skb);
+ if (!ttl && IN_MULTICAST(ntohl(dst->ip4)))
+ ttl = 1;
- ttl = vxlan->ttl;
- if (!ttl && IN_MULTICAST(ntohl(dst)))
- ttl = 1;
+ if (tos == 1)
+ tos = ip_tunnel_get_dsfield(old_iph, skb);
- tos = vxlan->tos;
- if (tos == 1)
- tos = ip_tunnel_get_dsfield(old_iph, skb);
-
- src_port = vxlan_src_port(vxlan, skb);
-
- memset(&fl4, 0, sizeof(fl4));
- fl4.flowi4_oif = rdst->remote_ifindex;
- fl4.flowi4_tos = RT_TOS(tos);
- fl4.daddr = dst;
- fl4.saddr = vxlan->saddr;
-
- rt = ip_route_output_key(dev_net(dev), &fl4);
- if (IS_ERR(rt)) {
- netdev_dbg(dev, "no route to %pI4\n", &dst);
- dev->stats.tx_carrier_errors++;
- goto tx_error;
- }
+ src_port = vxlan_src_port(vxlan, skb);
+
+ memset(&fl4, 0, sizeof(fl4));
+ fl4.flowi4_oif = rdst->remote_ifindex;
+ fl4.flowi4_tos = RT_TOS(tos);
+ fl4.daddr = dst->ip4;
+ fl4.saddr = vxlan->saddr.ip4;
+
+ rt = ip_route_output_key(dev_net(dev), &fl4);
+ if (IS_ERR(rt)) {
+ netdev_dbg(dev, "no route to %pI4\n", &dst->ip4);
+ dev->stats.tx_carrier_errors++;
+ goto tx_error;
+ }
+
+ if (rt->dst.dev == dev) {
+ netdev_dbg(dev, "circular route to %pI4\n", &dst->ip4);
+ ip_rt_put(rt);
+ dev->stats.collisions++;
+ goto tx_error;
+ }
+ ndst = &rt->dst;
+ memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
+ } else {
+#if IS_ENABLED(CONFIG_IPV6)
+ const struct ipv6hdr *old_iph6;
+
+ /* Need space for new headers (invalidates iph ptr) */
+ if (skb_cow_head(skb, VXLAN6_HEADROOM))
+ goto drop;
+
+ old_iph6 = ipv6_hdr(skb);
+ if (!ttl && ipv6_addr_is_multicast(&dst->ip6))
+ ttl = 1;
- if (rt->dst.dev == dev) {
- netdev_dbg(dev, "circular route to %pI4\n", &dst);
- ip_rt_put(rt);
- dev->stats.collisions++;
- goto tx_error;
+ if (tos == 1)
+ tos = ipv6_get_dsfield(old_iph6);
+
+ src_port = vxlan_src_port(vxlan, skb);
+
+ memset(&fl6, 0, sizeof(fl6));
+ fl6.flowi6_oif = vxlan->link;
+ fl6.flowi6_tos = RT_TOS(tos);
+ fl6.daddr = dst->ip6;
+ fl6.saddr = vxlan->saddr.ip6;
+ fl6.flowi6_proto = skb->protocol;
+
+ if (ip6_dst_lookup(sk, &ndst, &fl6)) {
+ netdev_dbg(dev, "no route to %pI6\n", &dst->ip6);
+ dev->stats.tx_carrier_errors++;
+ goto tx_error;
+ }
+
+ if (ndst->dev == dev) {
+ netdev_dbg(dev, "circular route to %pI6\n", &dst->ip6);
+ dst_release(ndst);
+ dev->stats.collisions++;
+ goto tx_error;
+ }
+#endif
}
- memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED |
IPSKB_REROUTED);
skb_dst_drop(skb);
- skb_dst_set(skb, &rt->dst);
+ skb_dst_set(skb, ndst);
vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
vxh->vx_flags = htonl(VXLAN_FLAGS);
@@ -1017,27 +1219,55 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
uh->len = htons(skb->len);
uh->check = 0;
- __skb_push(skb, sizeof(*iph));
- skb_reset_network_header(skb);
- iph = ip_hdr(skb);
- iph->version = 4;
- iph->ihl = sizeof(struct iphdr) >> 2;
- iph->frag_off = df;
- iph->protocol = IPPROTO_UDP;
- iph->tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
- iph->daddr = dst;
- iph->saddr = fl4.saddr;
- iph->ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
- tunnel_ip_select_ident(skb, old_iph, &rt->dst);
-
- nf_reset(skb);
+ if (dst->proto == AF_INET) {
+ __skb_push(skb, sizeof(*iph));
+ skb_reset_network_header(skb);
+ iph = ip_hdr(skb);
+ iph->version = 4;
+ iph->ihl = sizeof(struct iphdr) >> 2;
+ iph->frag_off = df;
+ iph->protocol = IPPROTO_UDP;
+ iph->tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
+ iph->daddr = dst->ip4;
+ iph->saddr = fl4.saddr;
+ iph->ttl = ttl ? : ip4_dst_hoplimit(ndst);
+ tunnel_ip_select_ident(skb, old_iph, ndst);
+ } else {
+#if IS_ENABLED(CONFIG_IPV6)
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ skb->csum_start = skb_transport_header(skb) - skb->head;
+ skb->csum_offset = offsetof(struct udphdr, check);
+ } else
+ uh->check = csum_ipv6_magic(&fl6.saddr, &fl6.daddr,
+ skb->len, IPPROTO_UDP,
+ csum_partial(uh, skb->len, 0));
+ __skb_push(skb, sizeof(*ip6h));
+ skb_reset_network_header(skb);
+ ip6h = ipv6_hdr(skb);
+ ip6h->version = 6;
+ ip6h->priority = 0;
+ ip6h->flow_lbl[0] = 0;
+ ip6h->flow_lbl[1] = 0;
+ ip6h->flow_lbl[2] = 0;
+ ip6h->payload_len = htons(skb->len);
+ ip6h->nexthdr = IPPROTO_UDP;
+ ip6h->hop_limit = ttl ? : ip6_dst_hoplimit(ndst);
+ ip6h->daddr = fl6.daddr;
+ ip6h->saddr = fl6.saddr;
+#endif
+ }
vxlan_set_owner(dev, skb);
if (handle_offloads(skb))
goto drop;
- iptunnel_xmit(skb, dev);
+ if (dst->proto == AF_INET)
+ iptunnel_xmit(skb, dev);
+#if IS_ENABLED(CONFIG_IPV6)
+ else
+ ip6tunnel_xmit(skb, dev);
+#endif
return NETDEV_TX_OK;
drop:
@@ -1084,7 +1314,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
group.remote_next = 0;
rdst0 = &group;
- if (group.remote_ip == htonl(INADDR_ANY) &&
+ if (vxlan_ip_any(&group.remote_ip) &&
(vxlan->flags & VXLAN_F_L2MISS) &&
!is_multicast_ether_addr(eth->h_dest))
vxlan_fdb_miss(vxlan, eth->h_dest);
@@ -1162,7 +1392,7 @@ static int vxlan_open(struct net_device *dev)
struct vxlan_dev *vxlan = netdev_priv(dev);
int err;
- if (vxlan->gaddr) {
+ if (!vxlan_ip_any(&vxlan->gaddr)) {
err = vxlan_join_group(dev);
if (err)
return err;
@@ -1196,7 +1426,7 @@ static int vxlan_stop(struct net_device *dev)
{
struct vxlan_dev *vxlan = netdev_priv(dev);
- if (vxlan->gaddr)
+ if (!vxlan_ip_any(&vxlan->gaddr))
vxlan_leave_group(dev);
del_timer_sync(&vxlan->age_timer);
@@ -1246,7 +1476,10 @@ static void vxlan_setup(struct net_device *dev)
eth_hw_addr_random(dev);
ether_setup(dev);
- dev->hard_header_len = ETH_HLEN + VXLAN_HEADROOM;
+ if (vxlan->gaddr.proto == AF_INET)
+ dev->hard_header_len = ETH_HLEN + VXLAN_HEADROOM;
+ else
+ dev->hard_header_len = ETH_HLEN + VXLAN6_HEADROOM;
dev->netdev_ops = &vxlan_netdev_ops;
dev->destructor = vxlan_free;
@@ -1283,8 +1516,10 @@ static void vxlan_setup(struct net_device *dev)
static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
[IFLA_VXLAN_ID] = { .type = NLA_U32 },
[IFLA_VXLAN_GROUP] = { .len = FIELD_SIZEOF(struct iphdr, daddr) },
+ [IFLA_VXLAN_GROUP6] = { .len = sizeof(struct in6_addr) },
[IFLA_VXLAN_LINK] = { .type = NLA_U32 },
[IFLA_VXLAN_LOCAL] = { .len = FIELD_SIZEOF(struct iphdr, saddr) },
+ [IFLA_VXLAN_LOCAL6] = { .len = sizeof(struct in6_addr) },
[IFLA_VXLAN_TOS] = { .type = NLA_U8 },
[IFLA_VXLAN_TTL] = { .type = NLA_U8 },
[IFLA_VXLAN_LEARNING] = { .type = NLA_U8 },
@@ -1326,6 +1561,17 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
pr_debug("group address is not IPv4 multicast\n");
return -EADDRNOTAVAIL;
}
+ } else if (data[IFLA_VXLAN_GROUP6]) {
+#if IS_ENABLED(CONFIG_IPV6)
+ struct in6_addr gaddr;
+ nla_memcpy(&gaddr, data[IFLA_VXLAN_GROUP6], sizeof(struct in6_addr));
+ if (!ipv6_addr_is_multicast(&gaddr)) {
+ pr_debug("group address is not IPv6 multicast\n");
+ return -EADDRNOTAVAIL;
+ }
+#else
+ return -EPFNOSUPPORT;
+#endif
}
if (data[IFLA_VXLAN_PORT_RANGE]) {
@@ -1371,11 +1617,29 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
}
vxlan->vni = vni;
- if (data[IFLA_VXLAN_GROUP])
- vxlan->gaddr = nla_get_be32(data[IFLA_VXLAN_GROUP]);
+ if (data[IFLA_VXLAN_GROUP]) {
+ vxlan->gaddr.ip4 = nla_get_be32(data[IFLA_VXLAN_GROUP]);
+ vxlan->gaddr.proto = AF_INET;
+ } else if (data[IFLA_VXLAN_GROUP6]) {
+#if IS_ENABLED(CONFIG_IPV6)
+ nla_memcpy(&vxlan->gaddr.ip6, data[IFLA_VXLAN_GROUP6], sizeof(struct in6_addr));
+ vxlan->gaddr.proto = AF_INET6;
+#else
+ return -EPFNOSUPPORT;
+#endif
+ }
- if (data[IFLA_VXLAN_LOCAL])
- vxlan->saddr = nla_get_be32(data[IFLA_VXLAN_LOCAL]);
+ if (data[IFLA_VXLAN_LOCAL]) {
+ vxlan->saddr.ip4 = nla_get_be32(data[IFLA_VXLAN_GROUP]);
+ vxlan->saddr.proto = AF_INET;
+ } else if (data[IFLA_VXLAN_LOCAL6]) {
+#if IS_ENABLED(CONFIG_IPV6)
+ nla_memcpy(&vxlan->saddr.ip6, data[IFLA_VXLAN_GROUP6], sizeof(struct in6_addr));
+ vxlan->saddr.proto = AF_INET6;
+#else
+ return -EPFNOSUPPORT;
+#endif
+ }
if (data[IFLA_VXLAN_LINK] &&
(vxlan->link = nla_get_u32(data[IFLA_VXLAN_LINK]))) {
@@ -1453,9 +1717,9 @@ static size_t vxlan_get_size(const struct net_device *dev)
{
return nla_total_size(sizeof(__u32)) + /* IFLA_VXLAN_ID */
- nla_total_size(sizeof(__be32)) +/* IFLA_VXLAN_GROUP */
+ nla_total_size(sizeof(struct in6_addr)) + /* IFLA_VXLAN_GROUP{6} */
nla_total_size(sizeof(__u32)) + /* IFLA_VXLAN_LINK */
- nla_total_size(sizeof(__be32))+ /* IFLA_VXLAN_LOCAL */
+ nla_total_size(sizeof(struct in6_addr)) + /* IFLA_VXLAN_LOCAL{6} */
nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_TTL */
nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_TOS */
nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_LEARNING */
@@ -1480,14 +1744,32 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
if (nla_put_u32(skb, IFLA_VXLAN_ID, vxlan->vni))
goto nla_put_failure;
- if (vxlan->gaddr && nla_put_be32(skb, IFLA_VXLAN_GROUP, vxlan->gaddr))
- goto nla_put_failure;
+ if (!vxlan_ip_any(&vxlan->gaddr)) {
+ if (vxlan->gaddr.proto == AF_INET) {
+ if (nla_put_be32(skb, IFLA_VXLAN_GROUP, vxlan->gaddr.ip4))
+ goto nla_put_failure;
+ } else {
+#if IS_ENABLED(CONFIG_IPV6)
+ if (nla_put(skb, IFLA_VXLAN_GROUP6, sizeof(struct in6_addr), &vxlan->gaddr.ip6))
+ goto nla_put_failure;
+#endif
+ }
+ }
if (vxlan->link && nla_put_u32(skb, IFLA_VXLAN_LINK, vxlan->link))
goto nla_put_failure;
- if (vxlan->saddr && nla_put_be32(skb, IFLA_VXLAN_LOCAL, vxlan->saddr))
- goto nla_put_failure;
+ if (!vxlan_ip_any(&vxlan->saddr)) {
+ if (vxlan->saddr.proto == AF_INET) {
+ if (nla_put_be32(skb, IFLA_VXLAN_LOCAL, vxlan->saddr.ip4))
+ goto nla_put_failure;
+ } else {
+#if IS_ENABLED(CONFIG_IPV6)
+ if (nla_put(skb, IFLA_VXLAN_LOCAL6, sizeof(struct in6_addr), &vxlan->saddr.ip6))
+ goto nla_put_failure;
+#endif
+ }
+ }
if (nla_put_u8(skb, IFLA_VXLAN_TTL, vxlan->ttl) ||
nla_put_u8(skb, IFLA_VXLAN_TOS, vxlan->tos) ||
@@ -1526,38 +1808,82 @@ static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
.fill_info = vxlan_fill_info,
};
-static __net_init int vxlan_init_net(struct net *net)
+/* Create UDP socket for encapsulation receive. AF_INET6 sockets
+ * could be used for both IPv4 and IPv6 communications.
+ */
+#if IS_ENABLED(CONFIG_IPV6)
+static __net_init int create_sock(struct net *net, struct sock **sk)
+{
+ struct vxlan_net *vn = net_generic(net, vxlan_net_id);
+ struct sockaddr_in6 vxlan_addr = {
+ .sin6_family = AF_INET6,
+ .sin6_port = htons(vxlan_port),
+ };
+ int rc;
+
+ rc = sock_create_kern(AF_INET6, SOCK_DGRAM, IPPROTO_UDP, &vn->sock);
+ if (rc < 0) {
+ pr_debug("UDP socket create failed\n");
+ return rc;
+ }
+ /* Put in proper namespace */
+ *sk = vn->sock->sk;
+ sk_change_net(*sk, net);
+
+ rc = kernel_bind(vn->sock, (struct sockaddr *)&vxlan_addr,
+ sizeof(struct sockaddr_in6));
+ if (rc < 0) {
+ pr_debug("bind for UDP socket %pI6:%u (%d)\n",
+ &vxlan_addr.sin6_addr, ntohs(vxlan_addr.sin6_port), rc);
+ sk_release_kernel(*sk);
+ vn->sock = NULL;
+ return rc;
+ }
+ return 0;
+}
+#else
+static __net_init int create_sock(struct net *net, struct sock **sk)
{
struct vxlan_net *vn = net_generic(net, vxlan_net_id);
- struct sock *sk;
struct sockaddr_in vxlan_addr = {
.sin_family = AF_INET,
+ .sin_port = htons(vxlan_port),
.sin_addr.s_addr = htonl(INADDR_ANY),
};
int rc;
- unsigned h;
- /* Create UDP socket for encapsulation receive. */
rc = sock_create_kern(AF_INET, SOCK_DGRAM, IPPROTO_UDP, &vn->sock);
if (rc < 0) {
pr_debug("UDP socket create failed\n");
return rc;
}
/* Put in proper namespace */
- sk = vn->sock->sk;
- sk_change_net(sk, net);
+ *sk = vn->sock->sk;
+ sk_change_net(*sk, net);
- vxlan_addr.sin_port = htons(vxlan_port);
-
- rc = kernel_bind(vn->sock, (struct sockaddr *) &vxlan_addr,
- sizeof(vxlan_addr));
+ rc = kernel_bind(vn->sock, (struct sockaddr *)&vxlan_addr,
+ sizeof(struct sockaddr_in));
if (rc < 0) {
pr_debug("bind for UDP socket %pI4:%u (%d)\n",
&vxlan_addr.sin_addr, ntohs(vxlan_addr.sin_port), rc);
- sk_release_kernel(sk);
+ sk_release_kernel(*sk);
vn->sock = NULL;
return rc;
}
+ return 0;
+}
+#endif
+
+static __net_init int vxlan_init_net(struct net *net)
+{
+ struct vxlan_net *vn = net_generic(net, vxlan_net_id);
+ struct sock *sk;
+ int rc;
+ unsigned h;
+
+ rc = create_sock(net, &sk);
+ if (rc < 0)
+ return rc;
/* Disable multicast loopback */
inet_sk(sk)->mc_loop = 0;
@@ -1566,6 +1892,9 @@ static __net_init int vxlan_init_net(struct net *net)
udp_sk(sk)->encap_type = 1;
udp_sk(sk)->encap_rcv = vxlan_udp_encap_recv;
udp_encap_enable();
+#if IS_ENABLED(CONFIG_IPV6)
+ udpv6_encap_enable();
+#endif
for (h = 0; h < VNI_HASH_SIZE; ++h)
INIT_HLIST_HEAD(&vn->vni_list[h]);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index c4edfe1..0eee00f 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -308,6 +308,8 @@ enum {
IFLA_VXLAN_RSC,
IFLA_VXLAN_L2MISS,
IFLA_VXLAN_L3MISS,
+ IFLA_VXLAN_GROUP6,
+ IFLA_VXLAN_LOCAL6,
__IFLA_VXLAN_MAX
};
#define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
--
1.7.7.6
^ permalink raw reply related
* [Patch net-next v2 2/4] ipv6: export ipv6_sock_mc_join and ipv6_sock_mc_drop
From: Cong Wang @ 2013-04-05 12:16 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger, David S. Miller, Cong Wang
In-Reply-To: <1365164186-21719-1-git-send-email-amwang@redhat.com>
From: Cong Wang <amwang@redhat.com>
They will be used by vxlan module.
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
net/ipv6/mcast.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index bfa6cc3..d03426d 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -200,6 +200,7 @@ int ipv6_sock_mc_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
return 0;
}
+EXPORT_SYMBOL(ipv6_sock_mc_join);
/*
* socket leave on multicast group
@@ -246,6 +247,7 @@ int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr)
return -EADDRNOTAVAIL;
}
+EXPORT_SYMBOL(ipv6_sock_mc_drop);
/* called with rcu_read_lock() */
static struct inet6_dev *ip6_mc_find_dev_rcu(struct net *net,
--
1.7.7.6
^ permalink raw reply related
* [Patch net-next v2 1/4] vxlan: defer vxlan init as late as possible
From: Cong Wang @ 2013-04-05 12:16 UTC (permalink / raw)
To: netdev; +Cc: Stephen Hemminger, David S. Miller, Cong Wang
From: Cong Wang <amwang@redhat.com>
When vxlan is compiled as builtin, its init code
runs before IPv6 init, this could cause problems
if we create IPv6 socket in the latter patch.
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
---
drivers/net/vxlan.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 62a4438..cac4e4f 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1619,7 +1619,7 @@ out2:
out1:
return rc;
}
-module_init(vxlan_init_module);
+late_initcall(vxlan_init_module);
static void __exit vxlan_cleanup_module(void)
{
--
1.7.7.6
^ permalink raw reply related
* Re: hv_netvsc: WARNING in softirq.c
From: Richard Genoud @ 2013-04-05 10:59 UTC (permalink / raw)
To: Haiyang Zhang
Cc: KY Srinivasan, devel@linuxdriverproject.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <CACQ1gAiadLrJQkDFe7_TEM_1OVSjECyewmxHyXvOcYCc7GLn_Q@mail.gmail.com>
2013/3/19 Richard Genoud <richard.genoud@gmail.com>:
> 2013/3/7 Richard Genoud <richard.genoud@gmail.com>:
>> 2013/3/6 Haiyang Zhang <haiyangz@microsoft.com>:
>>> I have found a simple way to reproduce this kind of warning:
>>> 1) reboot the VM (because this warning can be displayed only once.)
>>> 2) login to the host and open the VM [Settings]
>>> 3) Temporarily change the Network adapter's option [Virtual Switch] to "Not connected".
>>> 4) run dmesg, you should see the warning.
>>>
>>> The reason for the warning is -- netif_tx_disable() is called when the NIC
>>> is disconnected. And it's called within irq context. netif_tx_disable()
>>> calls local_bh_enable() which displays warning if in irq.
>>>
>>> The fix is to remove the unnecessary netif_tx_disable() in the netvsc_linkstatus_callback().
>>> I attached a patch. Would you like to test it on your side as well?
>
> I installed the patched kernel today.
> here are the stats of the warings until today :
> kern.log.1:Mar 12 22:32:10 devlabo kernel: [30114.506299]
> ------------[ cut here ]------------
> kern.log.2:Mar 5 22:30:48 devlabo kernel: [10455.098586]
> ------------[ cut here ]------------
> kern.log.2:Mar 6 22:31:25 devlabo kernel: [85552.645480]
> ------------[ cut here ]------------
> kern.log.3:Feb 26 22:35:14 devlabo kernel: [37398.788119]
> ------------[ cut here ]------------
> kern.log.3:Feb 27 22:34:37 devlabo kernel: [10688.187196]
> ------------[ cut here ]------------
> kern.log.4:Feb 22 22:32:22 devlabo kernel: [40399.795364]
> ------------[ cut here ]------------
>
> So, I'll wait for one or two weeks before sending some feedback.
>
Ok, it has been a little bit more than 2 weeks now, and I did not see
this warning any more, just the
"hv_vmbus: child device vmbus_0_8 unregistered" sometimes.
so it seems to be fixed !
Thanks !
Reported-by: Richard Genoud <richard.genoud@gmail.com>
Tested-by: Richard Genoud <richard.genoud@gmail.com>
Regards,
Richard
^ permalink raw reply
* Re: [Linux-zigbee-devel] [PATCH 2/2] at86rf230: change irq handling to prevent lockups with edge type irq
From: Werner Almesberger @ 2013-04-05 10:51 UTC (permalink / raw)
To: Sascha Herrmann; +Cc: netdev, linux-zigbee-devel
In-Reply-To: <20130405035949.GC29789@ws>
I wrote:
> To achieve perfection, at86rf230_probe could try all four
> possible trigger modes, pick one the platform supports, and
> set TRX_CTRL_1.IRQ_POLARITY accordingly.
Thinking of it, probing by trying request_irq has an unpleasant ring
to it. Perhaps a better way would be to leave this decision to the
platform code and do one of these:
1) pass irqflags and the polarity in the platform data, or
2) pass irqflags and extract the polarity from the irqflags, or
3) set up the trigger mode outside the driver and pass only the
polarity,
where 1) with (irqflags & IRQF_TRIGGER_MASK) == 0 includes
case 3).
- Werner
^ permalink raw reply
* bnx2x multicast packet loss on igmp join/leave
From: vincent Richard @ 2013-04-05 10:13 UTC (permalink / raw)
To: netdev
I all,
I am using a HP server with bnx2x ethernet cards in it.
I have compiled the latest linux 3.2 version, and I am encountering
the following error :
when joining or leaving a multicast group, other multicast streams
lose some packets.
I tried latest firmware and different drivers (latest HP recommended
one (redhat one)) but still the same..
I observe that putting the interface in promiscuous mode make the
problem disappear.
looking around bnx2x_set_rx_mode, i notice that there is 3 rx_mode,
NORMAL, PROMISCUOUS and ALLMULTI . forcing other than normal mode
correct my issue..
My bnx2x is in a blade center so there is a switch just after. So
forcing PROMISCUOUS or ALLMULTI mode have no major impact.
To reproduce the problem I just "listen" 2 multicast streams and
join/leave another one periodically.
Thanks in advance for your help.
Regards,
Vincent
^ permalink raw reply
* Re: [PATCH 1/5 v2] mv643xx_eth: add Device Tree bindings
From: Florian Fainelli @ 2013-04-05 9:56 UTC (permalink / raw)
To: Simon Baatz
Cc: thomas.petazzoni, moinejf, jason, andrew, netdev,
devicetree-discuss, rob.herring, grant.likely, jogo,
linux-arm-kernel, jm, davem, buytenh, sebastian.hesselbarth
In-Reply-To: <20130404212906.GA25904@schnuecks.de>
Hello Simon,
First of all, thanks for getting these patches a try!
Le 04/04/13 23:29, Simon Baatz a écrit :
> Hi Florian
>
[snip]
>> if (!mv643xx_eth_version_printed++)
>> pr_notice("MV-643xx 10/100/1000 ethernet driver version %s\n",
>
> This is not related to your change, but there is a problem in this
> function that has already been discussed in the past if I remember
> correctly: The respective clock needs to be enabled here (at least
> on Kirkwood), since accesses to the hardware are done below.
> Enabling the clock only in mv643xx_eth_probe() is too late.
>
> As said, this is not a problem introduced by your changes (and which
> is currently circumvented by enabling the respective clocks in
> kirkwood_legacy_clk_init() and kirkwood_ge0x_init()), but we might
> want to fix this now to get rid of unconditionally enabling the GE
> clocks in the DT case.
I think there may have been some confusion between the "ethernet-group"
clock and the actual Ethernet port inside the "ethernet-group". The
mv643xx_eth driver assumes we have a per-port clock gating scheme, while
I think we have a per "ethernet-group" clock gating scheme instead. Like
you said, I think this should be addressed separately.
[snip]
>
> You don't change the clk initialization here:
>
> #if defined(CONFIG_HAVE_CLK)
> mp->clk = clk_get(&pdev->dev, (pdev->id ? "1" : "0"));
> if (!IS_ERR(mp->clk)) {
> clk_prepare_enable(mp->clk);
> mp->t_clk = clk_get_rate(mp->clk);
> }
> #endif
>
> Which, if I understand correctly, works in the DT case because you
> assign "clock-names" to the clocks in the DTS. However, I wonder
> whether this works for any but the first Ethernet device.
>
> In the old platform device setup, the pdev->id was set when
> initialiazing the platform_device structure in common.c. Where is
> this done in the DT case?
Looks like you are right, in the DT case, I assume that we should lookup
the clock using NULL instead of "1" or "0" so we match any clock instead
of a specific one.
[snip]
>
>
> In phy_scan(), the phy is searched like this:
>
> snprintf(phy_id, sizeof(phy_id), PHY_ID_FMT,
> "orion-mdio-mii", addr);
>
> phydev = phy_connect(mp->dev, phy_id, mv643xx_eth_adjust_link,
> PHY_INTERFACE_MODE_GMII);
>
> But "orion-mdio-mii:xx" is the name of the PHY if MDIO is setup via a
> platform_device. I could not get this to work if the MDIO device is
> setup via DT. Am I doing something wrong?
I just missed updating this part of the code to probe for PHYs. The
board I tested with uses a "PHY_NONE" configuration. I will add the
missing bits for of_phy_connect() to be called here.
>
>
> Additionally, in phy_scan() there is this:
>
> if (phy_addr == MV643XX_ETH_PHY_ADDR_DEFAULT) {
> start = phy_addr_get(mp) & 0x1f;
> num = 32;
> } else {
> ...
>
> MV643XX_ETH_PHY_ADDR_DEFAULT is defined as 0. However, many Kirkwood
> devices use "MV643XX_ETH_PHY_ADDR(0)". If the module probe is
> deferred in mv643xx_eth because the MDIO driver is not yet loaded,
> all 32 PHY addresses are scanned without success. This is not needed
> and clutters the log.
Ok, I am not sure how we can circumvent the log cluttering that happens,
what would be your suggestion?
--
Florian
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* Re: [RFC PATCH ipsec] xfrm: use the right dev to fill xdst
From: Steffen Klassert @ 2013-04-05 9:46 UTC (permalink / raw)
To: Nicolas Dichtel; +Cc: herbert, davem, netdev, dbaluta
In-Reply-To: <1365088362-4318-1-git-send-email-nicolas.dichtel@6wind.com>
On Thu, Apr 04, 2013 at 05:12:42PM +0200, Nicolas Dichtel wrote:
> Commit bc8e4b954e46 (xfrm6: ensure to use the same dev when building a bundle)
> broke IPsec for IPv4 over IPv6 tunnels (because dev points to an IPv4 only
> interface, hence in6_dev_get(dev) returns NULL.
Can you give some informations on how to reproduce this? I'm running
interfamily tunnels on our testing environment and it seems to
work fine.
>
> After looking again into commit 25ee3286dcbc ([IPSEC]: Merge common code into
> xfrm_bundle_create), it seems that previously we were using dev from the route,
> for both IPv4 and IPv6.
I think this was the right way. We need to attach the dev from the
corresponding route to the xdst.
>
> In fact, xfrm_fill_dst() is called during a loop on chained dst, but dev points
> always to the same device.
The way we do it now can be problematic for tunnel in tunnel scenarios too.
We assign the dev from the first tunnel route to all the bundle entries,
this looks really wrong.
I think your patch is correct, but I want understand the breaking
scenario first.
Thanks!
^ permalink raw reply
* [patch net 2/2] net: ipv4: fix schedule while atomic bug in check_lifetime()
From: Jiri Pirko @ 2013-04-05 9:39 UTC (permalink / raw)
To: netdev; +Cc: davem, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1365154779-4204-1-git-send-email-jiri@resnulli.us>
move might_sleep operations out of the rcu_read_lock() section.
Also fix iterating over ifa_dev->ifa_list
Introduced by: commit 5c766d642bcaf "ipv4: introduce address lifetime"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
net/ipv4/devinet.c | 58 +++++++++++++++++++++++++++++++++++++++---------------
1 file changed, 42 insertions(+), 16 deletions(-)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 00386e0..c6287cd 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -587,13 +587,16 @@ static void check_lifetime(struct work_struct *work)
{
unsigned long now, next, next_sec, next_sched;
struct in_ifaddr *ifa;
+ struct hlist_node *n;
int i;
now = jiffies;
next = round_jiffies_up(now + ADDR_CHECK_FREQUENCY);
- rcu_read_lock();
for (i = 0; i < IN4_ADDR_HSIZE; i++) {
+ bool change_needed = false;
+
+ rcu_read_lock();
hlist_for_each_entry_rcu(ifa, &inet_addr_lst[i], hash) {
unsigned long age;
@@ -606,16 +609,7 @@ static void check_lifetime(struct work_struct *work)
if (ifa->ifa_valid_lft != INFINITY_LIFE_TIME &&
age >= ifa->ifa_valid_lft) {
- struct in_ifaddr **ifap ;
-
- rtnl_lock();
- for (ifap = &ifa->ifa_dev->ifa_list;
- *ifap != NULL; ifap = &ifa->ifa_next) {
- if (*ifap == ifa)
- inet_del_ifa(ifa->ifa_dev,
- ifap, 1);
- }
- rtnl_unlock();
+ change_needed = true;
} else if (ifa->ifa_preferred_lft ==
INFINITY_LIFE_TIME) {
continue;
@@ -625,10 +619,8 @@ static void check_lifetime(struct work_struct *work)
next = ifa->ifa_tstamp +
ifa->ifa_valid_lft * HZ;
- if (!(ifa->ifa_flags & IFA_F_DEPRECATED)) {
- ifa->ifa_flags |= IFA_F_DEPRECATED;
- rtmsg_ifa(RTM_NEWADDR, ifa, NULL, 0);
- }
+ if (!(ifa->ifa_flags & IFA_F_DEPRECATED))
+ change_needed = true;
} else if (time_before(ifa->ifa_tstamp +
ifa->ifa_preferred_lft * HZ,
next)) {
@@ -636,8 +628,42 @@ static void check_lifetime(struct work_struct *work)
ifa->ifa_preferred_lft * HZ;
}
}
+ rcu_read_unlock();
+ if (!change_needed)
+ continue;
+ rtnl_lock();
+ hlist_for_each_entry_safe(ifa, n, &inet_addr_lst[i], hash) {
+ unsigned long age;
+
+ if (ifa->ifa_flags & IFA_F_PERMANENT)
+ continue;
+
+ /* We try to batch several events at once. */
+ age = (now - ifa->ifa_tstamp +
+ ADDRCONF_TIMER_FUZZ_MINUS) / HZ;
+
+ if (ifa->ifa_valid_lft != INFINITY_LIFE_TIME &&
+ age >= ifa->ifa_valid_lft) {
+ struct in_ifaddr **ifap;
+
+ for (ifap = &ifa->ifa_dev->ifa_list;
+ *ifap != NULL; ifap = &(*ifap)->ifa_next) {
+ if (*ifap == ifa) {
+ inet_del_ifa(ifa->ifa_dev,
+ ifap, 1);
+ break;
+ }
+ }
+ } else if (ifa->ifa_preferred_lft !=
+ INFINITY_LIFE_TIME &&
+ age >= ifa->ifa_preferred_lft &&
+ !(ifa->ifa_flags & IFA_F_DEPRECATED)) {
+ ifa->ifa_flags |= IFA_F_DEPRECATED;
+ rtmsg_ifa(RTM_NEWADDR, ifa, NULL, 0);
+ }
+ }
+ rtnl_unlock();
}
- rcu_read_unlock();
next_sec = round_jiffies_up(next);
next_sched = next;
--
1.8.1.2
^ permalink raw reply related
* [patch net 1/2] net: ipv4: reset check_lifetime_work after changing lifetime
From: Jiri Pirko @ 2013-04-05 9:39 UTC (permalink / raw)
To: netdev; +Cc: davem, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1365154779-4204-1-git-send-email-jiri@resnulli.us>
This will result in calling check_lifetime in nearest opportunity and
that function will adjust next time to call check_lifetime correctly.
Without this, check_lifetime is called in time computed by previous run,
not affecting modified lifetime.
Introduced by: commit 5c766d642bcaf "ipv4: introduce address lifetime"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
net/ipv4/devinet.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 96083b7..00386e0 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -804,6 +804,8 @@ static int inet_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg
return -EEXIST;
ifa = ifa_existing;
set_ifa_lifetime(ifa, valid_lft, prefered_lft);
+ cancel_delayed_work(&check_lifetime_work);
+ schedule_delayed_work(&check_lifetime_work, 0);
rtmsg_ifa(RTM_NEWADDR, ifa, nlh, NETLINK_CB(skb).portid);
blocking_notifier_call_chain(&inetaddr_chain, NETDEV_UP, ifa);
}
--
1.8.1.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox