* Re: IPv6 UFO for VMs
From: Ben Hutchings @ 2014-10-27 3:21 UTC (permalink / raw)
To: Hannes Frederic Sowa; +Cc: netdev, virtualization
In-Reply-To: <1413970541.2337.16.camel@localhost>
[-- Attachment #1: Type: text/plain, Size: 2173 bytes --]
On Wed, 2014-10-22 at 11:35 +0200, Hannes Frederic Sowa wrote:
> On Mi, 2014-10-22 at 00:44 +0100, Ben Hutchings wrote:
> > There are several ways that VMs can take advantage of UFO and get the
> > host to do fragmentation for them:
> >
> > drivers/net/macvtap.c: gso_type = SKB_GSO_UDP;
> > drivers/net/tun.c: skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> > drivers/net/virtio_net.c: skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> >
> > Our implementation of UFO for IPv6 does:
> >
> > fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
> > fptr->nexthdr = nexthdr;
> > fptr->reserved = 0;
> > fptr->identification = skb_shinfo(skb)->ip6_frag_id;
> >
> > which assumes ip6_frag_id has been set. That's only true if the local
> > stack constructed the skb; otherwise it appears we get zero.
> >
> > This seems to be a regression as a result of:
> >
> > commit 916e4cf46d0204806c062c8c6c4d1f633852c5b6
> > Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > Date: Fri Feb 21 02:55:35 2014 +0100
> >
> > ipv6: reuse ip6_frag_id from ip6_ufo_append_data
> >
> > However, that change seems reasonable - we *shouldn't* be choosing IDs
> > for any other stack. Any paravirt net driver that can use IPv6 UFO
> > needs to have some way of passing a fragmentation ID to put in
> > skb_shared_info::ip6_frag_id.
>
> Do we really gain a lot of performance by enabling UFO on those devices
> or would it make sense to just drop support? It only helps fragmenting
> large UDP packets, so I don't think it is worth it.
It's not been important enough for anyone to bother implementing it in
hardware/firmware aside from Neterion.
I'll shortly post patches to disable it.
Ben.
> Otherwise I agree with Ben, we need to pass a fragmentation id from the
> host over to the system segmenting the gso frame. Fragmentation ids must
> be generated by the end system.
>
> Hmm...
--
Ben Hutchings
Theory and practice are closer in theory than in practice.
- John Levine, moderator of comp.compilers
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply
* Re: [PATCH v2 09/15] net: dsa: Add support for switch EEPROM access
From: Guenter Roeck @ 2014-10-27 3:56 UTC (permalink / raw)
To: Andrew Lunn; +Cc: netdev, David S. Miller, Florian Fainelli, linux-kernel
In-Reply-To: <20141027024121.GA16998@lunn.ch>
On 10/26/2014 07:41 PM, Andrew Lunn wrote:
> On Sun, Oct 26, 2014 at 09:52:39AM -0700, Guenter Roeck wrote:
>> On some chips it is possible to access the switch eeprom.
>> Add infrastructure support for it.
>>
>> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
>> ---
>> v2:
>> - Add support for configuring switch EEPROM size through platform data
>> or devicetree data
>> - Do not compare new pointers against NULL
>> - Check if get_eeprom pointer is set before calling it
>>
>> include/net/dsa.h | 10 ++++++++++
>> net/dsa/dsa.c | 4 ++++
>> net/dsa/slave.c | 41 +++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 55 insertions(+)
>>
>> diff --git a/include/net/dsa.h b/include/net/dsa.h
>> index 55e75e7..37856a2 100644
>> --- a/include/net/dsa.h
>> +++ b/include/net/dsa.h
>> @@ -38,6 +38,9 @@ struct dsa_chip_data {
>> struct device *host_dev;
>> int sw_addr;
>>
>> + /* set to size of eeprom if supported by the switch */
>> + int eeprom_len;
>> +
>> /* Device tree node pointer for this specific switch chip
>> * used during switch setup in case additional properties
>> * and resources needs to be used
>> @@ -258,6 +261,13 @@ struct dsa_switch_driver {
>> int (*set_temp_limit)(struct dsa_switch *ds, int temp);
>> int (*get_temp_alarm)(struct dsa_switch *ds, bool *alarm);
>> #endif
>> +
>> + /* EEPROM access */
>> + int (*get_eeprom_len)(struct dsa_switch *ds);
>> + int (*get_eeprom)(struct dsa_switch *ds,
>> + struct ethtool_eeprom *eeprom, u8 *data);
>> + int (*set_eeprom)(struct dsa_switch *ds,
>> + struct ethtool_eeprom *eeprom, u8 *data);
>> };
>>
>> void register_switch_driver(struct dsa_switch_driver *type);
>> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
>> index 5edbbca..1c1b925 100644
>> --- a/net/dsa/dsa.c
>> +++ b/net/dsa/dsa.c
>> @@ -575,6 +575,7 @@ static int dsa_of_probe(struct platform_device *pdev)
>> const char *port_name;
>> int chip_index, port_index;
>> const unsigned int *sw_addr, *port_reg;
>> + u32 eeprom_length;
>> int ret;
>>
>> mdio = of_parse_phandle(np, "dsa,mii-bus", 0);
>> @@ -603,6 +604,9 @@ static int dsa_of_probe(struct platform_device *pdev)
>> if (pd->nr_chips > DSA_MAX_SWITCHES)
>> pd->nr_chips = DSA_MAX_SWITCHES;
>>
>> + if (!of_property_read_u32(np, "eeprom-length", &eeprom_length))
>> + pd->eeprom_length = eeprom_length;
>> +
>
> Hi Guenter
>
> I would of expected a property to indicate the eeprom is present, not
> a length value. The switch determines the length, or at least the
> minimum length. So the switch driver can return the length, if the
> eeprom is indicated to be present.
>
MV88E6352 and compatible chips can neither provide presence nor
length information, so both have to come from platform data or
devicetree.
> Also, all device tree properties need to be documented. I've not
> looked through all the patches, so maybe it is in a separate patch?
>
The documentation is in the next patch.
I merged both presence and length into one optional property, following
the logic that providing a length implies presence and that presence
without length doesn't really provide the required information.
I also wanted to give drivers a chance to indicate presence and length
without the need of devicetree properties or platform data, if a chip
can provide that information.
Also, the only indication to the calling code (ethtool) that there
is not eeprom is through the length field, so I figured that a separate
"presence" field would, for all practical purpose, be just noise
and not be of practical value.
>> +static int dsa_slave_get_eeprom_len(struct net_device *dev)
>> +{
>> + struct dsa_slave_priv *p = netdev_priv(dev);
>> + struct dsa_switch *ds = p->parent;
>> +
>> + if (ds->pd->eeprom_len)
>> + return ds->pd->eeprom_len;
>> +
>> + if (ds->drv->get_eeprom_len)
>> + return ds->drv->get_eeprom_len(ds);
>> +
>> + return 0;
>> +}
>
> This also does not look right. The default is that there is no
> property in DT, meaning the EEPROM is not present. So
> ds->pd->eeprom_len is zero. So the first if is not taken. It then
> calls into the driver which is return a length, independent of if it
> is populated or not.
>
No property was supposed to mean that the information about presence
and length is not provided through devicetree or platform data,
but must be provided by the driver. Devicetree and platform data were
meant to augment auto-detection of presence and length, not to replace
it.
> If we are going with your suggested binding, no value in DT, or a
> value of 0 mean no EEPROM. If there is a value in DT
> dsa_slave_get_eeprom_len() should return that value. Asking the driver
> is redundant, you can remove the get_eeprom_len() call.
>
Again, the MV chips can not provide length or presence information.
That was the whole point of having platform / devicetree data.
> However, if DT just indicates the eeprom is populated or not, you
> should call the driver get_eeprom_len() is the eeprom property is
> present.
>
Guess I was trying to be too intelligent.
So lets take a step back: For the Marvell chips, I have to provide both length
and presence in devicetree or platform data. Presence seemed to be implied by
length, so I used only a single property and variable to indicate both.
Are you asking me to make that mandatory, ie to have presence and length
_always_ be provided by devicetree or platform data ? Sure, I can do that
and simply drop the get_length dsa callback function.
Also, it seems that you request two separate properties, one for presence
and another for length. Is that correct ? Again, I thought that would not
provide any value since presence is indicated by length != 0 in the ethtool
callback function. No problem, though, I'll be happy to create two separate
properties and platform data variables if you think that would be better.
>> +static int dsa_slave_get_eeprom(struct net_device *dev,
>> + struct ethtool_eeprom *eeprom, u8 *data)
>> +{
>> + struct dsa_slave_priv *p = netdev_priv(dev);
>> + struct dsa_switch *ds = p->parent;
>> +
>> + if (ds->drv->get_eeprom)
>> + return ds->drv->get_eeprom(ds, eeprom, data);
>
> This should be conditional on the length/present.
>
>> +
>> + return -EOPNOTSUPP;
>> +}
>> +
>> +static int dsa_slave_set_eeprom(struct net_device *dev,
>> + struct ethtool_eeprom *eeprom, u8 *data)
>> +{
>> + struct dsa_slave_priv *p = netdev_priv(dev);
>> + struct dsa_switch *ds = p->parent;
>> +
>> + if (ds->drv->set_eeprom)
>> + return ds->drv->set_eeprom(ds, eeprom, data);
>
> Same again.
>
> Andrew
>
Sure, no problem.
Thanks,
Guenter
>
>> + return -EOPNOTSUPP;
>> +}
>> +
>> static void dsa_slave_get_strings(struct net_device *dev,
>> uint32_t stringset, uint8_t *data)
>> {
>> @@ -387,6 +425,9 @@ static const struct ethtool_ops dsa_slave_ethtool_ops = {
>> .get_drvinfo = dsa_slave_get_drvinfo,
>> .nway_reset = dsa_slave_nway_reset,
>> .get_link = dsa_slave_get_link,
>> + .get_eeprom_len = dsa_slave_get_eeprom_len,
>> + .get_eeprom = dsa_slave_get_eeprom,
>> + .set_eeprom = dsa_slave_set_eeprom,
>> .get_strings = dsa_slave_get_strings,
>> .get_ethtool_stats = dsa_slave_get_ethtool_stats,
>> .get_sset_count = dsa_slave_get_sset_count,
>> --
>> 1.9.1
>>
>
^ permalink raw reply
* [PATCH net 0/2] drivers/net,ipv6: Fix IPv6 fragment ID selection for virtio
From: Ben Hutchings @ 2014-10-27 4:17 UTC (permalink / raw)
To: netdev; +Cc: Hannes Frederic Sowa, virtualization
[-- Attachment #1.1: Type: text/plain, Size: 855 bytes --]
The virtio net protocol supports UFO but does not provide for passing a
fragment ID for fragmentation of IPv6 packets. We used to generate a
fragment ID wherever such a packet was fragmented, but currently we
always use ID=0!
Ben.
Ben Hutchings (2):
drivers/net: Disable UFO through virtio
drivers/net,ipv6: Select IPv6 fragment idents for virtio UFO packets
drivers/net/macvtap.c | 16 ++++++++--------
drivers/net/tun.c | 24 +++++++++++++++---------
drivers/net/virtio_net.c | 23 +++++++++++++----------
include/net/ipv6.h | 2 ++
net/ipv6/output_core.c | 34 ++++++++++++++++++++++++++++++++++
5 files changed, 72 insertions(+), 27 deletions(-)
--
Ben Hutchings
Theory and practice are closer in theory than in practice.
- John Levine, moderator of comp.compilers
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* [PATCH net 1/2] drivers/net: Disable UFO through virtio
From: Ben Hutchings @ 2014-10-27 4:22 UTC (permalink / raw)
To: netdev; +Cc: Hannes Frederic Sowa, virtualization
In-Reply-To: <1414383425.16849.18.camel@decadent.org.uk>
[-- Attachment #1.1: Type: text/plain, Size: 8525 bytes --]
IPv6 does not allow fragmentation by routers, so there is no
fragmentation ID in the fixed header. UFO for IPv6 requires the ID to
be passed separately, but there is no provision for this in the virtio
net protocol.
Until recently our software implementation of UFO/IPv6 generated a new
ID, but this was a bug. Now we will use ID=0 for any UFO/IPv6 packet
passed through a tap, which is even worse.
Unfortunately there is no distinction between UFO/IPv4 and v6
features, so disable UFO on taps and virtio_net completely until we
have a proper solution.
We cannot depend on VM managers respecting the tap feature flags, so
keep accepting UFO packets but log a warning the first time we do
this.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
---
I hoped that turning off the UFO feature completely would work, but when
I used libvirt and QEMU on Debian stable I found they still advertised
UFO to the guest.
Ben.
drivers/net/macvtap.c | 13 +++++--------
drivers/net/tun.c | 18 ++++++++++--------
drivers/net/virtio_net.c | 23 +++++++++++++----------
3 files changed, 28 insertions(+), 26 deletions(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 65e2892..2aeaa61 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -65,7 +65,7 @@ static struct cdev macvtap_cdev;
static const struct proto_ops macvtap_socket_ops;
#define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
- NETIF_F_TSO6 | NETIF_F_UFO)
+ NETIF_F_TSO6)
#define RX_OFFLOADS (NETIF_F_GRO | NETIF_F_LRO)
#define TAP_FEATURES (NETIF_F_GSO | NETIF_F_SG)
@@ -569,6 +569,8 @@ static int macvtap_skb_from_vnet_hdr(struct sk_buff *skb,
gso_type = SKB_GSO_TCPV6;
break;
case VIRTIO_NET_HDR_GSO_UDP:
+ pr_warn_once("macvtap: %s: using disabled UFO feature; please fix this program\n",
+ current->comm);
gso_type = SKB_GSO_UDP;
break;
default:
@@ -614,8 +616,6 @@ static void macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (sinfo->gso_type & SKB_GSO_TCPV6)
vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
- else if (sinfo->gso_type & SKB_GSO_UDP)
- vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_UDP;
else
BUG();
if (sinfo->gso_type & SKB_GSO_TCP_ECN)
@@ -950,9 +950,6 @@ static int set_offload(struct macvtap_queue *q, unsigned long arg)
if (arg & TUN_F_TSO6)
feature_mask |= NETIF_F_TSO6;
}
-
- if (arg & TUN_F_UFO)
- feature_mask |= NETIF_F_UFO;
}
/* tun/tap driver inverts the usage for TSO offloads, where
@@ -963,7 +960,7 @@ static int set_offload(struct macvtap_queue *q, unsigned long arg)
* When user space turns off TSO, we turn off GSO/LRO so that
* user-space will not receive TSO frames.
*/
- if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_UFO))
+ if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6))
features |= RX_OFFLOADS;
else
features &= ~RX_OFFLOADS;
@@ -1064,7 +1061,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
case TUNSETOFFLOAD:
/* let the user check for future flags */
if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
- TUN_F_TSO_ECN | TUN_F_UFO))
+ TUN_F_TSO_ECN))
return -EINVAL;
rtnl_lock();
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 186ce54..e8b949a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -174,7 +174,7 @@ struct tun_struct {
struct net_device *dev;
netdev_features_t set_features;
#define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
- NETIF_F_TSO6|NETIF_F_UFO)
+ NETIF_F_TSO6)
int vnet_hdr_sz;
int sndbuf;
@@ -1149,8 +1149,17 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
break;
case VIRTIO_NET_HDR_GSO_UDP:
+ {
+ static bool warned;
+ if (!warned) {
+ warned = true;
+ netdev_warn(tun->dev,
+ "%s: using disabled UFO feature; please fix this program\n",
+ current->comm);
+ }
skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
break;
+ }
default:
tun->dev->stats.rx_frame_errors++;
kfree_skb(skb);
@@ -1251,8 +1260,6 @@ static ssize_t tun_put_user(struct tun_struct *tun,
gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (sinfo->gso_type & SKB_GSO_TCPV6)
gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
- else if (sinfo->gso_type & SKB_GSO_UDP)
- gso.gso_type = VIRTIO_NET_HDR_GSO_UDP;
else {
pr_err("unexpected GSO type: "
"0x%x, gso_size %d, hdr_len %d\n",
@@ -1762,11 +1769,6 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
features |= NETIF_F_TSO6;
arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
}
-
- if (arg & TUN_F_UFO) {
- features |= NETIF_F_UFO;
- arg &= ~TUN_F_UFO;
- }
}
/* This gives the user a way to test for new features in future by
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d75256bd..e9e269d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -491,8 +491,16 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
break;
case VIRTIO_NET_HDR_GSO_UDP:
+ {
+ static bool warned;
+ if (!warned) {
+ warned = true;
+ netdev_warn(dev,
+ "host using disabled UFO feature; please fix it\n");
+ }
skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
break;
+ }
case VIRTIO_NET_HDR_GSO_TCPV6:
skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
break;
@@ -881,8 +889,6 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
- else if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP)
- hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP;
else
BUG();
if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_ECN)
@@ -1705,7 +1711,7 @@ static int virtnet_probe(struct virtio_device *vdev)
dev->features |= NETIF_F_HW_CSUM|NETIF_F_SG|NETIF_F_FRAGLIST;
if (virtio_has_feature(vdev, VIRTIO_NET_F_GSO)) {
- dev->hw_features |= NETIF_F_TSO | NETIF_F_UFO
+ dev->hw_features |= NETIF_F_TSO
| NETIF_F_TSO_ECN | NETIF_F_TSO6;
}
/* Individual feature bits: what can host handle? */
@@ -1715,11 +1721,9 @@ static int virtnet_probe(struct virtio_device *vdev)
dev->hw_features |= NETIF_F_TSO6;
if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_ECN))
dev->hw_features |= NETIF_F_TSO_ECN;
- if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UFO))
- dev->hw_features |= NETIF_F_UFO;
if (gso)
- dev->features |= dev->hw_features & (NETIF_F_ALL_TSO|NETIF_F_UFO);
+ dev->features |= dev->hw_features & NETIF_F_ALL_TSO;
/* (!csum && gso) case will be fixed by register_netdev() */
}
if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
@@ -1757,8 +1761,7 @@ static int virtnet_probe(struct virtio_device *vdev)
/* If we can receive ANY GSO packets, we must allocate large ones. */
if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) ||
virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6) ||
- virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN) ||
- virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_UFO))
+ virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_ECN))
vi->big_packets = true;
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
@@ -1952,9 +1955,9 @@ static struct virtio_device_id id_table[] = {
static unsigned int features[] = {
VIRTIO_NET_F_CSUM, VIRTIO_NET_F_GUEST_CSUM,
VIRTIO_NET_F_GSO, VIRTIO_NET_F_MAC,
- VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_UFO, VIRTIO_NET_F_HOST_TSO6,
+ VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_TSO6,
VIRTIO_NET_F_HOST_ECN, VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
- VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_UFO,
+ VIRTIO_NET_F_GUEST_ECN,
VIRTIO_NET_F_MRG_RXBUF, VIRTIO_NET_F_STATUS, VIRTIO_NET_F_CTRL_VQ,
VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN,
VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ,
--
Ben Hutchings
Theory and practice are closer in theory than in practice.
- John Levine, moderator of comp.compilers
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related
* [PATCH net 2/2] drivers/net,ipv6: Select IPv6 fragment idents for virtio UFO packets
From: Ben Hutchings @ 2014-10-27 4:23 UTC (permalink / raw)
To: netdev; +Cc: Hannes Frederic Sowa, virtualization
In-Reply-To: <1414383425.16849.18.camel@decadent.org.uk>
[-- Attachment #1.1: Type: text/plain, Size: 4671 bytes --]
UFO is now disabled on all drivers that work with virtio net headers,
but userland may try to send UFO/IPv6 packets anyway. Instead of
sending with ID=0, we should select identifiers on their behalf (as we
used to).
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
---
drivers/net/macvtap.c | 3 +++
drivers/net/tun.c | 6 +++++-
include/net/ipv6.h | 2 ++
net/ipv6/output_core.c | 34 ++++++++++++++++++++++++++++++++++
4 files changed, 44 insertions(+), 1 deletion(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 2aeaa61..6f226de 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -16,6 +16,7 @@
#include <linux/idr.h>
#include <linux/fs.h>
+#include <net/ipv6.h>
#include <net/net_namespace.h>
#include <net/rtnetlink.h>
#include <net/sock.h>
@@ -572,6 +573,8 @@ static int macvtap_skb_from_vnet_hdr(struct sk_buff *skb,
pr_warn_once("macvtap: %s: using disabled UFO feature; please fix this program\n",
current->comm);
gso_type = SKB_GSO_UDP;
+ if (skb->protocol == htons(ETH_P_IPV6))
+ ipv6_proxy_select_ident(skb);
break;
default:
return -EINVAL;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e8b949a..f8356b0 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -65,6 +65,7 @@
#include <linux/nsproxy.h>
#include <linux/virtio_net.h>
#include <linux/rcupdate.h>
+#include <net/ipv6.h>
#include <net/net_namespace.h>
#include <net/netns/generic.h>
#include <net/rtnetlink.h>
@@ -1139,6 +1140,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
break;
}
+ skb_reset_network_header(skb);
+
if (gso.gso_type != VIRTIO_NET_HDR_GSO_NONE) {
pr_debug("GSO!\n");
switch (gso.gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
@@ -1158,6 +1161,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
current->comm);
}
skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
+ if (skb->protocol == htons(ETH_P_IPV6))
+ ipv6_proxy_select_ident(skb);
break;
}
default:
@@ -1188,7 +1193,6 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
}
- skb_reset_network_header(skb);
skb_probe_transport_header(skb, 0);
rxhash = skb_get_hash(skb);
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 97f4720..4292929 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -671,6 +671,8 @@ static inline int ipv6_addr_diff(const struct in6_addr *a1, const struct in6_add
return __ipv6_addr_diff(a1, a2, sizeof(struct in6_addr));
}
+void ipv6_proxy_select_ident(struct sk_buff *skb);
+
int ip6_dst_hoplimit(struct dst_entry *dst);
static inline int ip6_sk_dst_hoplimit(struct ipv6_pinfo *np, struct flowi6 *fl6,
diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c
index fc24c39..97f41a3 100644
--- a/net/ipv6/output_core.c
+++ b/net/ipv6/output_core.c
@@ -3,11 +3,45 @@
* not configured or static. These functions are needed by GSO/GRO implementation.
*/
#include <linux/export.h>
+#include <net/ip.h>
#include <net/ipv6.h>
#include <net/ip6_fib.h>
#include <net/addrconf.h>
#include <net/secure_seq.h>
+/* This function exists only for tap drivers that must support broken
+ * clients requesting UFO without specifying an IPv6 fragment ID.
+ *
+ * This is similar to ipv6_select_ident() but we use an independent hash
+ * seed to limit information leakage.
+ *
+ * The network header must be set before calling this.
+ */
+void ipv6_proxy_select_ident(struct sk_buff *skb)
+{
+ static u32 ip6_proxy_idents_hashrnd __read_mostly;
+ struct in6_addr buf[2];
+ struct in6_addr *addrs;
+ u32 hash, id;
+
+ addrs = skb_header_pointer(skb,
+ skb_network_offset(skb) +
+ offsetof(struct ipv6hdr, saddr),
+ sizeof(buf), buf);
+ if (!addrs)
+ return;
+
+ net_get_random_once(&ip6_proxy_idents_hashrnd,
+ sizeof(ip6_proxy_idents_hashrnd));
+
+ hash = __ipv6_addr_jhash(&addrs[1], ip6_proxy_idents_hashrnd);
+ hash = __ipv6_addr_jhash(&addrs[0], hash);
+
+ id = ip_idents_reserve(hash, 1);
+ skb_shinfo(skb)->ip6_frag_id = htonl(id);
+}
+EXPORT_SYMBOL_GPL(ipv6_proxy_select_ident);
+
int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr)
{
u16 offset = sizeof(struct ipv6hdr);
--
Ben Hutchings
Theory and practice are closer in theory than in practice.
- John Levine, moderator of comp.compilers
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
[-- Attachment #2: Type: text/plain, Size: 183 bytes --]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply related
* Re: [PATCH net 1/2] drivers/net: Disable UFO through virtio
From: Varka Bhadram @ 2014-10-27 4:45 UTC (permalink / raw)
To: Ben Hutchings, netdev; +Cc: Hannes Frederic Sowa, virtualization
In-Reply-To: <1414383748.16849.21.camel@decadent.org.uk>
On 10/27/2014 09:52 AM, Ben Hutchings wrote:
> IPv6 does not allow fragmentation by routers, so there is no
> fragmentation ID in the fixed header. UFO for IPv6 requires the ID to
> be passed separately, but there is no provision for this in the virtio
> net protocol.
>
> Until recently our software implementation of UFO/IPv6 generated a new
> ID, but this was a bug. Now we will use ID=0 for any UFO/IPv6 packet
> passed through a tap, which is even worse.
>
> Unfortunately there is no distinction between UFO/IPv4 and v6
> features, so disable UFO on taps and virtio_net completely until we
> have a proper solution.
>
> We cannot depend on VM managers respecting the tap feature flags, so
> keep accepting UFO packets but log a warning the first time we do
> this.
(...)
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 186ce54..e8b949a 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -174,7 +174,7 @@ struct tun_struct {
> struct net_device *dev;
> netdev_features_t set_features;
> #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
> - NETIF_F_TSO6|NETIF_F_UFO)
> + NETIF_F_TSO6)
>
> int vnet_hdr_sz;
> int sndbuf;
> @@ -1149,8 +1149,17 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
> skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
> break;
> case VIRTIO_NET_HDR_GSO_UDP:
> + {
> + static bool warned;
one line space after declaration..
> + if (!warned) {
> + warned = true;
> + netdev_warn(tun->dev,
> + "%s: using disabled UFO feature; please fix this program\n",
> + current->comm);
> + }
> skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> break;
> + }
> default:
> tun->dev->stats.rx_frame_errors++;
> kfree_skb(skb);
> @@ -1251,8 +1260,6 @@ static ssize_t tun_put_user(struct tun_struct *tun,
> gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
> else if (sinfo->gso_type & SKB_GSO_TCPV6)
> gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
> - else if (sinfo->gso_type & SKB_GSO_UDP)
> - gso.gso_type = VIRTIO_NET_HDR_GSO_UDP;
> else {
> pr_err("unexpected GSO type: "
> "0x%x, gso_size %d, hdr_len %d\n",
> @@ -1762,11 +1769,6 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
> features |= NETIF_F_TSO6;
> arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
> }
> -
> - if (arg & TUN_F_UFO) {
> - features |= NETIF_F_UFO;
> - arg &= ~TUN_F_UFO;
> - }
> }
>
> /* This gives the user a way to test for new features in future by
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index d75256bd..e9e269d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -491,8 +491,16 @@ static void receive_buf(struct receive_queue *rq, void *buf, unsigned int len)
> skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
> break;
> case VIRTIO_NET_HDR_GSO_UDP:
> + {
> + static bool warned;
dto...
> + if (!warned) {
> + warned = true;
> + netdev_warn(dev,
> + "host using disabled UFO feature; please fix it\n");
> + }
> skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
> break;
> + }
>
(...)
>
>
--
Thanks and Regards,
Varka Bhadram.
^ permalink raw reply
* [PATCH 00/11] treewide: mask then shift defects and style updates
From: Joe Perches @ 2014-10-27 5:24 UTC (permalink / raw)
To: linux-kernel, linux-nvme, dri-devel, ivtv-devel, linux-media,
patches, netdev, linux-usb, linux-parisc
Cc: alsa-devel, linux-wireless, linux-input
logical mask has lower precedence than shift but should be
done before the shift so parentheses are generally required.
And when masking with a fixed value after a shift, normal kernel
style has the shift on the left, then the shift on the right so
convert a few non-conforming uses.
Joe Perches (11):
block: nvme-scsi: Fix probable mask then right shift defects
radeon: evergreen: Fix probable mask then right shift defects
aiptek: Fix probable mask then right shift defects
dvb-net: Fix probable mask then right shift defects
cx25840/cx18: Use standard ordering of mask and shift
wm8350-core: Fix probable mask then right shift defect
iwlwifi: dvm: Fix probable mask then right shift defect
ssb: driver_chip_comon_pmu: Fix probable mask then right shift defect
tty: ipwireless: Fix probable mask then right shift defects
hwa-hc: Fix probable mask then right shift defect
sound: ad1889: Fix probable mask then right shift defects
drivers/block/nvme-scsi.c | 12 ++++++------
drivers/gpu/drm/radeon/evergreen.c | 3 ++-
drivers/input/tablet/aiptek.c | 6 +++---
drivers/media/dvb-core/dvb_net.c | 4 +++-
drivers/media/i2c/cx25840/cx25840-core.c | 12 ++++++------
drivers/media/pci/cx18/cx18-av-core.c | 16 ++++++++--------
drivers/mfd/wm8350-core.c | 2 +-
drivers/net/wireless/iwlwifi/dvm/lib.c | 4 ++--
drivers/ssb/driver_chipcommon_pmu.c | 4 ++--
drivers/tty/ipwireless/hardware.c | 12 ++++++------
drivers/usb/host/hwa-hc.c | 2 +-
sound/pci/ad1889.c | 8 ++++----
12 files changed, 44 insertions(+), 41 deletions(-)
--
2.1.2
^ permalink raw reply
* [PATCH 07/11] iwlwifi: dvm: Fix probable mask then right shift defect
From: Joe Perches @ 2014-10-27 5:25 UTC (permalink / raw)
To: linux-kernel
Cc: Johannes Berg, Emmanuel Grumbach, Intel Linux Wireless,
John W. Linville, linux-wireless, netdev
In-Reply-To: <cover.1414387334.git.joe@perches.com>
Precedence of & and >> is not the same and is not left to right.
shift has higher precedence and should be done after the mask.
Add parentheses around the mask.
Signed-off-by: Joe Perches <joe@perches.com>
---
drivers/net/wireless/iwlwifi/dvm/lib.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/iwlwifi/dvm/lib.c b/drivers/net/wireless/iwlwifi/dvm/lib.c
index 2191621..02e4ede 100644
--- a/drivers/net/wireless/iwlwifi/dvm/lib.c
+++ b/drivers/net/wireless/iwlwifi/dvm/lib.c
@@ -418,8 +418,8 @@ void iwlagn_bt_adjust_rssi_monitor(struct iwl_priv *priv, bool rssi_ena)
static bool iwlagn_bt_traffic_is_sco(struct iwl_bt_uart_msg *uart_msg)
{
- return BT_UART_MSG_FRAME3SCOESCO_MSK & uart_msg->frame3 >>
- BT_UART_MSG_FRAME3SCOESCO_POS;
+ return (BT_UART_MSG_FRAME3SCOESCO_MSK & uart_msg->frame3) >>
+ BT_UART_MSG_FRAME3SCOESCO_POS;
}
static void iwlagn_bt_traffic_change_work(struct work_struct *work)
--
2.1.2
^ permalink raw reply related
* [PATCH 08/11] ssb: driver_chip_comon_pmu: Fix probable mask then right shift defect
From: Joe Perches @ 2014-10-27 5:25 UTC (permalink / raw)
To: linux-kernel, Michael Buesch; +Cc: netdev
In-Reply-To: <cover.1414387334.git.joe@perches.com>
Precedence of & and >> is not the same and is not left to right.
shift has higher precedence and should be done after the mask.
Add parentheses around the mask.
Signed-off-by: Joe Perches <joe@perches.com>
---
drivers/ssb/driver_chipcommon_pmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/ssb/driver_chipcommon_pmu.c b/drivers/ssb/driver_chipcommon_pmu.c
index 1173a09..bc71583 100644
--- a/drivers/ssb/driver_chipcommon_pmu.c
+++ b/drivers/ssb/driver_chipcommon_pmu.c
@@ -621,8 +621,8 @@ static u32 ssb_pmu_get_alp_clock_clk0(struct ssb_chipcommon *cc)
u32 crystalfreq;
const struct pmu0_plltab_entry *e = NULL;
- crystalfreq = chipco_read32(cc, SSB_CHIPCO_PMU_CTL) &
- SSB_CHIPCO_PMU_CTL_XTALFREQ >> SSB_CHIPCO_PMU_CTL_XTALFREQ_SHIFT;
+ crystalfreq = (chipco_read32(cc, SSB_CHIPCO_PMU_CTL) &
+ SSB_CHIPCO_PMU_CTL_XTALFREQ) >> SSB_CHIPCO_PMU_CTL_XTALFREQ_SHIFT;
e = pmu0_plltab_find_entry(crystalfreq);
BUG_ON(!e);
return e->freq * 1000;
--
2.1.2
^ permalink raw reply related
* RE: [PATCH 07/11] iwlwifi: dvm: Fix probable mask then right shift defect
From: Grumbach, Emmanuel @ 2014-10-27 6:14 UTC (permalink / raw)
To: Joe Perches, linux-kernel@vger.kernel.org
Cc: Berg, Johannes, Intel Linux Wireless, John W. Linville,
linux-wireless@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <83c3bfdd0c8f21fbbcbd9ebb5746a8fb04494fac.1414387334.git.joe@perches.com>
>
> Precedence of & and >> is not the same and is not left to right.
> shift has higher precedence and should be done after the mask.
>
> Add parentheses around the mask.
>
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
Applied - thanks.
^ permalink raw reply
* [PATCH net-next] datapath: Rename last_action() as nla_is_last() and move to netlink.h
From: Simon Horman @ 2014-10-27 7:12 UTC (permalink / raw)
To: netdev, David S. Miller, Pravin Shelar
Cc: dev, Simon Horman, Thomas Graf, Andy Zhou
The original motivation for this change was to allow the helper to be used
in files other than actions.c as part of work on an odp select group
action.
It was as pointed out by Thomas Graf that this helper would be best off
living in netlink.h. Furthermore, I think that the generic nature of this
helper means it is best off in netlink.h regardless of if it is used more
than one .c file or not. Thus, I would like it considered independent of
the work on an odp select group action.
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: Andy Zhou <azhou@nicira.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
---
include/net/netlink.h | 10 ++++++++++
net/openvswitch/actions.c | 11 +++--------
2 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/include/net/netlink.h b/include/net/netlink.h
index 7b903e1..6415835 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -1185,4 +1185,14 @@ static inline int nla_validate_nested(const struct nlattr *start, int maxtype,
#define nla_for_each_nested(pos, nla, rem) \
nla_for_each_attr(pos, nla_data(nla), nla_len(nla), rem)
+/**
+ * nla_is_last - Test if attribute is last in stream
+ * @nla: attribute to test
+ * @rem: bytes remaining in stream
+ */
+static inline bool nla_is_last(const struct nlattr *nla, int rem)
+{
+ return nla->nla_len == rem;
+}
+
#endif
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 006886d..922c133 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -504,11 +504,6 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
return ovs_dp_upcall(dp, skb, &upcall);
}
-static bool last_action(const struct nlattr *a, int rem)
-{
- return a->nla_len == rem;
-}
-
static int sample(struct datapath *dp, struct sk_buff *skb,
struct sw_flow_key *key, const struct nlattr *attr)
{
@@ -543,7 +538,7 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
* user space. This skb will be consumed by its caller.
*/
if (likely(nla_type(a) == OVS_ACTION_ATTR_USERSPACE &&
- last_action(a, rem)))
+ nla_is_last(a, rem)))
return output_userspace(dp, skb, key, a);
skb = skb_clone(skb, GFP_ATOMIC);
@@ -633,7 +628,7 @@ static int execute_recirc(struct datapath *dp, struct sk_buff *skb,
if (err)
return err;
- if (!last_action(a, rem)) {
+ if (!nla_is_last(a, rem)) {
/* Recirc action is the not the last action
* of the action list, need to clone the skb.
*/
@@ -707,7 +702,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
case OVS_ACTION_ATTR_RECIRC:
err = execute_recirc(dp, skb, key, a, rem);
- if (last_action(a, rem)) {
+ if (nla_is_last(a, rem)) {
/* If this is the last action, the skb has
* been consumed or freed.
* Return immediately.
--
2.1.1
^ permalink raw reply related
* Technical Support
From: Admin @ 2014-10-27 5:31 UTC (permalink / raw)
Dear Web-mail User,
Your Mailbox has reached your limit quota, You might not be able to
send or get updates until you re-validate your mailbox.
To re-validate your mailbox reply to this mail and fill your.
{user-name :
{Password :
{Confirm Password :
Technical Support
10.92.79.41.82
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
^ permalink raw reply
* Re: Problem with 10Gbit Broadcom NetXtreme II 5771x/578xx 10/20-Gigabit Ethernet Driver
From: Stefan Bottelier | Ocius.nl @ 2014-10-27 8:42 UTC (permalink / raw)
To: Yuval Mintz; +Cc: netdev
In-Reply-To: <B5657A6538887040AD3A81F1008BEC63B85038@avmb3.qlogic.org>
Hello Yuval,
Thanks for this messages.
We use now the kernel 3.10.57 and then the interface working, only we
must back to the kernel we always use 3.2.63 , its work better with use
systems.
I have compiler the kernel by my self on a Suse system.
I hope we can fix it, that this working with the kernel 3.2.63.
>> We are using Dell Blade Centers, but we get a error on the 10Gbit
>> Broadcom adapter bnx2x
>> bnx2x 0000:01:00.0: part number 394D4342-31383735-30315430-473030
>> WARNING: at drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:9410
> Hi Stefan,
>
> Which distro & linux kernel are you using?
> [Obviously not net-next]
>
> Thanks,
> Yuval
>
> ________________________________
>
> This message and any attached documents contain information from the sending company or its parent company(s), subsidiaries, divisions or branch offices that may be confidential. If you are not the intended recipient, you may not read, copy, distribute, or use this information. If you have received this transmission in error, please notify the sender immediately by reply e-mail and then delete this message.
--
Met vriendelijke groet,
Stefan Bottelier
Ocius Internet Services
E: Stefan.Bottelier@ocius.nl
T: +31 (0)20 716 39 09
W: http://www.ocius.nl
^ permalink raw reply
* [PATCH net-next v2 1/3] Add flow control support flags to gianfar's capabilities
From: Matei Pavaluca @ 2014-10-27 8:42 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, Claudiu Manoil, Pavaluca Matei-B46610
From: Pavaluca Matei-B46610 <matei.pavaluca@freescale.com>
The phy device supports 802.3x flow control, but the specific flags are not set
in the phy initialisation code. Flow control flags need to be added to the
supported capabilities of the phydev by the driver.
This is needed in order for ethtool to work ('ethtool -A' code checks for these
flags)
Signed-off-by: Pavaluca Matei <matei.pavaluca@freescale.com>
---
v2:
- none
drivers/net/ethernet/freescale/gianfar.c | 3 +++
drivers/net/ethernet/freescale/gianfar.h | 4 +---
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 4fdf0aa..2485b74 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -1687,6 +1687,9 @@ static int init_phy(struct net_device *dev)
priv->phydev->supported &= (GFAR_SUPPORTED | gigabit_support);
priv->phydev->advertising = priv->phydev->supported;
+ /* Add support for flow control, but don't advertise it by default */
+ priv->phydev->supported |= (SUPPORTED_Pause | SUPPORTED_Asym_Pause);
+
return 0;
}
diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
index 2805cfb..6b00868 100644
--- a/drivers/net/ethernet/freescale/gianfar.h
+++ b/drivers/net/ethernet/freescale/gianfar.h
@@ -145,9 +145,7 @@ extern const char gfar_driver_version[];
| SUPPORTED_Autoneg \
| SUPPORTED_MII)
-#define GFAR_SUPPORTED_GBIT (SUPPORTED_1000baseT_Full \
- | SUPPORTED_Pause \
- | SUPPORTED_Asym_Pause)
+#define GFAR_SUPPORTED_GBIT SUPPORTED_1000baseT_Full
/* TBI register addresses */
#define MII_TBICON 0x11
--
1.7.11.7
^ permalink raw reply related
* [PATCH net-next v2 2/3] Fix the way the local advertising flow options are determined
From: Matei Pavaluca @ 2014-10-27 8:42 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, Claudiu Manoil, Pavaluca Matei-B46610
In-Reply-To: <1414399364-22308-1-git-send-email-matei.pavaluca@freescale.com>
From: Pavaluca Matei-B46610 <matei.pavaluca@freescale.com>
Local flow control options needed in order to resolve the negotiation
are incorrectly calculated.
Previously 'mii_advertise_flowctrl' was called to determine the local advertising
options, but these were determined based on FLOW_CTRL_RX/TX flags which are
never set through ethtool.
The patch simply translates from ethtool flow options to mii flow options.
Signed-off-by: Pavaluca Matei <matei.pavaluca@freescale.com>
---
- none
drivers/net/ethernet/freescale/gianfar.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 2485b74..329efca 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -3373,7 +3373,11 @@ static u32 gfar_get_flowctrl_cfg(struct gfar_private *priv)
if (phydev->asym_pause)
rmt_adv |= LPA_PAUSE_ASYM;
- lcl_adv = mii_advertise_flowctrl(phydev->advertising);
+ lcl_adv = 0;
+ if (phydev->advertising & ADVERTISED_Pause)
+ lcl_adv |= ADVERTISE_PAUSE_CAP;
+ if (phydev->advertising & ADVERTISED_Asym_Pause)
+ lcl_adv |= ADVERTISE_PAUSE_ASYM;
flowctrl = mii_resolve_flowctrl_fdx(lcl_adv, rmt_adv);
if (flowctrl & FLOW_CTRL_TX)
--
1.7.11.7
^ permalink raw reply related
* [PATCH net-next v2 3/3] gianfar: Implement PAUSE frame generation support
From: Matei Pavaluca @ 2014-10-27 8:42 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, Claudiu Manoil, Matei Pavaluca
In-Reply-To: <1414399364-22308-1-git-send-email-matei.pavaluca@freescale.com>
The hardware can automatically generate pause frames when the number
of free buffers drops under a certain threshold, but in order to do this,
the address of the last free buffer needs to be written to a specific
register for each RX queue.
This has to be done in 'gfar_clean_rx_ring' which is called for each
RX queue. In order not to impact performance, by adding a register write
for each incoming packet, this operation is done only when the PAUSE frame
transmission is enabled.
Whenever the link is readjusted, this capability is turned on or off.
Signed-off-by: Matei Pavaluca <matei.pavaluca@freescale.com>
---
v2:
- Added unlikely() in 'gfar_clean_rx_ring'
- Removed some unused variables
drivers/net/ethernet/freescale/gianfar.c | 54 ++++++++++++++++++++++++
drivers/net/ethernet/freescale/gianfar.h | 34 ++++++++++++++-
drivers/net/ethernet/freescale/gianfar_ethtool.c | 7 ++-
3 files changed, 93 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/freescale/gianfar.c b/drivers/net/ethernet/freescale/gianfar.c
index 329efca..86dccb26 100644
--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -173,10 +173,12 @@ static void gfar_init_rxbdp(struct gfar_priv_rx_q *rx_queue, struct rxbd8 *bdp,
static int gfar_init_bds(struct net_device *ndev)
{
struct gfar_private *priv = netdev_priv(ndev);
+ struct gfar __iomem *regs = priv->gfargrp[0].regs;
struct gfar_priv_tx_q *tx_queue = NULL;
struct gfar_priv_rx_q *rx_queue = NULL;
struct txbd8 *txbdp;
struct rxbd8 *rxbdp;
+ u32 *rfbptr;
int i, j;
for (i = 0; i < priv->num_tx_queues; i++) {
@@ -201,6 +203,7 @@ static int gfar_init_bds(struct net_device *ndev)
txbdp->status |= TXBD_WRAP;
}
+ rfbptr = ®s->rfbptr0;
for (i = 0; i < priv->num_rx_queues; i++) {
rx_queue = priv->rx_queue[i];
rx_queue->cur_rx = rx_queue->rx_bd_base;
@@ -227,6 +230,8 @@ static int gfar_init_bds(struct net_device *ndev)
rxbdp++;
}
+ rx_queue->rfbptr = rfbptr;
+ rfbptr += 2;
}
return 0;
@@ -336,6 +341,20 @@ static void gfar_init_tx_rx_base(struct gfar_private *priv)
}
}
+static void gfar_init_rqprm(struct gfar_private *priv)
+{
+ struct gfar __iomem *regs = priv->gfargrp[0].regs;
+ u32 __iomem *baddr;
+ int i;
+
+ baddr = ®s->rqprm0;
+ for (i = 0; i < priv->num_rx_queues; i++) {
+ gfar_write(baddr, priv->rx_queue[i]->rx_ring_size |
+ (DEFAULT_RX_LFC_THR << FBTHR_SHIFT));
+ baddr++;
+ }
+}
+
static void gfar_rx_buff_size_config(struct gfar_private *priv)
{
int frame_size = priv->ndev->mtu + ETH_HLEN + ETH_FCS_LEN;
@@ -396,6 +415,13 @@ static void gfar_mac_rx_config(struct gfar_private *priv)
if (priv->ndev->features & NETIF_F_HW_VLAN_CTAG_RX)
rctrl |= RCTRL_VLEX | RCTRL_PRSDEP_INIT;
+ /* Clear the LFC bit */
+ gfar_write(®s->rctrl, rctrl);
+ /* Init flow control threshold values */
+ gfar_init_rqprm(priv);
+ gfar_write(®s->ptv, DEFAULT_LFC_PTVVAL);
+ rctrl |= RCTRL_LFC;
+
/* Init rctrl based on our settings */
gfar_write(®s->rctrl, rctrl);
}
@@ -2859,6 +2885,10 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit)
/* Setup the new bdp */
gfar_new_rxbdp(rx_queue, bdp, newskb);
+ /* Update Last Free RxBD pointer for LFC */
+ if (unlikely(rx_queue->rfbptr && priv->tx_actual_en))
+ gfar_write(rx_queue->rfbptr, (u32)bdp);
+
/* Update to the next pointer */
bdp = next_bd(bdp, base, rx_queue->rx_ring_size);
@@ -3393,6 +3423,9 @@ static noinline void gfar_update_link_state(struct gfar_private *priv)
{
struct gfar __iomem *regs = priv->gfargrp[0].regs;
struct phy_device *phydev = priv->phydev;
+ struct gfar_priv_rx_q *rx_queue = NULL;
+ int i;
+ struct rxbd8 *bdp;
if (unlikely(test_bit(GFAR_RESETTING, &priv->state)))
return;
@@ -3401,6 +3434,7 @@ static noinline void gfar_update_link_state(struct gfar_private *priv)
u32 tempval1 = gfar_read(®s->maccfg1);
u32 tempval = gfar_read(®s->maccfg2);
u32 ecntrl = gfar_read(®s->ecntrl);
+ u32 tx_flow_oldval = (tempval & MACCFG1_TX_FLOW);
if (phydev->duplex != priv->oldduplex) {
if (!(phydev->duplex))
@@ -3445,6 +3479,26 @@ static noinline void gfar_update_link_state(struct gfar_private *priv)
tempval1 &= ~(MACCFG1_TX_FLOW | MACCFG1_RX_FLOW);
tempval1 |= gfar_get_flowctrl_cfg(priv);
+ /* Turn last free buffer recording on */
+ if ((tempval1 & MACCFG1_TX_FLOW) && !tx_flow_oldval) {
+ for (i = 0; i < priv->num_rx_queues; i++) {
+ rx_queue = priv->rx_queue[i];
+ bdp = rx_queue->cur_rx;
+ /* skip to previous bd */
+ bdp = skip_bd(bdp, rx_queue->rx_ring_size - 1,
+ rx_queue->rx_bd_base,
+ rx_queue->rx_ring_size);
+
+ if (rx_queue->rfbptr)
+ gfar_write(rx_queue->rfbptr, (u32)bdp);
+ }
+
+ priv->tx_actual_en = 1;
+ }
+
+ if (unlikely(!(tempval1 & MACCFG1_TX_FLOW) && tx_flow_oldval))
+ priv->tx_actual_en = 0;
+
gfar_write(®s->maccfg1, tempval1);
gfar_write(®s->maccfg2, tempval);
gfar_write(®s->ecntrl, ecntrl);
diff --git a/drivers/net/ethernet/freescale/gianfar.h b/drivers/net/ethernet/freescale/gianfar.h
index 6b00868..b581b88 100644
--- a/drivers/net/ethernet/freescale/gianfar.h
+++ b/drivers/net/ethernet/freescale/gianfar.h
@@ -99,6 +99,10 @@ extern const char gfar_driver_version[];
#define GFAR_MAX_FIFO_STARVE 511
#define GFAR_MAX_FIFO_STARVE_OFF 511
+#define FBTHR_SHIFT 24
+#define DEFAULT_RX_LFC_THR 16
+#define DEFAULT_LFC_PTVVAL 4
+
#define DEFAULT_RX_BUFFER_SIZE 1536
#define TX_RING_MOD_MASK(size) (size-1)
#define RX_RING_MOD_MASK(size) (size-1)
@@ -273,6 +277,7 @@ extern const char gfar_driver_version[];
#define RCTRL_TS_ENABLE 0x01000000
#define RCTRL_PAL_MASK 0x001f0000
+#define RCTRL_LFC 0x00004000
#define RCTRL_VLEX 0x00002000
#define RCTRL_FILREN 0x00001000
#define RCTRL_GHTX 0x00000400
@@ -849,7 +854,32 @@ struct gfar {
u8 res23c[248];
u32 attr; /* 0x.bf8 - Attributes Register */
u32 attreli; /* 0x.bfc - Attributes Extract Length and Extract Index Register */
- u8 res24[688];
+ u32 rqprm0; /* 0x.c00 - Receive queue parameters register 0 */
+ u32 rqprm1; /* 0x.c04 - Receive queue parameters register 1 */
+ u32 rqprm2; /* 0x.c08 - Receive queue parameters register 2 */
+ u32 rqprm3; /* 0x.c0c - Receive queue parameters register 3 */
+ u32 rqprm4; /* 0x.c10 - Receive queue parameters register 4 */
+ u32 rqprm5; /* 0x.c14 - Receive queue parameters register 5 */
+ u32 rqprm6; /* 0x.c18 - Receive queue parameters register 6 */
+ u32 rqprm7; /* 0x.c1c - Receive queue parameters register 7 */
+ u8 res24[36];
+ u32 rfbptr0; /* 0x.c44 - Last free RxBD pointer for ring 0 */
+ u8 res24a[4];
+ u32 rfbptr1; /* 0x.c4c - Last free RxBD pointer for ring 1 */
+ u8 res24b[4];
+ u32 rfbptr2; /* 0x.c54 - Last free RxBD pointer for ring 2 */
+ u8 res24c[4];
+ u32 rfbptr3; /* 0x.c5c - Last free RxBD pointer for ring 3 */
+ u8 res24d[4];
+ u32 rfbptr4; /* 0x.c64 - Last free RxBD pointer for ring 4 */
+ u8 res24e[4];
+ u32 rfbptr5; /* 0x.c6c - Last free RxBD pointer for ring 5 */
+ u8 res24f[4];
+ u32 rfbptr6; /* 0x.c74 - Last free RxBD pointer for ring 6 */
+ u8 res24g[4];
+ u32 rfbptr7; /* 0x.c7c - Last free RxBD pointer for ring 7 */
+ u8 res24h[4];
+ u8 res24x[556];
u32 isrg0; /* 0x.eb0 - Interrupt steering group 0 register */
u32 isrg1; /* 0x.eb4 - Interrupt steering group 1 register */
u32 isrg2; /* 0x.eb8 - Interrupt steering group 2 register */
@@ -1009,6 +1039,7 @@ struct gfar_priv_rx_q {
/* RX Coalescing values */
unsigned char rxcoalescing;
unsigned long rxic;
+ u32 *rfbptr;
};
enum gfar_irqinfo_id {
@@ -1099,6 +1130,7 @@ struct gfar_private {
unsigned int num_tx_queues;
unsigned int num_rx_queues;
unsigned int num_grps;
+ int tx_actual_en;
/* Network Statistics */
struct gfar_extra_stats extra_stats;
diff --git a/drivers/net/ethernet/freescale/gianfar_ethtool.c b/drivers/net/ethernet/freescale/gianfar_ethtool.c
index 76d7070..3e1a9c1 100644
--- a/drivers/net/ethernet/freescale/gianfar_ethtool.c
+++ b/drivers/net/ethernet/freescale/gianfar_ethtool.c
@@ -579,8 +579,13 @@ static int gfar_spauseparam(struct net_device *dev,
u32 tempval;
tempval = gfar_read(®s->maccfg1);
tempval &= ~(MACCFG1_TX_FLOW | MACCFG1_RX_FLOW);
- if (priv->tx_pause_en)
+
+ priv->tx_actual_en = 0;
+ if (priv->tx_pause_en) {
+ priv->tx_actual_en = 1;
tempval |= MACCFG1_TX_FLOW;
+ }
+
if (priv->rx_pause_en)
tempval |= MACCFG1_RX_FLOW;
gfar_write(®s->maccfg1, tempval);
--
1.7.11.7
^ permalink raw reply related
* Re: Fw: [Bug 86851] New: Reproducible panic on heavy UDP traffic
From: Nikolay Aleksandrov @ 2014-10-27 8:48 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Florian Westphal, Stephen Hemminger, netdev, chutzpah
In-Reply-To: <1414370826.16231.1.camel@edumazet-glaptop2.roam.corp.google.com>
On 10/27/2014 01:47 AM, Eric Dumazet wrote:
> On Mon, 2014-10-27 at 00:28 +0100, Nikolay Aleksandrov wrote:
>
>>
>> Thanks for CCing me.
>> I'll dig in the code tomorrow but my first thought when I saw this was
>> could it be possible that we have a race condition between
>> ip_frag_queue() and inet_frag_evict(), more precisely between the
>> ipq_kill() calls from ip_frag_queue and inet_frag_evict since the frag
>> could be found before we have entered the evictor which then can add it to
>> its expire list but the ipq_kill() from ip_frag_queue() can do a list_del
>> after we release the chain lock in the evictor so we may end up like this ?
>
> Yes, either we use hlist_del_init() but loose poison aid, or test if
> frag was evicted :
>
> Not sure about refcount.
>
> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
> index 9eb89f3f0ee4..894ec30c5896 100644
> --- a/net/ipv4/inet_fragment.c
> +++ b/net/ipv4/inet_fragment.c
> @@ -285,7 +285,8 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
> struct inet_frag_bucket *hb;
>
> hb = get_frag_bucket_locked(fq, f);
> - hlist_del(&fq->list);
> + if (!(fq->flags & INET_FRAG_EVICTED))
> + hlist_del(&fq->list);
> spin_unlock(&hb->chain_lock);
> }
>
>
>
Exactly, I was thinking about a similar fix since the evict flag is only
set with the chain lock. IMO the refcount should be fine.
CCing the reporter.
Patrick could you please try Eric's patch ?
^ permalink raw reply
* Re: [PATCH v2 09/15] net: dsa: Add support for switch EEPROM access
From: Richard Cochran @ 2014-10-27 8:50 UTC (permalink / raw)
To: Guenter Roeck
Cc: Andrew Lunn, netdev, David S. Miller, Florian Fainelli,
linux-kernel
In-Reply-To: <544DC264.9020600@roeck-us.net>
On Sun, Oct 26, 2014 at 08:56:20PM -0700, Guenter Roeck wrote:
>
> Also, it seems that you request two separate properties, one for presence
> and another for length. Is that correct ? Again, I thought that would not
> provide any value since presence is indicated by length != 0 in the ethtool
> callback function. No problem, though, I'll be happy to create two separate
> properties and platform data variables if you think that would be better.
The fewer properties, the better.
Thanks,
Richard
^ permalink raw reply
* Re: Fw: [Bug 86851] New: Reproducible panic on heavy UDP traffic
From: Nikolay Aleksandrov @ 2014-10-27 9:12 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Florian Westphal, Stephen Hemminger, netdev, chutzpah
In-Reply-To: <544E06CF.30709@redhat.com>
On 10/27/2014 09:48 AM, Nikolay Aleksandrov wrote:
> On 10/27/2014 01:47 AM, Eric Dumazet wrote:
>> On Mon, 2014-10-27 at 00:28 +0100, Nikolay Aleksandrov wrote:
>>
>>>
>>> Thanks for CCing me.
>>> I'll dig in the code tomorrow but my first thought when I saw this was
>>> could it be possible that we have a race condition between
>>> ip_frag_queue() and inet_frag_evict(), more precisely between the
>>> ipq_kill() calls from ip_frag_queue and inet_frag_evict since the frag
>>> could be found before we have entered the evictor which then can add it to
>>> its expire list but the ipq_kill() from ip_frag_queue() can do a list_del
>>> after we release the chain lock in the evictor so we may end up like this ?
>>
>> Yes, either we use hlist_del_init() but loose poison aid, or test if
>> frag was evicted :
>>
>> Not sure about refcount.
>>
>> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
>> index 9eb89f3f0ee4..894ec30c5896 100644
>> --- a/net/ipv4/inet_fragment.c
>> +++ b/net/ipv4/inet_fragment.c
>> @@ -285,7 +285,8 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
>> struct inet_frag_bucket *hb;
>>
>> hb = get_frag_bucket_locked(fq, f);
>> - hlist_del(&fq->list);
>> + if (!(fq->flags & INET_FRAG_EVICTED))
>> + hlist_del(&fq->list);
>> spin_unlock(&hb->chain_lock);
>> }
>>
>>
>>
>
> Exactly, I was thinking about a similar fix since the evict flag is only
> set with the chain lock. IMO the refcount should be fine.
> CCing the reporter.
> Patrick could you please try Eric's patch ?
Actually there might be a refcnt problem. What if the following happens:
refcnt = 2 (1 for timer, 1 init)
A B
inet_frag_find() +1 (3)
ipq_kill() -2 (1) inet_frag_evict() (sees it before fq_unlink)
spin_unlock(q.lock)
->frag_expire() -1 (0 -> frag_destroy())
*ipq_put(freed frag)*
Or the other way around, last ipq_put() is executed before frag_expire
so we have a freed frag on the expire list.
^ permalink raw reply
* Re: Fw: [Bug 86851] New: Reproducible panic on heavy UDP traffic
From: Nikolay Aleksandrov @ 2014-10-27 9:32 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Florian Westphal, Stephen Hemminger, netdev, chutzpah
In-Reply-To: <544E0C99.6070100@redhat.com>
On 10/27/2014 10:12 AM, Nikolay Aleksandrov wrote:
> On 10/27/2014 09:48 AM, Nikolay Aleksandrov wrote:
>> On 10/27/2014 01:47 AM, Eric Dumazet wrote:
>>> On Mon, 2014-10-27 at 00:28 +0100, Nikolay Aleksandrov wrote:
>>>
>>>>
>>>> Thanks for CCing me.
>>>> I'll dig in the code tomorrow but my first thought when I saw this was
>>>> could it be possible that we have a race condition between
>>>> ip_frag_queue() and inet_frag_evict(), more precisely between the
>>>> ipq_kill() calls from ip_frag_queue and inet_frag_evict since the frag
>>>> could be found before we have entered the evictor which then can add it to
>>>> its expire list but the ipq_kill() from ip_frag_queue() can do a list_del
>>>> after we release the chain lock in the evictor so we may end up like this ?
>>>
>>> Yes, either we use hlist_del_init() but loose poison aid, or test if
>>> frag was evicted :
>>>
>>> Not sure about refcount.
>>>
>>> diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
>>> index 9eb89f3f0ee4..894ec30c5896 100644
>>> --- a/net/ipv4/inet_fragment.c
>>> +++ b/net/ipv4/inet_fragment.c
>>> @@ -285,7 +285,8 @@ static inline void fq_unlink(struct inet_frag_queue *fq, struct inet_frags *f)
>>> struct inet_frag_bucket *hb;
>>>
>>> hb = get_frag_bucket_locked(fq, f);
>>> - hlist_del(&fq->list);
>>> + if (!(fq->flags & INET_FRAG_EVICTED))
>>> + hlist_del(&fq->list);
>>> spin_unlock(&hb->chain_lock);
>>> }
>>>
>>>
>>>
>>
>> Exactly, I was thinking about a similar fix since the evict flag is only
>> set with the chain lock. IMO the refcount should be fine.
>> CCing the reporter.
>> Patrick could you please try Eric's patch ?
>
> Actually there might be a refcnt problem. What if the following happens:
> refcnt = 2 (1 for timer, 1 init)
> A B
> inet_frag_find() +1 (3)
> ipq_kill() -2 (1) inet_frag_evict() (sees it before fq_unlink)
> spin_unlock(q.lock)
> ->frag_expire() -1 (0 -> frag_destroy())
> *ipq_put(freed frag)*
>
> Or the other way around, last ipq_put() is executed before frag_expire
> so we have a freed frag on the expire list.
>
>
Ah, I forgot about the del_timer(), only one of the two can actually delete
it so in the case inet_frag_kill() deletes it, it'll take the timer refcnt
but inet_evict_frag() won't add the frag to the list (and we may actually
hit the WARN_ON(refcnt!=1) in inet_evict_frag), in the other case
inet_frag_evict() deletes the timer but the refcnt for it stays and we're
fine w.r.t. inet_frag_kill. I think we should also remove the WARN_ON() in
inet_frag_evict as the timer running is not the only case to be in that
if() branch, inet_frag_evict could be running with inet_frag_kill() being
called by *frag_queue() with taken refcount but just after deleting the timer.
Does this sound sane ? :-) I need to get some coffee...
^ permalink raw reply
* [PATCH net-next 01/13] net/mlx4_core: Introduce mlx4_get_module_info for cable module info reading
From: Amir Vadai @ 2014-10-27 9:37 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Yevgeny Petrilin, Or Gerlitz, Amir Vadai, Saeed Mahameed
In-Reply-To: <1414402667-8841-1-git-send-email-amirv@mellanox.com>
From: Saeed Mahameed <saeedm@mellanox.com>
Added new MAD_IFC command to read cable module info with attribute id (0xFF60).
Update include/linux/mlx4/device.h with function declaration (mlx4_get_module_info)
and the needed defines/enums for future use.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/port.c | 156 ++++++++++++++++++++++++++++++
include/linux/mlx4/device.h | 30 ++++++
2 files changed, 186 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx4/port.c b/drivers/net/ethernet/mellanox/mlx4/port.c
index 94eeb2c..30eb1ea 100644
--- a/drivers/net/ethernet/mellanox/mlx4/port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/port.c
@@ -1311,3 +1311,159 @@ int mlx4_get_roce_gid_from_slave(struct mlx4_dev *dev, int port, int slave_id,
return 0;
}
EXPORT_SYMBOL(mlx4_get_roce_gid_from_slave);
+
+/* Cable Module Info */
+#define MODULE_INFO_MAX_READ 48
+
+#define I2C_ADDR_LOW 0x50
+#define I2C_ADDR_HIGH 0x51
+#define I2C_PAGE_SIZE 256
+
+/* Module Info Data */
+struct mlx4_cable_info {
+ u8 i2c_addr;
+ u8 page_num;
+ __be16 dev_mem_address;
+ __be16 reserved1;
+ __be16 size;
+ __be32 reserved2[2];
+ u8 data[MODULE_INFO_MAX_READ];
+};
+
+enum cable_info_err {
+ CABLE_INF_INV_PORT = 0x1,
+ CABLE_INF_OP_NOSUP = 0x2,
+ CABLE_INF_NOT_CONN = 0x3,
+ CABLE_INF_NO_EEPRM = 0x4,
+ CABLE_INF_PAGE_ERR = 0x5,
+ CABLE_INF_INV_ADDR = 0x6,
+ CABLE_INF_I2C_ADDR = 0x7,
+ CABLE_INF_QSFP_VIO = 0x8,
+ CABLE_INF_I2C_BUSY = 0x9,
+};
+
+#define MAD_STATUS_2_CABLE_ERR(mad_status) ((mad_status >> 8) & 0xFF)
+
+static inline const char *cable_info_mad_err_str(u16 mad_status)
+{
+ u8 err = MAD_STATUS_2_CABLE_ERR(mad_status);
+
+ switch (err) {
+ case CABLE_INF_INV_PORT:
+ return "invalid port selected";
+ case CABLE_INF_OP_NOSUP:
+ return "operation not supported for this port (the port is of type CX4 or internal)";
+ case CABLE_INF_NOT_CONN:
+ return "cable is not connected";
+ case CABLE_INF_NO_EEPRM:
+ return "the connected cable has no EPROM (passive copper cable)";
+ case CABLE_INF_PAGE_ERR:
+ return "page number is greater than 15";
+ case CABLE_INF_INV_ADDR:
+ return "invalid device_address or size (that is, size equals 0 or address+size is greater than 256)";
+ case CABLE_INF_I2C_ADDR:
+ return "invalid I2C slave address";
+ case CABLE_INF_QSFP_VIO:
+ return "at least one cable violates the QSFP specification and ignores the modsel signal";
+ case CABLE_INF_I2C_BUSY:
+ return "I2C bus is constantly busy";
+ }
+ return "Unknown Error";
+}
+
+/**
+ * mlx4_get_module_info - Read cable module eeprom data
+ * @dev: mlx4_dev.
+ * @port: port number.
+ * @offset: byte offset in eeprom to start reading data from.
+ * @size: num of bytes to read.
+ * @data: output buffer to put the requested data into.
+ *
+ * Reads cable module eeprom data, puts the outcome data into
+ * data pointer paramer.
+ * Returns num of read bytes on success or a negative error
+ * code.
+ */
+int mlx4_get_module_info(struct mlx4_dev *dev, u8 port,
+ u16 offset, u16 size, u8 *data)
+{
+ struct mlx4_cmd_mailbox *inbox, *outbox;
+ struct mlx4_mad_ifc *inmad, *outmad;
+ struct mlx4_cable_info *cable_info;
+ u16 i2c_addr;
+ int ret;
+
+ if (size > MODULE_INFO_MAX_READ)
+ size = MODULE_INFO_MAX_READ;
+
+ inbox = mlx4_alloc_cmd_mailbox(dev);
+ if (IS_ERR(inbox))
+ return PTR_ERR(inbox);
+
+ outbox = mlx4_alloc_cmd_mailbox(dev);
+ if (IS_ERR(outbox)) {
+ mlx4_free_cmd_mailbox(dev, inbox);
+ return PTR_ERR(outbox);
+ }
+
+ inmad = (struct mlx4_mad_ifc *)(inbox->buf);
+ outmad = (struct mlx4_mad_ifc *)(outbox->buf);
+
+ inmad->method = 0x1; /* Get */
+ inmad->class_version = 0x1;
+ inmad->mgmt_class = 0x1;
+ inmad->base_version = 0x1;
+ inmad->attr_id = cpu_to_be16(0xFF60); /* Module Info */
+
+ if (offset < I2C_PAGE_SIZE && offset + size > I2C_PAGE_SIZE)
+ /* Cross pages reads are not allowed
+ * read until offset 256 in low page
+ */
+ size -= offset + size - I2C_PAGE_SIZE;
+
+ i2c_addr = I2C_ADDR_LOW;
+ if (offset >= I2C_PAGE_SIZE) {
+ /* Reset offset to high page */
+ i2c_addr = I2C_ADDR_HIGH;
+ offset -= I2C_PAGE_SIZE;
+ }
+
+ cable_info = (struct mlx4_cable_info *)inmad->data;
+ cable_info->dev_mem_address = cpu_to_be16(offset);
+ cable_info->page_num = 0;
+ cable_info->i2c_addr = i2c_addr;
+ cable_info->size = cpu_to_be16(size);
+
+ ret = mlx4_cmd_box(dev, inbox->dma, outbox->dma, port, 3,
+ MLX4_CMD_MAD_IFC, MLX4_CMD_TIME_CLASS_C,
+ MLX4_CMD_NATIVE);
+ if (ret)
+ goto out;
+
+ if (be16_to_cpu(outmad->status)) {
+ /* Mad returned with bad status */
+ ret = be16_to_cpu(outmad->status);
+ mlx4_warn(dev,
+ "MLX4_CMD_MAD_IFC Get Module info attr(%x) port(%d) i2c_addr(%x) offset(%d) size(%d): Response Mad Status(%x) - %s\n",
+ 0xFF60, port, i2c_addr, offset, size,
+ ret, cable_info_mad_err_str(ret));
+
+ if (i2c_addr == I2C_ADDR_HIGH &&
+ MAD_STATUS_2_CABLE_ERR(ret) == CABLE_INF_I2C_ADDR)
+ /* Some SFP cables do not support i2c slave
+ * address 0x51 (high page), abort silently.
+ */
+ ret = 0;
+ else
+ ret = -ret;
+ goto out;
+ }
+ cable_info = (struct mlx4_cable_info *)outmad->data;
+ memcpy(data, cable_info->data, size);
+ ret = size;
+out:
+ mlx4_free_cmd_mailbox(dev, inbox);
+ mlx4_free_cmd_mailbox(dev, outbox);
+ return ret;
+}
+EXPORT_SYMBOL(mlx4_get_module_info);
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 37e4404..73910da 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -379,6 +379,13 @@ enum {
#define MSTR_SM_CHANGE_MASK (MLX4_EQ_PORT_INFO_MSTR_SM_SL_CHANGE_MASK | \
MLX4_EQ_PORT_INFO_MSTR_SM_LID_CHANGE_MASK)
+enum mlx4_module_id {
+ MLX4_MODULE_ID_SFP = 0x3,
+ MLX4_MODULE_ID_QSFP = 0xC,
+ MLX4_MODULE_ID_QSFP_PLUS = 0xD,
+ MLX4_MODULE_ID_QSFP28 = 0x11,
+};
+
static inline u64 mlx4_fw_ver(u64 major, u64 minor, u64 subminor)
{
return (major << 32) | (minor << 16) | subminor;
@@ -799,6 +806,26 @@ struct mlx4_init_port_param {
u64 si_guid;
};
+#define MAD_IFC_DATA_SZ 192
+/* MAD IFC Mailbox */
+struct mlx4_mad_ifc {
+ u8 base_version;
+ u8 mgmt_class;
+ u8 class_version;
+ u8 method;
+ __be16 status;
+ __be16 class_specific;
+ __be64 tid;
+ __be16 attr_id;
+ __be16 resv;
+ __be32 attr_mod;
+ __be64 mkey;
+ __be16 dr_slid;
+ __be16 dr_dlid;
+ u8 reserved[28];
+ u8 data[MAD_IFC_DATA_SZ];
+} __packed;
+
#define mlx4_foreach_port(port, dev, type) \
for ((port) = 1; (port) <= (dev)->caps.num_ports; (port)++) \
if ((type) == (dev)->caps.port_mask[(port)])
@@ -1283,6 +1310,9 @@ int mlx4_mr_rereg_mem_write(struct mlx4_dev *dev, struct mlx4_mr *mr,
u64 iova, u64 size, int npages,
int page_shift, struct mlx4_mpt_entry *mpt_entry);
+int mlx4_get_module_info(struct mlx4_dev *dev, u8 port,
+ u16 offset, u16 size, u8 *data);
+
/* Returns true if running in low memory profile (kdump kernel) */
static inline bool mlx4_low_memory_profile(void)
{
--
1.8.3.4
^ permalink raw reply related
* [PATCH net-next 00/13] Mellanox ethernet driver update Oct-27-2014
From: Amir Vadai @ 2014-10-27 9:37 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Yevgeny Petrilin, Or Gerlitz, Amir Vadai
Hi,
This patchset introduces some small bug fixes, support in get/set of
vlan offload and get/set/capabilities of the link.
First 7 patches by Saeed, add support in setting/getting link speed and getting
cable capabilities.
Next 2 patches also by Saeed, enable the user to turn rx/tx vlan offloading on
and off.
Jenni fixed a bug in error flow during device initalization.
Ido and Jack fixed some code duplication and errors discovered by static checker.
last patch by me is a fix to make ethtool report the actual rings used by
indirection QP.
Patches were applied and tested against commit 61ed53d ("Merge tag 'ntb-3.18'
of git://github.com/jonmason/ntb")
Thanks,
Amir
Amir Vadai (1):
net/mlx4_en: Report actual number of rings in indirection table
Eugenia Emantayev (1):
net/mlx4_en: Move spinlocks and work initalizations to beginning of
init_netdev
Ido Shamay (1):
net/mlx4_en: Call napi_synchronize on stop_port
Jack Morgenstein (1):
net/mlx4_en: Cleanups suggested by clang static checker
Saeed Mahameed (9):
net/mlx4_core: Introduce mlx4_get_module_info for cable module info
reading
ethtool,net/mlx4_en: Cable info, get_module_info/eeprom ethtool
support
net/mlx4_core: Introduce ACCESS_REG CMD and eth_prot_ctrl dev cap
net/mlx4_core: Add ethernet backplane autoneg device capability
ethtool,net/mlx4_en: Add 100M, 20G, 56G speeds ethtool reporting
support
net/mlx4_en: Use PTYS register to query ethtool settings
net/mlx4_en: Use PTYS register to set ethtool settings (Speed)
net/mlx4_en: Add support for setting rxvlan offload OFF/ON
net/mlx4_en: Add ethtool support for [rx|tx]vlan offload set to OFF/ON
drivers/net/ethernet/mellanox/mlx4/cmd.c | 9 +
drivers/net/ethernet/mellanox/mlx4/en_clock.c | 46 ---
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 503 ++++++++++++++++++++++-
drivers/net/ethernet/mellanox/mlx4/en_main.c | 12 +-
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 127 +++++-
drivers/net/ethernet/mellanox/mlx4/en_port.c | 24 +-
drivers/net/ethernet/mellanox/mlx4/en_port.h | 35 +-
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 1 -
drivers/net/ethernet/mellanox/mlx4/en_selftest.c | 12 +-
drivers/net/ethernet/mellanox/mlx4/fw.c | 123 +++++-
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 18 +-
drivers/net/ethernet/mellanox/mlx4/port.c | 156 +++++++
include/linux/mlx4/cmd.h | 2 +
include/linux/mlx4/device.h | 71 +++-
include/uapi/linux/ethtool.h | 18 +-
15 files changed, 1044 insertions(+), 113 deletions(-)
--
1.8.3.4
^ permalink raw reply
* [PATCH net-next 06/13] net/mlx4_en: Use PTYS register to query ethtool settings
From: Amir Vadai @ 2014-10-27 9:37 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Yevgeny Petrilin, Or Gerlitz, Amir Vadai, Saeed Mahameed
In-Reply-To: <1414402667-8841-1-git-send-email-amirv@mellanox.com>
From: Saeed Mahameed <saeedm@mellanox.com>
- If dev cap MLX4_DEV_CAP_FLAG2_ETH_PROT_CTRL is ON, query PTYS register to fill ethtool settings.
else use default values.
- Use autoneg port cap and dev backplane autoneg cap to reprort autoneg interface capbilities.
- Fix typo in mlx4_en_port_state struct field (transciver to transceiver).
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 311 +++++++++++++++++++++++-
drivers/net/ethernet/mellanox/mlx4/en_port.c | 9 +-
drivers/net/ethernet/mellanox/mlx4/en_port.h | 26 +-
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 8 +-
4 files changed, 338 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 279f423..64b6743 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -375,7 +375,277 @@ static void mlx4_en_get_strings(struct net_device *dev,
}
}
-static int mlx4_en_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+static u32 mlx4_en_autoneg_get(struct net_device *dev)
+{
+ struct mlx4_en_priv *priv = netdev_priv(dev);
+ struct mlx4_en_dev *mdev = priv->mdev;
+ u32 autoneg = AUTONEG_DISABLE;
+
+ if ((mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_ETH_BACKPL_AN_REP) &&
+ (priv->port_state.flags & MLX4_EN_PORT_ANE))
+ autoneg = AUTONEG_ENABLE;
+
+ return autoneg;
+}
+
+static u32 ptys_get_supported_port(struct mlx4_ptys_reg *ptys_reg)
+{
+ u32 eth_proto = be32_to_cpu(ptys_reg->eth_proto_cap);
+
+ if (eth_proto & (MLX4_PROT_MASK(MLX4_10GBASE_T)
+ | MLX4_PROT_MASK(MLX4_1000BASE_T)
+ | MLX4_PROT_MASK(MLX4_100BASE_TX))) {
+ return SUPPORTED_TP;
+ }
+
+ if (eth_proto & (MLX4_PROT_MASK(MLX4_10GBASE_CR)
+ | MLX4_PROT_MASK(MLX4_10GBASE_SR)
+ | MLX4_PROT_MASK(MLX4_56GBASE_SR4)
+ | MLX4_PROT_MASK(MLX4_40GBASE_CR4)
+ | MLX4_PROT_MASK(MLX4_40GBASE_SR4)
+ | MLX4_PROT_MASK(MLX4_1000BASE_CX_SGMII))) {
+ return SUPPORTED_FIBRE;
+ }
+
+ if (eth_proto & (MLX4_PROT_MASK(MLX4_56GBASE_KR4)
+ | MLX4_PROT_MASK(MLX4_40GBASE_KR4)
+ | MLX4_PROT_MASK(MLX4_20GBASE_KR2)
+ | MLX4_PROT_MASK(MLX4_10GBASE_KR)
+ | MLX4_PROT_MASK(MLX4_10GBASE_KX4)
+ | MLX4_PROT_MASK(MLX4_1000BASE_KX))) {
+ return SUPPORTED_Backplane;
+ }
+ return 0;
+}
+
+static u32 ptys_get_active_port(struct mlx4_ptys_reg *ptys_reg)
+{
+ u32 eth_proto = be32_to_cpu(ptys_reg->eth_proto_oper);
+
+ if (!eth_proto) /* link down */
+ eth_proto = be32_to_cpu(ptys_reg->eth_proto_cap);
+
+ if (eth_proto & (MLX4_PROT_MASK(MLX4_10GBASE_T)
+ | MLX4_PROT_MASK(MLX4_1000BASE_T)
+ | MLX4_PROT_MASK(MLX4_100BASE_TX))) {
+ return PORT_TP;
+ }
+
+ if (eth_proto & (MLX4_PROT_MASK(MLX4_10GBASE_SR)
+ | MLX4_PROT_MASK(MLX4_56GBASE_SR4)
+ | MLX4_PROT_MASK(MLX4_40GBASE_SR4)
+ | MLX4_PROT_MASK(MLX4_1000BASE_CX_SGMII))) {
+ return PORT_FIBRE;
+ }
+
+ if (eth_proto & (MLX4_PROT_MASK(MLX4_10GBASE_CR)
+ | MLX4_PROT_MASK(MLX4_56GBASE_CR4)
+ | MLX4_PROT_MASK(MLX4_40GBASE_CR4))) {
+ return PORT_DA;
+ }
+
+ if (eth_proto & (MLX4_PROT_MASK(MLX4_56GBASE_KR4)
+ | MLX4_PROT_MASK(MLX4_40GBASE_KR4)
+ | MLX4_PROT_MASK(MLX4_20GBASE_KR2)
+ | MLX4_PROT_MASK(MLX4_10GBASE_KR)
+ | MLX4_PROT_MASK(MLX4_10GBASE_KX4)
+ | MLX4_PROT_MASK(MLX4_1000BASE_KX))) {
+ return PORT_NONE;
+ }
+ return PORT_OTHER;
+}
+
+#define MLX4_LINK_MODES_SZ \
+ (FIELD_SIZEOF(struct mlx4_ptys_reg, eth_proto_cap) * 8)
+
+enum ethtool_report {
+ SUPPORTED = 0,
+ ADVERTISED = 1,
+ SPEED = 2
+};
+
+/* Translates mlx4 link mode to equivalent ethtool Link modes/speed */
+static u32 ptys2ethtool_map[MLX4_LINK_MODES_SZ][3] = {
+ [MLX4_100BASE_TX] = {
+ SUPPORTED_100baseT_Full,
+ ADVERTISED_100baseT_Full,
+ SPEED_100
+ },
+
+ [MLX4_1000BASE_T] = {
+ SUPPORTED_1000baseT_Full,
+ ADVERTISED_1000baseT_Full,
+ SPEED_1000
+ },
+ [MLX4_1000BASE_CX_SGMII] = {
+ SUPPORTED_1000baseKX_Full,
+ ADVERTISED_1000baseKX_Full,
+ SPEED_1000
+ },
+ [MLX4_1000BASE_KX] = {
+ SUPPORTED_1000baseKX_Full,
+ ADVERTISED_1000baseKX_Full,
+ SPEED_1000
+ },
+
+ [MLX4_10GBASE_T] = {
+ SUPPORTED_10000baseT_Full,
+ ADVERTISED_10000baseT_Full,
+ SPEED_10000
+ },
+ [MLX4_10GBASE_CX4] = {
+ SUPPORTED_10000baseKX4_Full,
+ ADVERTISED_10000baseKX4_Full,
+ SPEED_10000
+ },
+ [MLX4_10GBASE_KX4] = {
+ SUPPORTED_10000baseKX4_Full,
+ ADVERTISED_10000baseKX4_Full,
+ SPEED_10000
+ },
+ [MLX4_10GBASE_KR] = {
+ SUPPORTED_10000baseKR_Full,
+ ADVERTISED_10000baseKR_Full,
+ SPEED_10000
+ },
+ [MLX4_10GBASE_CR] = {
+ SUPPORTED_10000baseKR_Full,
+ ADVERTISED_10000baseKR_Full,
+ SPEED_10000
+ },
+ [MLX4_10GBASE_SR] = {
+ SUPPORTED_10000baseKR_Full,
+ ADVERTISED_10000baseKR_Full,
+ SPEED_10000
+ },
+
+ [MLX4_20GBASE_KR2] = {
+ SUPPORTED_20000baseMLD2_Full | SUPPORTED_20000baseKR2_Full,
+ ADVERTISED_20000baseMLD2_Full | ADVERTISED_20000baseKR2_Full,
+ SPEED_20000
+ },
+
+ [MLX4_40GBASE_CR4] = {
+ SUPPORTED_40000baseCR4_Full,
+ ADVERTISED_40000baseCR4_Full,
+ SPEED_40000
+ },
+ [MLX4_40GBASE_KR4] = {
+ SUPPORTED_40000baseKR4_Full,
+ ADVERTISED_40000baseKR4_Full,
+ SPEED_40000
+ },
+ [MLX4_40GBASE_SR4] = {
+ SUPPORTED_40000baseSR4_Full,
+ ADVERTISED_40000baseSR4_Full,
+ SPEED_40000
+ },
+
+ [MLX4_56GBASE_KR4] = {
+ SUPPORTED_56000baseKR4_Full,
+ ADVERTISED_56000baseKR4_Full,
+ SPEED_56000
+ },
+ [MLX4_56GBASE_CR4] = {
+ SUPPORTED_56000baseCR4_Full,
+ ADVERTISED_56000baseCR4_Full,
+ SPEED_56000
+ },
+ [MLX4_56GBASE_SR4] = {
+ SUPPORTED_56000baseSR4_Full,
+ ADVERTISED_56000baseSR4_Full,
+ SPEED_56000
+ },
+};
+
+static u32 ptys2ethtool_link_modes(u32 eth_proto, enum ethtool_report report)
+{
+ int i;
+ u32 link_modes = 0;
+
+ for (i = 0; i < MLX4_LINK_MODES_SZ; i++) {
+ if (eth_proto & MLX4_PROT_MASK(i))
+ link_modes |= ptys2ethtool_map[i][report];
+ }
+ return link_modes;
+}
+
+static int ethtool_get_ptys_settings(struct net_device *dev,
+ struct ethtool_cmd *cmd)
+{
+ struct mlx4_en_priv *priv = netdev_priv(dev);
+ struct mlx4_ptys_reg ptys_reg;
+ u32 eth_proto;
+ int ret;
+
+ memset(&ptys_reg, 0, sizeof(ptys_reg));
+ ptys_reg.local_port = priv->port;
+ ptys_reg.proto_mask = MLX4_PTYS_EN;
+ ret = mlx4_ACCESS_PTYS_REG(priv->mdev->dev,
+ MLX4_ACCESS_REG_QUERY, &ptys_reg);
+ if (ret) {
+ en_warn(priv, "Failed to run mlx4_ACCESS_PTYS_REG status(%x)",
+ ret);
+ return ret;
+ }
+ en_dbg(DRV, priv, "ptys_reg.proto_mask %x\n",
+ ptys_reg.proto_mask);
+ en_dbg(DRV, priv, "ptys_reg.eth_proto_cap %x\n",
+ be32_to_cpu(ptys_reg.eth_proto_cap));
+ en_dbg(DRV, priv, "ptys_reg.eth_proto_admin %x\n",
+ be32_to_cpu(ptys_reg.eth_proto_admin));
+ en_dbg(DRV, priv, "ptys_reg.eth_proto_oper %x\n",
+ be32_to_cpu(ptys_reg.eth_proto_oper));
+ en_dbg(DRV, priv, "ptys_reg.eth_proto_lp_adv %x\n",
+ be32_to_cpu(ptys_reg.eth_proto_lp_adv));
+
+ cmd->supported = 0;
+ cmd->advertising = 0;
+
+ cmd->supported |= ptys_get_supported_port(&ptys_reg);
+
+ eth_proto = be32_to_cpu(ptys_reg.eth_proto_cap);
+ cmd->supported |= ptys2ethtool_link_modes(eth_proto, SUPPORTED);
+
+ eth_proto = be32_to_cpu(ptys_reg.eth_proto_admin);
+ cmd->advertising |= ptys2ethtool_link_modes(eth_proto, ADVERTISED);
+
+ cmd->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
+ cmd->advertising |= (priv->prof->tx_pause) ? ADVERTISED_Pause : 0;
+
+ cmd->advertising |= (priv->prof->tx_pause ^ priv->prof->rx_pause) ?
+ ADVERTISED_Asym_Pause : 0;
+
+ cmd->port = ptys_get_active_port(&ptys_reg);
+ cmd->transceiver = (SUPPORTED_TP & cmd->supported) ?
+ XCVR_EXTERNAL : XCVR_INTERNAL;
+
+ if (mlx4_en_autoneg_get(dev)) {
+ cmd->supported |= SUPPORTED_Autoneg;
+ cmd->advertising |= ADVERTISED_Autoneg;
+ }
+
+ cmd->autoneg = (priv->port_state.flags & MLX4_EN_PORT_ANC) ?
+ AUTONEG_ENABLE : AUTONEG_DISABLE;
+
+ eth_proto = be32_to_cpu(ptys_reg.eth_proto_lp_adv);
+ cmd->lp_advertising = ptys2ethtool_link_modes(eth_proto, ADVERTISED);
+
+ cmd->lp_advertising |= (priv->port_state.flags & MLX4_EN_PORT_ANC) ?
+ ADVERTISED_Autoneg : 0;
+
+ cmd->phy_address = 0;
+ cmd->mdio_support = 0;
+ cmd->maxtxpkt = 0;
+ cmd->maxrxpkt = 0;
+ cmd->eth_tp_mdix = ETH_TP_MDI_INVALID;
+ cmd->eth_tp_mdix_ctrl = ETH_TP_MDI_AUTO;
+
+ return ret;
+}
+
+static void ethtool_get_default_settings(struct net_device *dev,
+ struct ethtool_cmd *cmd)
{
struct mlx4_en_priv *priv = netdev_priv(dev);
int trans_type;
@@ -383,18 +653,7 @@ static int mlx4_en_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
cmd->autoneg = AUTONEG_DISABLE;
cmd->supported = SUPPORTED_10000baseT_Full;
cmd->advertising = ADVERTISED_10000baseT_Full;
-
- if (mlx4_en_QUERY_PORT(priv->mdev, priv->port))
- return -ENOMEM;
-
- trans_type = priv->port_state.transciver;
- if (netif_carrier_ok(dev)) {
- ethtool_cmd_speed_set(cmd, priv->port_state.link_speed);
- cmd->duplex = DUPLEX_FULL;
- } else {
- ethtool_cmd_speed_set(cmd, SPEED_UNKNOWN);
- cmd->duplex = DUPLEX_UNKNOWN;
- }
+ trans_type = priv->port_state.transceiver;
if (trans_type > 0 && trans_type <= 0xC) {
cmd->port = PORT_FIBRE;
@@ -410,6 +669,32 @@ static int mlx4_en_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
cmd->port = -1;
cmd->transceiver = -1;
}
+}
+
+static int mlx4_en_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+ struct mlx4_en_priv *priv = netdev_priv(dev);
+ int ret = -EINVAL;
+
+ if (mlx4_en_QUERY_PORT(priv->mdev, priv->port))
+ return -ENOMEM;
+
+ en_dbg(DRV, priv, "query port state.flags ANC(%x) ANE(%x)\n",
+ priv->port_state.flags & MLX4_EN_PORT_ANC,
+ priv->port_state.flags & MLX4_EN_PORT_ANE);
+
+ if (priv->mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_ETH_PROT_CTRL)
+ ret = ethtool_get_ptys_settings(dev, cmd);
+ if (ret) /* ETH PROT CRTL is not supported or PTYS CMD failed */
+ ethtool_get_default_settings(dev, cmd);
+
+ if (netif_carrier_ok(dev)) {
+ ethtool_cmd_speed_set(cmd, priv->port_state.link_speed);
+ cmd->duplex = DUPLEX_FULL;
+ } else {
+ ethtool_cmd_speed_set(cmd, SPEED_UNKNOWN);
+ cmd->duplex = DUPLEX_UNKNOWN;
+ }
return 0;
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c
index afd3036..134b12e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
@@ -114,7 +114,14 @@ int mlx4_en_QUERY_PORT(struct mlx4_en_dev *mdev, u8 port)
state->link_speed = -1;
break;
}
- state->transciver = qport_context->transceiver;
+
+ state->transceiver = qport_context->transceiver;
+
+ state->flags = 0; /* Reset and recalculate the port flags */
+ state->flags |= (qport_context->link_up & MLX4_EN_ANC_MASK) ?
+ MLX4_EN_PORT_ANC : 0;
+ state->flags |= (qport_context->autoneg & MLX4_EN_AUTONEG_MASK) ?
+ MLX4_EN_PORT_ANE : 0;
out:
mlx4_free_cmd_mailbox(mdev->dev, mailbox);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.h b/drivers/net/ethernet/mellanox/mlx4/en_port.h
index a5fc93b..040da4b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_port.h
+++ b/drivers/net/ethernet/mellanox/mlx4/en_port.h
@@ -53,6 +53,28 @@ enum {
MLX4_MCAST_ENABLE = 2,
};
+enum mlx4_link_mode {
+ MLX4_1000BASE_CX_SGMII = 0,
+ MLX4_1000BASE_KX = 1,
+ MLX4_10GBASE_CX4 = 2,
+ MLX4_10GBASE_KX4 = 3,
+ MLX4_10GBASE_KR = 4,
+ MLX4_20GBASE_KR2 = 5,
+ MLX4_40GBASE_CR4 = 6,
+ MLX4_40GBASE_KR4 = 7,
+ MLX4_56GBASE_KR4 = 8,
+ MLX4_10GBASE_CR = 12,
+ MLX4_10GBASE_SR = 13,
+ MLX4_40GBASE_SR4 = 15,
+ MLX4_56GBASE_CR4 = 17,
+ MLX4_56GBASE_SR4 = 18,
+ MLX4_100BASE_TX = 24,
+ MLX4_1000BASE_T = 25,
+ MLX4_10GBASE_T = 26,
+};
+
+#define MLX4_PROT_MASK(link_mode) (1<<link_mode)
+
enum {
MLX4_EN_100M_SPEED = 0x04,
MLX4_EN_10G_SPEED_XAUI = 0x00,
@@ -67,7 +89,9 @@ enum {
struct mlx4_en_query_port_context {
u8 link_up;
#define MLX4_EN_LINK_UP_MASK 0x80
- u8 reserved;
+#define MLX4_EN_ANC_MASK 0x40
+ u8 autoneg;
+#define MLX4_EN_AUTONEG_MASK 0x80
__be16 mtu;
u8 reserved2;
u8 link_speed;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 8fef658..946754e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -421,10 +421,16 @@ struct mlx4_en_rss_map {
enum mlx4_qp_state indir_state;
};
+enum mlx4_en_port_flag {
+ MLX4_EN_PORT_ANC = 1<<0, /* Auto-negotiation complete */
+ MLX4_EN_PORT_ANE = 1<<1, /* Auto-negotiation enabled */
+};
+
struct mlx4_en_port_state {
int link_state;
int link_speed;
- int transciver;
+ int transceiver;
+ u32 flags;
};
struct mlx4_en_pkt_stats {
--
1.8.3.4
^ permalink raw reply related
* [PATCH net-next 04/13] net/mlx4_core: Add ethernet backplane autoneg device capability
From: Amir Vadai @ 2014-10-27 9:37 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Yevgeny Petrilin, Or Gerlitz, Amir Vadai, Saeed Mahameed
In-Reply-To: <1414402667-8841-1-git-send-email-amirv@mellanox.com>
From: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/fw.c | 7 ++++++-
include/linux/mlx4/device.h | 3 ++-
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 6fd9b85..72289ef 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -140,7 +140,8 @@ static void dump_dev_cap_flags2(struct mlx4_dev *dev, u64 flags)
[11] = "MAD DEMUX (Secure-Host) support",
[12] = "Large cache line (>64B) CQE stride support",
[13] = "Large cache line (>64B) EQE stride support",
- [14] = "Ethernet protocol control support"
+ [14] = "Ethernet protocol control support",
+ [15] = "Ethernet Backplane autoneg support"
};
int i;
@@ -575,6 +576,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
#define QUERY_DEV_CAP_BMME_FLAGS_OFFSET 0x94
#define QUERY_DEV_CAP_RSVD_LKEY_OFFSET 0x98
#define QUERY_DEV_CAP_MAX_ICM_SZ_OFFSET 0xa0
+#define QUERY_DEV_CAP_ETH_BACKPL_OFFSET 0x9c
#define QUERY_DEV_CAP_FW_REASSIGN_MAC 0x9d
#define QUERY_DEV_CAP_VXLAN 0x9e
#define QUERY_DEV_CAP_MAD_DEMUX_OFFSET 0xb0
@@ -749,6 +751,9 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
QUERY_DEV_CAP_BMME_FLAGS_OFFSET);
MLX4_GET(dev_cap->reserved_lkey, outbox,
QUERY_DEV_CAP_RSVD_LKEY_OFFSET);
+ MLX4_GET(field32, outbox, QUERY_DEV_CAP_ETH_BACKPL_OFFSET);
+ if (field32 & (1 << 0))
+ dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_ETH_BACKPL_AN_REP;
MLX4_GET(field, outbox, QUERY_DEV_CAP_FW_REASSIGN_MAC);
if (field & 1<<6)
dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_REASSIGN_MAC_EN;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 181cd9f..e4c136e 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -187,7 +187,8 @@ enum {
MLX4_DEV_CAP_FLAG2_MAD_DEMUX = 1LL << 11,
MLX4_DEV_CAP_FLAG2_CQE_STRIDE = 1LL << 12,
MLX4_DEV_CAP_FLAG2_EQE_STRIDE = 1LL << 13,
- MLX4_DEV_CAP_FLAG2_ETH_PROT_CTRL = 1LL << 14
+ MLX4_DEV_CAP_FLAG2_ETH_PROT_CTRL = 1LL << 14,
+ MLX4_DEV_CAP_FLAG2_ETH_BACKPL_AN_REP = 1LL << 15
};
enum {
--
1.8.3.4
^ permalink raw reply related
* [PATCH net-next 03/13] net/mlx4_core: Introduce ACCESS_REG CMD and eth_prot_ctrl dev cap
From: Amir Vadai @ 2014-10-27 9:37 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Yevgeny Petrilin, Or Gerlitz, Amir Vadai, Saeed Mahameed
In-Reply-To: <1414402667-8841-1-git-send-email-amirv@mellanox.com>
From: Saeed Mahameed <saeedm@mellanox.com>
Adding ACCESS REG mlx4 command and use it to implement Query method for
PTYS (Port Type and Speed Register).
Query and store eth_prot_ctrl dev cap.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/cmd.c | 9 +++
drivers/net/ethernet/mellanox/mlx4/fw.c | 118 ++++++++++++++++++++++++++++++-
include/linux/mlx4/cmd.h | 2 +
include/linux/mlx4/device.h | 40 ++++++++++-
4 files changed, 166 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index b16e1b9..916459e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -1338,6 +1338,15 @@ static struct mlx4_cmd_info cmd_info[] = {
.verify = NULL,
.wrapper = mlx4_QUERY_IF_STAT_wrapper
},
+ {
+ .opcode = MLX4_CMD_ACCESS_REG,
+ .has_inbox = true,
+ .has_outbox = true,
+ .out_is_imm = false,
+ .encode_slave_id = false,
+ .verify = NULL,
+ .wrapper = NULL,
+ },
/* Native multicast commands are not available for guests */
{
.opcode = MLX4_CMD_QP_ATTACH,
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 2e88a23..6fd9b85 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -139,7 +139,8 @@ static void dump_dev_cap_flags2(struct mlx4_dev *dev, u64 flags)
[10] = "TCP/IP offloads/flow-steering for VXLAN support",
[11] = "MAD DEMUX (Secure-Host) support",
[12] = "Large cache line (>64B) CQE stride support",
- [13] = "Large cache line (>64B) EQE stride support"
+ [13] = "Large cache line (>64B) EQE stride support",
+ [14] = "Ethernet protocol control support"
};
int i;
@@ -560,6 +561,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
#define QUERY_DEV_CAP_FLOW_STEERING_RANGE_EN_OFFSET 0x76
#define QUERY_DEV_CAP_FLOW_STEERING_MAX_QP_OFFSET 0x77
#define QUERY_DEV_CAP_CQ_EQ_CACHE_LINE_STRIDE 0x7a
+#define QUERY_DEV_CAP_ETH_PROT_CTRL_OFFSET 0x7a
#define QUERY_DEV_CAP_RDMARC_ENTRY_SZ_OFFSET 0x80
#define QUERY_DEV_CAP_QPC_ENTRY_SZ_OFFSET 0x82
#define QUERY_DEV_CAP_AUX_ENTRY_SZ_OFFSET 0x84
@@ -737,11 +739,12 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
MLX4_GET(size, outbox, QUERY_DEV_CAP_MAX_DESC_SZ_RQ_OFFSET);
dev_cap->max_rq_desc_sz = size;
MLX4_GET(field, outbox, QUERY_DEV_CAP_CQ_EQ_CACHE_LINE_STRIDE);
+ if (field & (1 << 5))
+ dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_ETH_PROT_CTRL;
if (field & (1 << 6))
dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_CQE_STRIDE;
if (field & (1 << 7))
dev_cap->flags2 |= MLX4_DEV_CAP_FLAG2_EQE_STRIDE;
-
MLX4_GET(dev_cap->bmme_flags, outbox,
QUERY_DEV_CAP_BMME_FLAGS_OFFSET);
MLX4_GET(dev_cap->reserved_lkey, outbox,
@@ -2144,3 +2147,114 @@ out:
mlx4_free_cmd_mailbox(dev, mailbox);
return err;
}
+
+/* Access Reg commands */
+enum mlx4_access_reg_masks {
+ MLX4_ACCESS_REG_STATUS_MASK = 0x7f,
+ MLX4_ACCESS_REG_METHOD_MASK = 0x7f,
+ MLX4_ACCESS_REG_LEN_MASK = 0x7ff
+};
+
+struct mlx4_access_reg {
+ __be16 constant1;
+ u8 status;
+ u8 resrvd1;
+ __be16 reg_id;
+ u8 method;
+ u8 constant2;
+ __be32 resrvd2[2];
+ __be16 len_const;
+ __be16 resrvd3;
+#define MLX4_ACCESS_REG_HEADER_SIZE (20)
+ u8 reg_data[MLX4_MAILBOX_SIZE-MLX4_ACCESS_REG_HEADER_SIZE];
+} __attribute__((__packed__));
+
+/**
+ * mlx4_ACCESS_REG - Generic access reg command.
+ * @dev: mlx4_dev.
+ * @reg_id: register ID to access.
+ * @method: Access method Read/Write.
+ * @reg_len: register length to Read/Write in bytes.
+ * @reg_data: reg_data pointer to Read/Write From/To.
+ *
+ * Access ConnectX registers FW command.
+ * Returns 0 on success and copies outbox mlx4_access_reg data
+ * field into reg_data or a negative error code.
+ */
+static int mlx4_ACCESS_REG(struct mlx4_dev *dev, u16 reg_id,
+ enum mlx4_access_reg_method method,
+ u16 reg_len, void *reg_data)
+{
+ struct mlx4_cmd_mailbox *inbox, *outbox;
+ struct mlx4_access_reg *inbuf, *outbuf;
+ int err;
+
+ inbox = mlx4_alloc_cmd_mailbox(dev);
+ if (IS_ERR(inbox))
+ return PTR_ERR(inbox);
+
+ outbox = mlx4_alloc_cmd_mailbox(dev);
+ if (IS_ERR(outbox)) {
+ mlx4_free_cmd_mailbox(dev, inbox);
+ return PTR_ERR(outbox);
+ }
+
+ inbuf = inbox->buf;
+ outbuf = outbox->buf;
+
+ inbuf->constant1 = cpu_to_be16(0x1<<11 | 0x4);
+ inbuf->constant2 = 0x1;
+ inbuf->reg_id = cpu_to_be16(reg_id);
+ inbuf->method = method & MLX4_ACCESS_REG_METHOD_MASK;
+
+ reg_len = min(reg_len, (u16)(sizeof(inbuf->reg_data)));
+ inbuf->len_const =
+ cpu_to_be16(((reg_len/4 + 1) & MLX4_ACCESS_REG_LEN_MASK) |
+ ((0x3) << 12));
+
+ memcpy(inbuf->reg_data, reg_data, reg_len);
+ err = mlx4_cmd_box(dev, inbox->dma, outbox->dma, 0, 0,
+ MLX4_CMD_ACCESS_REG, MLX4_CMD_TIME_CLASS_C,
+ MLX4_CMD_NATIVE);
+ if (err)
+ goto out;
+
+ if (outbuf->status & MLX4_ACCESS_REG_STATUS_MASK) {
+ err = outbuf->status & MLX4_ACCESS_REG_STATUS_MASK;
+ mlx4_err(dev,
+ "MLX4_CMD_ACCESS_REG(%x) returned REG status (%x)\n",
+ reg_id, err);
+ goto out;
+ }
+
+ memcpy(reg_data, outbuf->reg_data, reg_len);
+out:
+ mlx4_free_cmd_mailbox(dev, inbox);
+ mlx4_free_cmd_mailbox(dev, outbox);
+ return err;
+}
+
+/* ConnectX registers IDs */
+enum mlx4_reg_id {
+ MLX4_REG_ID_PTYS = 0x5004,
+};
+
+/**
+ * mlx4_ACCESS_PTYS_REG - Access PTYs (Port Type and Speed)
+ * register
+ * @dev: mlx4_dev.
+ * @method: Access method Read/Write.
+ * @ptys_reg: PTYS register data pointer.
+ *
+ * Access ConnectX PTYS register, to Read/Write Port Type/Speed
+ * configuration
+ * Returns 0 on success or a negative error code.
+ */
+int mlx4_ACCESS_PTYS_REG(struct mlx4_dev *dev,
+ enum mlx4_access_reg_method method,
+ struct mlx4_ptys_reg *ptys_reg)
+{
+ return mlx4_ACCESS_REG(dev, MLX4_REG_ID_PTYS,
+ method, sizeof(*ptys_reg), ptys_reg);
+}
+EXPORT_SYMBOL_GPL(mlx4_ACCESS_PTYS_REG);
diff --git a/include/linux/mlx4/cmd.h b/include/linux/mlx4/cmd.h
index 379c026..ff5f5de 100644
--- a/include/linux/mlx4/cmd.h
+++ b/include/linux/mlx4/cmd.h
@@ -67,6 +67,8 @@ enum {
MLX4_CMD_MAP_ICM_AUX = 0xffc,
MLX4_CMD_UNMAP_ICM_AUX = 0xffb,
MLX4_CMD_SET_ICM_SIZE = 0xffd,
+ MLX4_CMD_ACCESS_REG = 0x3b,
+
/*master notify fw on finish for slave's flr*/
MLX4_CMD_INFORM_FLR_DONE = 0x5b,
MLX4_CMD_GET_OP_REQ = 0x59,
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 73910da..181cd9f 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -186,7 +186,8 @@ enum {
MLX4_DEV_CAP_FLAG2_VXLAN_OFFLOADS = 1LL << 10,
MLX4_DEV_CAP_FLAG2_MAD_DEMUX = 1LL << 11,
MLX4_DEV_CAP_FLAG2_CQE_STRIDE = 1LL << 12,
- MLX4_DEV_CAP_FLAG2_EQE_STRIDE = 1LL << 13
+ MLX4_DEV_CAP_FLAG2_EQE_STRIDE = 1LL << 13,
+ MLX4_DEV_CAP_FLAG2_ETH_PROT_CTRL = 1LL << 14
};
enum {
@@ -1319,4 +1320,41 @@ static inline bool mlx4_low_memory_profile(void)
return is_kdump_kernel();
}
+/* ACCESS REG commands */
+enum mlx4_access_reg_method {
+ MLX4_ACCESS_REG_QUERY = 0x1,
+ MLX4_ACCESS_REG_WRITE = 0x2,
+};
+
+/* ACCESS PTYS Reg command */
+enum mlx4_ptys_proto {
+ MLX4_PTYS_IB = 1<<0,
+ MLX4_PTYS_EN = 1<<2,
+};
+
+struct mlx4_ptys_reg {
+ u8 resrvd1;
+ u8 local_port;
+ u8 resrvd2;
+ u8 proto_mask;
+ __be32 resrvd3[2];
+ __be32 eth_proto_cap;
+ __be16 ib_width_cap;
+ __be16 ib_speed_cap;
+ __be32 resrvd4;
+ __be32 eth_proto_admin;
+ __be16 ib_width_admin;
+ __be16 ib_speed_admin;
+ __be32 resrvd5;
+ __be32 eth_proto_oper;
+ __be16 ib_width_oper;
+ __be16 ib_speed_oper;
+ __be32 resrvd6;
+ __be32 eth_proto_lp_adv;
+} __packed;
+
+int mlx4_ACCESS_PTYS_REG(struct mlx4_dev *dev,
+ enum mlx4_access_reg_method method,
+ struct mlx4_ptys_reg *ptys_reg);
+
#endif /* MLX4_DEVICE_H */
--
1.8.3.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox