* Re: [PATCH 01/10] core: Split out UFO6 support
From: Vlad Yasevich @ 2014-12-19 20:13 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, Vladislav Yasevich, virtualization, stefanha, ben,
David Miller
In-Reply-To: <20141218175002.GA5539@redhat.com>
On 12/18/2014 12:50 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 18, 2014 at 07:35:46PM +0200, Michael S. Tsirkin wrote:
>>>> We should either generate our own ID,
>>>> like we always did, or make sure we don't accept
>>>> these packets.
>>>> Second option is almost sure to break userspace,
>>>> so it seems we must do the first one.
>>>>
>>>
>>> Right. This was missing from packet sockets. I can fix it.
>>>
>>> -vlad
>>
>> Also, this can't be a patch on top, since we don't
>> want bisect to give us configurations which
>> can BUG().
>
> So how doing this in stages:
>
> 1. add helper that checks skb GSO type.
> If it is SKB_GSO_UDP, check for IPv6, and
> generate the fragment ID.
>
> Call this helper in
> - virtio net rx path
Why do we need id on rx path? Fragment ID should only be generated on tx.
> - tun rx path (get user)
> - macvtap tx patch (get user)
> - packet socket tx patch (get user)
The reset makes sense.
>
> 2. Put back UFO flag everywhere: virtio,tun,macvtap,packet,
> since we can handle IPv6 now even if it's suboptimal.
>
> Above two seem like smalling patches, appropriate for stable.
OK.
>
> Next, work on a new feature to pass
> fragment ID in virtio header:
>
> 3. split UFO/UFO6 as you did here, but add code
> in above 4 places to convert UDP to UDP6,
Doing so will adversely impact IPv6 UFO performance for legacy
guests. I know how many times I've seen mail wrt patch xyz caused
%X regression in my setup and how we've reverted or tried to fixed
things to solve this. If we go with approach, the only "fix' would be
to upgrade the guest and that's not available to some users.
-vlad
> additionally, add code in
> - virtio net tx path
> - tun tx path (get user)
> - macvtap rx patch (put user)
> - packet socket rx patch (put user)
> to convert UDP6 to UDP.
>
> step 3 needs to be bisect-clean, somehow.
>
> 4. add new field in header, new feature bit for virtio net to gate it,
> new ioctls to tun,macvtap,packet socket.
>
> These two are more like optimizations, so not stable material.
>
>
^ permalink raw reply
* Re: [PATCH 01/10] core: Split out UFO6 support
From: Vlad Yasevich @ 2014-12-19 19:55 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, Vladislav Yasevich, virtualization, stefanha, ben,
David Miller
In-Reply-To: <20141218173546.GB5425@redhat.com>
On 12/18/2014 12:35 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 18, 2014 at 10:01:35AM -0500, Vlad Yasevich wrote:
>> On 12/18/2014 02:54 AM, Michael S. Tsirkin wrote:
>>> Cc Dave, pls remember to do this next time otherwise
>>> your patches won't get merged :)
>>>
>>> On Wed, Dec 17, 2014 at 06:31:50PM -0500, Vlad Yasevich wrote:
>>>> On 12/17/2014 05:45 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Dec 17, 2014 at 01:20:46PM -0500, Vladislav Yasevich wrote:
>>>>>> Split IPv6 support for UFO into its own feature similiar to TSO.
>>>>>> This will later allow us to re-enable UFO support for virtio-net
>>>>>> devices.
>>>>>>
>>>>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>>>>>> ---
>>>>>> include/linux/netdev_features.h | 7 +++++--
>>>>>> include/linux/netdevice.h | 1 +
>>>>>> include/linux/skbuff.h | 1 +
>>>>>> net/core/dev.c | 35 +++++++++++++++++++----------------
>>>>>> net/core/ethtool.c | 2 +-
>>>>>> 5 files changed, 27 insertions(+), 19 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
>>>>>> index dcfdecb..a078945 100644
>>>>>> --- a/include/linux/netdev_features.h
>>>>>> +++ b/include/linux/netdev_features.h
>>>>>> @@ -48,8 +48,9 @@ enum {
>>>>>> NETIF_F_GSO_UDP_TUNNEL_BIT, /* ... UDP TUNNEL with TSO */
>>>>>> NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT,/* ... UDP TUNNEL with TSO & CSUM */
>>>>>> NETIF_F_GSO_MPLS_BIT, /* ... MPLS segmentation */
>>>>>> + NETIF_F_UFO6_BIT, /* ... UDPv6 fragmentation */
>>>>>> /**/NETIF_F_GSO_LAST = /* last bit, see GSO_MASK */
>>>>>> - NETIF_F_GSO_MPLS_BIT,
>>>>>> + NETIF_F_UFO6_BIT,
>>>>>>
>>>>>> NETIF_F_FCOE_CRC_BIT, /* FCoE CRC32 */
>>>>>> NETIF_F_SCTP_CSUM_BIT, /* SCTP checksum offload */
>>>>>> @@ -109,6 +110,7 @@ enum {
>>>>>> #define NETIF_F_TSO_ECN __NETIF_F(TSO_ECN)
>>>>>> #define NETIF_F_TSO __NETIF_F(TSO)
>>>>>> #define NETIF_F_UFO __NETIF_F(UFO)
>>>>>> +#define NETIF_F_UFO6 __NETIF_F(UFO6)
>>>>>> #define NETIF_F_VLAN_CHALLENGED __NETIF_F(VLAN_CHALLENGED)
>>>>>> #define NETIF_F_RXFCS __NETIF_F(RXFCS)
>>>>>> #define NETIF_F_RXALL __NETIF_F(RXALL)
>>>>>> @@ -141,7 +143,7 @@ enum {
>>>>>>
>>>>>> /* List of features with software fallbacks. */
>>>>>> #define NETIF_F_GSO_SOFTWARE (NETIF_F_TSO | NETIF_F_TSO_ECN | \
>>>>>> - NETIF_F_TSO6 | NETIF_F_UFO)
>>>>>> + NETIF_F_TSO6 | NETIF_F_UFO | NETIF_F_UFO6)
>>>>>>
>>>>>> #define NETIF_F_GEN_CSUM NETIF_F_HW_CSUM
>>>>>> #define NETIF_F_V4_CSUM (NETIF_F_GEN_CSUM | NETIF_F_IP_CSUM)
>>>>>> @@ -149,6 +151,7 @@ enum {
>>>>>> #define NETIF_F_ALL_CSUM (NETIF_F_V4_CSUM | NETIF_F_V6_CSUM)
>>>>>>
>>>>>> #define NETIF_F_ALL_TSO (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_TSO_ECN)
>>>>>> +#define NETIF_F_ALL_UFO (NETIF_F_UFO | NETIF_F_UFO6)
>>>>>>
>>>>>> #define NETIF_F_ALL_FCOE (NETIF_F_FCOE_CRC | NETIF_F_FCOE_MTU | \
>>>>>> NETIF_F_FSO)
>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>>> index 74fd5d3..86af10a 100644
>>>>>> --- a/include/linux/netdevice.h
>>>>>> +++ b/include/linux/netdevice.h
>>>>>> @@ -3559,6 +3559,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
>>>>>> /* check flags correspondence */
>>>>>> BUILD_BUG_ON(SKB_GSO_TCPV4 != (NETIF_F_TSO >> NETIF_F_GSO_SHIFT));
>>>>>> BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_UFO >> NETIF_F_GSO_SHIFT));
>>>>>> + BUILD_BUG_ON(SKB_GSO_UDP6 != (NETIF_F_UFO6 >> NETIF_F_GSO_SHIFT));
>>>>>> BUILD_BUG_ON(SKB_GSO_DODGY != (NETIF_F_GSO_ROBUST >> NETIF_F_GSO_SHIFT));
>>>>>> BUILD_BUG_ON(SKB_GSO_TCP_ECN != (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
>>>>>> BUILD_BUG_ON(SKB_GSO_TCPV6 != (NETIF_F_TSO6 >> NETIF_F_GSO_SHIFT));
>>>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>>>> index 6c8b6f6..8538b67 100644
>>>>>> --- a/include/linux/skbuff.h
>>>>>> +++ b/include/linux/skbuff.h
>>>>>> @@ -372,6 +372,7 @@ enum {
>>>>>>
>>>>>> SKB_GSO_MPLS = 1 << 12,
>>>>>>
>>>>>> + SKB_GSO_UDP6 = 1 << 13
>>>>>> };
>>>>>>
>>>>>> #if BITS_PER_LONG > 32
>>>>>
>>>>> So this implies anything getting GSO packets e.g.
>>>>> from userspace now needs to check IP version to
>>>>> set GSO type correctly.
>>>>>
>>>>> I think you missed some places that do this, e.g. af_packet
>>>>> sockets.
>>>>>
>>>>
>>>> I looked at af_packet sockets and they set this only in the event
>>>> vnet header has been used with a GSO type. In this case, the user
>>>> already knows the the type.
>>>
>>> Imagine you are receiving a packet:
>>>
>>> if (vnet_hdr.gso_type != VIRTIO_NET_HDR_GSO_NONE) {
>>> switch (vnet_hdr.gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
>>> case VIRTIO_NET_HDR_GSO_TCPV4:
>>> gso_type = SKB_GSO_TCPV4;
>>> break;
>>> case VIRTIO_NET_HDR_GSO_TCPV6:
>>> gso_type = SKB_GSO_TCPV6;
>>> break;
>>> case VIRTIO_NET_HDR_GSO_UDP:
>>> gso_type = SKB_GSO_UDP;
>>> break;
>>> default:
>>> goto out_unlock;
>>> }
>>>
>>> if (vnet_hdr.gso_type & VIRTIO_NET_HDR_GSO_ECN)
>>> gso_type |= SKB_GSO_TCP_ECN;
>>>
>>> if (vnet_hdr.gso_size == 0)
>>> goto out_unlock;
>>>
>>> }
>>>
>>> we used to report UFO6 as SKB_GSO_UDP, we probably
>>> should keep doing this, with your patch we select the
>>> goto out_unlock path.
>>>
>>>
>>
>> No. The vnet_hdr.gso_type is still going to be VIRTIO_NET_HDR_GSO_UDP
>> since the UDP6 version isn't defined yet. So, it will be marked as
>> GSO_UDP.
>
> I pasted the wrong snippet.
> I meant this:
>
> /* This is a hint as to how much should be * linear. */
> vnet_hdr.hdr_len = __cpu_to_virtio16(false, skb_headlen(skb));
> vnet_hdr.gso_size = __cpu_to_virtio16(false, sinfo->gso_size);
> if (sinfo->gso_type & SKB_GSO_TCPV4)
> vnet_hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
> else if (sinfo->gso_type & SKB_GSO_TCPV6)
> vnet_hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
> else if (sinfo->gso_type & SKB_GSO_UDP)
> vnet_hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP;
> else if (sinfo->gso_type & SKB_GSO_FCOE)
> goto out_free;
> else
> BUG();
>
> so if we get SKB_GSO_UDP we'll get BUG().
>
>
>
>> This code most likely needs the same workaround as exists in tap and macvtap,
>> i.e select the proxy fragment id and decide what to do with gso_type.
>
> And fixup type to GSO_UDP6 while we are at it.
>
>>>
>>>> It is true that with this series af_packets now can't do IPv6 UFO
>>>> since there is no VIRTIO_NET_HDR_GSO_UDPV6 yet.
>>>
>>> What do you mean by "do".
>>
>> What I mean is that AF_PACKET sockets currently do not do IPv6 UFO
>> correctly, even after Ben's fixes to tap/macvtap. There is no
>> proxy fragment id selection in af_packet case and we have the
>> same problem Ben was trying address for tap/macvtap.
>>> Are we talking about sending or receiving packets?
>>
>> I am talking about sending, see above.
>>
>>> You seem to conflate the two.
>>>
>>> We always discarded ID on RX.
>>>
>>> For tun, this is xmit, so just by saying "this device can
>>> not do UFO" you will never get short packets.
>>>
>>
>> You must mean long packets.
>
> Yes.
>
>> This is actually an issue I've been thinking
>> about. With with your suggestion of switching the GSO type for legacy
>> applications we end up with fragments for IPv6 traffic. As a result,
>> legacy VMs will see a regression for large IPv6 datagrams.
>
> I'm not sure what's meant by my suggestion here :)
What I am referring to here is to change the gso_type to UDPV6 in tap/macvtap
when passing packet to the host.
> It seems clear that legacy applications don't want to get IPv6
> fragment IDs in virtio header. Either we pass them plain ethernet
> packets or assume they are ok with discarding the IDs even
> if we set GSO_UDP.
So, I am not try to pass fragment IDs yet. I am trying to make sure that
older gueest that assume that UFO == UFO4 + UFO6 continue to work and do not
regress. At the same time, I want to enable UFO(4) for new guests.
If we set gso_type to SKB_GSO_UDPV6 before passing the data to the forwarding
path of the host, then this will cause IPv6 fragmentation. If we leave gso_type
as SKB_GSO_UDP, then older VMs will continue to communicate using large UDP
data packets.
Later when VMs support UFO6, when UFO6 is enabled, the fragment id can be communicated.
But that's later.
>
>>>
>>>>
>>>> I suppose we could do something similar there as we do in tun code/macvtap code.
>>>> If that's the case, it currently broken as well.
>>>>
>>>> -vlad
>>>
>>>
>>> Broken is a big word.
>>>
>>> Let's stop conflating two directions.
>>
>> I am not and was talking only about af_packet as that is what you asked about.
>> There is no tun/macvtap in play here. They are handled separately in their
>> respective drivers.
>>
>>>
>>> Here's the way I look at it:
>>>
>>> 1. Userspace doesn't have a way to get fragment IDs
>>> from tun/macvtap/packet sockets.
>>> Presumably, not all users need these IDs.
>>> E.g. old guests clearly don't.
>>>
>>> We should either give them packets stripping the ID,
>>> like we always did, or make sure they never get these packets.
>>> Second option seems achievable for tun if we
>>> never tell linux it can do UFO, but I don't see
>>> how to do it for macvtap and packet socket.
>>>
>>
>> macvtap is slightly problematic, but doable with some tricks.
>> packet socket is an interesting problem. The only way
>> an af_packet socket can receive an skb marked SKB_GSO_UDPV6
>> is if someone else on the host sent it (like another guest).
>
> Or if a NIC set it.
OK, virtio seems to be the only nic doing this... You are right, I
need to cover that scenario.
-vlad
>
>> Since there is are no feature or vnet header length negotiations,
>> it is impossible to tell if an application using this af_packet
>> socket is capable of processing VIRTIO_NET_HDR_GSO_UDPV6
>> type (yet to be defined).
>>
>> So, we can either use existing VIRTIO_NET_HDR_GSO_UDP on receive
>> path, add some kind of negotiation logic to packet socket (like
>> storing the application expected vnet header size), or perform
>> IPv6 fragmentation somehow.
>>
>> Options 1 or 2 are doable.
>
> 1 is using VIRTIO_NET_HDR_GSO_UDP and discarding ID,
> 2 is "some kind of negotiation logic"?
> 2 can't be enough, you will need 1 as well.
>
> So let's start with 1 as a first step.
>
>
>
>
>>>
>>> 2. Userspace doesn't have a way to set fragment IDs
>>> for tun/macvtap/packet sockets.
>>> Presumably, not all users have these IDs. E.g. old
>>> guests clearly don't.
>>>
>>> We should either generate our own ID,
>>> like we always did, or make sure we don't accept
>>> these packets.
>>> Second option is almost sure to break userspace,
>>> so it seems we must do the first one.
>>>
>>
>> Right. This was missing from packet sockets. I can fix it.
>>
>> -vlad
>
> Also, this can't be a patch on top, since we don't
> want bisect to give us configurations which
> can BUG().
>
>
>>>
>>> 3.
>>> Exactly the same two things if you replace userspace
>>> with hypervisor and tun/macvtap/packet socket with
>>> virtio device.
>>>
>>>
>>> 4.
>>> As a next step, we should add a way for userspace
>>> to tell us that it has ids and can handle them.
>>>
>>>
>>>
>>>
>>>
>>>>>
>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>>> index 945bbd0..fa4d2ee 100644
>>>>>> --- a/net/core/dev.c
>>>>>> +++ b/net/core/dev.c
>>>>>> @@ -5929,6 +5929,12 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>>>>>> features &= ~NETIF_F_ALL_TSO;
>>>>>> }
>>>>>>
>>>>>> + /* UFO requires that SG is present as well */
>>>>>> + if ((features & NETIF_F_ALL_UFO) && !(features & NETIF_F_SG)) {
>>>>>> + netdev_dbg(dev, "Dropping UFO features since no SG feature.\n");
>>>>>> + features &= ~NETIF_F_ALL_UFO;
>>>>>> + }
>>>>>> +
>>>>>> if ((features & NETIF_F_TSO) && !(features & NETIF_F_HW_CSUM) &&
>>>>>> !(features & NETIF_F_IP_CSUM)) {
>>>>>> netdev_dbg(dev, "Dropping TSO features since no CSUM feature.\n");
>>>>>> @@ -5952,24 +5958,21 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>>>>>> features &= ~NETIF_F_GSO;
>>>>>> }
>>>>>>
>>>>>> - /* UFO needs SG and checksumming */
>>>>>> - if (features & NETIF_F_UFO) {
>>>>>> - /* maybe split UFO into V4 and V6? */
>>>>>> - if (!((features & NETIF_F_GEN_CSUM) ||
>>>>>> - (features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))
>>>>>> - == (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))) {
>>>>>> - netdev_dbg(dev,
>>>>>> - "Dropping NETIF_F_UFO since no checksum offload features.\n");
>>>>>> - features &= ~NETIF_F_UFO;
>>>>>> - }
>>>>>> -
>>>>>> - if (!(features & NETIF_F_SG)) {
>>>>>> - netdev_dbg(dev,
>>>>>> - "Dropping NETIF_F_UFO since no NETIF_F_SG feature.\n");
>>>>>> - features &= ~NETIF_F_UFO;
>>>>>> - }
>>>>>> + /* UFO also needs checksumming */
>>>>>> + if ((features & NETIF_F_UFO) && !(features & NETIF_F_GEN_CSUM) &&
>>>>>> + !(features & NETIF_F_IP_CSUM)) {
>>>>>> + netdev_dbg(dev,
>>>>>> + "Dropping NETIF_F_UFO since no checksum offload features.\n");
>>>>>> + features &= ~NETIF_F_UFO;
>>>>>> + }
>>>>>> + if ((features & NETIF_F_UFO6) && !(features & NETIF_F_GEN_CSUM) &&
>>>>>> + !(features & NETIF_F_IPV6_CSUM)) {
>>>>>> + netdev_dbg(dev,
>>>>>> + "Dropping NETIF_F_UFO6 since no checksum offload features.\n");
>>>>>> + features &= ~NETIF_F_UFO6;
>>>>>> }
>>>>>>
>>>>>> +
>>>>>> #ifdef CONFIG_NET_RX_BUSY_POLL
>>>>>> if (dev->netdev_ops->ndo_busy_poll)
>>>>>> features |= NETIF_F_BUSY_POLL;
>>>>>> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
>>>>>> index 06dfb29..93eff41 100644
>>>>>> --- a/net/core/ethtool.c
>>>>>> +++ b/net/core/ethtool.c
>>>>>> @@ -223,7 +223,7 @@ static netdev_features_t ethtool_get_feature_mask(u32 eth_cmd)
>>>>>> return NETIF_F_ALL_TSO;
>>>>>> case ETHTOOL_GUFO:
>>>>>> case ETHTOOL_SUFO:
>>>>>> - return NETIF_F_UFO;
>>>>>> + return NETIF_F_ALL_UFO;
>>>>>> case ETHTOOL_GGSO:
>>>>>> case ETHTOOL_SGSO:
>>>>>> return NETIF_F_GSO;
>>>>>> --
>>>>>> 1.9.3
^ permalink raw reply
* Re: [PATCH] genirq: set initial affinity when hinting
From: Jesse Brandeburg @ 2014-12-19 19:53 UTC (permalink / raw)
To: Jesse Brandeburg
Cc: Thomas Gleixner, linux-kernel@vger.kernel.org, NetDEV list
In-Reply-To: <20141219012206.4220.27491.stgit@jbrandeb-cp2.jf.intel.com>
+netdev, as network driver developers might be interested in this functionality.
On Thu, Dec 18, 2014 at 5:22 PM, Jesse Brandeburg
<jesse.brandeburg@intel.com> wrote:
> Problem:
> The default behavior of the kernel is somewhat undesirable as all
> requested interrupts end up on CPU0 after registration. A user can
> run irqbalance daemon, or can manually configure smp_affinity via the
> proc filesystem, but the default affinity of the interrupts for all
> devices is always CPU zero, this can cause performance problems or
> very heavy cpu use of only one core if not noticed and fixed by the
> user.
>
> Patch:
> This patch enables the setting of the initial affinity directly
> when the driver sets a hint.
>
> This enabling means that kernel drivers can include an initial
> affinity setting for the interrupt, instead of all interrupts starting
> out life on CPU0. Of course if irqbalance is still running then the
> interrupts will get moved as before.
>
> This function is currently called by drivers in block, crypto,
> infiniband, ethernet and scsi trees, but only a handful, so these will
> be the devices affected by this change.
>
> Tested on i40e, and default interrupts were spread across the CPUs
> according to the hint.
>
> drivers/block/mtip32xx/mtip32xx.c:3
> drivers/block/nvme-core.c:2
> drivers/crypto/qat/qat_dh895xcc/adf_isr.c:3
> drivers/infiniband/hw/qib/qib_iba7322.c:2
> drivers/net/ethernet/intel/i40e/i40e_main.c:3
> drivers/net/ethernet/intel/i40evf/i40evf_main.c:3
> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3
> drivers/net/ethernet/mellanox/mlx4/en_cq.c:2
> drivers/scsi/hpsa.c:3
> drivers/scsi/lpfc/lpfc_init.c:3
> drivers/scsi/megaraid/megaraid_sas_base.c:8
> drivers/soc/ti/knav_qmss_acc.c:1
> drivers/soc/ti/knav_qmss_queue.c:2
> drivers/virtio/virtio_pci_common.c:2
>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> ---
> kernel/irq/manage.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 8069237..f038e58 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -243,6 +243,8 @@ int irq_set_affinity_hint(unsigned int irq, const struct cpumask *m)
> return -EINVAL;
> desc->affinity_hint = m;
> irq_put_desc_unlock(desc, flags);
> + /* set the initial affinity to prevent every interrupt being on CPU0 */
> + __irq_set_affinity(irq, m, false);
> return 0;
> }
> EXPORT_SYMBOL_GPL(irq_set_affinity_hint);
>
^ permalink raw reply
* Re: [bisected] tg3 broken in 3.18.0?
From: Rajat Jain @ 2014-12-19 19:37 UTC (permalink / raw)
To: Prashant Sreedharan
Cc: Marcelo Ricardo Leitner, Bjorn Helgaas, Nils Holland,
Michael Chan, David Miller, netdev, linux-pci@vger.kernel.org,
Rafael Wysocki
In-Reply-To: <1419015228.27773.3.camel@prashant>
On Fri, Dec 19, 2014 at 10:53 AM, Prashant Sreedharan
<prashant@broadcom.com> wrote:
> On Fri, 2014-12-19 at 10:24 -0800, Rajat Jain wrote:
>> One of the reasons to replace the condition (*l == 0xffff0001) with
>> (*l & 0xffff) == 0x0001) was that some devices apparently returned
>> 0001 for device id to indicate CRS, but returned actual vendor id in
>> the vendor ID field (hence the need to ignore vendor field).
>>
>> Are we saying that the tg3 device returns 0x0001 for the device ID and
>> yet expects it to be treated as a good value (not CRS)?
>>
> No it should not be treated as a good value, this commit has
> surfaced/exposed a problem in 5722 chipset which was previously masked.
>
Got it. Thanks.
I assume you mean its a HW issue, and the workaround if required shall
be taken care in tg3 driver.
Thanks,
Rajat
>
>
^ permalink raw reply
* RE: [PATCH net-next v3] Add support of Cavium Liquidio ethernet adapters
From: Chickles, Derek @ 2014-12-19 18:39 UTC (permalink / raw)
To: Stephen Hemminger, Vatsavayi, Raghu
Cc: davem@davemloft.net, netdev@vger.kernel.org, Burla, Satananda,
Manlunas, Felix, Vatsavayi, Raghu
In-Reply-To: <20141218232833.3e640164@urahara>
> > +
> > +static u32 lio_get_link(struct net_device *dev)
> > +{
> > + u32 ret;
> > +
> > + ret = netif_carrier_ok(dev) ? 1 : 0;
> > + return ret;
> > +}
> > +
>
> This is unnecessary, this is what the default ethtool handler already does
> if you give a NULL handler.
Okay
> > + if (linfo->link.s.status) {
> > + ecmd->speed = linfo->link.s.speed;
> > + ecmd->duplex = linfo->link.s.duplex;
> > + } else {
> > + ecmd->speed = -1;
> > + ecmd->duplex = -1;
> > + }
> > +
>
> You should SPEED_UNKNOWN and DUPLEX_UNKNOWN (not -1).
> and don't set ecmd->speed directly, use
> ethtool_cmd_speed_sed(ecmd, speed)
Okay
> > +static void
> > +lio_get_ethtool_stats(struct net_device *netdev,
> > + struct ethtool_stats *stats, u64 *data)
> > +{
> > + struct lio *lio = GET_LIO(netdev);
> > + struct octeon_device *oct_dev = lio->oct_dev;
> > + int i = 0, j;
> > +
> > + data[i++] = jiffies;
> > + data[i++] = lio->stats.tx_packets;
> > + data[i++] = lio->stats.tx_bytes;
> > + data[i++] = lio->stats.rx_packets;
> > + data[i++] = lio->stats.rx_bytes;
> > + data[i++] = lio->stats.tx_errors;
> > + data[i++] = lio->stats.tx_dropped;
> > + data[i++] = lio->stats.rx_dropped;
> > +
>
> Do not mirror existing netdevice statistics in ethtool.
> The purpose of ethtool stats is to provide device specific information.
Okay.
> > +/** \brief Network device get stats
> > + * @param netdev pointer to network device
> > + * @returns pointer stats structure
> > + */
> > +static struct net_device_stats *liquidio_stats(struct net_device *netdev)
> > +{
> > + return &(GET_LIO(netdev)->stats);
> > +}
> > +
>
> Unnecessary function, this is what network core does already
> if .ndo_get_stats is NULL;
Okay.
> > +static int liquidio_ioctl(struct net_device *netdev, struct ifreq *ifr, int
> cmd)
> > +{
> > + switch (cmd) {
> > + case SIOCSHWTSTAMP:
> > + return hwtstamp_ioctl(netdev, ifr, cmd);
> > + default:
> > + return -EINVAL
>
> Standard error code for unsupported ioctl is -EOPNOTSUPP
Okay.
> > +static int __init liquidio_init(void)
> > +{
> > + int i;
> > + struct handshake *hs;
> > +
> > + pr_info("LiquidIO: Starting Network module version %s\n",
> > + LIQUIDIO_VERSION);
> > +
>
> Do you really need to log this. Systems are already too chatty.
Okay. This is available with ethtool anyway.
> > +/**
> > + * \brief Alloc net device
> > + * @param size Size to allocate
> > + * @param nq how many queues
> > + * @returns pointer to net device structure
> > + */
> > +static inline struct net_device *liquidio_alloc_netdev(int size, int nq)
> > +{
> > + if (nq > 1)
> > + return alloc_netdev_mq(size, "lio%d",
> NET_NAME_UNKNOWN,
> > + ether_setup, nq);
> > + else
> > + return alloc_netdev(size, "lio%d", NET_NAME_UNKNOWN,
> > + ether_setup);
> > +}
>
> There is no need for the (nq > 1) test, just do:
> > + return alloc_netdev_mq(size, "lio%d",
> NET_NAME_UNKNOWN,
> > + ether_setup, nq);
>
Okay.
> Also, why does this device not have standard ethernet naming?
No good reason. We will follow the p<slot_number>p<port_number> convention.
> > +void process_noresponse_list(struct octeon_device *oct,
> > + struct octeon_instr_queue *iq)
> > +{
> > + uint32_t get_idx;
> > + struct octeon_soft_command *sc;
> > + struct octeon_instr_ih *ih;
>
> All global function names must start with same prefix, driver may not
> always be built as a module.
>
> In this case oceteon_process_noresponse_list.
Okay. All global symbols will get an octeon_ or lio_ prefix.
We'll reply to the last couple issues separately.
Thanks for all the feedback.
^ permalink raw reply
* Re: [bisected] tg3 broken in 3.18.0?
From: Prashant Sreedharan @ 2014-12-19 18:53 UTC (permalink / raw)
To: rajatxjain
Cc: Marcelo Ricardo Leitner, Bjorn Helgaas, Nils Holland,
Michael Chan, David Miller, netdev, linux-pci@vger.kernel.org,
Rafael Wysocki
In-Reply-To: <CAA93t1qOUZEmJEcuUNtkDS0M_pafb_1s==9Yqm9-+-+OmJmVuQ@mail.gmail.com>
On Fri, 2014-12-19 at 10:24 -0800, Rajat Jain wrote:
> One of the reasons to replace the condition (*l == 0xffff0001) with
> (*l & 0xffff) == 0x0001) was that some devices apparently returned
> 0001 for device id to indicate CRS, but returned actual vendor id in
> the vendor ID field (hence the need to ignore vendor field).
>
> Are we saying that the tg3 device returns 0x0001 for the device ID and
> yet expects it to be treated as a good value (not CRS)?
>
No it should not be treated as a good value, this commit has
surfaced/exposed a problem in 5722 chipset which was previously masked.
^ permalink raw reply
* Re: [PATCH Iproute2 next] ip netns: 'ip netns id' cmd without argument should not give error
From: Mahesh Bandewar @ 2014-12-19 18:33 UTC (permalink / raw)
To: Vadim Kochan; +Cc: netdev, Stephen Hemminger, Saied Kazemi
In-Reply-To: <20141219083153.GA11360@angus-think>
On Fri, Dec 19, 2014 at 12:31 AM, Vadim Kochan <vadim4j@gmail.com> wrote:
> On Thu, Dec 18, 2014 at 10:05:38PM -0800, Mahesh Bandewar wrote:
>> The command 'ip netns identify' without PID parameter is supposed to
>> use the self PID to identify its ns but a trivial error prevented it
>> from doing so. This patch fixes that error.
>>
>> So before the patch -
>> # ip netns id
>> No pid specified
>> # echo $?
>> 1
>> #
>>
>> After the patch -
>> # ip netns id
>> test-ns
>> # echo $?
>> 0
>> #
>>
>> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>> Cc: Stephen Hemminger <stephen@networkplumber.org>
>> Cc: Vadim Kochan <vadim4j@gmail.com>
>> Cc: Saied Kazemi <saied@google.com>
>> ---
>> ip/ipnetns.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/ip/ipnetns.c b/ip/ipnetns.c
>> index 1c8aa029073e..ec9afa2177a5 100644
>> --- a/ip/ipnetns.c
>> +++ b/ip/ipnetns.c
>> @@ -298,7 +298,7 @@ static int netns_identify(int argc, char **argv)
>> DIR *dir;
>> struct dirent *entry;
>>
>> - if (argc < 1) {
>> + if (!argc) {
>> pidstr = "self";
>> } else if (argc > 1) {
>> fprintf(stderr, "extra arguments specified\n");
>> --
>> 2.2.0.rc0.207.ga3a616c
>>
>
> Hi Mahesh,
>
> I did not get such error on the current master branch?
>
> Did I miss something ?
>
No you did not! Please ignore this patch. :)
Looks like the tree I was working / using wasn't next and did not have
the patch that made it work (in next). I cherry-picked my fix onto the
next and sent it. My bad!
> Regards,
> Vadim Kochan
^ permalink raw reply
* Re: [patches] a bunch of old bluetooth fixes
From: Marcel Holtmann @ 2014-12-19 18:25 UTC (permalink / raw)
To: David S. Miller; +Cc: viro, netdev, linux-bluetooth
In-Reply-To: <20141219.115933.2303527819596860834.davem@davemloft.net>
Hi Dave,
>>> Dave, we can prepare a pull request for these or do you want to take them directly into net tree?
>>
>> and in case you decide to take them directly.
>>
>> Acked-by: Marcel Holtmann <marcel@holtmann.org>
>
> Feel free to create a pull request for me, that works best.
actually Johan already sent one. Should be in your inbox and on netdev.
Regards
Marcel
^ permalink raw reply
* Re: [bisected] tg3 broken in 3.18.0?
From: Rajat Jain @ 2014-12-19 18:24 UTC (permalink / raw)
To: Marcelo Ricardo Leitner
Cc: Bjorn Helgaas, Prashant Sreedharan, Nils Holland, Michael Chan,
David Miller, netdev, linux-pci@vger.kernel.org, Rafael Wysocki
In-Reply-To: <54945D79.2070008@gmail.com>
One of the reasons to replace the condition (*l == 0xffff0001) with
(*l & 0xffff) == 0x0001) was that some devices apparently returned
0001 for device id to indicate CRS, but returned actual vendor id in
the vendor ID field (hence the need to ignore vendor field).
Are we saying that the tg3 device returns 0x0001 for the device ID and
yet expects it to be treated as a good value (not CRS)?
Thanks,
Rajat
On Fri, Dec 19, 2014 at 9:16 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> On 19-12-2014 15:09, Bjorn Helgaas wrote:
>>
>> On Thu, Dec 18, 2014 at 7:10 PM, Prashant Sreedharan
>> <prashant@broadcom.com> wrote:
>>>
>>> On Thu, 2014-12-18 at 21:26 +0100, Nils Holland wrote:
>>>>
>>>> On Thu, Dec 18, 2014 at 11:28:09AM -0800, Prashant Sreedharan wrote:
>>>>>
>>>>> On Thu, 2014-12-18 at 12:15 -0700, Bjorn Helgaas wrote:
>>>>>>
>>>>>>
>>>>>> Any updates from the hardware team?
>>>>>>
>>>>>> This is a pretty serious regression, but as far as I can tell, it is
>>>>>> not a PCI bug. The device should respond to a config read of vendor
>>>>>> ID. If the driver does something that make the read return CRS
>>>>>> status, I think the driver is responsible for doing whatever delay or
>>>>>> other fixup is required.
>>>>>>
>>>>>> I'm inclined to reassign this bug to the tg3 driver unless you think
>>>>>> the PCI core is doing something wrong here.
>>>>>>
>>>>>> Bjorn
>>>>>
>>>>>
>>>>> We were not able to reproduce this issue, could you please check what
>>>>> is
>>>>> the value of reg 0x70, before the pci_device_is_present call is made ?
>>>>> if bit 15 is set config access will be retried.
>>>>>
>>>>> --- a/drivers/net/ethernet/broadcom/tg3.c
>>>>> +++ b/drivers/net/ethernet/broadcom/tg3.c
>>>>> @@ -9025,6 +9025,7 @@ static int tg3_chip_reset(struct tg3 *tp)
>>>>> void (*write_op)(struct tg3 *, u32, u32);
>>>>> int i, err;
>>>>>
>>>>> + printk(KERN_ERR "config state: %x\n", tr32(TG3PCI_PCISTATE));
>>>>> if (!pci_device_is_present(tp->pdev))
>>>>> return -ENODEV;
>>>>
>>>>
>>>> No problem, I gave this a try and here is what I get:
>>>>
>>>> [ 2.185190] libphy: tg3 mdio bus: probed
>>>> [ 2.229357] tsc: Refined TSC clocksource calibration: 2399.999 MHz
>>>> [ 2.244993] config state: 1292
>>>> [ 2.247136] tg3 0000:02:00.0 eth0: Tigon3 [partno(BCM57780) rev
>>>> 57780001]
>>>> (PCI Express) MAC address 00:19:99:ce:13:a6
>>>> [ 2.249279] tg3 0000:02:00.0 eth0: attached PHY driver [Broadcom
>>>> BCM57780]
>>>> (mii_bus:phy_addr=200:01)
>>>> [ 2.251460] tg3 0000:02:00.0 eth0: RXcsums[1] LinkChgREG[0]
>>>> MIirq[0] ASF[0] TSOcap[1]
>>>> [ 2.253672] tg3 0000:02:00.0 eth0: dma_rwctrl[76180000]
>>>> dma_mask[64-bit]
>>>> [...]
>>>> [ 12.204692] tg3 0000:02:00.0
>>>> enp2s0: No firmware running
>>>> [ 12.206653] config state: 1292
>>>> [ 12.208655] config state: 1292
>>>>
>>>> That's all of the three times the new debugging line gets hit when I
>>>> boot my system using the supplied diagnostic patch.
>>>>
>>>> Hope that helps - of course, I'd gladly test any further
>>>> (diagnostic) patches if required! Also, if I can provide any
>>>> additional information that might be of value, just ask:-)
>>>>
>>> Nils/Marcelo thanks for inputs, since reg 0x70 bit 15 is clear it
>>> indicates the chip is not setting the config retry bit. We were hoping
>>> this bit is causing the config access to return CRS but looks like it is
>>> not.
>>>
>>> Btw after forcing the error path (tg3_init_one -> tg3_halt) in the
>>> driver now we are able to reproduce the problem on 5722 in house. We are
>>> working with the HW team to narrow this down.
>>>
>>> Also it is not clear to me how reverting commit cfa6a7877b17a667 fixes
>>> the problem.
>>
>>
>> The full commit is 89665a6a71408796565bfd29cfa6a7877b17a667, and git
>> works with any unique *prefix* of that. The current convention is to
>> use the first 12 characters (I have "[core] abbrev = 12" in my
>> .git/config). Unfortunately, suffixes don't work at all.
>>
>> Anyway, here's why I think 89665a6a7140 makes a difference. We're in this
>> path:
>>
>> pci_device_is_present
>> pci_bus_read_dev_vendor_id(..., crs_timeout = 0)
>> pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l)
>>
>> and for some reason the chip returns 0x00010001 for that 32-bit read.
>
>
> Actually it returns just 0x00000001, but yeah, that's my understanding too.
>
> Marcelo
>
>
>> Before 89665a6a7140, we compared all 32 bits with "*l == 0xffff0001".
>> This is false, so pci_bus_read_dev_vendor_id() returns true, which
>> means pci_device_is_present() is also true.
>>
>> After 89665a6a7140, we compare only the low 16 bits with ((*l &
>> 0xffff) == 0x0001), which is true, so pci_bus_read_dev_vendor_id()
>> returns false, and pci_device_is_present() is false.
>>
>> Bjorn
>>
>
^ permalink raw reply
* Re: [PATCH] sunvnet: fix a memory leak in vnet_handle_offloads
From: David Miller @ 2014-12-19 18:22 UTC (permalink / raw)
To: roy.qing.li; +Cc: netdev
In-Reply-To: <1418966375-23188-1-git-send-email-roy.qing.li@gmail.com>
From: roy.qing.li@gmail.com
Date: Fri, 19 Dec 2014 13:19:35 +0800
> From: Li RongQing <roy.qing.li@gmail.com>
>
> when skb_gso_segment returns error, the original skb should be freed
>
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH net] r8152: drop the tx packet with invalid length
From: Eric Dumazet @ 2014-12-19 18:13 UTC (permalink / raw)
To: Hayes Wang
Cc: Tom Herbert, David Miller, netdev@vger.kernel.org, nic_swsd,
linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org
In-Reply-To: <1419010603.9773.84.camel@edumazet-glaptop2.roam.corp.google.com>
On Fri, 2014-12-19 at 09:36 -0800, Eric Dumazet wrote:
> On Fri, 2014-12-19 at 06:42 +0000, Hayes Wang wrote:
>
> > Excuse me. I try to implement ndo_gso_check() with kernel 3.18.
> > However, I still get packets with gso and their skb lengths are more
> > than the acceptable one. Do I miss something?
> >
> > +static bool rtl8152_gso_check(struct sk_buff *skb, struct net_device *dev)
> > +{
> > + if ((skb->len + sizeof(struct tx_desc)) > agg_buf_sz)
> > + return false;
> > +
> > + return true;
> > +}
> >
> > @@ -5861,6 +5876,9 @@ static const struct net_device_ops rtl8152_netdev_ops = {
> > .ndo_set_mac_address = rtl8152_set_mac_address,
> > .ndo_change_mtu = rtl8152_change_mtu,
> > .ndo_validate_addr = eth_validate_addr,
> > + .ndo_gso_check = rtl8152_gso_check,
> > };
>
> You are right, it seems ndo_gso_check() is buggy at this moment.
>
> Presumably this method should alter %features so that we really segment
> the packets in software.
>
> features &= ~NETIF_F_GSO_MASK;
Could you try following patch ?
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 7df221788cd4..0346bcfe72a5 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -314,7 +314,7 @@ static rx_handler_result_t macvtap_handle_frame(struct sk_buff **pskb)
*/
if (q->flags & IFF_VNET_HDR)
features |= vlan->tap_features;
- if (netif_needs_gso(dev, skb, features)) {
+ if (netif_needs_gso(dev, skb, &features)) {
struct sk_buff *segs = __skb_gso_segment(skb, features, false);
if (IS_ERR(segs))
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 22bcb4e12e2a..9cacabaea175 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -578,6 +578,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
unsigned long flags;
struct netfront_queue *queue = NULL;
unsigned int num_queues = dev->real_num_tx_queues;
+ netdev_features_t features;
u16 queue_index;
/* Drop the packet if no queues are set up */
@@ -611,9 +612,10 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
spin_lock_irqsave(&queue->tx_lock, flags);
+ features = netif_skb_features(skb);
if (unlikely(!netif_carrier_ok(dev) ||
(slots > 1 && !xennet_can_sg(dev)) ||
- netif_needs_gso(dev, skb, netif_skb_features(skb)))) {
+ netif_needs_gso(dev, skb, &features))) {
spin_unlock_irqrestore(&queue->tx_lock, flags);
goto drop;
}
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c31f74d76ebd..fb1f8c900df9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3608,13 +3608,19 @@ static inline bool skb_gso_ok(struct sk_buff *skb, netdev_features_t features)
}
static inline bool netif_needs_gso(struct net_device *dev, struct sk_buff *skb,
- netdev_features_t features)
+ netdev_features_t *features)
{
- return skb_is_gso(skb) && (!skb_gso_ok(skb, features) ||
- (dev->netdev_ops->ndo_gso_check &&
- !dev->netdev_ops->ndo_gso_check(skb, dev)) ||
- unlikely((skb->ip_summed != CHECKSUM_PARTIAL) &&
- (skb->ip_summed != CHECKSUM_UNNECESSARY)));
+ if (!skb_is_gso(skb))
+ return false;
+ if (!skb_gso_ok(skb, *features))
+ return true;
+ if (dev->netdev_ops->ndo_gso_check &&
+ !dev->netdev_ops->ndo_gso_check(skb, dev)) {
+ *features &= ~NETIF_F_GSO_MASK;
+ return true;
+ }
+ return skb->ip_summed != CHECKSUM_PARTIAL &&
+ skb->ip_summed != CHECKSUM_UNNECESSARY;
}
static inline void netif_set_gso_max_size(struct net_device *dev,
diff --git a/net/core/dev.c b/net/core/dev.c
index f411c28d0a66..b61c26b45bb7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2668,7 +2668,7 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device
if (skb->encapsulation)
features &= dev->hw_enc_features;
- if (netif_needs_gso(dev, skb, features)) {
+ if (netif_needs_gso(dev, skb, &features)) {
struct sk_buff *segs;
segs = skb_gso_segment(skb, features);
^ permalink raw reply related
* RE: [PATCH net-next v3] Add support of Cavium Liquidio ethernet adapters
From: Chickles, Derek @ 2014-12-19 17:54 UTC (permalink / raw)
To: Stephen Hemminger, Vatsavayi, Raghu
Cc: davem@davemloft.net, netdev@vger.kernel.org, Burla, Satananda,
Manlunas, Felix, Vatsavayi, Raghu
In-Reply-To: <20141218225940.0697e3c5@urahara>
> > +static struct {
> > + const char str[ETH_GSTRING_LEN];
> > +} ethtool_stats_keys[] = {
> > + {
> > + "jiffies"
> > + }, {
>
> Jiffies does not belong in ethtool stats. Looks like some developer debug
> option.
The intention here was to have a semi-accurate timestamp taken at the time
the statistics were gathered before handing it over to userspace. This is a
generic requirement though, so it probably belongs in ethtool_get_stats().
Otherwise, we can publish some per-second stats if that's preferable.
^ permalink raw reply
* Re: [PATCH] net: eth: xgene: change APM X-Gene SoC platform ethernet to support ACPI
From: David Miller @ 2014-12-19 17:40 UTC (permalink / raw)
To: fkan; +Cc: netdev, patches, linux-kernel
In-Reply-To: <CAL85gmCOLrcTijpEhXx1x5UjHu0sgS5DdGYJxY4mwPFKCHpU-g@mail.gmail.com>
From: Feng Kan <fkan@apm.com>
Date: Fri, 19 Dec 2014 09:36:13 -0800
> On Fri, Dec 19, 2014 at 9:31 AM, David Miller <davem@davemloft.net> wrote:
>> From: Feng Kan <fkan@apm.com>
>> Date: Thu, 18 Dec 2014 15:39:51 -0800
>>
>>> +#define NO_MAC_FOUND 0
>>> +#define RES_ENET_CSR 0
>>> +#define RES_RING_CSR 1
>>> +#define RES_RING_CMD 2
> Is there a issue with the RES defines, do you prefer enum types?
I don't have any problem with those.
^ permalink raw reply
* Re: [PATCH] sunvnet: fix a memory leak in vnet_handle_offloads
From: David L Stevens @ 2014-12-19 17:39 UTC (permalink / raw)
To: roy.qing.li, netdev
In-Reply-To: <1418966375-23188-1-git-send-email-roy.qing.li@gmail.com>
Acked-by: David L Stevens <david.stevens@oracle.com>
On 12/19/2014 12:19 AM, roy.qing.li@gmail.com wrote:
> From: Li RongQing <roy.qing.li@gmail.com>
>
> when skb_gso_segment returns error, the original skb should be freed
>
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
> ---
> drivers/net/ethernet/sun/sunvnet.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/sun/sunvnet.c b/drivers/net/ethernet/sun/sunvnet.c
> index 45c408e..d2835bf 100644
> --- a/drivers/net/ethernet/sun/sunvnet.c
> +++ b/drivers/net/ethernet/sun/sunvnet.c
> @@ -1201,6 +1201,7 @@ static int vnet_handle_offloads(struct vnet_port *port, struct sk_buff *skb)
> segs = skb_gso_segment(skb, dev->features & ~NETIF_F_TSO);
> if (IS_ERR(segs)) {
> dev->stats.tx_dropped++;
> + dev_kfree_skb_any(skb);
> return NETDEV_TX_OK;
> }
>
>
^ permalink raw reply
* Re: [PATCH net] r8152: drop the tx packet with invalid length
From: Eric Dumazet @ 2014-12-19 17:36 UTC (permalink / raw)
To: Hayes Wang, Tom Herbert
Cc: David Miller, netdev@vger.kernel.org, nic_swsd,
linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org
In-Reply-To: <0835B3720019904CB8F7AA43166CEEB2ED585A@RTITMBSV03.realtek.com.tw>
On Fri, 2014-12-19 at 06:42 +0000, Hayes Wang wrote:
> Excuse me. I try to implement ndo_gso_check() with kernel 3.18.
> However, I still get packets with gso and their skb lengths are more
> than the acceptable one. Do I miss something?
>
> +static bool rtl8152_gso_check(struct sk_buff *skb, struct net_device *dev)
> +{
> + if ((skb->len + sizeof(struct tx_desc)) > agg_buf_sz)
> + return false;
> +
> + return true;
> +}
>
> @@ -5861,6 +5876,9 @@ static const struct net_device_ops rtl8152_netdev_ops = {
> .ndo_set_mac_address = rtl8152_set_mac_address,
> .ndo_change_mtu = rtl8152_change_mtu,
> .ndo_validate_addr = eth_validate_addr,
> + .ndo_gso_check = rtl8152_gso_check,
> };
You are right, it seems ndo_gso_check() is buggy at this moment.
Presumably this method should alter %features so that we really segment
the packets in software.
features &= ~NETIF_F_GSO_MASK;
^ permalink raw reply
* Re: [PATCH] net: eth: xgene: change APM X-Gene SoC platform ethernet to support ACPI
From: Feng Kan @ 2014-12-19 17:36 UTC (permalink / raw)
To: David Miller; +Cc: netdev, patches, linux-kernel@vger.kernel.org
In-Reply-To: <20141219.123107.1080751255468272638.davem@davemloft.net>
On Fri, Dec 19, 2014 at 9:31 AM, David Miller <davem@davemloft.net> wrote:
> From: Feng Kan <fkan@apm.com>
> Date: Thu, 18 Dec 2014 15:39:51 -0800
>
>> +#define NO_MAC_FOUND 0
>> +#define RES_ENET_CSR 0
>> +#define RES_RING_CSR 1
>> +#define RES_RING_CMD 2
Is there a issue with the RES defines, do you prefer enum types?
>
> Don't define your own magic set of error return semantics.
>
> For example, in the MAC address acquisition case, just verify that
> you get an ETH_ALEN length value from the fetch routines and return
> -ENODEV or similar if you do not.
Thanks, well change.
^ permalink raw reply
* Re: [PATCH] net: eth: xgene: change APM X-Gene SoC platform ethernet to support ACPI
From: David Miller @ 2014-12-19 17:31 UTC (permalink / raw)
To: fkan; +Cc: netdev, patches, linux-kernel
In-Reply-To: <1418945991-31494-1-git-send-email-fkan@apm.com>
From: Feng Kan <fkan@apm.com>
Date: Thu, 18 Dec 2014 15:39:51 -0800
> +#define NO_MAC_FOUND 0
> +#define RES_ENET_CSR 0
> +#define RES_RING_CSR 1
> +#define RES_RING_CMD 2
Don't define your own magic set of error return semantics.
For example, in the MAC address acquisition case, just verify that
you get an ETH_ALEN length value from the fetch routines and return
-ENODEV or similar if you do not.
^ permalink raw reply
* Re: [bisected] tg3 broken in 3.18.0?
From: Marcelo Ricardo Leitner @ 2014-12-19 17:16 UTC (permalink / raw)
To: Bjorn Helgaas, Prashant Sreedharan
Cc: Nils Holland, Michael Chan, Rajat Jain, David Miller, netdev,
linux-pci@vger.kernel.org, Rafael Wysocki
In-Reply-To: <CAErSpo4UdxwL=wA8cBJ7_DAdTb2xNeg6600qB0=Jdxv80aVmcg@mail.gmail.com>
On 19-12-2014 15:09, Bjorn Helgaas wrote:
> On Thu, Dec 18, 2014 at 7:10 PM, Prashant Sreedharan
> <prashant@broadcom.com> wrote:
>> On Thu, 2014-12-18 at 21:26 +0100, Nils Holland wrote:
>>> On Thu, Dec 18, 2014 at 11:28:09AM -0800, Prashant Sreedharan wrote:
>>>> On Thu, 2014-12-18 at 12:15 -0700, Bjorn Helgaas wrote:
>>>>>
>>>>> Any updates from the hardware team?
>>>>>
>>>>> This is a pretty serious regression, but as far as I can tell, it is
>>>>> not a PCI bug. The device should respond to a config read of vendor
>>>>> ID. If the driver does something that make the read return CRS
>>>>> status, I think the driver is responsible for doing whatever delay or
>>>>> other fixup is required.
>>>>>
>>>>> I'm inclined to reassign this bug to the tg3 driver unless you think
>>>>> the PCI core is doing something wrong here.
>>>>>
>>>>> Bjorn
>>>>
>>>> We were not able to reproduce this issue, could you please check what is
>>>> the value of reg 0x70, before the pci_device_is_present call is made ?
>>>> if bit 15 is set config access will be retried.
>>>>
>>>> --- a/drivers/net/ethernet/broadcom/tg3.c
>>>> +++ b/drivers/net/ethernet/broadcom/tg3.c
>>>> @@ -9025,6 +9025,7 @@ static int tg3_chip_reset(struct tg3 *tp)
>>>> void (*write_op)(struct tg3 *, u32, u32);
>>>> int i, err;
>>>>
>>>> + printk(KERN_ERR "config state: %x\n", tr32(TG3PCI_PCISTATE));
>>>> if (!pci_device_is_present(tp->pdev))
>>>> return -ENODEV;
>>>
>>> No problem, I gave this a try and here is what I get:
>>>
>>> [ 2.185190] libphy: tg3 mdio bus: probed
>>> [ 2.229357] tsc: Refined TSC clocksource calibration: 2399.999 MHz
>>> [ 2.244993] config state: 1292
>>> [ 2.247136] tg3 0000:02:00.0 eth0: Tigon3 [partno(BCM57780) rev 57780001]
>>> (PCI Express) MAC address 00:19:99:ce:13:a6
>>> [ 2.249279] tg3 0000:02:00.0 eth0: attached PHY driver [Broadcom BCM57780]
>>> (mii_bus:phy_addr=200:01)
>>> [ 2.251460] tg3 0000:02:00.0 eth0: RXcsums[1] LinkChgREG[0]
>>> MIirq[0] ASF[0] TSOcap[1]
>>> [ 2.253672] tg3 0000:02:00.0 eth0: dma_rwctrl[76180000] dma_mask[64-bit]
>>> [...]
>>> [ 12.204692] tg3 0000:02:00.0
>>> enp2s0: No firmware running
>>> [ 12.206653] config state: 1292
>>> [ 12.208655] config state: 1292
>>>
>>> That's all of the three times the new debugging line gets hit when I
>>> boot my system using the supplied diagnostic patch.
>>>
>>> Hope that helps - of course, I'd gladly test any further
>>> (diagnostic) patches if required! Also, if I can provide any
>>> additional information that might be of value, just ask:-)
>>>
>> Nils/Marcelo thanks for inputs, since reg 0x70 bit 15 is clear it
>> indicates the chip is not setting the config retry bit. We were hoping
>> this bit is causing the config access to return CRS but looks like it is
>> not.
>>
>> Btw after forcing the error path (tg3_init_one -> tg3_halt) in the
>> driver now we are able to reproduce the problem on 5722 in house. We are
>> working with the HW team to narrow this down.
>>
>> Also it is not clear to me how reverting commit cfa6a7877b17a667 fixes
>> the problem.
>
> The full commit is 89665a6a71408796565bfd29cfa6a7877b17a667, and git
> works with any unique *prefix* of that. The current convention is to
> use the first 12 characters (I have "[core] abbrev = 12" in my
> .git/config). Unfortunately, suffixes don't work at all.
>
> Anyway, here's why I think 89665a6a7140 makes a difference. We're in this path:
>
> pci_device_is_present
> pci_bus_read_dev_vendor_id(..., crs_timeout = 0)
> pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l)
>
> and for some reason the chip returns 0x00010001 for that 32-bit read.
Actually it returns just 0x00000001, but yeah, that's my understanding too.
Marcelo
> Before 89665a6a7140, we compared all 32 bits with "*l == 0xffff0001".
> This is false, so pci_bus_read_dev_vendor_id() returns true, which
> means pci_device_is_present() is also true.
>
> After 89665a6a7140, we compare only the low 16 bits with ((*l &
> 0xffff) == 0x0001), which is true, so pci_bus_read_dev_vendor_id()
> returns false, and pci_device_is_present() is false.
>
> Bjorn
>
^ permalink raw reply
* Re: [bisected] tg3 broken in 3.18.0?
From: Bjorn Helgaas @ 2014-12-19 17:09 UTC (permalink / raw)
To: Prashant Sreedharan
Cc: Nils Holland, Marcelo Ricardo Leitner, Michael Chan, Rajat Jain,
David Miller, netdev, linux-pci@vger.kernel.org, Rafael Wysocki
In-Reply-To: <1418955047.3433.27.camel@prashant>
On Thu, Dec 18, 2014 at 7:10 PM, Prashant Sreedharan
<prashant@broadcom.com> wrote:
> On Thu, 2014-12-18 at 21:26 +0100, Nils Holland wrote:
>> On Thu, Dec 18, 2014 at 11:28:09AM -0800, Prashant Sreedharan wrote:
>> > On Thu, 2014-12-18 at 12:15 -0700, Bjorn Helgaas wrote:
>> > >
>> > > Any updates from the hardware team?
>> > >
>> > > This is a pretty serious regression, but as far as I can tell, it is
>> > > not a PCI bug. The device should respond to a config read of vendor
>> > > ID. If the driver does something that make the read return CRS
>> > > status, I think the driver is responsible for doing whatever delay or
>> > > other fixup is required.
>> > >
>> > > I'm inclined to reassign this bug to the tg3 driver unless you think
>> > > the PCI core is doing something wrong here.
>> > >
>> > > Bjorn
>> >
>> > We were not able to reproduce this issue, could you please check what is
>> > the value of reg 0x70, before the pci_device_is_present call is made ?
>> > if bit 15 is set config access will be retried.
>> >
>> > --- a/drivers/net/ethernet/broadcom/tg3.c
>> > +++ b/drivers/net/ethernet/broadcom/tg3.c
>> > @@ -9025,6 +9025,7 @@ static int tg3_chip_reset(struct tg3 *tp)
>> > void (*write_op)(struct tg3 *, u32, u32);
>> > int i, err;
>> >
>> > + printk(KERN_ERR "config state: %x\n", tr32(TG3PCI_PCISTATE));
>> > if (!pci_device_is_present(tp->pdev))
>> > return -ENODEV;
>>
>> No problem, I gave this a try and here is what I get:
>>
>> [ 2.185190] libphy: tg3 mdio bus: probed
>> [ 2.229357] tsc: Refined TSC clocksource calibration: 2399.999 MHz
>> [ 2.244993] config state: 1292
>> [ 2.247136] tg3 0000:02:00.0 eth0: Tigon3 [partno(BCM57780) rev 57780001]
>> (PCI Express) MAC address 00:19:99:ce:13:a6
>> [ 2.249279] tg3 0000:02:00.0 eth0: attached PHY driver [Broadcom BCM57780]
>> (mii_bus:phy_addr=200:01)
>> [ 2.251460] tg3 0000:02:00.0 eth0: RXcsums[1] LinkChgREG[0]
>> MIirq[0] ASF[0] TSOcap[1]
>> [ 2.253672] tg3 0000:02:00.0 eth0: dma_rwctrl[76180000] dma_mask[64-bit]
>> [...]
>> [ 12.204692] tg3 0000:02:00.0
>> enp2s0: No firmware running
>> [ 12.206653] config state: 1292
>> [ 12.208655] config state: 1292
>>
>> That's all of the three times the new debugging line gets hit when I
>> boot my system using the supplied diagnostic patch.
>>
>> Hope that helps - of course, I'd gladly test any further
>> (diagnostic) patches if required! Also, if I can provide any
>> additional information that might be of value, just ask:-)
>>
> Nils/Marcelo thanks for inputs, since reg 0x70 bit 15 is clear it
> indicates the chip is not setting the config retry bit. We were hoping
> this bit is causing the config access to return CRS but looks like it is
> not.
>
> Btw after forcing the error path (tg3_init_one -> tg3_halt) in the
> driver now we are able to reproduce the problem on 5722 in house. We are
> working with the HW team to narrow this down.
>
> Also it is not clear to me how reverting commit cfa6a7877b17a667 fixes
> the problem.
The full commit is 89665a6a71408796565bfd29cfa6a7877b17a667, and git
works with any unique *prefix* of that. The current convention is to
use the first 12 characters (I have "[core] abbrev = 12" in my
.git/config). Unfortunately, suffixes don't work at all.
Anyway, here's why I think 89665a6a7140 makes a difference. We're in this path:
pci_device_is_present
pci_bus_read_dev_vendor_id(..., crs_timeout = 0)
pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l)
and for some reason the chip returns 0x00010001 for that 32-bit read.
Before 89665a6a7140, we compared all 32 bits with "*l == 0xffff0001".
This is false, so pci_bus_read_dev_vendor_id() returns true, which
means pci_device_is_present() is also true.
After 89665a6a7140, we compare only the low 16 bits with ((*l &
0xffff) == 0x0001), which is true, so pci_bus_read_dev_vendor_id()
returns false, and pci_device_is_present() is false.
Bjorn
^ permalink raw reply
* Re: [patches] a bunch of old bluetooth fixes
From: David Miller @ 2014-12-19 16:59 UTC (permalink / raw)
To: marcel-kz+m5ild9QBg9hUCZPvPmw
Cc: viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn,
netdev-u79uwXL29TY76Z2rM5mHXA,
linux-bluetooth-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <07BDA2A2-1560-4F78-A0B2-FC25E312CACE-kz+m5ild9QBg9hUCZPvPmw@public.gmane.org>
From: Marcel Holtmann <marcel-kz+m5ild9QBg9hUCZPvPmw@public.gmane.org>
Date: Fri, 19 Dec 2014 11:30:38 +0100
>> Dave, we can prepare a pull request for these or do you want to take them directly into net tree?
>
> and in case you decide to take them directly.
>
> Acked-by: Marcel Holtmann <marcel-kz+m5ild9QBg9hUCZPvPmw@public.gmane.org>
Feel free to create a pull request for me, that works best.
Thanks.
^ permalink raw reply
* Re: [RFC iproute2] tc: Show classes in tree view
From: Vadim Kochan @ 2014-12-19 16:46 UTC (permalink / raw)
To: Cong Wang; +Cc: Vadim Kochan, netdev
In-Reply-To: <CAHA+R7P0eZzEgx+Oq-X8eqs7NFFLon4HKa+UUgxDpdaM5e0X+g@mail.gmail.com>
On Fri, Dec 19, 2014 at 08:44:30AM -0800, Cong Wang wrote:
> On Fri, Dec 19, 2014 at 3:43 AM, Vadim Kochan <vadim4j@gmail.com> wrote:
> > From: Vadim Kochan <vadim4j@gmail.com>
> >
> > Added new '-t[ree]' which shows classes dependency
> > in the tree view.
> >
> > Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
>
> A sample output would convince people more easily.
Yeah, good point:
# tc/tc -t class show dev tap0
+---1:2(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
| +---1:40(htb) prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
| +---1:50(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
| +---1:60(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|
+---1:1(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
+---1:10(htb) prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
+---1:20(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
+---1:30(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
# tc/tc -t -s class show dev tap0
+---1:2(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
| | Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
| | rate 0bit 0pps backlog 0b 0p requeues 0
| |
| +---1:40(htb) prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
| | Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
| | rate 0bit 0pps backlog 0b 0p requeues 0
| |
| +---1:50(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
| | Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
| | rate 0bit 0pps backlog 0b 0p requeues 0
| |
| +---1:60(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
| rate 0bit 0pps backlog 0b 0p requeues 0
|
+---1:1(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
| rate 0bit 0pps backlog 0b 0p requeues 0
|
+---1:10(htb) prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
| rate 0bit 0pps backlog 0b 0p requeues 0
|
+---1:20(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
| Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
| rate 0bit 0pps backlog 0b 0p requeues 0
|
+---1:30(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
^ permalink raw reply
* Re: [RFC iproute2] tc: Show classes in tree view
From: Cong Wang @ 2014-12-19 16:44 UTC (permalink / raw)
To: Vadim Kochan; +Cc: netdev
In-Reply-To: <1418989381-6492-1-git-send-email-vadim4j@gmail.com>
On Fri, Dec 19, 2014 at 3:43 AM, Vadim Kochan <vadim4j@gmail.com> wrote:
> From: Vadim Kochan <vadim4j@gmail.com>
>
> Added new '-t[ree]' which shows classes dependency
> in the tree view.
>
> Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
A sample output would convince people more easily.
^ permalink raw reply
* Re: [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined.
From: Roopa Prabhu @ 2014-12-19 16:32 UTC (permalink / raw)
To: Hubert Sokolowski; +Cc: Jamal Hadi Salim, vyasevic, John Fastabend, netdev
In-Reply-To: <5494418B.1000004@intel.com>
On 12/19/14, 7:17 AM, Hubert Sokolowski wrote:
> On 18/12/14 22:32, Jamal Hadi Salim wrote:
>
>> Sorry for the latency (head-buried-in-sand in effect)
>> On 12/17/14 11:18, Hubert Sokolowski wrote:
>>> I have just prepared a patch where I dump uc/mc for bridge devices
>>> by looking at (dev->priv_flags & IFF_EBRIDGE), so I have same results
>>> as without my changes. This should satisfy Jamal and Roopa.
>>> I could send it as v3 of my patch along with the results if you are
>>> interested.
>> Please do. If you satisfy Vlad's goals then we are all happy.
> Posted as v3, please review.
> There is still open question I asked sometime ago but never got explained.
> It is about the new filter_dev parameter that was added to ndo_fdb_dump:
> int (*ndo_fdb_dump)(struct sk_buff *skb,
> struct netlink_callback *cb,
> struct net_device *dev,
> struct net_device *filter_dev,
> int idx);
>
> When we call this function for a device, dev pointer is passed as the filter_dev:
> if (dev->netdev_ops->ndo_fdb_dump)
> idx = dev->netdev_ops->ndo_fdb_dump(skb, cb, bdev, dev,
> idx);
seems like these calls should be fixed. bdev is really dev in this case.
And filter_dev should be null.
>
> This is not an issue for a bridge device and a device that is not enslaved
> in a bridge because bdev == dev, but this can be dangerous in other cases.
> Let's assume QLogic NIC has a master device, in this case bdev != dev.
> Now look what is happening, dev is passed as filter_dev to:
> static int qlcnic_fdb_dump(struct sk_buff *skb, struct netlink_callback *ncb,
> struct net_device *netdev,
> struct net_device *filter_dev, int idx)
> {
> struct qlcnic_adapter *adapter = netdev_priv(netdev);
> ...
>
> netdev_priv(netdev) returns a pointer to private struct of the bridge,but the driver
> is expecting it's own private stuff.
>
> Should we fix the driver and assume filter_dev is /me and dev is our master
> or the parameters were reversed and should be passed as (skb, cb, dev, bdev, idx) ?
> Is this something for another patch/discussion?
>
> regards,
> Hubert
>
^ permalink raw reply
* Re: [RFC PATCH net-next v2 1/1] net: Support for switch port configuration
From: Roopa Prabhu @ 2014-12-19 16:22 UTC (permalink / raw)
To: Jiri Pirko
Cc: B Viswanath, Samudrala, Sridhar, John Fastabend, Varlese, Marco,
netdev@vger.kernel.org, Thomas Graf, sfeldma@gmail.com,
linux-kernel@vger.kernel.org
In-Reply-To: <20141219095512.GH1848@nanopsycho.orion>
On 12/19/14, 1:55 AM, Jiri Pirko wrote:
> Fri, Dec 19, 2014 at 10:35:27AM CET, marichika4@gmail.com wrote:
>> On 19 December 2014 at 14:53, Jiri Pirko <jiri@resnulli.us> wrote:
>>> Fri, Dec 19, 2014 at 10:01:46AM CET, marichika4@gmail.com wrote:
>>>> On 19 December 2014 at 13:57, Jiri Pirko <jiri@resnulli.us> wrote:
>>>>> Fri, Dec 19, 2014 at 06:14:57AM CET, marichika4@gmail.com wrote:
>>>>>> On 19 December 2014 at 05:18, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>>>>>>> On 12/18/14, 3:26 PM, Samudrala, Sridhar wrote:
>>>> <snipped for ease of reading>
>>>>>>>>
>>>>>>>> We also need an interface to set per-switch attributes. Can this work?
>>>>>>>> bridge link set dev sw0 sw_attr bcast_flooding 1 master
>>>>>>>> where sw0 is a bridge representing the hardware switch.
>>>>>>>
>>>>>>> Not today. We discussed this @ LPC, and one way to do this would be to have
>>>>>>> a device
>>>>>>> representing the switch asic. This is in the works.
>>>>>>
>>>>>> Can I assume that on platforms which house more than one asic (say
>>>>>> two 24 port asics, interconnected via a 10G link or equivalent, to get
>>>>>> a 48 port 'switch') , the 'rocker' driver (or similar) should expose
>>>>>> them as a single set of ports, and not as two 'switch ports' ?
>>>>> Well that really depends on particular implementation and drivers. If you
>>>>> have 2 pci-e devices, I think you should expose them as 2 entities. For
>>>>> sure, you can have the driver to do the masking for you. I don't believe
>>>>> that is correct though.
>>>>>
>>>> In a platform that houses two asic chips, IMO, the user is still
>>>> expected to manage the router as a single entity. The configuration
>>>> being applied on both asic devices need to be matching if not
>>>> identical, and may not be conflicting. The FDB is to be synchronized
>>>> so that (offloaded) switching can happen across the asics. Some of
>>>> this stuff is asic specific anyway. Another example is that of the
>>>> learning. The (hardware) learning can't be enabled on one asic, while
>>>> being disabled on another one. The general use cases I have seen are
>>>> all involving managing the 'router' as a single entity. That the
>>>> 'router' is implemented with two asics instead of a single asic (with
>>>> more ports) is to be treated as an implementation detail. This is the
>>>> usual router management method that exists today.
>>>>
>>>> I hope I make sense.
>>>>
>>>> So I am trying to figure out what this single entity that will be used
>>> >from a user perspective. It can be a bridge, but our bridges are more
>>>> 802.1q bridges. We can use the 'self' mode, but then it means that it
>>>> should reflect the entire port count, and not just an asic.
>>>>
>>>> So I was trying to deduce that in our switchdevice model, the best bet
>>>> would be to leave the unification to the driver (i.e., to project the
>>>> multiple physical asics as a single virtual switch device). Thist
>>> Is it possible to have the asic as just single one? Or is it possible to
>>> connect asics being multiple chips maybe from multiple vendors together?
>> I didn't understand the first question. Some times, it is possible to
> I ment that there is a design with just a single asic of this type,
> instead of a pair.
>
>> have a single asic replace two, but its a cost factor, and others that
>> are involved.
>>
>> AFAIK, the answer to the second question is a No. Two asics from
>> different vendors may not be connected together. The interconnect
>> tends to be proprietary.
> Okay. In that case, it might make sense to mask it on driver level.
>
>
>>> I believe that answer is "yes" in both cases. Making two separate asics
>>> to appear as one for user is not correct in my opinion. Driver should
>>> not do such masking. It is unclean, unextendable.
>>>
>> I am only looking for a single management entity. I am not thinking it
>> needs to be at driver level. I am not sure of any other option apart
> >from creating a 'switchdev' that Roopa was mentioning.
>
> Well the thing is there is a common desire to make the offloading as
> transparent as possible. For example, have 4 ports of same switch and
> put them into br0. Just like that, without need to do anything else
> than you would do when bridging ordinary NICs. Introducing some
> "management entity" would break this approach.
>
I don't think having a switchdevice breaks this approach. A software
bridge is not a 1-1 mapping with the asic in all cases.
When its a vlan filtering bridge, yes, it is (In which case all switch
global l2 non-port specific attributes can be applied to the bridge).
The switch asic can do l2 and l3 too. For a bridge, the switch asic is
just accelerating l2.
And a switch asic is also capable of l3, acls. A switch device (whether
accessible to userspace or not)
may become necessary (as discussed in other threads) where you cannot
resolve a kernel object to a switch port (Global acl rules, unresolved
route nexthops etc).
^ permalink raw reply
* Re: [PATCH net-next RESEND] net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined.
From: Hubert Sokolowski @ 2014-12-19 15:17 UTC (permalink / raw)
To: Jamal Hadi Salim, vyasevic; +Cc: John Fastabend, Roopa Prabhu, netdev
In-Reply-To: <54935618.4070005@mojatatu.com>
On 18/12/14 22:32, Jamal Hadi Salim wrote:
> Sorry for the latency (head-buried-in-sand in effect)
> On 12/17/14 11:18, Hubert Sokolowski wrote:
>> I have just prepared a patch where I dump uc/mc for bridge devices
>> by looking at (dev->priv_flags & IFF_EBRIDGE), so I have same results
>> as without my changes. This should satisfy Jamal and Roopa.
>> I could send it as v3 of my patch along with the results if you are
>> interested.
> Please do. If you satisfy Vlad's goals then we are all happy.
Posted as v3, please review.
There is still open question I asked sometime ago but never got explained.
It is about the new filter_dev parameter that was added to ndo_fdb_dump:
int (*ndo_fdb_dump)(struct sk_buff *skb,
struct netlink_callback *cb,
struct net_device *dev,
struct net_device *filter_dev,
int idx);
When we call this function for a device, dev pointer is passed as the filter_dev:
if (dev->netdev_ops->ndo_fdb_dump)
idx = dev->netdev_ops->ndo_fdb_dump(skb, cb, bdev, dev,
idx);
This is not an issue for a bridge device and a device that is not enslaved
in a bridge because bdev == dev, but this can be dangerous in other cases.
Let's assume QLogic NIC has a master device, in this case bdev != dev.
Now look what is happening, dev is passed as filter_dev to:
static int qlcnic_fdb_dump(struct sk_buff *skb, struct netlink_callback *ncb,
struct net_device *netdev,
struct net_device *filter_dev, int idx)
{
struct qlcnic_adapter *adapter = netdev_priv(netdev);
...
netdev_priv(netdev) returns a pointer to private struct of the bridge,but the driver
is expecting it's own private stuff.
Should we fix the driver and assume filter_dev is /me and dev is our master
or the parameters were reversed and should be passed as (skb, cb, dev, bdev, idx) ?
Is this something for another patch/discussion?
regards,
Hubert
--
Hubert Sokolowski Intel Corporation
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox