Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
From: Hannes Frederic Sowa @ 2013-10-01 12:09 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert,
	eric.dumazet
In-Reply-To: <20131001105837.GA1424@minipsycho.brq.redhat.com>

On Tue, Oct 01, 2013 at 12:58:37PM +0200, Jiri Pirko wrote:
> >> 
> >> What if non-ufo-path-created skb is peeked?
> >
> >You mean like:
> >first append a frame < mtu
> >second frame > mtu so it gets handles by ip6_ufo_append?
> >
> >Currently I don't see a problem with it but I may be wrong. What is
> >your suspicion?
> 
> Well the skb will have gso_size==0 and ip_summed==CHECKSUM_NONE
> Because of that it will be not threated as ufo skb.
> 
> Following patch fixes it:
> 
> Subject: [patch net] ip6_output: do skb ufo init for peeked non ufo skb as well
> 
> Now, if user application does:
> sendto len<mtu flag MSG_MORE
> sendto len>mtu flag 0
> The skb is not treated as fragmented one because it is not initialized
> that way. So move the initialization to fix this.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

My understanding is that the frame is only a valid gso frame iff the first skb
queued up is setup as an UFO frame. In other cases we only need to make sure
we append to the fragment list without updating the gso field and the skb will
get linearized (if needed) later in the output path.

This seems not to matter for virtio_net and loopback because we seem to pass
the skb as is. But the remaining ethernet card which sets NETIF_F_UFO is
neterion and we have to verify if this assumption is correct, there.

This is also the way the IPv4 stack deals with UFO.

Either way, this needs a look from Eric Dumazet and Herbert Xu, so I added
them in Cc.

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH net 0/4] bridge: Fix problems around the PVID
From: Toshiaki Makita @ 2013-10-01 11:56 UTC (permalink / raw)
  To: vyasevic
  Cc: Toshiaki Makita, David Miller, netdev, Fernando Luis Vazquez Cao,
	Patrick McHardy
In-Reply-To: <5249A03E.4080208@redhat.com>

On Mon, 2013-09-30 at 12:01 -0400, Vlad Yasevich wrote:
> On 09/30/2013 07:46 AM, Toshiaki Makita wrote:
> > On Fri, 2013-09-27 at 14:10 -0400, Vlad Yasevich wrote:
> >> On 09/27/2013 01:11 PM, Toshiaki Makita wrote:
> >>> On Thu, 2013-09-26 at 10:22 -0400, Vlad Yasevich wrote:
> >>>> On 09/26/2013 06:38 AM, Toshiaki Makita wrote:
> >>>>> On Tue, 2013-09-24 at 13:55 -0400, Vlad Yasevich wrote:
> >>>>>> On 09/24/2013 01:30 PM, Toshiaki Makita wrote:
> >>>>>>> On Tue, 2013-09-24 at 09:35 -0400, Vlad Yasevich wrote:
> >>>>>>>> On 09/24/2013 07:45 AM, Toshiaki Makita wrote:
> >>>>>>>>> On Mon, 2013-09-23 at 10:41 -0400, Vlad Yasevich
> >>>>>>>>> wrote:
> >>>>>>>>>> On 09/17/2013 04:12 AM, Toshiaki Makita wrote:
> >>>>>>>>>>> On Mon, 2013-09-16 at 13:49 -0400, Vlad Yasevich
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>> On 09/13/2013 08:06 AM, Toshiaki Makita wrote:
> >>>>>>>>>>>>> On Thu, 2013-09-12 at 16:00 -0400, David
> >>>>>>>>>>>>> Miller wrote:
> >>>>>>>>>>>>>> From: Toshiaki Makita
> >>>>>>>>>>>>>> <makita.toshiaki@lab.ntt.co.jp> Date: Tue,
> >>>>>>>>>>>>>> 10 Sep 2013 19:27:54 +0900
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> There seem to be some undesirable
> >>>>>>>>>>>>>>> behaviors related with PVID. 1. It has no
> >>>>>>>>>>>>>>> effect assigning PVID to a port. PVID
> >>>>>>>>>>>>>>> cannot be applied to any frame regardless
> >>>>>>>>>>>>>>> of whether we set it or not. 2. FDB
> >>>>>>>>>>>>>>> entries learned via frames applied PVID
> >>>>>>>>>>>>>>> are registered with VID 0 rather than VID
> >>>>>>>>>>>>>>> value of PVID. 3. We can set 0 or 4095 as
> >>>>>>>>>>>>>>> a PVID that are not allowed in IEEE
> >>>>>>>>>>>>>>> 802.1Q. This leads interoperational
> >>>>>>>>>>>>>>> problems such as sending frames with VID
> >>>>>>>>>>>>>>> 4095, which is not allowed in IEEE
> >>>>>>>>>>>>>>> 802.1Q, and treating frames with VID 0 as
> >>>>>>>>>>>>>>> they belong to VLAN 0, which is expected
> >>>>>>>>>>>>>>> to be handled as they have no VID
> >>>>>>>>>>>>>>> according to IEEE 802.1Q.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Note: 2nd and 3rd problems are potential
> >>>>>>>>>>>>>>> and not exposed unless 1st problem is
> >>>>>>>>>>>>>>> fixed, because we cannot activate PVID
> >>>>>>>>>>>>>>> due to it.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Please work out the issues in patch #2 with
> >>>>>>>>>>>>>> Vlad and resubmit this series.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thank you.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'm hovering between whether we should fix
> >>>>>>>>>>>>> the issue by changing vlan 0 interface
> >>>>>>>>>>>>> behavior in 8021q module or enabling a bridge
> >>>>>>>>>>>>> port to sending priority-tagged frames, or
> >>>>>>>>>>>>> another better way.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If you could comment it, I'd appreciate it
> >>>>>>>>>>>>> :)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> BTW, I think what is discussed in patch #2 is
> >>>>>>>>>>>>> another problem about handling priority-tags,
> >>>>>>>>>>>>> and it exists without this patch set
> >>>>>>>>>>>>> applied. It looks like that we should prepare
> >>>>>>>>>>>>> another patch set than this to fix that
> >>>>>>>>>>>>> problem.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Should I include patches that fix the
> >>>>>>>>>>>>> priority-tags problem in this patch set and
> >>>>>>>>>>>>> resubmit them all together?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am thinking that we might need to do it in
> >>>>>>>>>>>> bridge and it looks like the simplest way to do
> >>>>>>>>>>>> it is to have default priority regeneration
> >>>>>>>>>>>> table (table 6-5 from 802.1Q doc).
> >>>>>>>>>>>>
> >>>>>>>>>>>> That way I think we would conform to the spec.
> >>>>>>>>>>>>
> >>>>>>>>>>>> -vlad
> >>>>>>>>>>>
> >>>>>>>>>>> Unfortunately I don't think the default priority
> >>>>>>>>>>> regeneration table resolves the problem because
> >>>>>>>>>>> IEEE 802.1Q says that a VLAN-aware bridge can
> >>>>>>>>>>> transmit untagged or VLAN-tagged frames only (the
> >>>>>>>>>>> end of section 7.5 and 8.1.7).
> >>>>>>>>>>>
> >>>>>>>>>>> No mechanism to send priority-tagged frames is
> >>>>>>>>>>> found as far as I can see the standard. I think
> >>>>>>>>>>> the regenerated priority is used for outgoing
> >>>>>>>>>>> PCP field only if egress policy is not untagged
> >>>>>>>>>>> (i.e. transmitting as VLAN-tagged), and unused if
> >>>>>>>>>>> untagged (Section 6.9.2 3rd/4th Paragraph).
> >>>>>>>>>>>
> >>>>>>>>>>> If we want to transmit priority-tagged frames
> >>>>>>>>>>> from a bridge port, I think we need to implement
> >>>>>>>>>>> a new (optional) feature that is above the
> >>>>>>>>>>> standard, as I stated previously.
> >>>>>>>>>>>
> >>>>>>>>>>> How do you feel about adding a per-port policy
> >>>>>>>>>>> that enables a bridge to send priority-tagged
> >>>>>>>>>>> frames instead of untagged frames when egress
> >>>>>>>>>>> policy for the port is untagged? With this
> >>>>>>>>>>> change, we can transmit frames for a given vlan
> >>>>>>>>>>> as either all untagged, all priority-tagged or
> >>>>>>>>>>> all VLAN-tagged.
> >>>>>>>>>>
> >>>>>>>>>> That would work.  What I am thinking is that we do
> >>>>>>>>>> it by special casing the vid 0 egress policy
> >>>>>>>>>> specification.  Let it be untagged by default and
> >>>>>>>>>> if it is tagged, then we preserve the priority
> >>>>>>>>>> field and forward it on.
> >>>>>>>>>>
> >>>>>>>>>> This keeps the API stable and doesn't require
> >>>>>>>>>> user/admin from knowing exactly what happens.
> >>>>>>>>>> Default operation conforms to the spec and allows
> >>>>>>>>>> simple change to make it backward-compatible.
> >>>>>>>>>>
> >>>>>>>>>> What do you think.  I've done a simple prototype of
> >>>>>>>>>> this an it seems to work with the VMs I am testing
> >>>>>>>>>> with.
> >>>>>>>>>
> >>>>>>>>> Are you saying that - by default, set the 0th bit of
> >>>>>>>>> untagged_bitmap; and - if we unset the 0th bit and
> >>>>>>>>> set the "vid"th bit, we transmit frames classified as
> >>>>>>>>> belonging to VLAN "vid" as priority-tagged?
> >>>>>>>>>
> >>>>>>>>> If so, though it's attractive to keep current API,
> >>>>>>>>> I'm worried about if it could be a bit confusing and
> >>>>>>>>> not intuitive for kernel/iproute2 developers that VID
> >>>>>>>>> 0 has a special meaning only in the egress policy.
> >>>>>>>>> Wouldn't it be better to adding a new member to
> >>>>>>>>> struct net_port_vlans instead of using VID 0 of
> >>>>>>>>> untagged_bitmap?
> >>>>>>>>>
> >>>>>>>>> Or are you saying that we use a new flag in struct
> >>>>>>>>> net_port_vlans but use the BRIDGE_VLAN_INFO_UNTAGGED
> >>>>>>>>> bit with VID 0 in netlink to set the flag?
> >>>>>>>>>
> >>>>>>>>> Even in that case, I'm afraid that it might be
> >>>>>>>>> confusing for developers for the same reason. We are
> >>>>>>>>> going to prohibit to specify VID with 0 (and 4095) in
> >>>>>>>>> adding/deleting a FDB entry or a vlan filtering
> >>>>>>>>> entry, but it would allow us to use VID 0 only when a
> >>>>>>>>> vlan filtering entry is configured. I am thinking a
> >>>>>>>>> new nlattr is a straightforward approach to
> >>>>>>>>> configure it.
> >>>>>>>>
> >>>>>>>> By making this an explicit attribute it makes vid 0 a
> >>>>>>>> special case for any automatic tool that would
> >>>>>>>> provision such filtering.  Seeing vid 0 would mean that
> >>>>>>>> these tools would have to know that this would have to
> >>>>>>>> be translated to a different attribute instead of
> >>>>>>>> setting the policy values.
> >>>>>>>
> >>>>>>> Yes, I agree with you that we can do it by the way you
> >>>>>>> explained. What I don't understand is the advantage of
> >>>>>>> using vid 0 over another way such as adding a new
> >>>>>>> nlattr. I think we can indicate transmitting
> >>>>>>> priority-tags explicitly by such a nlattr. Using vid 0
> >>>>>>> seems to be easier to implement than a new nlattr, but,
> >>>>>>> for me, it looks less intuitive and more difficult to
> >>>>>>> maintain because we have to care about vid 0 instead of
> >>>>>>> simply ignoring it.
> >>>>>>>
> >>>>>>
> >>>>>> The point I am trying to make is that regardless of the
> >>>>>> approach someone has to know what to do when enabling
> >>>>>> priority tagged frames.  You proposal would require the
> >>>>>> administrator or config tool to have that knowledge.
> >>>>>> Example is: Admin does: bridge vlan set priority on dev
> >>>>>> eth0 Automated app: if (vid == 0) /* Turn on priority
> >>>>>> tagged frame support */
> >>>>>>
> >>>>>> My proposal would require the bridge filtering
> >>>>>> implementation to have it. user tool: bridge vlan add vid 0
> >>>>>> tagged Automated app:  No special case.
> >>>>>>
> >>>>>> IMO its better to have 1 piece code handling the special
> >>>>>> case then putting it multiple places.
> >>>>>
> >>>>> Thank you for the detailed explanation. Now I understand your
> >>>>> intention.
> >>>>>
> >>>>> I have one question about your proposal. I guess the way to
> >>>>> enable priority-tagged is something like bridge vlan add vid
> >>>>> 10 dev eth0 bridge vlan add vid 10 dev vnet0 pvid untagged
> >>>>> bridge vlan add vid 0 dev vnet0 tagged where vnet0 has sub
> >>>>> interface vnet0.0.
> >>>>>
> >>>>> Here the admin have to know the egress policy is applied to a
> >>>>> frame twice in a certain order when it is transmitted from
> >>>>> the port vnet0 attached, that is, first, a frame with vid 10
> >>>>> get untagged, and then, an untagged frame get
> >>>>> priority-tagged.
> >>>>>
> >>>>> This behavior looks difficult to know without previous
> >>>>> knowledge. Any good idea to avoid such a need for the admin's
> >>>>> additional knowledge?
> >>>>
> >>>> To me, the fact that there is vnet0.0 (or typically, there is
> >>>> eth0.0 in the guest or on the remote host) already tells the
> >>>> admin vlan 0 has to be tagged.  The fact that we codify this in
> >>>> the policy makes it explicit.
> >>>
> >>> My worry is that the admin might not be able to guess how to use
> >>> bridge commands to enable priority-tag without any additional
> >>> hint in "man bridge", "bridge vlan help", etc. I actually
> >>> couldn't hit upon such a usage before seeing example commands you
> >>> gave, because I had never think the egress policy could be
> >>> applied twice.
> >>>
> >>>>
> >>>> However, I can see strong argument to be made for an addition
> >>>> egress policy attribute that could be for instance:
> >>>>
> >>>> bridge vlan add vid 10 dev eth0 pvid bridge vlan add vid 10 dev
> >>>> vnet0 pvid untagged prio_tag
> >>>>
> >>>> But this has the same connotations as wrt to egress policy.
> >>>> The 2 policies are applied: (1) untag the frame. (2) add
> >>>> priority_tag.
> >>>>
> >>>> (2) only happens if initial fame received on eth0 was priority
> >>>> tagged.
> >>>
> >>> If we do so, we will not be able to communicate using vlan 0
> >>> interface under a certain circumstance. Eth0 can be receive mixed
> >>> untagged and priority-tagged frames according to the network
> >>> element it is connected to: for example, Open vSwitch can send
> >>> such two kinds of frames from the same port even if original
> >>> incoming frames belong to the same vlan.
> >>
> >> Which priority would you assign to the frame that was received
> >> untagged?
> >
> > Untagged frame's priority is by default 0, so I think 0 is likely.
> >
> > 802.1Q 6.9.1 i) The received priority value and the drop_eligible
> > parameter value are the values in the M_UNITDATA.indication.
> >
> > M_UNITDATA is passed from ISS.
> >
> > 802.1Q 6.7.1 The priority parameter provided in a data indication
> > primitive shall take the value of the Default User Priority parameter
> > for the Port through which the MAC frame was received. The default
> > value of this parameter is 0, it may be set by management in which
> > case the capability to set it to any of the values 0 through 7 shall
> > be provided.
> >
> >>
> >>> In this situation, we can only receive frames that is
> >>> priority-tagged when received on eth0.
> >>
> >> Not sure I understand.  Let's look at this config: bridge vlan add
> >> vid 10 dev eth0 pvid bridge vlan add vid 10 dev vnet0 pvid untagged
> >> prio_tag
> >>
> >> Here, eth0 is allowed to receive vid 10 tagged, untagged, and
> >> prio_tagged (if we look at the patch 2 from this set). Now, frame
> >> is forwarded to vnet0. 1) if the frame had vid 10 in the tag or was
> >> untagged, it should probably be sent untagged. 2) if the frame had
> >> a priority tag, it should probably be sent as such.
> >>
> >> Now, I think a case could be made that if the frame had any
> >> priority markings in the vlan header, we should try to preserve
> >> those markings if prio_tag is turned on.  We can assume value of 0
> >> means not set.
> >
> > If we don't insert prio_tag when PCP is 0, we might receive mixed
> > priority-tagged and untagged frames on eth0.
> 
> Right, and that's what you were trying to handle in your patch:
> 
> > +		/* PVID is set on this port.  Any untagged or priority-tagged +
> > * ingress frame is considered to belong to this vlan. */
> 
> So, in this case we are prepared to handle the "mixed" scenario on ingress.
> 
> > Even if we are sending frames from eth0.0 with some priority other
> > than 0, we could receive frames with priority 0 or untagged on the
> > other side of the bridge.
> > For example, if we receive untagged arp reply on the bridge port, we
> > migit not be able to communicate with such an end station, because
> > untagged reply will not be passed to eth0.0.
> 
> So the ARP request was sent tagged, but the reply came back untagged?

Yes, it can happen.
These are problematic cases.

Example 1:
            prio_tagged         prio_tagged
+------------+ ---> +------------+ ---> +----------+
|guest eth0.0|------|host1 Bridge|------|host2 eth0|
+------------+ <--- +------------+ <--- +----------+
             untagged            untagged

Note: Host2 eth0, which is an interface on Linux, can receive
priority-tagged frames if it doesn't have vlan 0 interface (eth0.0).


Example 2:

            prio_tagged         prio_tagged       untagged
+------------+ ---> +------------+ ---> +---------+ ---> +-----+
|guest eth0.0|------|host1 Bridge|------|HW switch|------|host2|
+------------+ <--- +------------+ <--- +---------+ <--- +-----+
             untagged            untagged       prio/untagged

Note: 802.1Q conformed HW switch transmits only untagged or VLAN-tagged
frames.

> 
> How does that work when the end station is attached directly to the
> HW switch instead of a linux bridge?
> 
> The station configures eth0.0 and sends priority-tagged traffic to
> the HW switch.  If the HW switch sends back untagged traffic, then
> the untagged traffic will never reach eth0.0.

Currently we cannot communicate using eth0.0 via directly connected
802.1Q conformed switch, because we never receive priority-tagged frames
from the switch.
It is not a problem of Linux bridge and is why I wondered whether it
should be fixed by bridge or vlan 0 interface.

> 
> >
> >>
> >>> IMO, if prio_tag is configured, the bridge should send any
> >>> untagged frame as priority-tagged regardless of whatever it is on
> >>> eth0.
> >>
> >> Which priority would you use, 0?  You are not guaranteed to
> >> properly deliver the traffic then for a configuration such as: VM:
> >> eth0: 10.0.0.1/24 eth0.0: 10.0.1.1/24
> >
> > I'd like to use priority 0 for untagged frames.
> >
> > I am assuming that one of our goals is at least that eth0.0 comes to
> > be able to communicate with another end station. It seems to be hard
> > to use both eth0 and eth0.0 simultaneously.
> 
> I understand, but I don't agree that we should always tag.
> 
> Consider config:
> 
>     hw switch <---> (eth0: Linux Bridge: eth1) <--- (em1.0:end station)
> 
> If the end station sends priority-tagged traffic it should receive
> priority tagged traffic back.  Otherwise, untagged traffic may be 
> dropped by the end station.  This is true whether it is connected to
> the hw switch or Linux bridge.

Though such a behavior is generally not necessary as far as I can read
802.1Q spec, it is essential for vlan 0 interface on Linux, I think.
My proposal aims to resolve it at least when we use Linux bridge.

Example configuration:
	bridge vlan add vid 10 dev eth1 pvid untagged
	bridge vlan add vid 10 dev eth0
	bridge vlan set prio_tag on dev eth1

Intended behavior:

        VID10-tagged                     prio_tagged
+---------+ <--- +------------------------+ <--- +-----------------+
|hw switch|------|eth0: Linux Bridge: eth1|------|em1.0:end station|
+---------+ ---> +------------------------+ ---> +-----------------+
        VID10-tagged                     prio_tagged
                              (always if egress policy untagged)

Thanks,

Toshiaki Makita

> 
> -vlad
> 
> >
> > Thanks,
> >
> > Toshiaki Makita
> >
> >>
> >> -vlad
> >>
> >>>
> >>>>
> >>>> I think I am ok with either approach.  Explicit vid 0 policy is
> >>>> easier for automatic provisioning.   The flag based one is
> >>>> easier for admin/ manual provisioning.
> >>>
> >>> Supposing we have to add something to help or man in any case, I
> >>> think flag based is better. The format below seems to suitable
> >>> for a per-port policy. bridge vlan set prio_tag on dev vnet0
> >>>
> >>> Thanks,
> >>>
> >>> Toshiaki Makita
> >>>
> >>>>
> >>>> -vlad.
> >>>>
> >>>> -vlad
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>> Thanks -vlad
> >>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Toshiaki Makita
> >>>>>>>
> >>>>>>>>
> >>>>>>>> How it is implemented internally in the kernel isn't as
> >>>>>>>> big of an issue. We can do it as a separate flag or as
> >>>>>>>> part of existing policy.
> >>>>>>>>
> >>>>>>>> -vlad
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>>
> >>>>>>>>> Toshiaki Makita
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> -vlad
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>>
> >>>>>>>>>>> Toshiaki Makita
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Toshiaki Makita
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> -- To unsubscribe from this list: send the
> >>>>>>>>>>>>>> line "unsubscribe netdev" in the body of a
> >>>>>>>>>>>>>> message to majordomo@vger.kernel.org More
> >>>>>>>>>>>>>> majordomo info at
> >>>>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> -- To unsubscribe from this list: send the line
> >>>>>>>>>>> "unsubscribe netdev" in the body of a message to
> >>>>>>>>>>> majordomo@vger.kernel.org More majordomo info at
> >>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >
> >

^ permalink raw reply

* [net-next  9/9] igb: Add ethtool support to configure number of channels
From: Jeff Kirsher @ 2013-10-01 11:33 UTC (permalink / raw)
  To: davem; +Cc: Laura Mihaela Vasilescu, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1380627236-3190-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Laura Mihaela Vasilescu <laura.vasilescu@rosedu.org>

This patch adds the ethtool callbacks necessary to configure the
number of RSS queues.

The maximum number of queues is in accordance with the datasheets.

Signed-off-by: Laura Mihaela Vasilescu <laura.vasilescu@rosedu.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h         |  1 +
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 84 ++++++++++++++++++++++++++++
 drivers/net/ethernet/intel/igb/igb_main.c    | 22 ++++++++
 3 files changed, 107 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index cdaa2bc..5e9ed89 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -487,6 +487,7 @@ int igb_up(struct igb_adapter *);
 void igb_down(struct igb_adapter *);
 void igb_reinit_locked(struct igb_adapter *);
 void igb_reset(struct igb_adapter *);
+int igb_reinit_queues(struct igb_adapter *);
 void igb_write_rss_indir_tbl(struct igb_adapter *);
 int igb_set_spd_dplx(struct igb_adapter *, u32, u8);
 int igb_setup_tx_resources(struct igb_ring *);
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index e4c77c0..c8f65e5 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -2874,6 +2874,88 @@ static int igb_set_rxfh_indir(struct net_device *netdev, const u32 *indir)
 	return 0;
 }
 
+static unsigned int igb_max_channels(struct igb_adapter *adapter)
+{
+	struct e1000_hw *hw = &adapter->hw;
+	unsigned int max_combined = 0;
+
+	switch (hw->mac.type) {
+	case e1000_i211:
+		max_combined = IGB_MAX_RX_QUEUES_I211;
+		break;
+	case e1000_82575:
+	case e1000_i210:
+		max_combined = IGB_MAX_RX_QUEUES_82575;
+		break;
+	case e1000_i350:
+		if (!!adapter->vfs_allocated_count) {
+			max_combined = 1;
+			break;
+		}
+		/* fall through */
+	case e1000_82576:
+		if (!!adapter->vfs_allocated_count) {
+			max_combined = 2;
+			break;
+		}
+		/* fall through */
+	case e1000_82580:
+	case e1000_i354:
+	default:
+		max_combined = IGB_MAX_RX_QUEUES;
+		break;
+	}
+
+	return max_combined;
+}
+
+static void igb_get_channels(struct net_device *netdev,
+			     struct ethtool_channels *ch)
+{
+	struct igb_adapter *adapter = netdev_priv(netdev);
+
+	/* Report maximum channels */
+	ch->max_combined = igb_max_channels(adapter);
+
+	/* Report info for other vector */
+	if (adapter->msix_entries) {
+		ch->max_other = NON_Q_VECTORS;
+		ch->other_count = NON_Q_VECTORS;
+	}
+
+	ch->combined_count = adapter->rss_queues;
+}
+
+static int igb_set_channels(struct net_device *netdev,
+			    struct ethtool_channels *ch)
+{
+	struct igb_adapter *adapter = netdev_priv(netdev);
+	unsigned int count = ch->combined_count;
+
+	/* Verify they are not requesting separate vectors */
+	if (!count || ch->rx_count || ch->tx_count)
+		return -EINVAL;
+
+	/* Verify other_count is valid and has not been changed */
+	if (ch->other_count != NON_Q_VECTORS)
+		return -EINVAL;
+
+	/* Verify the number of channels doesn't exceed hw limits */
+	if (count > igb_max_channels(adapter))
+		return -EINVAL;
+
+	if (count != adapter->rss_queues) {
+		adapter->rss_queues = count;
+
+		/* Hardware has to reinitialize queues and interrupts to
+		 * match the new configuration.
+		 */
+		return igb_reinit_queues(adapter);
+	}
+
+	return 0;
+}
+
 static const struct ethtool_ops igb_ethtool_ops = {
 	.get_settings		= igb_get_settings,
 	.set_settings		= igb_set_settings,
@@ -2910,6 +2992,8 @@ static const struct ethtool_ops igb_ethtool_ops = {
 	.get_rxfh_indir_size	= igb_get_rxfh_indir_size,
 	.get_rxfh_indir		= igb_get_rxfh_indir,
 	.set_rxfh_indir		= igb_set_rxfh_indir,
+	.get_channels		= igb_get_channels,
+	.set_channels		= igb_set_channels,
 	.begin			= igb_ethtool_begin,
 	.complete		= igb_ethtool_complete,
 };
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 8cf44f2..a56266e 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -7838,4 +7838,26 @@ s32 igb_write_i2c_byte(struct e1000_hw *hw, u8 byte_offset,
 		return E1000_SUCCESS;
 
 }
+
+int igb_reinit_queues(struct igb_adapter *adapter)
+{
+	struct net_device *netdev = adapter->netdev;
+	struct pci_dev *pdev = adapter->pdev;
+	int err = 0;
+
+	if (netif_running(netdev))
+		igb_close(netdev);
+
+	igb_clear_interrupt_scheme(adapter);
+
+	if (igb_init_interrupt_scheme(adapter, true)) {
+		dev_err(&pdev->dev, "Unable to allocate memory for queues\n");
+		return -ENOMEM;
+	}
+
+	if (netif_running(netdev))
+		err = igb_open(netdev);
+
+	return err;
+}
 /* igb_main.c */
-- 
1.8.3.1

^ permalink raw reply related

* [net-next  8/9] igb: Add ethtool offline tests for i354
From: Jeff Kirsher @ 2013-10-01 11:33 UTC (permalink / raw)
  To: davem; +Cc: Fujinaka, Todd, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1380627236-3190-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: "Fujinaka, Todd" <todd.fujinaka@intel.com>

Add the ethtool offline tests for i354 devices.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_ethtool.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 48cbc83..e4c77c0 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -1656,7 +1656,8 @@ static int igb_setup_loopback_test(struct igb_adapter *adapter)
 		if ((hw->device_id == E1000_DEV_ID_DH89XXCC_SGMII) ||
 		(hw->device_id == E1000_DEV_ID_DH89XXCC_SERDES) ||
 		(hw->device_id == E1000_DEV_ID_DH89XXCC_BACKPLANE) ||
-		(hw->device_id == E1000_DEV_ID_DH89XXCC_SFP)) {
+		(hw->device_id == E1000_DEV_ID_DH89XXCC_SFP) ||
+		(hw->device_id == E1000_DEV_ID_I354_SGMII)) {
 
 			/* Enable DH89xxCC MPHY for near end loopback */
 			reg = rd32(E1000_MPHY_ADDR_CTL);
@@ -1722,7 +1723,8 @@ static void igb_loopback_cleanup(struct igb_adapter *adapter)
 	if ((hw->device_id == E1000_DEV_ID_DH89XXCC_SGMII) ||
 	(hw->device_id == E1000_DEV_ID_DH89XXCC_SERDES) ||
 	(hw->device_id == E1000_DEV_ID_DH89XXCC_BACKPLANE) ||
-	(hw->device_id == E1000_DEV_ID_DH89XXCC_SFP)) {
+	(hw->device_id == E1000_DEV_ID_DH89XXCC_SFP) ||
+	(hw->device_id == E1000_DEV_ID_I354_SGMII)) {
 		u32 reg;
 
 		/* Disable near end loopback on DH89xxCC */
-- 
1.8.3.1

^ permalink raw reply related

* [net-next  7/9] ixgbe: remove marketing names from busy poll code
From: Jeff Kirsher @ 2013-10-01 11:33 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1380627236-3190-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

This patch renames the LL_EXTENDED_STATS and some of the functions required to
implement busy polling in the ixgbe driver, in order to remove the marketing
"low latency" blurb which hides what the code actually does.

This furthers work which was requested by Linus Torvalds when the initial busy
poll code was included in the kernel. The code in the ixgbe driver itself was
never properly renamed to reflect the change to busy polling as the title.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         | 14 ++++++------
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 28 ++++++++++++------------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  4 ++--
 3 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 3637841..dc1588e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -55,7 +55,7 @@
 #include <net/busy_poll.h>
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
-#define LL_EXTENDED_STATS
+#define BP_EXTENDED_STATS
 #endif
 /* common prefix used by pr_<> macros */
 #undef pr_fmt
@@ -187,11 +187,11 @@ struct ixgbe_rx_buffer {
 struct ixgbe_queue_stats {
 	u64 packets;
 	u64 bytes;
-#ifdef LL_EXTENDED_STATS
+#ifdef BP_EXTENDED_STATS
 	u64 yields;
 	u64 misses;
 	u64 cleaned;
-#endif  /* LL_EXTENDED_STATS */
+#endif  /* BP_EXTENDED_STATS */
 };
 
 struct ixgbe_tx_queue_stats {
@@ -399,7 +399,7 @@ static inline bool ixgbe_qv_lock_napi(struct ixgbe_q_vector *q_vector)
 		WARN_ON(q_vector->state & IXGBE_QV_STATE_NAPI);
 		q_vector->state |= IXGBE_QV_STATE_NAPI_YIELD;
 		rc = false;
-#ifdef LL_EXTENDED_STATS
+#ifdef BP_EXTENDED_STATS
 		q_vector->tx.ring->stats.yields++;
 #endif
 	} else
@@ -432,7 +432,7 @@ static inline bool ixgbe_qv_lock_poll(struct ixgbe_q_vector *q_vector)
 	if ((q_vector->state & IXGBE_QV_LOCKED)) {
 		q_vector->state |= IXGBE_QV_STATE_POLL_YIELD;
 		rc = false;
-#ifdef LL_EXTENDED_STATS
+#ifdef BP_EXTENDED_STATS
 		q_vector->rx.ring->stats.yields++;
 #endif
 	} else
@@ -457,7 +457,7 @@ static inline bool ixgbe_qv_unlock_poll(struct ixgbe_q_vector *q_vector)
 }
 
 /* true if a socket is polling, even if it did not get the lock */
-static inline bool ixgbe_qv_ll_polling(struct ixgbe_q_vector *q_vector)
+static inline bool ixgbe_qv_busy_polling(struct ixgbe_q_vector *q_vector)
 {
 	WARN_ON(!(q_vector->state & IXGBE_QV_LOCKED));
 	return q_vector->state & IXGBE_QV_USER_PEND;
@@ -487,7 +487,7 @@ static inline bool ixgbe_qv_unlock_poll(struct ixgbe_q_vector *q_vector)
 	return false;
 }
 
-static inline bool ixgbe_qv_ll_polling(struct ixgbe_q_vector *q_vector)
+static inline bool ixgbe_qv_busy_polling(struct ixgbe_q_vector *q_vector)
 {
 	return false;
 }
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 27c2032..90aac31 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -1117,7 +1117,7 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i] = 0;
 			data[i+1] = 0;
 			i += 2;
-#ifdef LL_EXTENDED_STATS
+#ifdef BP_EXTENDED_STATS
 			data[i] = 0;
 			data[i+1] = 0;
 			data[i+2] = 0;
@@ -1132,7 +1132,7 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i+1] = ring->stats.bytes;
 		} while (u64_stats_fetch_retry_bh(&ring->syncp, start));
 		i += 2;
-#ifdef LL_EXTENDED_STATS
+#ifdef BP_EXTENDED_STATS
 		data[i] = ring->stats.yields;
 		data[i+1] = ring->stats.misses;
 		data[i+2] = ring->stats.cleaned;
@@ -1145,7 +1145,7 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i] = 0;
 			data[i+1] = 0;
 			i += 2;
-#ifdef LL_EXTENDED_STATS
+#ifdef BP_EXTENDED_STATS
 			data[i] = 0;
 			data[i+1] = 0;
 			data[i+2] = 0;
@@ -1160,7 +1160,7 @@ static void ixgbe_get_ethtool_stats(struct net_device *netdev,
 			data[i+1] = ring->stats.bytes;
 		} while (u64_stats_fetch_retry_bh(&ring->syncp, start));
 		i += 2;
-#ifdef LL_EXTENDED_STATS
+#ifdef BP_EXTENDED_STATS
 		data[i] = ring->stats.yields;
 		data[i+1] = ring->stats.misses;
 		data[i+2] = ring->stats.cleaned;
@@ -1202,28 +1202,28 @@ static void ixgbe_get_strings(struct net_device *netdev, u32 stringset,
 			p += ETH_GSTRING_LEN;
 			sprintf(p, "tx_queue_%u_bytes", i);
 			p += ETH_GSTRING_LEN;
-#ifdef LL_EXTENDED_STATS
-			sprintf(p, "tx_queue_%u_ll_napi_yield", i);
+#ifdef BP_EXTENDED_STATS
+			sprintf(p, "tx_queue_%u_bp_napi_yield", i);
 			p += ETH_GSTRING_LEN;
-			sprintf(p, "tx_queue_%u_ll_misses", i);
+			sprintf(p, "tx_queue_%u_bp_misses", i);
 			p += ETH_GSTRING_LEN;
-			sprintf(p, "tx_queue_%u_ll_cleaned", i);
+			sprintf(p, "tx_queue_%u_bp_cleaned", i);
 			p += ETH_GSTRING_LEN;
-#endif /* LL_EXTENDED_STATS */
+#endif /* BP_EXTENDED_STATS */
 		}
 		for (i = 0; i < IXGBE_NUM_RX_QUEUES; i++) {
 			sprintf(p, "rx_queue_%u_packets", i);
 			p += ETH_GSTRING_LEN;
 			sprintf(p, "rx_queue_%u_bytes", i);
 			p += ETH_GSTRING_LEN;
-#ifdef LL_EXTENDED_STATS
-			sprintf(p, "rx_queue_%u_ll_poll_yield", i);
+#ifdef BP_EXTENDED_STATS
+			sprintf(p, "rx_queue_%u_bp_poll_yield", i);
 			p += ETH_GSTRING_LEN;
-			sprintf(p, "rx_queue_%u_ll_misses", i);
+			sprintf(p, "rx_queue_%u_bp_misses", i);
 			p += ETH_GSTRING_LEN;
-			sprintf(p, "rx_queue_%u_ll_cleaned", i);
+			sprintf(p, "rx_queue_%u_bp_cleaned", i);
 			p += ETH_GSTRING_LEN;
-#endif /* LL_EXTENDED_STATS */
+#endif /* BP_EXTENDED_STATS */
 		}
 		for (i = 0; i < IXGBE_MAX_PACKET_BUFFERS; i++) {
 			sprintf(p, "tx_pb_%u_pxon", i);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 0ade0cd..43b777a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1585,7 +1585,7 @@ static void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 
-	if (ixgbe_qv_ll_polling(q_vector))
+	if (ixgbe_qv_busy_polling(q_vector))
 		netif_receive_skb(skb);
 	else if (!(adapter->flags & IXGBE_FLAG_IN_NETPOLL))
 		napi_gro_receive(&q_vector->napi, skb);
@@ -2097,7 +2097,7 @@ static int ixgbe_low_latency_recv(struct napi_struct *napi)
 
 	ixgbe_for_each_ring(ring, q_vector->rx) {
 		found = ixgbe_clean_rx_irq(q_vector, ring, 4);
-#ifdef LL_EXTENDED_STATS
+#ifdef BP_EXTENDED_STATS
 		if (found)
 			ring->stats.cleaned += found;
 		else
-- 
1.8.3.1

^ permalink raw reply related

* [net-next  6/9] ixgbe: Cleanup the use of tabs and spaces
From: Jeff Kirsher @ 2013-10-01 11:33 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann
In-Reply-To: <1380627236-3190-1-git-send-email-jeffrey.t.kirsher@intel.com>

Cleans up the whitespace issues noticed during code review where
a mix of tabs and spaces were used for indentation.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_phy.h  | 40 +++++++++++++--------------
 drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c | 12 ++++----
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.h
index 24af12e..aae900a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_phy.h
@@ -57,28 +57,28 @@
 #define IXGBE_SFF_QSFP_DEVICE_TECH	0x93
 
 /* Bitmasks */
-#define IXGBE_SFF_DA_PASSIVE_CABLE           0x4
-#define IXGBE_SFF_DA_ACTIVE_CABLE            0x8
-#define IXGBE_SFF_DA_SPEC_ACTIVE_LIMITING    0x4
-#define IXGBE_SFF_1GBASESX_CAPABLE           0x1
-#define IXGBE_SFF_1GBASELX_CAPABLE           0x2
-#define IXGBE_SFF_1GBASET_CAPABLE            0x8
-#define IXGBE_SFF_10GBASESR_CAPABLE          0x10
-#define IXGBE_SFF_10GBASELR_CAPABLE          0x20
-#define IXGBE_SFF_SOFT_RS_SELECT_MASK	0x8
-#define IXGBE_SFF_SOFT_RS_SELECT_10G	0x8
-#define IXGBE_SFF_SOFT_RS_SELECT_1G	0x0
-#define IXGBE_SFF_ADDRESSING_MODE	     0x4
-#define IXGBE_SFF_QSFP_DA_ACTIVE_CABLE       0x1
-#define IXGBE_SFF_QSFP_DA_PASSIVE_CABLE      0x8
+#define IXGBE_SFF_DA_PASSIVE_CABLE		0x4
+#define IXGBE_SFF_DA_ACTIVE_CABLE		0x8
+#define IXGBE_SFF_DA_SPEC_ACTIVE_LIMITING	0x4
+#define IXGBE_SFF_1GBASESX_CAPABLE		0x1
+#define IXGBE_SFF_1GBASELX_CAPABLE		0x2
+#define IXGBE_SFF_1GBASET_CAPABLE		0x8
+#define IXGBE_SFF_10GBASESR_CAPABLE		0x10
+#define IXGBE_SFF_10GBASELR_CAPABLE		0x20
+#define IXGBE_SFF_SOFT_RS_SELECT_MASK		0x8
+#define IXGBE_SFF_SOFT_RS_SELECT_10G		0x8
+#define IXGBE_SFF_SOFT_RS_SELECT_1G		0x0
+#define IXGBE_SFF_ADDRESSING_MODE		0x4
+#define IXGBE_SFF_QSFP_DA_ACTIVE_CABLE		0x1
+#define IXGBE_SFF_QSFP_DA_PASSIVE_CABLE		0x8
 #define IXGBE_SFF_QSFP_CONNECTOR_NOT_SEPARABLE	0x23
 #define IXGBE_SFF_QSFP_TRANSMITER_850NM_VCSEL	0x0
-#define IXGBE_I2C_EEPROM_READ_MASK           0x100
-#define IXGBE_I2C_EEPROM_STATUS_MASK         0x3
-#define IXGBE_I2C_EEPROM_STATUS_NO_OPERATION 0x0
-#define IXGBE_I2C_EEPROM_STATUS_PASS         0x1
-#define IXGBE_I2C_EEPROM_STATUS_FAIL         0x2
-#define IXGBE_I2C_EEPROM_STATUS_IN_PROGRESS  0x3
+#define IXGBE_I2C_EEPROM_READ_MASK		0x100
+#define IXGBE_I2C_EEPROM_STATUS_MASK		0x3
+#define IXGBE_I2C_EEPROM_STATUS_NO_OPERATION	0x0
+#define IXGBE_I2C_EEPROM_STATUS_PASS		0x1
+#define IXGBE_I2C_EEPROM_STATUS_FAIL		0x2
+#define IXGBE_I2C_EEPROM_STATUS_IN_PROGRESS	0x3
 
 /* Flow control defines */
 #define IXGBE_TAF_SYM_PAUSE                  0x400
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c
index 389324f..24b80a6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c
@@ -32,12 +32,12 @@
 #include "ixgbe.h"
 #include "ixgbe_phy.h"
 
-#define IXGBE_X540_MAX_TX_QUEUES 128
-#define IXGBE_X540_MAX_RX_QUEUES 128
-#define IXGBE_X540_RAR_ENTRIES   128
-#define IXGBE_X540_MC_TBL_SIZE   128
-#define IXGBE_X540_VFT_TBL_SIZE  128
-#define IXGBE_X540_RX_PB_SIZE	 384
+#define IXGBE_X540_MAX_TX_QUEUES	128
+#define IXGBE_X540_MAX_RX_QUEUES	128
+#define IXGBE_X540_RAR_ENTRIES		128
+#define IXGBE_X540_MC_TBL_SIZE		128
+#define IXGBE_X540_VFT_TBL_SIZE		128
+#define IXGBE_X540_RX_PB_SIZE		384
 
 static s32 ixgbe_update_flash_X540(struct ixgbe_hw *hw);
 static s32 ixgbe_poll_flash_update_done_X540(struct ixgbe_hw *hw);
-- 
1.8.3.1

^ permalink raw reply related

* [net-next  5/9] ixgbe: ethtool DCB registers dump for 82599 and x540
From: Jeff Kirsher @ 2013-10-01 11:33 UTC (permalink / raw)
  To: davem
  Cc: Leonardo Potenza, netdev, gospo, sassmann, Maryam Tahhan,
	Jeff Kirsher
In-Reply-To: <1380627236-3190-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Leonardo Potenza <leonardo.potenza@intel.com>

Added support for DCB registers dump using ethtool -d option both for
82599 and x540 ethernet controllers

Signed-off-by: Leonardo Potenza <leonardo.potenza@intel.com>
Signed-off-by: Maryam Tahhan <maryam.tahhan@intel.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Jack Morgan <jack.morgan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 75 +++++++++++++++++++-----
 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h    |  5 ++
 2 files changed, 65 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index e8649ab..27c2032 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -442,7 +442,7 @@ static void ixgbe_set_msglevel(struct net_device *netdev, u32 data)
 
 static int ixgbe_get_regs_len(struct net_device *netdev)
 {
-#define IXGBE_REGS_LEN  1129
+#define IXGBE_REGS_LEN  1139
 	return IXGBE_REGS_LEN * sizeof(u32);
 }
 
@@ -602,22 +602,53 @@ static void ixgbe_get_regs(struct net_device *netdev,
 	regs_buff[828] = IXGBE_READ_REG(hw, IXGBE_FHFT(0));
 
 	/* DCB */
-	regs_buff[829] = IXGBE_READ_REG(hw, IXGBE_RMCS);
-	regs_buff[830] = IXGBE_READ_REG(hw, IXGBE_DPMCS);
-	regs_buff[831] = IXGBE_READ_REG(hw, IXGBE_PDPMCS);
-	regs_buff[832] = IXGBE_READ_REG(hw, IXGBE_RUPPBMR);
-	for (i = 0; i < 8; i++)
-		regs_buff[833 + i] = IXGBE_READ_REG(hw, IXGBE_RT2CR(i));
-	for (i = 0; i < 8; i++)
-		regs_buff[841 + i] = IXGBE_READ_REG(hw, IXGBE_RT2SR(i));
-	for (i = 0; i < 8; i++)
-		regs_buff[849 + i] = IXGBE_READ_REG(hw, IXGBE_TDTQ2TCCR(i));
-	for (i = 0; i < 8; i++)
-		regs_buff[857 + i] = IXGBE_READ_REG(hw, IXGBE_TDTQ2TCSR(i));
+	regs_buff[829] = IXGBE_READ_REG(hw, IXGBE_RMCS);   /* same as FCCFG  */
+	regs_buff[831] = IXGBE_READ_REG(hw, IXGBE_PDPMCS); /* same as RTTPCS */
+
+	switch (hw->mac.type) {
+	case ixgbe_mac_82598EB:
+		regs_buff[830] = IXGBE_READ_REG(hw, IXGBE_DPMCS);
+		regs_buff[832] = IXGBE_READ_REG(hw, IXGBE_RUPPBMR);
+		for (i = 0; i < 8; i++)
+			regs_buff[833 + i] =
+				IXGBE_READ_REG(hw, IXGBE_RT2CR(i));
+		for (i = 0; i < 8; i++)
+			regs_buff[841 + i] =
+				IXGBE_READ_REG(hw, IXGBE_RT2SR(i));
+		for (i = 0; i < 8; i++)
+			regs_buff[849 + i] =
+				IXGBE_READ_REG(hw, IXGBE_TDTQ2TCCR(i));
+		for (i = 0; i < 8; i++)
+			regs_buff[857 + i] =
+				IXGBE_READ_REG(hw, IXGBE_TDTQ2TCSR(i));
+		break;
+	case ixgbe_mac_82599EB:
+	case ixgbe_mac_X540:
+		regs_buff[830] = IXGBE_READ_REG(hw, IXGBE_RTTDCS);
+		regs_buff[832] = IXGBE_READ_REG(hw, IXGBE_RTRPCS);
+		for (i = 0; i < 8; i++)
+			regs_buff[833 + i] =
+				IXGBE_READ_REG(hw, IXGBE_RTRPT4C(i));
+		for (i = 0; i < 8; i++)
+			regs_buff[841 + i] =
+				IXGBE_READ_REG(hw, IXGBE_RTRPT4S(i));
+		for (i = 0; i < 8; i++)
+			regs_buff[849 + i] =
+				IXGBE_READ_REG(hw, IXGBE_RTTDT2C(i));
+		for (i = 0; i < 8; i++)
+			regs_buff[857 + i] =
+				IXGBE_READ_REG(hw, IXGBE_RTTDT2S(i));
+		break;
+	default:
+		break;
+	}
+
 	for (i = 0; i < 8; i++)
-		regs_buff[865 + i] = IXGBE_READ_REG(hw, IXGBE_TDPT2TCCR(i));
+		regs_buff[865 + i] =
+		IXGBE_READ_REG(hw, IXGBE_TDPT2TCCR(i)); /* same as RTTPT2C */
 	for (i = 0; i < 8; i++)
-		regs_buff[873 + i] = IXGBE_READ_REG(hw, IXGBE_TDPT2TCSR(i));
+		regs_buff[873 + i] =
+		IXGBE_READ_REG(hw, IXGBE_TDPT2TCSR(i)); /* same as RTTPT2S */
 
 	/* Statistics */
 	regs_buff[881] = IXGBE_GET_STAT(adapter, crcerrs);
@@ -757,6 +788,20 @@ static void ixgbe_get_regs(struct net_device *netdev,
 
 	/* 82599 X540 specific registers  */
 	regs_buff[1128] = IXGBE_READ_REG(hw, IXGBE_MFLCN);
+
+	/* 82599 X540 specific DCB registers  */
+	regs_buff[1129] = IXGBE_READ_REG(hw, IXGBE_RTRUP2TC);
+	regs_buff[1130] = IXGBE_READ_REG(hw, IXGBE_RTTUP2TC);
+	for (i = 0; i < 4; i++)
+		regs_buff[1131 + i] = IXGBE_READ_REG(hw, IXGBE_TXLLQ(i));
+	regs_buff[1135] = IXGBE_READ_REG(hw, IXGBE_RTTBCNRM);
+					/* same as RTTQCNRM */
+	regs_buff[1136] = IXGBE_READ_REG(hw, IXGBE_RTTBCNRD);
+					/* same as RTTQCNRR */
+
+	/* X540 specific DCB registers  */
+	regs_buff[1137] = IXGBE_READ_REG(hw, IXGBE_RTTQCNCR);
+	regs_buff[1138] = IXGBE_READ_REG(hw, IXGBE_RTTQCNTG);
 }
 
 static int ixgbe_get_eeprom_len(struct net_device *netdev)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
index 10775cb..7c19e96 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
@@ -561,6 +561,10 @@ struct ixgbe_thermal_sensor_data {
 #define IXGBE_RTTDQSEL    0x04904
 #define IXGBE_RTTDT1C     0x04908
 #define IXGBE_RTTDT1S     0x0490C
+#define IXGBE_RTTQCNCR    0x08B00
+#define IXGBE_RTTQCNTG    0x04A90
+#define IXGBE_RTTBCNRD    0x0498C
+#define IXGBE_RTTQCNRR    0x0498C
 #define IXGBE_RTTDTECC    0x04990
 #define IXGBE_RTTDTECC_NO_BCN   0x00000100
 #define IXGBE_RTTBCNRC    0x04984
@@ -570,6 +574,7 @@ struct ixgbe_thermal_sensor_data {
 #define IXGBE_RTTBCNRC_RF_INT_MASK	\
 	(IXGBE_RTTBCNRC_RF_DEC_MASK << IXGBE_RTTBCNRC_RF_INT_SHIFT)
 #define IXGBE_RTTBCNRM    0x04980
+#define IXGBE_RTTQCNRM    0x04980
 
 /* FCoE DMA Context Registers */
 #define IXGBE_FCPTRL    0x02410 /* FC User Desc. PTR Low */
-- 
1.8.3.1

^ permalink raw reply related

* [net-next  4/9] ixgbevf: move API neg to reset path
From: Jeff Kirsher @ 2013-10-01 11:33 UTC (permalink / raw)
  To: davem; +Cc: Don Skidmore, netdev, gospo, sassmann, Alexander Duyck,
	Jeff Kirsher
In-Reply-To: <1380627236-3190-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Don Skidmore <donald.c.skidmore@intel.com>

After this patch the API negotiation will occur in the reset path.  So now
the PF will be informed of the API version earlier.  This will also require
the mailbox lock to be initialize sooner as well.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index e3493b0..ce27d62 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1544,8 +1544,6 @@ void ixgbevf_up(struct ixgbevf_adapter *adapter)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
 
-	ixgbevf_negotiate_api(adapter);
-
 	ixgbevf_reset_queues(adapter);
 
 	ixgbevf_configure(adapter);
@@ -1735,10 +1733,12 @@ void ixgbevf_reset(struct ixgbevf_adapter *adapter)
 	struct ixgbe_hw *hw = &adapter->hw;
 	struct net_device *netdev = adapter->netdev;
 
-	if (hw->mac.ops.reset_hw(hw))
+	if (hw->mac.ops.reset_hw(hw)) {
 		hw_dbg(hw, "PF still resetting\n");
-	else
+	} else {
 		hw->mac.ops.init_hw(hw);
+		ixgbevf_negotiate_api(adapter);
+	}
 
 	if (is_valid_ether_addr(adapter->hw.mac.addr)) {
 		memcpy(netdev->dev_addr, adapter->hw.mac.addr,
@@ -2074,6 +2074,9 @@ static int ixgbevf_sw_init(struct ixgbevf_adapter *adapter)
 	hw->mac.max_tx_queues = 2;
 	hw->mac.max_rx_queues = 2;
 
+	/* lock to protect mailbox accesses */
+	spin_lock_init(&adapter->mbx_lock);
+
 	err = hw->mac.ops.reset_hw(hw);
 	if (err) {
 		dev_info(&pdev->dev,
@@ -2084,6 +2087,7 @@ static int ixgbevf_sw_init(struct ixgbevf_adapter *adapter)
 			pr_err("init_shared_code failed: %d\n", err);
 			goto out;
 		}
+		ixgbevf_negotiate_api(adapter);
 		err = hw->mac.ops.get_mac_addr(hw, hw->mac.addr);
 		if (err)
 			dev_info(&pdev->dev, "Error reading MAC address\n");
@@ -2099,9 +2103,6 @@ static int ixgbevf_sw_init(struct ixgbevf_adapter *adapter)
 		memcpy(hw->mac.addr, netdev->dev_addr, netdev->addr_len);
 	}
 
-	/* lock to protect mailbox accesses */
-	spin_lock_init(&adapter->mbx_lock);
-
 	/* Enable dynamic interrupt throttling rates */
 	adapter->rx_itr_setting = 1;
 	adapter->tx_itr_setting = 1;
@@ -2622,8 +2623,6 @@ static int ixgbevf_open(struct net_device *netdev)
 		}
 	}
 
-	ixgbevf_negotiate_api(adapter);
-
 	/* setup queue reg_idx and Rx queue count */
 	err = ixgbevf_setup_queues(adapter);
 	if (err)
@@ -3218,6 +3217,8 @@ static int ixgbevf_resume(struct pci_dev *pdev)
 	}
 	pci_set_master(pdev);
 
+	ixgbevf_reset(adapter);
+
 	rtnl_lock();
 	err = ixgbevf_init_interrupt_scheme(adapter);
 	rtnl_unlock();
@@ -3226,8 +3227,6 @@ static int ixgbevf_resume(struct pci_dev *pdev)
 		return err;
 	}
 
-	ixgbevf_reset(adapter);
-
 	if (netif_running(netdev)) {
 		err = ixgbevf_open(netdev);
 		if (err)
-- 
1.8.3.1

^ permalink raw reply related

* [net-next  3/9] ixgbevf: add wait for Rx queue disable
From: Jeff Kirsher @ 2013-10-01 11:33 UTC (permalink / raw)
  To: davem; +Cc: Don Skidmore, netdev, gospo, sassmann, Alexander Duyck,
	Jeff Kirsher
In-Reply-To: <1380627236-3190-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Don Skidmore <donald.c.skidmore@intel.com>

New function was added to wait for Rx queues to be disabled before
disabling NAPI. This function also allows us to  modify
ixgbevf_rx_desc_queue_enable() to better match ixgbe.  I also cleaned up
some msleep calls to usleep_range while I was in this code anyway.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 61 ++++++++++++++++-------
 1 file changed, 44 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index b9a78a2..e3493b0 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1302,27 +1302,51 @@ static void ixgbevf_configure(struct ixgbevf_adapter *adapter)
 	}
 }
 
-#define IXGBE_MAX_RX_DESC_POLL 10
-static inline void ixgbevf_rx_desc_queue_enable(struct ixgbevf_adapter *adapter,
-						int rxr)
+#define IXGBEVF_MAX_RX_DESC_POLL 10
+static void ixgbevf_rx_desc_queue_enable(struct ixgbevf_adapter *adapter,
+					 int rxr)
 {
 	struct ixgbe_hw *hw = &adapter->hw;
+	int wait_loop = IXGBEVF_MAX_RX_DESC_POLL;
+	u32 rxdctl;
 	int j = adapter->rx_ring[rxr].reg_idx;
-	int k;
 
-	for (k = 0; k < IXGBE_MAX_RX_DESC_POLL; k++) {
-		if (IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(j)) & IXGBE_RXDCTL_ENABLE)
-			break;
-		else
-			msleep(1);
-	}
-	if (k >= IXGBE_MAX_RX_DESC_POLL) {
-		hw_dbg(hw, "RXDCTL.ENABLE on Rx queue %d "
-		       "not set within the polling period\n", rxr);
-	}
+	do {
+		usleep_range(1000, 2000);
+		rxdctl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(j));
+	} while (--wait_loop && !(rxdctl & IXGBE_RXDCTL_ENABLE));
+
+	if (!wait_loop)
+		hw_dbg(hw, "RXDCTL.ENABLE queue %d not set while polling\n",
+		       rxr);
+
+	ixgbevf_release_rx_desc(&adapter->hw, &adapter->rx_ring[rxr],
+				(adapter->rx_ring[rxr].count - 1));
+}
 
-	ixgbevf_release_rx_desc(hw, &adapter->rx_ring[rxr],
-				adapter->rx_ring[rxr].count - 1);
+static void ixgbevf_disable_rx_queue(struct ixgbevf_adapter *adapter,
+				     struct ixgbevf_ring *ring)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	int wait_loop = IXGBEVF_MAX_RX_DESC_POLL;
+	u32 rxdctl;
+	u8 reg_idx = ring->reg_idx;
+
+	rxdctl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(reg_idx));
+	rxdctl &= ~IXGBE_RXDCTL_ENABLE;
+
+	/* write value back with RXDCTL.ENABLE bit cleared */
+	IXGBE_WRITE_REG(hw, IXGBE_VFRXDCTL(reg_idx), rxdctl);
+
+	/* the hardware may take up to 100us to really disable the rx queue */
+	do {
+		udelay(10);
+		rxdctl = IXGBE_READ_REG(hw, IXGBE_VFRXDCTL(reg_idx));
+	} while (--wait_loop && (rxdctl & IXGBE_RXDCTL_ENABLE));
+
+	if (!wait_loop)
+		hw_dbg(hw, "RXDCTL.ENABLE queue %d not cleared while polling\n",
+		       reg_idx);
 }
 
 static void ixgbevf_save_reset_stats(struct ixgbevf_adapter *adapter)
@@ -1654,7 +1678,10 @@ void ixgbevf_down(struct ixgbevf_adapter *adapter)
 
 	/* signal that we are down to the interrupt handler */
 	set_bit(__IXGBEVF_DOWN, &adapter->state);
-	/* disable receives */
+
+	/* disable all enabled rx queues */
+	for (i = 0; i < adapter->num_rx_queues; i++)
+		ixgbevf_disable_rx_queue(adapter, &adapter->rx_ring[i]);
 
 	netif_tx_disable(netdev);
 
-- 
1.8.3.1

^ permalink raw reply related

* [net-next  2/9] ixgbevf: cleanup redundant mailbox read failure check
From: Jeff Kirsher @ 2013-10-01 11:33 UTC (permalink / raw)
  To: davem; +Cc: Don Skidmore, netdev, gospo, sassmann, Alexander Duyck,
	Jeff Kirsher
In-Reply-To: <1380627236-3190-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Don Skidmore <donald.c.skidmore@intel.com>

Since we are already checking for read failure in check_link we don't need
to do it here. Instead just make sure the watchdog task gets scheduled, if
we are up, and it can be done there. This will better follow igbvf method
of handling a mailbox event and message timeout.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 29 ++---------------------
 1 file changed, 2 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 59a62bb..b9a78a2 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -756,37 +756,12 @@ static void ixgbevf_set_itr(struct ixgbevf_q_vector *q_vector)
 static irqreturn_t ixgbevf_msix_other(int irq, void *data)
 {
 	struct ixgbevf_adapter *adapter = data;
-	struct pci_dev *pdev = adapter->pdev;
 	struct ixgbe_hw *hw = &adapter->hw;
-	u32 msg;
-	bool got_ack = false;
 
 	hw->mac.get_link_status = 1;
-	if (!hw->mbx.ops.check_for_ack(hw))
-		got_ack = true;
-
-	if (!hw->mbx.ops.check_for_msg(hw)) {
-		hw->mbx.ops.read(hw, &msg, 1);
 
-		if ((msg & IXGBE_MBVFICR_VFREQ_MASK) == IXGBE_PF_CONTROL_MSG) {
-			mod_timer(&adapter->watchdog_timer,
-				  round_jiffies(jiffies + 1));
-			adapter->link_up = false;
-		}
-
-		if (msg & IXGBE_VT_MSGTYPE_NACK)
-			dev_info(&pdev->dev,
-				 "Last Request of type %2.2x to PF Nacked\n",
-				 msg & 0xFF);
-		hw->mbx.v2p_mailbox |= IXGBE_VFMAILBOX_PFSTS;
-	}
-
-	/* checking for the ack clears the PFACK bit.  Place
-	 * it back in the v2p_mailbox cache so that anyone
-	 * polling for an ack will not miss it
-	 */
-	if (got_ack)
-		hw->mbx.v2p_mailbox |= IXGBE_VFMAILBOX_PFACK;
+	if (!test_bit(__IXGBEVF_DOWN, &adapter->state))
+		mod_timer(&adapter->watchdog_timer, jiffies);
 
 	IXGBE_WRITE_REG(hw, IXGBE_VTEIMS, adapter->eims_other);
 
-- 
1.8.3.1

^ permalink raw reply related

* [net-next  1/9] ixgbevf: do not print registers to dmesg in ixgbevf_get_regs
From: Jeff Kirsher @ 2013-10-01 11:33 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1380627236-3190-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

This patch removes the use of hw_dbg in ixgbevf when the ixgbe_get_regs function
is called from ethtool. This goes along side a patch to ethtool which enables
proper support for ixgbevf pretty-printing of registers.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbevf/ethtool.c | 55 +---------------------------
 1 file changed, 2 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ethtool.c b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
index c9d0c12..84329b0 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ethtool.c
@@ -140,58 +140,10 @@ static void ixgbevf_set_msglevel(struct net_device *netdev, u32 data)
 
 #define IXGBE_GET_STAT(_A_, _R_) (_A_->stats._R_)
 
-static char *ixgbevf_reg_names[] = {
-	"IXGBE_VFCTRL",
-	"IXGBE_VFSTATUS",
-	"IXGBE_VFLINKS",
-	"IXGBE_VFRXMEMWRAP",
-	"IXGBE_VFFRTIMER",
-	"IXGBE_VTEICR",
-	"IXGBE_VTEICS",
-	"IXGBE_VTEIMS",
-	"IXGBE_VTEIMC",
-	"IXGBE_VTEIAC",
-	"IXGBE_VTEIAM",
-	"IXGBE_VTEITR",
-	"IXGBE_VTIVAR",
-	"IXGBE_VTIVAR_MISC",
-	"IXGBE_VFRDBAL0",
-	"IXGBE_VFRDBAL1",
-	"IXGBE_VFRDBAH0",
-	"IXGBE_VFRDBAH1",
-	"IXGBE_VFRDLEN0",
-	"IXGBE_VFRDLEN1",
-	"IXGBE_VFRDH0",
-	"IXGBE_VFRDH1",
-	"IXGBE_VFRDT0",
-	"IXGBE_VFRDT1",
-	"IXGBE_VFRXDCTL0",
-	"IXGBE_VFRXDCTL1",
-	"IXGBE_VFSRRCTL0",
-	"IXGBE_VFSRRCTL1",
-	"IXGBE_VFPSRTYPE",
-	"IXGBE_VFTDBAL0",
-	"IXGBE_VFTDBAL1",
-	"IXGBE_VFTDBAH0",
-	"IXGBE_VFTDBAH1",
-	"IXGBE_VFTDLEN0",
-	"IXGBE_VFTDLEN1",
-	"IXGBE_VFTDH0",
-	"IXGBE_VFTDH1",
-	"IXGBE_VFTDT0",
-	"IXGBE_VFTDT1",
-	"IXGBE_VFTXDCTL0",
-	"IXGBE_VFTXDCTL1",
-	"IXGBE_VFTDWBAL0",
-	"IXGBE_VFTDWBAL1",
-	"IXGBE_VFTDWBAH0",
-	"IXGBE_VFTDWBAH1"
-};
-
-
 static int ixgbevf_get_regs_len(struct net_device *netdev)
 {
-	return (ARRAY_SIZE(ixgbevf_reg_names)) * sizeof(u32);
+#define IXGBE_REGS_LEN 45
+	return IXGBE_REGS_LEN * sizeof(u32);
 }
 
 static void ixgbevf_get_regs(struct net_device *netdev,
@@ -264,9 +216,6 @@ static void ixgbevf_get_regs(struct net_device *netdev,
 		regs_buff[41 + i] = IXGBE_READ_REG(hw, IXGBE_VFTDWBAL(i));
 	for (i = 0; i < 2; i++)
 		regs_buff[43 + i] = IXGBE_READ_REG(hw, IXGBE_VFTDWBAH(i));
-
-	for (i = 0; i < ARRAY_SIZE(ixgbevf_reg_names); i++)
-		hw_dbg(hw, "%s\t%8.8x\n", ixgbevf_reg_names[i], regs_buff[i]);
 }
 
 static void ixgbevf_get_drvinfo(struct net_device *netdev,
-- 
1.8.3.1

^ permalink raw reply related

* [net-next  0/9][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2013-10-01 11:33 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series contains updates to ixgbevf, ixgbe and igb.

Don provides 3 patches for ixgbevf where he cleans up a redundant
read mailbox failure check, adds a new function to wait for receive
queues to be disabled before disabling NAPI, and move the API
negotiation so that it occurs in the reset path.  This will allow
the PF to be informed of the API version earlier.

Jacob provides a ixgbevf and ixgbe patch.  His ixgbevf patch removes
the use of hw_dbg when the ixgbe_get_regs function is called in ethtool.
The ixgbe patch renames the LL_EXTENDED_STATS and some of the functions
required to implement busy polling in order to remove the marketing
"low latency" blurb which hides what the code actually does.

Leonardo provides a ixgbe patch to add support for DCB registers dump
using ethtool for 82599 and x540 ethernet controllers.

I (Jeff) provide a ixgbe patch to cleanup whitespace issues seen in a
code review.

Todd provides for igb to add support for i354 in the ethtool offline
tests.

Laura provides an igb patch to add the ethtool callbacks necessary to
configure the number of RSS queues.

The following are changes since commit b32418705107265dfca5edfe2b547643e53a732e:
  bonding: RCUify bond_set_rx_mode()
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Don Skidmore (3):
  ixgbevf: cleanup redundant mailbox read failure check
  ixgbevf: add wait for Rx queue disable
  ixgbevf: move API neg to reset path

Fujinaka, Todd (1):
  igb: Add ethtool offline tests for i354

Jacob Keller (2):
  ixgbevf: do not print registers to dmesg in ixgbevf_get_regs
  ixgbe: remove marketing names from busy poll code

Jeff Kirsher (1):
  ixgbe: Cleanup the use of tabs and spaces

Laura Mihaela Vasilescu (1):
  igb: Add ethtool support to configure number of channels

Leonardo Potenza (1):
  ixgbe: ethtool DCB registers dump for 82599 and x540

 drivers/net/ethernet/intel/igb/igb.h              |   1 +
 drivers/net/ethernet/intel/igb/igb_ethtool.c      |  90 +++++++++++++++++-
 drivers/net/ethernet/intel/igb/igb_main.c         |  22 +++++
 drivers/net/ethernet/intel/ixgbe/ixgbe.h          |  14 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c  | 103 ++++++++++++++------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     |   4 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_phy.h      |  40 ++++----
 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h     |   5 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_x540.c     |  12 +--
 drivers/net/ethernet/intel/ixgbevf/ethtool.c      |  55 +----------
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 111 +++++++++++-----------
 11 files changed, 283 insertions(+), 174 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [PATCH net 4/4] ip_tunnel: Remove double unregister of the fallback device
From: Nicolas Dichtel @ 2013-10-01 11:30 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, Pravin Shelar, netdev
In-Reply-To: <20131001104046.GK7660@secunet.com>

Le 01/10/2013 12:40, Steffen Klassert a écrit :
> On Tue, Oct 01, 2013 at 12:00:44PM +0200, Nicolas Dichtel wrote:
>>
>> Note that there is the same problem in
>> net/ipv6/ip6_tunnel.c:ip6_tnl_destroy_tunnels() introduced by
>> 0bd8762824e7
>> ("ip6tnl: add x-netns support").
>>
>> Should I send a patch for it or would you take care of it?
>
> I don't mind either way. But you noticed the problem, so
> feel free to send a patch yourself.
>
Ok, I will dot it.

Thank you

^ permalink raw reply

* Re: [PATCHv2 net-next] xfrm: Simplify SA looking up when using wildcard source
From: Steffen Klassert @ 2013-10-01 11:18 UTC (permalink / raw)
  To: Fan Du; +Cc: davem, netdev
In-Reply-To: <1380270770-19089-1-git-send-email-fan.du@windriver.com>

On Fri, Sep 27, 2013 at 04:32:50PM +0800, Fan Du wrote:
> __xfrm4/6_state_addr_check is a four steps check, all we need to do
> is checking whether the destination address match when looking SA
> using wildcard source address. Passing saddr from flow is worst option,
> as the checking needs to reach the fourth step while actually only
> one time checking will do the work.
> 
> So, simplify this process by only checking destination address when
> using wildcard source address for looking up SAs.
> 
> Signed-off-by: Fan Du <fan.du@windriver.com>

Applied to ipsec-next, thanks a lot!

^ permalink raw reply

* Re: [PATCH net-next] xfrm: Force SA to be lookup again if SA in acquire state
From: Steffen Klassert @ 2013-10-01 11:17 UTC (permalink / raw)
  To: Fan Du; +Cc: davem, netdev
In-Reply-To: <20130926090301.GC7660@secunet.com>

On Thu, Sep 26, 2013 at 11:03:01AM +0200, Steffen Klassert wrote:
> On Mon, Sep 23, 2013 at 05:18:25PM +0800, Fan Du wrote:
> > If SA is in the process of acquiring, which indicates this SA is more
> > promising and precise than the fall back option, i.e. using wild card
> > source address for searching less suitable SA.
> > 
> > So, here bail out, and try again.
> > 
> > Signed-off-by: Fan Du <fan.du@windriver.com>
> 
> This looks ok, I'll take this into ipsec-next as soon as I can
> update the tree.

Now applied to ipsec-next, thanks!

^ permalink raw reply

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
From: Jiri Pirko @ 2013-10-01 10:58 UTC (permalink / raw)
  To: netdev, yoshfuji, davem; +Cc: kuznet, jmorris, kaber
In-Reply-To: <20130930172312.GE10771@order.stressinduktion.org>

>> 
>> What if non-ufo-path-created skb is peeked?
>
>You mean like:
>first append a frame < mtu
>second frame > mtu so it gets handles by ip6_ufo_append?
>
>Currently I don't see a problem with it but I may be wrong. What is
>your suspicion?

Well the skb will have gso_size==0 and ip_summed==CHECKSUM_NONE
Because of that it will be not threated as ufo skb.

Following patch fixes it:

Subject: [patch net] ip6_output: do skb ufo init for peeked non ufo skb as well

Now, if user application does:
sendto len<mtu flag MSG_MORE
sendto len>mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/ipv6/ip6_output.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a54c45c..c6cfa2f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1008,6 +1008,7 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 
 {
 	struct sk_buff *skb;
+	struct frag_hdr fhdr;
 	int err;
 
 	/* There is support for UDP large send offload by network
@@ -1015,8 +1016,6 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 	 * udp datagram
 	 */
 	if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL) {
-		struct frag_hdr fhdr;
-
 		skb = sock_alloc_send_skb(sk,
 			hh_len + fragheaderlen + transhdrlen + 20,
 			(flags & MSG_DONTWAIT), &err);
@@ -1036,20 +1035,23 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 		skb->transport_header = skb->network_header + fragheaderlen;
 
 		skb->protocol = htons(ETH_P_IPV6);
-		skb->ip_summed = CHECKSUM_PARTIAL;
 		skb->csum = 0;
 
-		/* Specify the length of each IPv6 datagram fragment.
-		 * It has to be a multiple of 8.
-		 */
-		skb_shinfo(skb)->gso_size = (mtu - fragheaderlen -
-					     sizeof(struct frag_hdr)) & ~7;
-		skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
-		ipv6_select_ident(&fhdr, rt);
-		skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
 		__skb_queue_tail(&sk->sk_write_queue, skb);
-	}
+	} else if (skb_is_gso(skb))
+		goto append;
+
+	skb->ip_summed = CHECKSUM_PARTIAL;
+	/* Specify the length of each IPv6 datagram fragment.
+	 * It has to be a multiple of 8.
+	 */
+	skb_shinfo(skb)->gso_size = (mtu - fragheaderlen -
+				     sizeof(struct frag_hdr)) & ~7;
+	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
+	ipv6_select_ident(&fhdr, rt);
+	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
 
+append:
 	return skb_append_datato_frags(sk, skb, getfrag, from,
 				       (length - transhdrlen));
 }
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH net 4/4] ip_tunnel: Remove double unregister of the fallback device
From: Steffen Klassert @ 2013-10-01 10:40 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: David Miller, Pravin Shelar, netdev
In-Reply-To: <524A9D4C.9030900@6wind.com>

On Tue, Oct 01, 2013 at 12:00:44PM +0200, Nicolas Dichtel wrote:
> 
> Note that there is the same problem in
> net/ipv6/ip6_tunnel.c:ip6_tnl_destroy_tunnels() introduced by
> 0bd8762824e7
> ("ip6tnl: add x-netns support").
> 
> Should I send a patch for it or would you take care of it?

I don't mind either way. But you noticed the problem, so
feel free to send a patch yourself.

^ permalink raw reply

* [PATCH next 6/6] be2net: add a counter for pkts dropped in xmit path
From: Sathya Perla @ 2013-10-01 10:30 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1380623401-15630-1-git-send-email-sathya.perla@emulex.com>

In the xmit path, the driver may drop some pkts due to reasons such as
DMA mapping errors, out of memory conditions or to protect HW from
unrecoverable errors. Add a counter in TX-stats for such drops.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be.h         |    1 +
 drivers/net/ethernet/emulex/benet/be_ethtool.c |    4 +++-
 drivers/net/ethernet/emulex/benet/be_main.c    |    5 ++++-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
index e7cbc56..21b064f 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -225,6 +225,7 @@ struct be_tx_stats {
 	u64 tx_compl;
 	ulong tx_jiffies;
 	u32 tx_stops;
+	u32 tx_drv_drops;	/* pkts dropped by driver */
 	struct u64_stats_sync sync;
 	struct u64_stats_sync sync_compl;
 };
diff --git a/drivers/net/ethernet/emulex/benet/be_ethtool.c b/drivers/net/ethernet/emulex/benet/be_ethtool.c
index a08783c..3dcf817 100644
--- a/drivers/net/ethernet/emulex/benet/be_ethtool.c
+++ b/drivers/net/ethernet/emulex/benet/be_ethtool.c
@@ -155,7 +155,9 @@ static const struct be_ethtool_stat et_tx_stats[] = {
 	/* Number of times the TX queue was stopped due to lack
 	 * of spaces in the TXQ.
 	 */
-	{DRVSTAT_TX_INFO(tx_stops)}
+	{DRVSTAT_TX_INFO(tx_stops)},
+	/* Pkts dropped in the driver's transmit path */
+	{DRVSTAT_TX_INFO(tx_drv_drops)}
 };
 #define ETHTOOL_TXSTATS_NUM (ARRAY_SIZE(et_tx_stats))
 
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 6691d75..0a168e3 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -935,8 +935,10 @@ static netdev_tx_t be_xmit(struct sk_buff *skb, struct net_device *netdev)
 	u32 start = txq->head;
 
 	skb = be_xmit_workarounds(adapter, skb, &skip_hw_vlan);
-	if (!skb)
+	if (!skb) {
+		tx_stats(txo)->tx_drv_drops++;
 		return NETDEV_TX_OK;
+	}
 
 	wrb_cnt = wrb_cnt_for_skb(adapter, skb, &dummy_wrb);
 
@@ -965,6 +967,7 @@ static netdev_tx_t be_xmit(struct sk_buff *skb, struct net_device *netdev)
 		be_tx_stats_update(txo, wrb_cnt, copied, gso_segs, stopped);
 	} else {
 		txq->head = start;
+		tx_stats(txo)->tx_drv_drops++;
 		dev_kfree_skb_any(skb);
 	}
 	return NETDEV_TX_OK;
-- 
1.7.1

^ permalink raw reply related

* [PATCH next 5/6] be2net: fix adaptive interrupt coalescing
From: Sathya Perla @ 2013-10-01 10:30 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1380623401-15630-1-git-send-email-sathya.perla@emulex.com>

The current EQ delay calculation for AIC is based only on RX packet rate.
This fails to be effective when there's only TX and no RX.
This patch inclues:
- Calculating EQ-delay based on both RX and TX pps.
- Modifying EQ-delay of all EQs via one cmd, instead of issuing a separate
  cmd for each EQ.
- A new structure to store interrupt coalescing parameters, in a separate
  cache-line from the EQ-obj.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be.h         |   17 +++-
 drivers/net/ethernet/emulex/benet/be_cmds.c    |   17 ++--
 drivers/net/ethernet/emulex/benet/be_cmds.h    |   15 ++--
 drivers/net/ethernet/emulex/benet/be_ethtool.c |   29 ++++---
 drivers/net/ethernet/emulex/benet/be_main.c    |  110 +++++++++++++++---------
 5 files changed, 115 insertions(+), 73 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
index 4a540af..e7cbc56 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -88,7 +88,7 @@ static inline char *nic_name(struct pci_dev *pdev)
 #define BE_MIN_MTU		256
 
 #define BE_NUM_VLANS_SUPPORTED	64
-#define BE_MAX_EQD		96u
+#define BE_MAX_EQD		128u
 #define	BE_MAX_TX_FRAG_COUNT	30
 
 #define EVNT_Q_LEN		1024
@@ -200,6 +200,17 @@ struct be_eq_obj {
 	struct be_adapter *adapter;
 } ____cacheline_aligned_in_smp;
 
+struct be_aic_obj {		/* Adaptive interrupt coalescing (AIC) info */
+	bool enable;
+	u32 min_eqd;		/* in usecs */
+	u32 max_eqd;		/* in usecs */
+	u32 prev_eqd;		/* in usecs */
+	u32 et_eqd;		/* configured val when aic is off */
+	ulong jiffies;
+	u64 rx_pkts_prev;	/* Used to calculate RX pps */
+	u64 tx_reqs_prev;	/* Used to calculate TX pps */
+};
+
 struct be_mcc_obj {
 	struct be_queue_info q;
 	struct be_queue_info cq;
@@ -238,15 +249,12 @@ struct be_rx_page_info {
 struct be_rx_stats {
 	u64 rx_bytes;
 	u64 rx_pkts;
-	u64 rx_pkts_prev;
-	ulong rx_jiffies;
 	u32 rx_drops_no_skbs;	/* skb allocation errors */
 	u32 rx_drops_no_frags;	/* HW has no fetched frags */
 	u32 rx_post_fail;	/* page post alloc failures */
 	u32 rx_compl;
 	u32 rx_mcast_pkts;
 	u32 rx_compl_err;	/* completions with err set */
-	u32 rx_pps;		/* pkts per second */
 	struct u64_stats_sync sync;
 };
 
@@ -403,6 +411,7 @@ struct be_adapter {
 	u32 big_page_size;	/* Compounded page size shared by rx wrbs */
 
 	struct be_drv_stats drv_stats;
+	struct be_aic_obj aic_obj[MAX_EVT_QS];
 	u16 vlans_added;
 	u8 vlan_tag[VLAN_N_VID];
 	u8 vlan_prio_bmap;	/* Available Priority BitMap */
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c
index 8610530..b282487 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -1716,11 +1716,12 @@ err:
 /* set the EQ delay interval of an EQ to specified value
  * Uses async mcc
  */
-int be_cmd_modify_eqd(struct be_adapter *adapter, u32 eq_id, u32 eqd)
+int be_cmd_modify_eqd(struct be_adapter *adapter, struct be_set_eqd *set_eqd,
+		      int num)
 {
 	struct be_mcc_wrb *wrb;
 	struct be_cmd_req_modify_eq_delay *req;
-	int status = 0;
+	int status = 0, i;
 
 	spin_lock_bh(&adapter->mcc_lock);
 
@@ -1734,13 +1735,15 @@ int be_cmd_modify_eqd(struct be_adapter *adapter, u32 eq_id, u32 eqd)
 	be_wrb_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_COMMON,
 		OPCODE_COMMON_MODIFY_EQ_DELAY, sizeof(*req), wrb, NULL);
 
-	req->num_eq = cpu_to_le32(1);
-	req->delay[0].eq_id = cpu_to_le32(eq_id);
-	req->delay[0].phase = 0;
-	req->delay[0].delay_multiplier = cpu_to_le32(eqd);
+	req->num_eq = cpu_to_le32(num);
+	for (i = 0; i < num; i++) {
+		req->set_eqd[i].eq_id = cpu_to_le32(set_eqd[i].eq_id);
+		req->set_eqd[i].phase = 0;
+		req->set_eqd[i].delay_multiplier =
+				cpu_to_le32(set_eqd[i].delay_multiplier);
+	}
 
 	be_mcc_notify(adapter);
-
 err:
 	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.h b/drivers/net/ethernet/emulex/benet/be_cmds.h
index 84f8c52..70c3017 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.h
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.h
@@ -1055,14 +1055,16 @@ struct be_cmd_resp_get_flow_control {
 } __packed;
 
 /******************** Modify EQ Delay *******************/
+struct be_set_eqd {
+	u32 eq_id;
+	u32 phase;
+	u32 delay_multiplier;
+};
+
 struct be_cmd_req_modify_eq_delay {
 	struct be_cmd_req_hdr hdr;
 	u32 num_eq;
-	struct {
-		u32 eq_id;
-		u32 phase;
-		u32 delay_multiplier;
-	} delay[8];
+	struct be_set_eqd set_eqd[MAX_EVT_QS];
 } __packed;
 
 struct be_cmd_resp_modify_eq_delay {
@@ -1894,8 +1896,7 @@ int lancer_cmd_get_pport_stats(struct be_adapter *adapter,
 			       struct be_dma_mem *nonemb_cmd);
 int be_cmd_get_fw_ver(struct be_adapter *adapter, char *fw_ver,
 		      char *fw_on_flash);
-
-int be_cmd_modify_eqd(struct be_adapter *adapter, u32 eq_id, u32 eqd);
+int be_cmd_modify_eqd(struct be_adapter *adapter, struct be_set_eqd *, int num);
 int be_cmd_vlan_config(struct be_adapter *adapter, u32 if_id, u16 *vtag_array,
 		       u32 num, bool untagged, bool promiscuous);
 int be_cmd_rx_filter(struct be_adapter *adapter, u32 flags, u32 status);
diff --git a/drivers/net/ethernet/emulex/benet/be_ethtool.c b/drivers/net/ethernet/emulex/benet/be_ethtool.c
index b440a1f..a08783c 100644
--- a/drivers/net/ethernet/emulex/benet/be_ethtool.c
+++ b/drivers/net/ethernet/emulex/benet/be_ethtool.c
@@ -290,19 +290,19 @@ static int be_get_coalesce(struct net_device *netdev,
 			   struct ethtool_coalesce *et)
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
-	struct be_eq_obj *eqo = &adapter->eq_obj[0];
+	struct be_aic_obj *aic = &adapter->aic_obj[0];
 
 
-	et->rx_coalesce_usecs = eqo->cur_eqd;
-	et->rx_coalesce_usecs_high = eqo->max_eqd;
-	et->rx_coalesce_usecs_low = eqo->min_eqd;
+	et->rx_coalesce_usecs = aic->prev_eqd;
+	et->rx_coalesce_usecs_high = aic->max_eqd;
+	et->rx_coalesce_usecs_low = aic->min_eqd;
 
-	et->tx_coalesce_usecs = eqo->cur_eqd;
-	et->tx_coalesce_usecs_high = eqo->max_eqd;
-	et->tx_coalesce_usecs_low = eqo->min_eqd;
+	et->tx_coalesce_usecs = aic->prev_eqd;
+	et->tx_coalesce_usecs_high = aic->max_eqd;
+	et->tx_coalesce_usecs_low = aic->min_eqd;
 
-	et->use_adaptive_rx_coalesce = eqo->enable_aic;
-	et->use_adaptive_tx_coalesce = eqo->enable_aic;
+	et->use_adaptive_rx_coalesce = aic->enable;
+	et->use_adaptive_tx_coalesce = aic->enable;
 
 	return 0;
 }
@@ -314,14 +314,17 @@ static int be_set_coalesce(struct net_device *netdev,
 			   struct ethtool_coalesce *et)
 {
 	struct be_adapter *adapter = netdev_priv(netdev);
+	struct be_aic_obj *aic = &adapter->aic_obj[0];
 	struct be_eq_obj *eqo;
 	int i;
 
 	for_all_evt_queues(adapter, eqo, i) {
-		eqo->enable_aic = et->use_adaptive_rx_coalesce;
-		eqo->max_eqd = min(et->rx_coalesce_usecs_high, BE_MAX_EQD);
-		eqo->min_eqd = min(et->rx_coalesce_usecs_low, eqo->max_eqd);
-		eqo->eqd = et->rx_coalesce_usecs;
+		aic->enable = et->use_adaptive_rx_coalesce;
+		aic->max_eqd = min(et->rx_coalesce_usecs_high, BE_MAX_EQD);
+		aic->min_eqd = min(et->rx_coalesce_usecs_low, aic->max_eqd);
+		aic->et_eqd = min(et->rx_coalesce_usecs, aic->max_eqd);
+		aic->et_eqd = max(aic->et_eqd, aic->min_eqd);
+		aic++;
 	}
 
 	return 0;
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 961e9f0..6691d75 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -1261,53 +1261,79 @@ static int be_set_vf_tx_rate(struct net_device *netdev,
 	return status;
 }
 
-static void be_eqd_update(struct be_adapter *adapter, struct be_eq_obj *eqo)
+static void be_aic_update(struct be_aic_obj *aic, u64 rx_pkts, u64 tx_pkts,
+			  ulong now)
 {
-	struct be_rx_stats *stats = rx_stats(&adapter->rx_obj[eqo->idx]);
-	ulong now = jiffies;
-	ulong delta = now - stats->rx_jiffies;
-	u64 pkts;
-	unsigned int start, eqd;
+	aic->rx_pkts_prev = rx_pkts;
+	aic->tx_reqs_prev = tx_pkts;
+	aic->jiffies = now;
+}
 
-	if (!eqo->enable_aic) {
-		eqd = eqo->eqd;
-		goto modify_eqd;
-	}
+static void be_eqd_update(struct be_adapter *adapter)
+{
+	struct be_set_eqd set_eqd[MAX_EVT_QS];
+	int eqd, i, num = 0, start;
+	struct be_aic_obj *aic;
+	struct be_eq_obj *eqo;
+	struct be_rx_obj *rxo;
+	struct be_tx_obj *txo;
+	u64 rx_pkts, tx_pkts;
+	ulong now;
+	u32 pps, delta;
 
-	if (eqo->idx >= adapter->num_rx_qs)
-		return;
+	for_all_evt_queues(adapter, eqo, i) {
+		aic = &adapter->aic_obj[eqo->idx];
+		if (!aic->enable) {
+			if (aic->jiffies)
+				aic->jiffies = 0;
+			eqd = aic->et_eqd;
+			goto modify_eqd;
+		}
 
-	stats = rx_stats(&adapter->rx_obj[eqo->idx]);
+		rxo = &adapter->rx_obj[eqo->idx];
+		do {
+			start = u64_stats_fetch_begin_bh(&rxo->stats.sync);
+			rx_pkts = rxo->stats.rx_pkts;
+		} while (u64_stats_fetch_retry_bh(&rxo->stats.sync, start));
 
-	/* Wrapped around */
-	if (time_before(now, stats->rx_jiffies)) {
-		stats->rx_jiffies = now;
-		return;
-	}
+		txo = &adapter->tx_obj[eqo->idx];
+		do {
+			start = u64_stats_fetch_begin_bh(&txo->stats.sync);
+			tx_pkts = txo->stats.tx_reqs;
+		} while (u64_stats_fetch_retry_bh(&txo->stats.sync, start));
 
-	/* Update once a second */
-	if (delta < HZ)
-		return;
 
-	do {
-		start = u64_stats_fetch_begin_bh(&stats->sync);
-		pkts = stats->rx_pkts;
-	} while (u64_stats_fetch_retry_bh(&stats->sync, start));
-
-	stats->rx_pps = (unsigned long)(pkts - stats->rx_pkts_prev) / (delta / HZ);
-	stats->rx_pkts_prev = pkts;
-	stats->rx_jiffies = now;
-	eqd = (stats->rx_pps / 110000) << 3;
-	eqd = min(eqd, eqo->max_eqd);
-	eqd = max(eqd, eqo->min_eqd);
-	if (eqd < 10)
-		eqd = 0;
+		/* Skip, if wrapped around or first calculation */
+		now = jiffies;
+		if (!aic->jiffies || time_before(now, aic->jiffies) ||
+		    rx_pkts < aic->rx_pkts_prev ||
+		    tx_pkts < aic->tx_reqs_prev) {
+			be_aic_update(aic, rx_pkts, tx_pkts, now);
+			continue;
+		}
+
+		delta = jiffies_to_msecs(now - aic->jiffies);
+		pps = (((u32)(rx_pkts - aic->rx_pkts_prev) * 1000) / delta) +
+			(((u32)(tx_pkts - aic->tx_reqs_prev) * 1000) / delta);
+		eqd = (pps / 15000) << 2;
 
+		if (eqd < 8)
+			eqd = 0;
+		eqd = min_t(u32, eqd, aic->max_eqd);
+		eqd = max_t(u32, eqd, aic->min_eqd);
+
+		be_aic_update(aic, rx_pkts, tx_pkts, now);
 modify_eqd:
-	if (eqd != eqo->cur_eqd) {
-		be_cmd_modify_eqd(adapter, eqo->q.id, eqd);
-		eqo->cur_eqd = eqd;
+		if (eqd != aic->prev_eqd) {
+			set_eqd[num].delay_multiplier = (eqd * 65)/100;
+			set_eqd[num].eq_id = eqo->q.id;
+			aic->prev_eqd = eqd;
+			num++;
+		}
 	}
+
+	if (num)
+		be_cmd_modify_eqd(adapter, set_eqd, num);
 }
 
 static void be_rx_stats_update(struct be_rx_obj *rxo,
@@ -1924,6 +1950,7 @@ static int be_evt_queues_create(struct be_adapter *adapter)
 {
 	struct be_queue_info *eq;
 	struct be_eq_obj *eqo;
+	struct be_aic_obj *aic;
 	int i, rc;
 
 	adapter->num_evt_qs = min_t(u16, num_irqs(adapter),
@@ -1932,11 +1959,12 @@ static int be_evt_queues_create(struct be_adapter *adapter)
 	for_all_evt_queues(adapter, eqo, i) {
 		netif_napi_add(adapter->netdev, &eqo->napi, be_poll,
 			       BE_NAPI_WEIGHT);
+		aic = &adapter->aic_obj[i];
 		eqo->adapter = adapter;
 		eqo->tx_budget = BE_TX_BUDGET;
 		eqo->idx = i;
-		eqo->max_eqd = BE_MAX_EQD;
-		eqo->enable_aic = true;
+		aic->max_eqd = BE_MAX_EQD;
+		aic->enable = true;
 
 		eq = &eqo->q;
 		rc = be_queue_alloc(adapter, eq, EVNT_Q_LEN,
@@ -4240,7 +4268,6 @@ static void be_worker(struct work_struct *work)
 	struct be_adapter *adapter =
 		container_of(work, struct be_adapter, work.work);
 	struct be_rx_obj *rxo;
-	struct be_eq_obj *eqo;
 	int i;
 
 	/* when interrupts are not yet enabled, just reap any pending
@@ -4271,8 +4298,7 @@ static void be_worker(struct work_struct *work)
 		}
 	}
 
-	for_all_evt_queues(adapter, eqo, i)
-		be_eqd_update(adapter, eqo);
+	be_eqd_update(adapter);
 
 reschedule:
 	adapter->work_counter++;
-- 
1.7.1

^ permalink raw reply related

* [PATCH next 4/6] be2net: call ENABLE_VF cmd for Skyhawk-R too
From: Sathya Perla @ 2013-10-01 10:29 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1380623401-15630-1-git-send-email-sathya.perla@emulex.com>

From: Vasundhara Volam <vasundhara.volam@emulex.com>

This cmd needs to be sent to FW when enabling VFs (currently used only
for Lancer.) Also, avoid calling the cmd when driver loads and finds that
VFs are already enabled from a previous load.

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be_cmds.c |    2 +-
 drivers/net/ethernet/emulex/benet/be_main.c |    3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c
index 331dfdc..8610530 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -3511,7 +3511,7 @@ int be_cmd_enable_vf(struct be_adapter *adapter, u8 domain)
 	struct be_cmd_enable_disable_vf *req;
 	int status;
 
-	if (!lancer_chip(adapter))
+	if (BEx_chip(adapter))
 		return 0;
 
 	spin_lock_bh(&adapter->mcc_lock);
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index be12987..961e9f0 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2923,7 +2923,8 @@ static int be_vf_setup(struct be_adapter *adapter)
 			goto err;
 		vf_cfg->def_vid = def_vlan;
 
-		be_cmd_enable_vf(adapter, vf + 1);
+		if (!old_vfs)
+			be_cmd_enable_vf(adapter, vf + 1);
 	}
 
 	if (!old_vfs) {
-- 
1.7.1

^ permalink raw reply related

* [PATCH next 3/6] be2net: Create single TXQ on BE3-R 1G ports
From: Sathya Perla @ 2013-10-01 10:29 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1380623401-15630-1-git-send-email-sathya.perla@emulex.com>

From: Vasundhara Volam <vasundhara.volam@emulex.com>

On BE3-R 1G ports (identified by port numbers 2 and 3) the FW cannot properly
support multiple TXQs. This also makes the number of RX and TX queues symmetric
as only a single RXQ is available on 1G ports.

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c |   11 ++---------
 1 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 4cb2ac7..be12987 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2967,8 +2967,9 @@ static void BEx_get_resources(struct be_adapter *adapter,
 		res->max_vlans = BE_NUM_VLANS_SUPPORTED;
 	res->max_mcast_mac = BE_MAX_MC;
 
+	/* For BE3 1Gb ports, F/W does not properly support multiple TXQs */
 	if (BE2_chip(adapter) || use_sriov || be_is_mc(adapter) ||
-	    !be_physfn(adapter))
+	    !be_physfn(adapter) || (adapter->port_num > 1))
 		res->max_tx_qs = 1;
 	else
 		res->max_tx_qs = BE3_MAX_TX_QS;
@@ -3010,14 +3011,6 @@ static int be_get_resources(struct be_adapter *adapter)
 		adapter->res = res;
 	}
 
-	/* For BE3 only check if FW suggests a different max-txqs value */
-	if (BE3_chip(adapter)) {
-		status = be_cmd_get_profile_config(adapter, &res, 0);
-		if (!status && res.max_tx_qs)
-			adapter->res.max_tx_qs =
-				min(adapter->res.max_tx_qs, res.max_tx_qs);
-	}
-
 	/* For Lancer, SH etc read per-function resource limits from FW.
 	 * GET_FUNC_CONFIG returns per function guaranteed limits.
 	 * GET_PROFILE_CONFIG returns PCI-E related limits PF-pool limits
-- 
1.7.1

^ permalink raw reply related

* [PATCH next 2/6] be2net: pass if_id for v1 and V2 versions of TX_CREATE cmd
From: Sathya Perla @ 2013-10-01 10:29 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1380623401-15630-1-git-send-email-sathya.perla@emulex.com>

From: Vasundhara Volam <vasundhara.volam@emulex.com>

It is a required field for all TX_CREATE cmd versions > 0.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be_cmds.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c
index 1ab5dab..331dfdc 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -1195,7 +1195,6 @@ int be_cmd_txq_create(struct be_adapter *adapter, struct be_tx_obj *txo)
 
 	if (lancer_chip(adapter)) {
 		req->hdr.version = 1;
-		req->if_id = cpu_to_le16(adapter->if_handle);
 	} else if (BEx_chip(adapter)) {
 		if (adapter->function_caps & BE_FUNCTION_CAPS_SUPER_NIC)
 			req->hdr.version = 2;
@@ -1203,6 +1202,8 @@ int be_cmd_txq_create(struct be_adapter *adapter, struct be_tx_obj *txo)
 		req->hdr.version = 2;
 	}
 
+	if (req->hdr.version > 0)
+		req->if_id = cpu_to_le16(adapter->if_handle);
 	req->num_pages = PAGES_4K_SPANNED(q_mem->va, q_mem->size);
 	req->ulp_num = BE_ULP1_NUM;
 	req->type = BE_ETH_TX_RING_TYPE_STANDARD;
-- 
1.7.1

^ permalink raw reply related

* [PATCH next 1/6] be2net: Call be_vf_setup() even when VFs are enbaled from previous load
From: Sathya Perla @ 2013-10-01 10:29 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1380623401-15630-1-git-send-email-sathya.perla@emulex.com>

From: Vasundhara Volam <vasundhara.volam@emulex.com>

Re-define the sriov_want() macro to check for number of VFs that need
to be enabled in the current load of the driver or the number of VFs that
still remain enabled from the previous load (attached VFs cannot be disabled.)

Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be.h      |    4 ++--
 drivers/net/ethernet/emulex/benet/be_main.c |    6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
index 4a0d3b7..4a540af 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -470,8 +470,8 @@ struct be_adapter {
 
 #define be_physfn(adapter)		(!adapter->virtfn)
 #define	sriov_enabled(adapter)		(adapter->num_vfs > 0)
-#define sriov_want(adapter)             (be_max_vfs(adapter) && num_vfs && \
-					 be_physfn(adapter))
+#define sriov_want(adapter)             (be_physfn(adapter) &&	\
+					 (num_vfs || pci_num_vf(adapter->pdev)))
 #define for_all_vfs(adapter, vf_cfg, i)					\
 	for (i = 0, vf_cfg = &adapter->vf_cfg[i]; i < adapter->num_vfs;	\
 		i++, vf_cfg++)
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 100b528..4cb2ac7 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2948,12 +2948,12 @@ static void BEx_get_resources(struct be_adapter *adapter,
 	struct pci_dev *pdev = adapter->pdev;
 	bool use_sriov = false;
 
-	if (BE3_chip(adapter) && be_physfn(adapter)) {
+	if (BE3_chip(adapter) && sriov_want(adapter)) {
 		int max_vfs;
 
 		max_vfs = pci_sriov_get_totalvfs(pdev);
 		res->max_vfs = max_vfs > 0 ? min(MAX_VFS, max_vfs) : 0;
-		use_sriov = res->max_vfs && num_vfs;
+		use_sriov = res->max_vfs;
 	}
 
 	if (be_physfn(adapter))
@@ -3242,7 +3242,7 @@ static int be_setup(struct be_adapter *adapter)
 		be_cmd_set_flow_control(adapter, adapter->tx_fc,
 					adapter->rx_fc);
 
-	if (be_physfn(adapter) && num_vfs) {
+	if (sriov_want(adapter)) {
 		if (be_max_vfs(adapter))
 			be_vf_setup(adapter);
 		else
-- 
1.7.1

^ permalink raw reply related

* [PATCH next 0/6] be2net: patch set
From: Sathya Perla @ 2013-10-01 10:29 UTC (permalink / raw)
  To: netdev

Pls apply the following patches to the net-next tree. Thanks.

Sathya Perla (2):
  be2net: fix adaptive interrupt coalescing
  be2net: add a counter for pkts dropped in xmit path

Vasundhara Volam (4):
  be2net: Call be_vf_setup() even when VFs are enbaled from previous
    load
  be2net: pass if_id for v1 and V2 versions of TX_CREATE cmd
  be2net: Create single TXQ on BE3-R 1G ports
  be2net: call ENABLE_VF cmd for Skyhawk-R too

 drivers/net/ethernet/emulex/benet/be.h         |   22 +++-
 drivers/net/ethernet/emulex/benet/be_cmds.c    |   22 +++--
 drivers/net/ethernet/emulex/benet/be_cmds.h    |   15 ++--
 drivers/net/ethernet/emulex/benet/be_ethtool.c |   33 ++++---
 drivers/net/ethernet/emulex/benet/be_main.c    |  135 ++++++++++++++----------
 5 files changed, 135 insertions(+), 92 deletions(-)

^ permalink raw reply

* [PATCH 4/6] ipvs: do not use dest after ip_vs_dest_put in LBLCR
From: Pablo Neira Ayuso @ 2013-10-01  9:35 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1380620132-7388-1-git-send-email-pablo@netfilter.org>

From: Julian Anastasov <ja@ssi.bg>

commit c5549571f975ab ("ipvs: convert lblcr scheduler to rcu")
allows RCU readers to use dest after calling ip_vs_dest_put().
In the corner case it can race with ip_vs_dest_trash_expire()
which can release the dest while it is being returned to the
RCU readers as scheduling result.

To fix the problem do not allow e->dest to be replaced and
defer the ip_vs_dest_put() call by using RCU callback. Now
e->dest does not need to be RCU pointer.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 net/netfilter/ipvs/ip_vs_lblcr.c |   50 +++++++++++++++-----------------------
 1 file changed, 20 insertions(+), 30 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index e65f7c5..0b85500 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -89,7 +89,7 @@
  */
 struct ip_vs_dest_set_elem {
 	struct list_head	list;          /* list link */
-	struct ip_vs_dest __rcu *dest;         /* destination server */
+	struct ip_vs_dest	*dest;		/* destination server */
 	struct rcu_head		rcu_head;
 };
 
@@ -107,11 +107,7 @@ static void ip_vs_dest_set_insert(struct ip_vs_dest_set *set,
 
 	if (check) {
 		list_for_each_entry(e, &set->list, list) {
-			struct ip_vs_dest *d;
-
-			d = rcu_dereference_protected(e->dest, 1);
-			if (d == dest)
-				/* already existed */
+			if (e->dest == dest)
 				return;
 		}
 	}
@@ -121,7 +117,7 @@ static void ip_vs_dest_set_insert(struct ip_vs_dest_set *set,
 		return;
 
 	ip_vs_dest_hold(dest);
-	RCU_INIT_POINTER(e->dest, dest);
+	e->dest = dest;
 
 	list_add_rcu(&e->list, &set->list);
 	atomic_inc(&set->size);
@@ -129,22 +125,27 @@ static void ip_vs_dest_set_insert(struct ip_vs_dest_set *set,
 	set->lastmod = jiffies;
 }
 
+static void ip_vs_lblcr_elem_rcu_free(struct rcu_head *head)
+{
+	struct ip_vs_dest_set_elem *e;
+
+	e = container_of(head, struct ip_vs_dest_set_elem, rcu_head);
+	ip_vs_dest_put(e->dest);
+	kfree(e);
+}
+
 static void
 ip_vs_dest_set_erase(struct ip_vs_dest_set *set, struct ip_vs_dest *dest)
 {
 	struct ip_vs_dest_set_elem *e;
 
 	list_for_each_entry(e, &set->list, list) {
-		struct ip_vs_dest *d;
-
-		d = rcu_dereference_protected(e->dest, 1);
-		if (d == dest) {
+		if (e->dest == dest) {
 			/* HIT */
 			atomic_dec(&set->size);
 			set->lastmod = jiffies;
-			ip_vs_dest_put(dest);
 			list_del_rcu(&e->list);
-			kfree_rcu(e, rcu_head);
+			call_rcu(&e->rcu_head, ip_vs_lblcr_elem_rcu_free);
 			break;
 		}
 	}
@@ -155,16 +156,8 @@ static void ip_vs_dest_set_eraseall(struct ip_vs_dest_set *set)
 	struct ip_vs_dest_set_elem *e, *ep;
 
 	list_for_each_entry_safe(e, ep, &set->list, list) {
-		struct ip_vs_dest *d;
-
-		d = rcu_dereference_protected(e->dest, 1);
-		/*
-		 * We don't kfree dest because it is referred either
-		 * by its service or by the trash dest list.
-		 */
-		ip_vs_dest_put(d);
 		list_del_rcu(&e->list);
-		kfree_rcu(e, rcu_head);
+		call_rcu(&e->rcu_head, ip_vs_lblcr_elem_rcu_free);
 	}
 }
 
@@ -175,12 +168,9 @@ static inline struct ip_vs_dest *ip_vs_dest_set_min(struct ip_vs_dest_set *set)
 	struct ip_vs_dest *dest, *least;
 	int loh, doh;
 
-	if (set == NULL)
-		return NULL;
-
 	/* select the first destination server, whose weight > 0 */
 	list_for_each_entry_rcu(e, &set->list, list) {
-		least = rcu_dereference(e->dest);
+		least = e->dest;
 		if (least->flags & IP_VS_DEST_F_OVERLOAD)
 			continue;
 
@@ -195,7 +185,7 @@ static inline struct ip_vs_dest *ip_vs_dest_set_min(struct ip_vs_dest_set *set)
 	/* find the destination with the weighted least load */
   nextstage:
 	list_for_each_entry_continue_rcu(e, &set->list, list) {
-		dest = rcu_dereference(e->dest);
+		dest = e->dest;
 		if (dest->flags & IP_VS_DEST_F_OVERLOAD)
 			continue;
 
@@ -232,7 +222,7 @@ static inline struct ip_vs_dest *ip_vs_dest_set_max(struct ip_vs_dest_set *set)
 
 	/* select the first destination server, whose weight > 0 */
 	list_for_each_entry(e, &set->list, list) {
-		most = rcu_dereference_protected(e->dest, 1);
+		most = e->dest;
 		if (atomic_read(&most->weight) > 0) {
 			moh = ip_vs_dest_conn_overhead(most);
 			goto nextstage;
@@ -243,7 +233,7 @@ static inline struct ip_vs_dest *ip_vs_dest_set_max(struct ip_vs_dest_set *set)
 	/* find the destination with the weighted most load */
   nextstage:
 	list_for_each_entry_continue(e, &set->list, list) {
-		dest = rcu_dereference_protected(e->dest, 1);
+		dest = e->dest;
 		doh = ip_vs_dest_conn_overhead(dest);
 		/* moh/mw < doh/dw ==> moh*dw < doh*mw, where mw,dw>0 */
 		if (((__s64)moh * atomic_read(&dest->weight) <
@@ -819,7 +809,7 @@ static void __exit ip_vs_lblcr_cleanup(void)
 {
 	unregister_ip_vs_scheduler(&ip_vs_lblcr_scheduler);
 	unregister_pernet_subsys(&ip_vs_lblcr_ops);
-	synchronize_rcu();
+	rcu_barrier();
 }
 
 
-- 
1.7.10.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox