* e.m-ail
From: -Drw(REFF) @ 2011-10-24 18:03 UTC (permalink / raw)
KINDLY PROVIDE YOUR NAME- ADDRESS- SEX- AGE- MOBILE- FOR YOUR ONE-MILLION POUNDS WHICH
YOU HAVE WON.
^ permalink raw reply
* Re: Kernel Panic every 2 weeks on ISP server (NULL pointer dereference)
From: Luciano Ruete @ 2011-10-24 18:09 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1319346989.6180.71.camel@edumazet-laptop>
On Sunday, October 23, 2011 02:16:29 am Eric Dumazet wrote:
> Le samedi 22 octobre 2011 à 22:18 -0300, Luciano Ruete a écrit :
> > Hi,
> >
> > I'm the sysadmin at a 3500 customers ISP, wich runs an iptables+tc
> > solution for load balancing and QoS.
> >
> > Every 2 or 3 weeks the server panics with a "NULL pointer dereference"
> > and with IP at "dev_queue_xmit"
> >
> > It is curious that if i disable MSI on the network card driver this
> > panics seems to disapear, does this ring a bell?
> >
> > The server is an IBM, previously with Broadcom NetXtreme II BCM5709 nics
> > and now with Intel 82576. I change the nics thinking that maybe the bug
> > was in Broadcom Driver but it seems to affect MSI in general.
> >
> > The tc+iptables rules are auto-generated with sequreisp[1] an ISP
> > solution that i wrote and is open sourced under AGPLv3.
> >
> > Tell me if you need any further information, and plz CC because I'm not
> > suscribed.
> >
> >
> > root@server:~# uname -a
> > Linux server 2.6.35-30-server #60~lucid1-Ubuntu SMP Tue Sep 20 22:28:40
> > UTC 2011 x86_64 GNU/Linux
> >
> >
> > [1]https://github.com/sequre/sequreisp
>
> Hi Luciano
Hi Eric!
Thanks for your answer...
>
> [694250.472081] Code: f6
> 49 c1 e6 07 shl $0x7,%r14
> 66 89 93 ac 00 00 00 mov %dx,0xac(%rbx)
>[...]
> This looks like a dev_pick_tx() bug, using an out of bound
> queue_index number and returning a txq pointing after
> the device allocated array.
Clear explanation, is there a tool to map the trace to kernel code, or you did
this by hand?
> With recent kernels, this cannot happen anymore because
> we added fixes in this area.
>
> You could try Ubuntu 11.10 (based on linux 3.0) kernel
> on your server, or apply following patch :
>
> commit df32cc193ad88f7b1326b90af799c927b27f7654
> Author: Tom Herbert <therbert@google.com>
> Date: Mon Nov 1 12:55:52 2010 -0700
>
> net: check queue_index from sock is valid for device
>
> In dev_pick_tx recompute the queue index if the value stored in the
> socket is greater than or equal to the number of real queues for the
> device. The saved index in the sock structure is not guaranteed to
> be appropriate for the egress device (this could happen on a route
> change or in presence of tunnelling). The result of the queue index
> being bad would be to return a bogus queue (crash could prersumably
> follow).
Lot of ruote changes in this server, there are 30 upstream providers(15 are
dynamic IP ADSLs) load balanced using VLANs and a VLAN switch.
Thanks again i will try the kernel upgrade and post results in this thread.
Regards!
--
Luciano Ruete
Sequre - Sys Admin
Mitre 617, piso 7, of. 1
+54 261 4254894
Mendoza - Argentina
http://www.sequreisp.com/
http://www.sequre.com.ar/
^ permalink raw reply
* [PATCH v3 2/2] dp83640: free packet queues on remove
From: Richard Cochran @ 2011-10-24 17:55 UTC (permalink / raw)
To: netdev; +Cc: David Miller, Eric Dumazet, Johannes Berg, stable
In-Reply-To: <cover.1319478544.git.richard.cochran@omicron.at>
If the PHY should disappear (for example, on an USB Ethernet MAC), then
the driver would leak any undelivered time stamp packets. This commit
fixes the issue by calling the appropriate functions to free any packets
left in the transmit and receive queues.
The driver first appeared in v3.0.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: <stable@vger.kernel.org>
---
drivers/net/phy/dp83640.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index 311f5cb..dc44b73 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -875,6 +875,7 @@ static void dp83640_remove(struct phy_device *phydev)
struct dp83640_clock *clock;
struct list_head *this, *next;
struct dp83640_private *tmp, *dp83640 = phydev->priv;
+ struct sk_buff *skb;
if (phydev->addr == BROADCAST_ADDR)
return;
@@ -882,6 +883,12 @@ static void dp83640_remove(struct phy_device *phydev)
enable_status_frames(phydev, false);
cancel_work_sync(&dp83640->ts_work);
+ while ((skb = skb_dequeue(&dp83640->rx_queue)) != NULL)
+ kfree_skb(skb);
+
+ while ((skb = skb_dequeue(&dp83640->tx_queue)) != NULL)
+ skb_complete_tx_timestamp(skb, NULL);
+
clock = dp83640_clock_get(dp83640->clock);
if (dp83640 == clock->chosen) {
--
1.7.2.5
^ permalink raw reply related
* [PATCH v3 1/2] dp83640: use proper function to free transmit time stamping packets
From: Richard Cochran @ 2011-10-24 17:55 UTC (permalink / raw)
To: netdev; +Cc: David Miller, Eric Dumazet, Johannes Berg, stable
In-Reply-To: <cover.1319478544.git.richard.cochran@omicron.at>
Commit da92b194 introduced a new rule for handling the cloned packets
for transmit time stamping. These packets must not be freed using any other
function than skb_complete_tx_timestamp. This commit fixes the one and only
driver using this API.
The driver first appeared in v3.0.
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Cc: <stable@vger.kernel.org>
---
drivers/net/phy/dp83640.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/phy/dp83640.c b/drivers/net/phy/dp83640.c
index edd7304..311f5cb 100644
--- a/drivers/net/phy/dp83640.c
+++ b/drivers/net/phy/dp83640.c
@@ -1060,7 +1060,7 @@ static void dp83640_txtstamp(struct phy_device *phydev,
struct dp83640_private *dp83640 = phydev->priv;
if (!dp83640->hwts_tx_en) {
- kfree_skb(skb);
+ skb_complete_tx_timestamp(skb, NULL);
return;
}
skb_queue_tail(&dp83640->tx_queue, skb);
--
1.7.2.5
^ permalink raw reply related
* [PATCH v3 0/2] net: time stamping fixes
From: Richard Cochran @ 2011-10-24 17:55 UTC (permalink / raw)
To: netdev; +Cc: David Miller, Eric Dumazet, Johannes Berg
[ Changes in v3: omit accepted patch and fixup the dp83640 patches. ]
These two patches depend on commit da92b194 and fix two bugs in a
PTP Hardware Clock driver. This driver was first introduced in Linux
version 3.0.
Richard Cochran (2):
dp83640: use proper function to free transmit time stamping packets
dp83640: free packet queues on remove
drivers/net/phy/dp83640.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)
--
1.7.2.5
^ permalink raw reply
* |PATCH net-next] tg3: add tx_dropped counter
From: Eric Dumazet @ 2011-10-24 17:53 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Matt Carlson, Michael Chan
If a frame cant be transmitted, it is silently discarded.
Add a counter to report these errors to user.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
Note : merge errors expected because of pending tg3 patch in net tree, I
can respin if needed.
drivers/net/ethernet/broadcom/tg3.c | 23 +++++++++++------------
drivers/net/ethernet/broadcom/tg3.h | 1 +
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index b89027c..3447585 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6671,10 +6671,8 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
u32 tcp_opt_len, hdr_len;
if (skb_header_cloned(skb) &&
- pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) {
- dev_kfree_skb(skb);
- goto out_unlock;
- }
+ pskb_expand_head(skb, 0, 0, GFP_ATOMIC))
+ goto drop;
iph = ip_hdr(skb);
tcp_opt_len = tcp_optlen(skb);
@@ -6746,10 +6744,9 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
len = skb_headlen(skb);
mapping = pci_map_single(tp->pdev, skb->data, len, PCI_DMA_TODEVICE);
- if (pci_dma_mapping_error(tp->pdev, mapping)) {
- dev_kfree_skb(skb);
- goto out_unlock;
- }
+ if (pci_dma_mapping_error(tp->pdev, mapping))
+ goto drop;
+
tnapi->tx_buffers[entry].skb = skb;
dma_unmap_addr_set(&tnapi->tx_buffers[entry], mapping, mapping);
@@ -6805,7 +6802,7 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
budget = tg3_tx_avail(tnapi);
if (tigon3_dma_hwbug_workaround(tnapi, skb, &entry, &budget,
base_flags, mss, vlan))
- goto out_unlock;
+ goto drop_nofree;
}
skb_tx_timestamp(skb);
@@ -6827,15 +6824,16 @@ static netdev_tx_t tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
netif_tx_wake_queue(txq);
}
-out_unlock:
mmiowb();
-
return NETDEV_TX_OK;
dma_error:
tg3_tx_skb_unmap(tnapi, tnapi->tx_prod, i);
- dev_kfree_skb(skb);
tnapi->tx_buffers[tnapi->tx_prod].skb = NULL;
+drop:
+ dev_kfree_skb(skb);
+drop_nofree:
+ tp->tx_dropped++;
return NETDEV_TX_OK;
}
@@ -10009,6 +10007,7 @@ static struct rtnl_link_stats64 *tg3_get_stats64(struct net_device *dev,
get_stat64(&hw_stats->rx_discards);
stats->rx_dropped = tp->rx_dropped;
+ stats->tx_dropped = tp->tx_dropped;
return stats;
}
diff --git a/drivers/net/ethernet/broadcom/tg3.h b/drivers/net/ethernet/broadcom/tg3.h
index d2976f3..f32f288 100644
--- a/drivers/net/ethernet/broadcom/tg3.h
+++ b/drivers/net/ethernet/broadcom/tg3.h
@@ -2990,6 +2990,7 @@ struct tg3 {
/* begin "everything else" cacheline(s) section */
unsigned long rx_dropped;
+ unsigned long tx_dropped;
struct rtnl_link_stats64 net_stats_prev;
struct tg3_ethtool_stats estats;
struct tg3_ethtool_stats estats_prev;
^ permalink raw reply related
* Re: [PATCH v2 2/3] dp83640: use proper function to free transmit time stamping packets
From: Richard Cochran @ 2011-10-24 17:47 UTC (permalink / raw)
To: David Miller; +Cc: netdev, eric.dumazet, johannes, stable
In-Reply-To: <20111024.025555.265760947744724258.davem@davemloft.net>
On Mon, Oct 24, 2011 at 02:55:55AM -0400, David Miller wrote:
> From: Richard Cochran <richardcochran@gmail.com>
> Date: Fri, 21 Oct 2011 12:49:16 +0200
>
> > The previous commit enforces a new rule for handling the cloned packets
> > for transmit time stamping. These packets must not be freed using any other
> > function than skb_complete_tx_timestamp. This commit fixes the one and only
> > driver using this API.
> >
> > The driver first appeared in v3.0.
> >
> > Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
> > Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> > Cc: <stable@vger.kernel.org>
>
> In the 'net' tree, which is where you should be targetting these dp83640
> driver patches, the code looks nothing like what you're patching against.
>
> Please respin patches #2 and #3 against current sources.
Okay, but #2 will conflict with
dccaa9e0 dp83640: add time stamp insertion for sync messages
in net-next. Should I also submit a fix for that one?
Thanks,
Richard
^ permalink raw reply
* Re: [patch net-next V4] net: introduce ethernet teaming device
From: Michał Mirosław @ 2011-10-24 17:22 UTC (permalink / raw)
To: Jiri Pirko
Cc: netdev, davem, eric.dumazet, bhutchings, shemminger, fubar, andy,
tgraf, ebiederm, kaber, greearb, jesse, fbl, benjamin.poirier,
jzupka
In-Reply-To: <1319444005-1281-1-git-send-email-jpirko@redhat.com>
2011/10/24 Jiri Pirko <jpirko@redhat.com>:
> This patch introduces new network device called team. It supposes to be
> very fast, simple, userspace-driven alternative to existing bonding
> driver.
[...]
> drivers/net/team/team.c | 1573 +++++++++++++++++++++++++++++
> drivers/net/team/team_mode_activebackup.c | 152 +++
> drivers/net/team/team_mode_roundrobin.c | 107 ++
I think this mode-modularity is overkill. One mode will compile to at
most a few hundred bytes of code+data, but will use at least 10 times
that to get loaded and tracked properly. How often/how many more modes
you anticipate to be introduced? You could just keep the modular
design but drop the kernel module separation and maybe have modes
conditionally compiled (for those from the embedded world squeezing
every byte).
Best Regards,
Michał Mirosław
^ permalink raw reply
* Confidential/How are you
From: Barrister Jacque Charles @ 2011-10-24 15:21 UTC (permalink / raw)
Dearest,
My name is Barrister Jacque Charles, a personal Attorney to a late client who died in car crash without a will.
For more information please contact via email: (jcchamber@rocketmail.com) upon your response, I shall then provide you with more details and relevant documents that will help you understand this transaction well.
Kindest Regards
Barrister Jacque Charles,
^ permalink raw reply
* Re: [non-quoted-printable PATCH] Fix caif BUG() with network namespaces
From: Sjur Brændeland @ 2011-10-24 15:51 UTC (permalink / raw)
To: Woodhouse, David; +Cc: davem@redhat.com, netdev@vger.kernel.org
In-Reply-To: <1319405079.13738.72.camel@shinybook.infradead.org>
Hi David,
> The caif code will register its own pernet_operations, and then register
> a netdevice_notifier. Each time the netdevice_notifier is triggered,
> it'll do some stuff... including a lookup of its own pernet stuff with
> net_generic().
>
> If the net_generic() call ever returns NULL, the caif code will BUG().
> That doesn't seem *so* unreasonable, I suppose — it does seem like it
> should never happen.
>
> However, it *does* happen. When we clone a network namespace,
> setup_net() runs through all the pernet_operations one at a time. It
> gets to loopback before it gets to caif. And loopback_net_init()
> registers a netdevice... while caif hasn't been initialised. So the caif
> netdevice notifier triggers, and immediately goes BUG().
>
> I'm not entirely sure how best to fix this in the general case. Perhaps
> the netdevice_notifier registration should be pernet too, rather than
> global? Or perhaps we should suppress the notifier calls during
> setup_net() and flush them at the end after everything has been
> initialised?
>
> But really, I'm inclined to just take the simple approach. Make
> caif_device_notify() *not* go looking for its pernet data structures if
> the device it's being notified about isn't a caif device in the first
> place. This simple patch is sufficient to avoid the problem, and is
> probably good enough.
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Thank you for analyzing and fixing this David, this looks good to me.
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
^ permalink raw reply
* Re: [PATCH V2 2/4] MIPS: Add board support for Loongson1B
From: Giuseppe CAVALLARO @ 2011-10-24 15:35 UTC (permalink / raw)
To: Kelvin Cheung
Cc: Wu Zhangjin, linux-mips, linux-kernel, ralf, r0bertz, netdev
In-Reply-To: <CAJhJPsXxUAuF9HdivLd66MQC45mz-iYAuF1SdGdU=-duxJJ5bQ@mail.gmail.com>
On 10/24/2011 4:05 PM, Kelvin Cheung wrote:
> 2011/10/24, Giuseppe CAVALLARO <peppe.cavallaro@st.com>:
>> Hello Kelvin.
>>
>> On 10/24/2011 12:36 PM, Kelvin Cheung wrote:
>>
>> [snip]
>>
>>> According to datasheet of Loongson 1B, the buffer size in RX/TX
>>> descriptor is only 2KB. So the Loongson1B's GMAC could not handle
>>> jumbo frames. And the second buffer is useless in this case. Am I
>>> right? Is there a better way than ifdef CONFIG_MACH_LOONGSON1 to
>>> avoid duplicate code?
>>
>> Sorry for my misunderstanding.
>>
>> I think you have to use the normal descriptor and remove the enh_desc
>> from the platform w/o modifying the driver at all.
>>
>> The driver will be able to select/configure all automatically (also jumbo).
>>
>> Let me know.
>
> That's the problem.
> The bitfield definition of Loongson1B is also different from normal descriptor.
The problem is not in the Loongson1B gmac.
The normal descriptor fields in the stmmac refer to an old synopsys
databook.
New chips have the same structure you have added; so we should fix this
in the driver w/o breaking the compatibility for old chips.
I kindly ask you to confirm if the currently normal descriptor structure
(w/o your changes) doesn't work on your platform.
Did you test it?
> Moreover, I want to enable the TX checksum offload function which is
> not supported in normal descriptor.
> Any suggestions?
It is supported but you have to pass from the platform: tx_coe = 1.
Peppe
>
>> Note:
>> IIRC, there is a bit difference in case of normal descriptors for
>> Synopsys databook newer than the 1.91 (I used for testing this mode).
>> In any case, I remember that, on some platforms, the normal descriptors
>> have been used w/o problems also on these new chip generations.
>>
>> Peppe
>>
>>
>
>
^ permalink raw reply
* Re: [PATCH 07/10] RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues.
From: Vipul Pandya @ 2011-10-24 15:16 UTC (permalink / raw)
To: David Miller
Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
roland-BHEL68pLQRGGvPXPguhicg, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, divy-ut6Up61K2wZBDgjK7y7TUQ,
dm-ut6Up61K2wZBDgjK7y7TUQ, kumaras-ut6Up61K2wZBDgjK7y7TUQ
In-Reply-To: <20111020.165703.1713724038045504243.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
On 21-10-2011 02:27, David Miller wrote:
> From: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
> Date: Thu, 20 Oct 2011 12:28:07 -0500
>
>> On 10/20/2011 12:17 PM, Roland Dreier wrote:
>>>> I believe 5 and 7 have build dependencies.
>>> Right, missed that one too.
>>>
>>> But it seems 4,6,8,9,10 are independent of the rest of the series?
>>>
>>> ie I can trivially apply them and then worry about working out
>>> the drivers/net / drivers/infiniband interdependency a bit later?
>>>
>>
>> Some of these might be dependent on prior patches the series. But if
>> they aren't, yes, you could do that.
>
> So, how do you guys want to do this? If you give me a list of which
> patches I should put into net-next and leave the rest to the infiniband
> tree, that'd work fine for me as long as net-next is left in a working
> state independent of the infiniband tree.
Hi Dave Miller/Roland,
With respect to above dependencies we did some experiments and found
following things
1. We can apply three cxgb4 patches, 01 02 and 03, on net-next tree
successfully and build it.
2. Out of 7 RDMA/cxgb4 patches only 04, 08 and 10 can be applied
trivially and driver can be built successfully. If we try to apply
remaining patches, 05 06 07 and 09, either they will fail to apply or
give build failure. Moreover patches 05, 06, 07 and 09 can be applied on
top of 04, 08 and 10 cleanly.
Based on above results we would like to propose following two things.
1. We would like to recommend that all the patches get included in
Roland's infiniband tree since it has build dependencies.
2. Alternatively,
- Patches 01, 02 and 03 can be included in net-next tree.
- Patches 04, 08 and 10 can be included in Roland's infiniband tree at
present.
- Patches 05, 06, 07 and 09 have to wait till the net-next hits the
3.2-rc1.
Please let us know if you have any other suggestions.
Thanks,
Vipul
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH V2 05/10] RDMA/cxgb4: Add DB Overflow Avoidance.
From: Vipul Pandya @ 2011-10-24 15:12 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: roland-BHEL68pLQRGGvPXPguhicg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
divy-ut6Up61K2wZBDgjK7y7TUQ, dm-ut6Up61K2wZBDgjK7y7TUQ,
kumaras-ut6Up61K2wZBDgjK7y7TUQ,
swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW, Vipul Pandya
- get FULL/EMPTY/DROP events from LLD
- on FULL event, disable normal user mode DB rings.
- add modify_qp semantics to allow user processes to call into
the kernel to ring doobells without overflowing.
Add DB Full/Empty/Drop stats.
Mark queues when created indicating the doorbell state.
If we're in the middle of db overflow avoidance, then newly created
queues should start out in this mode.
Bump the C4IW_UVERBS_ABI_VERSION to 2 so the user mode library can
know if the driver supports the kernel mode db ringing.
Signed-off-by: Vipul Pandya <vipul-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
V2: Bump C4IW_UVERBS_ABI_VERSION to 2
drivers/infiniband/hw/cxgb4/device.c | 84 +++++++++++++++++++++++++++++--
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 37 ++++++++++++--
drivers/infiniband/hw/cxgb4/qp.c | 51 +++++++++++++++++++-
drivers/infiniband/hw/cxgb4/user.h | 2 +-
4 files changed, 162 insertions(+), 12 deletions(-)
diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 8483111..9062ed9 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -44,6 +44,12 @@ MODULE_DESCRIPTION("Chelsio T4 RDMA Driver");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_VERSION(DRV_VERSION);
+struct uld_ctx {
+ struct list_head entry;
+ struct cxgb4_lld_info lldi;
+ struct c4iw_dev *dev;
+};
+
static LIST_HEAD(uld_ctx_list);
static DEFINE_MUTEX(dev_mutex);
@@ -263,6 +269,9 @@ static int stats_show(struct seq_file *seq, void *v)
seq_printf(seq, " OCQPMEM: %10llu %10llu %10llu\n",
dev->rdev.stats.ocqp.total, dev->rdev.stats.ocqp.cur,
dev->rdev.stats.ocqp.max);
+ seq_printf(seq, " DB FULL: %10llu\n", dev->rdev.stats.db_full);
+ seq_printf(seq, " DB EMPTY: %10llu\n", dev->rdev.stats.db_empty);
+ seq_printf(seq, " DB DROP: %10llu\n", dev->rdev.stats.db_drop);
return 0;
}
@@ -283,6 +292,9 @@ static ssize_t stats_clear(struct file *file, const char __user *buf,
dev->rdev.stats.pbl.max = 0;
dev->rdev.stats.rqt.max = 0;
dev->rdev.stats.ocqp.max = 0;
+ dev->rdev.stats.db_full = 0;
+ dev->rdev.stats.db_empty = 0;
+ dev->rdev.stats.db_drop = 0;
mutex_unlock(&dev->rdev.stats.lock);
return count;
}
@@ -443,12 +455,6 @@ static void c4iw_rdev_close(struct c4iw_rdev *rdev)
c4iw_destroy_resource(&rdev->resource);
}
-struct uld_ctx {
- struct list_head entry;
- struct cxgb4_lld_info lldi;
- struct c4iw_dev *dev;
-};
-
static void c4iw_dealloc(struct uld_ctx *ctx)
{
c4iw_rdev_close(&ctx->dev->rdev);
@@ -514,6 +520,7 @@ static struct c4iw_dev *c4iw_alloc(const struct cxgb4_lld_info *infop)
idr_init(&devp->mmidr);
spin_lock_init(&devp->lock);
mutex_init(&devp->rdev.stats.lock);
+ mutex_init(&devp->db_mutex);
if (c4iw_debugfs_root) {
devp->debugfs_root = debugfs_create_dir(
@@ -659,11 +666,76 @@ static int c4iw_uld_state_change(void *handle, enum cxgb4_state new_state)
return 0;
}
+static int disable_qp_db(int id, void *p, void *data)
+{
+ struct c4iw_qp *qp = p;
+
+ t4_disable_wq_db(&qp->wq);
+ return 0;
+}
+
+static void stop_queues(struct uld_ctx *ctx)
+{
+ spin_lock_irq(&ctx->dev->lock);
+ ctx->dev->db_state = FLOW_CONTROL;
+ idr_for_each(&ctx->dev->qpidr, disable_qp_db, NULL);
+ spin_unlock_irq(&ctx->dev->lock);
+}
+
+static int enable_qp_db(int id, void *p, void *data)
+{
+ struct c4iw_qp *qp = p;
+
+ t4_enable_wq_db(&qp->wq);
+ return 0;
+}
+
+static void resume_queues(struct uld_ctx *ctx)
+{
+ spin_lock_irq(&ctx->dev->lock);
+ ctx->dev->db_state = NORMAL;
+ idr_for_each(&ctx->dev->qpidr, enable_qp_db, NULL);
+ spin_unlock_irq(&ctx->dev->lock);
+}
+
+static int c4iw_uld_control(void *handle, enum cxgb4_control control, ...)
+{
+ struct uld_ctx *ctx = handle;
+
+ switch (control) {
+ case CXGB4_CONTROL_DB_FULL:
+ stop_queues(ctx);
+ mutex_lock(&ctx->dev->rdev.stats.lock);
+ ctx->dev->rdev.stats.db_full++;
+ mutex_unlock(&ctx->dev->rdev.stats.lock);
+ break;
+ case CXGB4_CONTROL_DB_EMPTY:
+ resume_queues(ctx);
+ mutex_lock(&ctx->dev->rdev.stats.lock);
+ ctx->dev->rdev.stats.db_empty++;
+ mutex_unlock(&ctx->dev->rdev.stats.lock);
+ break;
+ case CXGB4_CONTROL_DB_DROP:
+ printk(KERN_WARNING MOD "%s: Fatal DB DROP\n",
+ pci_name(ctx->lldi.pdev));
+ mutex_lock(&ctx->dev->rdev.stats.lock);
+ ctx->dev->rdev.stats.db_drop++;
+ mutex_unlock(&ctx->dev->rdev.stats.lock);
+ break;
+ default:
+ printk(KERN_WARNING MOD "%s: unknown control cmd %u\n",
+ pci_name(ctx->lldi.pdev), control);
+ break;
+ }
+ return 0;
+}
+
static struct cxgb4_uld_info c4iw_uld_info = {
.name = DRV_NAME,
.add = c4iw_uld_add,
.rx_handler = c4iw_uld_rx_handler,
.state_change = c4iw_uld_state_change,
+ .control = c4iw_uld_control,
};
static int __init c4iw_init_module(void)
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index ec7c848..1924c19 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -117,6 +117,9 @@ struct c4iw_stats {
struct c4iw_stat pbl;
struct c4iw_stat rqt;
struct c4iw_stat ocqp;
+ u64 db_full;
+ u64 db_empty;
+ u64 db_drop;
};
struct c4iw_rdev {
@@ -192,6 +195,12 @@ static inline int c4iw_wait_for_reply(struct c4iw_rdev *rdev,
return wr_waitp->ret;
}
+enum db_state {
+ NORMAL = 0,
+ FLOW_CONTROL = 1,
+ RECOVERY = 2
+};
+
struct c4iw_dev {
struct ib_device ibdev;
struct c4iw_rdev rdev;
@@ -200,7 +209,9 @@ struct c4iw_dev {
struct idr qpidr;
struct idr mmidr;
spinlock_t lock;
+ struct mutex db_mutex;
struct dentry *debugfs_root;
+ enum db_state db_state;
};
static inline struct c4iw_dev *to_c4iw_dev(struct ib_device *ibdev)
@@ -228,8 +239,8 @@ static inline struct c4iw_mr *get_mhp(struct c4iw_dev *rhp, u32 mmid)
return idr_find(&rhp->mmidr, mmid);
}
-static inline int insert_handle(struct c4iw_dev *rhp, struct idr *idr,
- void *handle, u32 id)
+static inline int _insert_handle(struct c4iw_dev *rhp, struct idr *idr,
+ void *handle, u32 id, int lock)
{
int ret;
int newid;
@@ -237,15 +248,29 @@ static inline int insert_handle(struct c4iw_dev *rhp, struct idr *idr,
do {
if (!idr_pre_get(idr, GFP_KERNEL))
return -ENOMEM;
- spin_lock_irq(&rhp->lock);
+ if (lock)
+ spin_lock_irq(&rhp->lock);
ret = idr_get_new_above(idr, handle, id, &newid);
BUG_ON(newid != id);
- spin_unlock_irq(&rhp->lock);
+ if (lock)
+ spin_unlock_irq(&rhp->lock);
} while (ret == -EAGAIN);
return ret;
}
+static inline int insert_handle(struct c4iw_dev *rhp, struct idr *idr,
+ void *handle, u32 id)
+{
+ return _insert_handle(rhp, idr, handle, id, 1);
+}
+
+static inline int insert_handle_nolock(struct c4iw_dev *rhp, struct idr *idr,
+ void *handle, u32 id)
+{
+ return _insert_handle(rhp, idr, handle, id, 0);
+}
+
static inline void remove_handle(struct c4iw_dev *rhp, struct idr *idr, u32 id)
{
spin_lock_irq(&rhp->lock);
@@ -369,6 +394,8 @@ struct c4iw_qp_attributes {
struct c4iw_ep *llp_stream_handle;
u8 layer_etype;
u8 ecode;
+ u16 sq_db_inc;
+ u16 rq_db_inc;
};
struct c4iw_qp {
@@ -443,6 +470,8 @@ static inline void insert_mmap(struct c4iw_ucontext *ucontext,
enum c4iw_qp_attr_mask {
C4IW_QP_ATTR_NEXT_STATE = 1 << 0,
+ C4IW_QP_ATTR_SQ_DB = 1<<1,
+ C4IW_QP_ATTR_RQ_DB = 1<<2,
C4IW_QP_ATTR_ENABLE_RDMA_READ = 1 << 7,
C4IW_QP_ATTR_ENABLE_RDMA_WRITE = 1 << 8,
C4IW_QP_ATTR_ENABLE_RDMA_BIND = 1 << 9,
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index 74df98e..36fc94d 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -34,6 +34,10 @@
#include "iw_cxgb4.h"
+static int db_delay_usecs = 1;
+module_param(db_delay_usecs, int, 0644);
+MODULE_PARM_DESC(db_delay_usecs, "Usecs to delay awaiting db fifo to drain");
+
static int ocqp_support = 1;
module_param(ocqp_support, int, 0644);
MODULE_PARM_DESC(ocqp_support, "Support on-chip SQs (default=1)");
@@ -1117,6 +1121,29 @@ out:
return ret;
}
+/*
+ * Called by the library when the qp has user dbs disabled due to
+ * a DB_FULL condition. This function will single-thread all user
+ * DB rings to avoid overflowing the hw db-fifo.
+ */
+static int ring_kernel_db(struct c4iw_qp *qhp, u32 qid, u16 inc)
+{
+ int delay = db_delay_usecs;
+
+ mutex_lock(&qhp->rhp->db_mutex);
+ do {
+ if (cxgb4_dbfifo_count(qhp->rhp->rdev.lldi.ports[0], 1) < 768) {
+ writel(V_QID(qid) | V_PIDX(inc), qhp->wq.db);
+ break;
+ }
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(usecs_to_jiffies(delay));
+ delay = min(delay << 1, 200000);
+ } while (1);
+ mutex_unlock(&qhp->rhp->db_mutex);
+ return 0;
+}
+
int c4iw_modify_qp(struct c4iw_dev *rhp, struct c4iw_qp *qhp,
enum c4iw_qp_attr_mask mask,
struct c4iw_qp_attributes *attrs,
@@ -1165,6 +1192,15 @@ int c4iw_modify_qp(struct c4iw_dev *rhp, struct c4iw_qp *qhp,
qhp->attr = newattr;
}
+ if (mask & C4IW_QP_ATTR_SQ_DB) {
+ ret = ring_kernel_db(qhp, qhp->wq.sq.qid, attrs->sq_db_inc);
+ goto out;
+ }
+ if (mask & C4IW_QP_ATTR_RQ_DB) {
+ ret = ring_kernel_db(qhp, qhp->wq.rq.qid, attrs->rq_db_inc);
+ goto out;
+ }
+
if (!(mask & C4IW_QP_ATTR_NEXT_STATE))
goto out;
if (qhp->attr.state == attrs->next_state)
@@ -1454,7 +1490,11 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs,
init_waitqueue_head(&qhp->wait);
atomic_set(&qhp->refcnt, 1);
- ret = insert_handle(rhp, &rhp->qpidr, qhp, qhp->wq.sq.qid);
+ spin_lock_irq(&rhp->lock);
+ if (rhp->db_state != NORMAL)
+ t4_disable_wq_db(&qhp->wq);
+ ret = insert_handle_nolock(rhp, &rhp->qpidr, qhp, qhp->wq.sq.qid);
+ spin_unlock_irq(&rhp->lock);
if (ret)
goto err2;
@@ -1598,6 +1638,15 @@ int c4iw_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
C4IW_QP_ATTR_ENABLE_RDMA_WRITE |
C4IW_QP_ATTR_ENABLE_RDMA_BIND) : 0;
+ /*
+ * Use SQ_PSN and RQ_PSN to pass in IDX_INC values for
+ * ringing the queue db when we're in DB_FULL mode.
+ */
+ attrs.sq_db_inc = attr->sq_psn;
+ attrs.rq_db_inc = attr->rq_psn;
+ mask |= (attr_mask & IB_QP_SQ_PSN) ? C4IW_QP_ATTR_SQ_DB : 0;
+ mask |= (attr_mask & IB_QP_RQ_PSN) ? C4IW_QP_ATTR_RQ_DB : 0;
+
return c4iw_modify_qp(rhp, qhp, mask, &attrs, 0);
}
diff --git a/drivers/infiniband/hw/cxgb4/user.h b/drivers/infiniband/hw/cxgb4/user.h
index e6669d5..32b754c 100644
--- a/drivers/infiniband/hw/cxgb4/user.h
+++ b/drivers/infiniband/hw/cxgb4/user.h
@@ -32,7 +32,7 @@
#ifndef __C4IW_USER_H__
#define __C4IW_USER_H__
-#define C4IW_UVERBS_ABI_VERSION 1
+#define C4IW_UVERBS_ABI_VERSION 2
/*
* Make sure that all structs defined in this file remain laid out so
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [PATCH] bnx2x: Adding FW 7.0.29.0
From: David Woodhouse @ 2011-10-24 14:33 UTC (permalink / raw)
To: dmitry; +Cc: ben@decadent.org.uk, netdev@vger.kernel.org, Eilon Greenstein
In-Reply-To: <1319464676.6155.3.camel@lb-tlvb-dmitry>
[-- Attachment #1: Type: text/plain, Size: 1088 bytes --]
On Mon, 2011-10-24 at 15:57 +0200, Dmitry Kravkov wrote:
> On Mon, 2011-10-17 at 09:12 -0700, Dmitry Kravkov wrote:
> > On Mon, 2011-10-17 at 07:00 -0700, Dmitry Kravkov wrote:
> > > Includes fixes for the following issues:
> > > 1. (iSCSI) Arrival of un-solicited ASYNC message causes
> > > firmware to abort the connection with RST.
> > > 2. (FCoE) There is a probability that truncated FCoE packet on
> > > RX path won't get detected which might lead to FW assert.
> > > 3. (iSCSI) Arrival of target-initiated NOP-IN during intense
> > > ISCSI traffic might lead to FW assert.
> > > 4. (iSCSI) Chip hangs when in case of retransmission not aligned
> > > to 4-bytes from the beginning of iSCSI PDU.
> > > 5. (FCoE) Arrival of packets beyond task IO size can lead to crash.
> > >
>
> David, do you have estimation to handle the request? We have pending
> patch for net-next. Thanks.
Pushed to git.infradead.org; thanks. I'll work on getting the tree back
onto git.kernel.org now I'm sitting at a table with the admin...
--
dwmw2
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5818 bytes --]
^ permalink raw reply
* Re: [patch net-next V4] net: introduce ethernet teaming device
From: Paul E. McKenney @ 2011-10-24 14:11 UTC (permalink / raw)
To: Benjamin Poirier
Cc: Jiri Pirko, netdev, davem, eric.dumazet, bhutchings, shemminger,
fubar, andy, tgraf, ebiederm, mirqus, kaber, greearb, jesse, fbl,
jzupka, Dipankar Sarma
In-Reply-To: <20111024130918.GB24473@synalogic.ca>
On Mon, Oct 24, 2011 at 09:09:19AM -0400, Benjamin Poirier wrote:
> On 11/10/24 10:13, Jiri Pirko wrote:
> > This patch introduces new network device called team. It supposes to be
> > very fast, simple, userspace-driven alternative to existing bonding
> > driver.
> >
> > Userspace library called libteam with couple of demo apps is available
> > here:
> > https://github.com/jpirko/libteam
> > Note it's still in its dipers atm.
> >
> > team<->libteam use generic netlink for communication. That and rtnl
> > suppose to be the only way to configure team device, no sysfs etc.
> >
> > Python binding basis for libteam was recently introduced (some need
> > still need to be done on it though). Daemon providing arpmon/miimon
> > active-backup functionality will be introduced shortly.
> > All what's necessary is already implemented in kernel team driver.
> >
> > Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> >
> > v3->v4:
> > - remove redundant synchronize_rcu from __team_change_mode()
> > - revert "set and clear of mode_ops happens per pointer, not per
> > byte"
> > - extend comment of function __team_change_mode()
> >
> > v2->v3:
> > - team_change_mtu() user rcu version of list traversal to unwind
> > - set and clear of mode_ops happens per pointer, not per byte
> > - port hashlist changed to be embedded into team structure
> > - error branch in team_port_enter() does cleanup now
> > - fixed rtln->rtnl
> >
> > v1->v2:
> > - modes are made as modules. Makes team more modular and
> > extendable.
> > - several commenters' nitpicks found on v1 were fixed
> > - several other bugs were fixed.
> > - note I ignored Eric's comment about roundrobin port selector
> > as Eric's way may be easily implemented as another mode (mode
> > "random") in future.
> > ---
> > Documentation/networking/team.txt | 2 +
> > MAINTAINERS | 7 +
> > drivers/net/Kconfig | 2 +
> > drivers/net/Makefile | 1 +
> > drivers/net/team/Kconfig | 38 +
> > drivers/net/team/Makefile | 7 +
> > drivers/net/team/team.c | 1573 +++++++++++++++++++++++++++++
> > drivers/net/team/team_mode_activebackup.c | 152 +++
> > drivers/net/team/team_mode_roundrobin.c | 107 ++
> > include/linux/Kbuild | 1 +
> > include/linux/if.h | 1 +
> > include/linux/if_team.h | 231 +++++
> > include/linux/rculist.h | 14 +
>
> I think you're missing some CC's for the modifications to this file.
> I've taken the liberty of adding Dipankar and Paul to the discussion.
Thank you, and please see below.
> > 13 files changed, 2136 insertions(+), 0 deletions(-)
> > create mode 100644 Documentation/networking/team.txt
> > create mode 100644 drivers/net/team/Kconfig
> > create mode 100644 drivers/net/team/Makefile
> > create mode 100644 drivers/net/team/team.c
> > create mode 100644 drivers/net/team/team_mode_activebackup.c
> > create mode 100644 drivers/net/team/team_mode_roundrobin.c
> > create mode 100644 include/linux/if_team.h
> >
>
> [...]
>
> > diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> > new file mode 100644
> > index 0000000..acfef4c
> > --- /dev/null
> > +++ b/drivers/net/team/team.c
> > +
> [...]
> > +static int team_change_mtu(struct net_device *dev, int new_mtu)
> > +{
> > + struct team *team = netdev_priv(dev);
> > + struct team_port *port;
> > + int err;
> > +
> > + rcu_read_lock();
> > + list_for_each_entry_rcu(port, &team->port_list, list) {
> > + err = dev_set_mtu(port->dev, new_mtu);
> > + if (err) {
> > + netdev_err(dev, "Device %s failed to change mtu",
> > + port->dev->name);
> > + goto unwind;
> > + }
> > + }
> > + rcu_read_unlock();
> > +
> > + dev->mtu = new_mtu;
> > +
> > + return 0;
> > +
> > +unwind:
> > + list_for_each_entry_continue_reverse_rcu(port, &team->port_list, list)
> > + dev_set_mtu(port->dev, dev->mtu);
> > +
> > + rcu_read_unlock();
> > + return err;
> > +}
> > +
> > +
>
> [...]
>
> > diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> > index d079290..7586b2c 100644
> > --- a/include/linux/rculist.h
> > +++ b/include/linux/rculist.h
> > @@ -288,6 +288,20 @@ static inline void list_splice_init_rcu(struct list_head *list,
> > pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
> >
> > /**
> > + * list_for_each_entry_continue_reverse_rcu - iterate backwards from the given point
> > + * @pos: the type * to use as a loop cursor.
> > + * @head: the head for your list.
> > + * @member: the name of the list_struct within the struct.
> > + *
> > + * Start to iterate over list of given type backwards, continuing after
> > + * the current position.
> > + */
> > +#define list_for_each_entry_continue_reverse_rcu(pos, head, member) \
> > + for (pos = list_entry_rcu(pos->member.prev, typeof(*pos), member); \
> > + &pos->member != (head); \
> > + pos = list_entry_rcu(pos->member.prev, typeof(*pos), member))
> > +
>
> rcu lists can be modified while they are traversed with *_rcu()
> primitives. This benefit comes with the constraint that they may only be
> traversed forwards. This is implicit in the choice of *_rcu()
> list-traversal primitives: they only go forwards.
>
> You suggest to add a backwards rcu list-traversal primitive. But
> consider what happens in this sequence:
>
> CPU0 CPU1
> list_for_each_entry_continue_reverse_rcu(...)
> pos = list_entry_rcu(pos->member.prev, typeof(*pos), member)
> list_del_rcu(&pos->member)
> { (&pos->member)->prev = LIST_POISON2 }
> pos = list_entry_rcu(pos->member.prev, typeof(*pos), member)
> = container_of(LIST_POISON2, typeof(*pos), member)
> do_something(*pos)
> BAM!
>
> Going back to the problem you're trying to solve in team_change_mtu(),
> I think you could either:
> 1) take team->lock instead of rcu_read_lock() throughout this particular
> function
> 2) save each deleted element in a separate list on the side in case it's
> necessary to roll back
> 3) remove the rcu double locking, rely on rtnl and add some
> ASSERT_RTNL() if desired. You've said that you don't want to rely on
> rtnl and you want to use separate locking but I fail to see what
> advantage that brings to balance out the extra complexity in code and
> execution? Please clarify this.
Indeed -- the list_for_each_entry_continue_reverse_rcu() implementation
above would only work if elements were never deleted from the list.
But people would miss that fact, resulting in list-poison oopses.
Furthermore, even if you avoid the poisoning, there is no guarantee
that you will see the same objects in reverse that you saw going
forward because some might have been added or deleted in the meantime.
So please take some other approach. For example, if the list has a
fixed upper bound, perhaps just keeping track of what elements you
visited would be workable.
Thanx, Paul
> Thanks,
> -Ben
>
> > +/**
> > * hlist_del_rcu - deletes entry from hash list without re-initialization
> > * @n: the element to delete from the hash list.
> > *
> > --
> > 1.7.6
> >
>
^ permalink raw reply
* Re: [Xen-devel] Re: [PATCH 5/6] xen/netback: Enable netback on HVM guests
From: Konrad Rzeszutek Wilk @ 2011-10-24 14:19 UTC (permalink / raw)
To: David Miller; +Cc: Ian.Campbell, netdev, dgdegra, xen-devel, david.vrabel
In-Reply-To: <20111024.053419.1995560587557035685.davem@davemloft.net>
On Mon, Oct 24, 2011 at 05:34:19AM -0400, David Miller wrote:
> From: Ian Campbell <Ian.Campbell@citrix.com>
> Date: Mon, 24 Oct 2011 10:31:08 +0100
>
> > On Thu, 2011-10-20 at 16:35 +0100, Daniel De Graaf wrote:
> >> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
> >
> > Acked-by: Ian Campbell <ian.campbell@citrix.com>
> >
> > Normally netback patches would go in via the networking subsystem
> > maintainer's tree but since this depends on core Xen patches from this
> > series and is unlikely to conflict with anything in the net-next tree I
> > suspect it would make more sense for Konrad to take this one.
> >
> > David (Miller) does that work for you?
>
> Yes, it does.
OK, Can I stick Acked-by: David Miller on that patch?
Thank you.
^ permalink raw reply
* Re: bridge: HSR support
From: Arvid Brodin @ 2011-10-24 14:17 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <4E94D67A.9060207@enea.com>
Stephen Hemminger wrote:
> On Tue, 11 Oct 2011 20:25:08 +0200
> Arvid Brodin <arvid.brodin@enea.com> wrote:
>
>> Hi,
>>
>> I want to add support for HSR ("High-availability Seamless Redundancy",
>> IEC-62439-3) to the bridge code. With HSR, all connected units have two network
>> ports and are connected in a ring. All new Ethernet packets are sent on both
>> ports (or passed through if the current unit is not the originating unit). The
>> same packet is never passed twice. Non-HSR units are not allowed in the ring.
>>
>> This gives instant, reconfiguration-free failover.
>>
>> I'd like your input on how to design the user interface. To me it seems natural
>> to use bridge-utils, which of course today supports STP.
>>
>> One solution is to simply add an "hsr" command:
>>
>> # brctl hsr <bridge> on|off
>>
>> But HSR is mutually exclusive to other modes, and I think that STP and standard
>> bridge mode are mutually exclusive, too? Perhaps it would be better (more user-
>> friendly) to
>>
>> # brctl type <bridge> standard|stp|hsr
>>
>> ?
>>
>> 'brctl stp <bridge> on|off' would have to be kept for compatibility, but could
>> be a simple wrapper for 'brctl type <bridge> stp|standard'
>>
>> What do you think about this?
>>
>
> Why is it a bridge thing and not a standalone or bonding (or the new team
> device feature? Wouldn't users want to use it without all the stuff
> related to bridging. The fact that it doesn't work with STP is a big
> red flag that it doesn't belong in the bridge.
Ok, having read up some more on this it looks like STP is a standardised part of
bridging, so I guess you're right.
I need to do two things:
1) Bind two network interfaces into one (say, eth0 & eth1 => hsr0). Frames sent on
hsr0 should get an HSR tag (including the correct EtherType) and go out on both
eth0 and eth1.
2) Ingress frames on eth0 & eth1, with EtherType 0x88fb, should be captured and
handled specially (either received on hsr0 or forwarded to the other bound
physical interface).
Any ideas on the best way to implement this -- what's the nicest place to "hook
into" for this?
--
Arvid Brodin
Enea Services Stockholm AB
^ permalink raw reply
* Re: [PATCH V2 2/4] MIPS: Add board support for Loongson1B
From: Kelvin Cheung @ 2011-10-24 14:05 UTC (permalink / raw)
To: Giuseppe CAVALLARO
Cc: Wu Zhangjin, linux-mips, linux-kernel, ralf, r0bertz, netdev
In-Reply-To: <4EA557B2.4020504@st.com>
2011/10/24, Giuseppe CAVALLARO <peppe.cavallaro@st.com>:
> Hello Kelvin.
>
> On 10/24/2011 12:36 PM, Kelvin Cheung wrote:
>
> [snip]
>
>> According to datasheet of Loongson 1B, the buffer size in RX/TX
>> descriptor is only 2KB. So the Loongson1B's GMAC could not handle
>> jumbo frames. And the second buffer is useless in this case. Am I
>> right? Is there a better way than ifdef CONFIG_MACH_LOONGSON1 to
>> avoid duplicate code?
>
> Sorry for my misunderstanding.
>
> I think you have to use the normal descriptor and remove the enh_desc
> from the platform w/o modifying the driver at all.
>
> The driver will be able to select/configure all automatically (also jumbo).
>
> Let me know.
That's the problem.
The bitfield definition of Loongson1B is also different from normal descriptor.
Moreover, I want to enable the TX checksum offload function which is
not supported in normal descriptor.
Any suggestions?
> Note:
> IIRC, there is a bit difference in case of normal descriptors for
> Synopsys databook newer than the 1.91 (I used for testing this mode).
> In any case, I remember that, on some platforms, the normal descriptors
> have been used w/o problems also on these new chip generations.
>
> Peppe
>
>
--
Best Regards!
Kelvin
^ permalink raw reply
* Re: [PATCH] bnx2x: Adding FW 7.0.29.0
From: Dmitry Kravkov @ 2011-10-24 13:57 UTC (permalink / raw)
To: dwmw2@infradead.org
Cc: ben@decadent.org.uk, netdev@vger.kernel.org, Eilon Greenstein
In-Reply-To: <1318867968.5817.1.camel@lb-tlvb-dmitry>
On Mon, 2011-10-17 at 09:12 -0700, Dmitry Kravkov wrote:
> On Mon, 2011-10-17 at 07:00 -0700, Dmitry Kravkov wrote:
> > Includes fixes for the following issues:
> > 1. (iSCSI) Arrival of un-solicited ASYNC message causes
> > firmware to abort the connection with RST.
> > 2. (FCoE) There is a probability that truncated FCoE packet on
> > RX path won't get detected which might lead to FW assert.
> > 3. (iSCSI) Arrival of target-initiated NOP-IN during intense
> > ISCSI traffic might lead to FW assert.
> > 4. (iSCSI) Chip hangs when in case of retransmission not aligned
> > to 4-bytes from the beginning of iSCSI PDU.
> > 5. (FCoE) Arrival of packets beyond task IO size can lead to crash.
> >
David, do you have estimation to handle the request? We have pending
patch for net-next. Thanks.
^ permalink raw reply
* Re: [patch net-next V4] net: introduce ethernet teaming device
From: Jiri Pirko @ 2011-10-24 13:50 UTC (permalink / raw)
To: Benjamin Poirier
Cc: netdev, davem, eric.dumazet, bhutchings, shemminger, fubar, andy,
tgraf, ebiederm, mirqus, kaber, greearb, jesse, fbl, jzupka,
Dipankar Sarma, Paul E. McKenney
In-Reply-To: <20111024130918.GB24473@synalogic.ca>
Mon, Oct 24, 2011 at 03:09:19PM CEST, benjamin.poirier@gmail.com wrote:
>On 11/10/24 10:13, Jiri Pirko wrote:
>> This patch introduces new network device called team. It supposes to be
>> very fast, simple, userspace-driven alternative to existing bonding
>> driver.
>>
>> Userspace library called libteam with couple of demo apps is available
>> here:
>> https://github.com/jpirko/libteam
>> Note it's still in its dipers atm.
>>
>> team<->libteam use generic netlink for communication. That and rtnl
>> suppose to be the only way to configure team device, no sysfs etc.
>>
>> Python binding basis for libteam was recently introduced (some need
>> still need to be done on it though). Daemon providing arpmon/miimon
>> active-backup functionality will be introduced shortly.
>> All what's necessary is already implemented in kernel team driver.
>>
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>>
>> v3->v4:
>> - remove redundant synchronize_rcu from __team_change_mode()
>> - revert "set and clear of mode_ops happens per pointer, not per
>> byte"
>> - extend comment of function __team_change_mode()
>>
>> v2->v3:
>> - team_change_mtu() user rcu version of list traversal to unwind
>> - set and clear of mode_ops happens per pointer, not per byte
>> - port hashlist changed to be embedded into team structure
>> - error branch in team_port_enter() does cleanup now
>> - fixed rtln->rtnl
>>
>> v1->v2:
>> - modes are made as modules. Makes team more modular and
>> extendable.
>> - several commenters' nitpicks found on v1 were fixed
>> - several other bugs were fixed.
>> - note I ignored Eric's comment about roundrobin port selector
>> as Eric's way may be easily implemented as another mode (mode
>> "random") in future.
>> ---
>> Documentation/networking/team.txt | 2 +
>> MAINTAINERS | 7 +
>> drivers/net/Kconfig | 2 +
>> drivers/net/Makefile | 1 +
>> drivers/net/team/Kconfig | 38 +
>> drivers/net/team/Makefile | 7 +
>> drivers/net/team/team.c | 1573 +++++++++++++++++++++++++++++
>> drivers/net/team/team_mode_activebackup.c | 152 +++
>> drivers/net/team/team_mode_roundrobin.c | 107 ++
>> include/linux/Kbuild | 1 +
>> include/linux/if.h | 1 +
>> include/linux/if_team.h | 231 +++++
>> include/linux/rculist.h | 14 +
>
>I think you're missing some CC's for the modifications to this file.
>I've taken the liberty of adding Dipankar and Paul to the discussion.
>
>> 13 files changed, 2136 insertions(+), 0 deletions(-)
>> create mode 100644 Documentation/networking/team.txt
>> create mode 100644 drivers/net/team/Kconfig
>> create mode 100644 drivers/net/team/Makefile
>> create mode 100644 drivers/net/team/team.c
>> create mode 100644 drivers/net/team/team_mode_activebackup.c
>> create mode 100644 drivers/net/team/team_mode_roundrobin.c
>> create mode 100644 include/linux/if_team.h
>>
>
>[...]
>
>> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>> new file mode 100644
>> index 0000000..acfef4c
>> --- /dev/null
>> +++ b/drivers/net/team/team.c
>> +
>[...]
>> +static int team_change_mtu(struct net_device *dev, int new_mtu)
>> +{
>> + struct team *team = netdev_priv(dev);
>> + struct team_port *port;
>> + int err;
>> +
>> + rcu_read_lock();
>> + list_for_each_entry_rcu(port, &team->port_list, list) {
>> + err = dev_set_mtu(port->dev, new_mtu);
>> + if (err) {
>> + netdev_err(dev, "Device %s failed to change mtu",
>> + port->dev->name);
>> + goto unwind;
>> + }
>> + }
>> + rcu_read_unlock();
>> +
>> + dev->mtu = new_mtu;
>> +
>> + return 0;
>> +
>> +unwind:
>> + list_for_each_entry_continue_reverse_rcu(port, &team->port_list, list)
>> + dev_set_mtu(port->dev, dev->mtu);
>> +
>> + rcu_read_unlock();
>> + return err;
>> +}
>> +
>> +
>
>[...]
>
>> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
>> index d079290..7586b2c 100644
>> --- a/include/linux/rculist.h
>> +++ b/include/linux/rculist.h
>> @@ -288,6 +288,20 @@ static inline void list_splice_init_rcu(struct list_head *list,
>> pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>>
>> /**
>> + * list_for_each_entry_continue_reverse_rcu - iterate backwards from the given point
>> + * @pos: the type * to use as a loop cursor.
>> + * @head: the head for your list.
>> + * @member: the name of the list_struct within the struct.
>> + *
>> + * Start to iterate over list of given type backwards, continuing after
>> + * the current position.
>> + */
>> +#define list_for_each_entry_continue_reverse_rcu(pos, head, member) \
>> + for (pos = list_entry_rcu(pos->member.prev, typeof(*pos), member); \
>> + &pos->member != (head); \
>> + pos = list_entry_rcu(pos->member.prev, typeof(*pos), member))
>> +
>
>rcu lists can be modified while they are traversed with *_rcu()
>primitives. This benefit comes with the constraint that they may only be
>traversed forwards. This is implicit in the choice of *_rcu()
>list-traversal primitives: they only go forwards.
>
>You suggest to add a backwards rcu list-traversal primitive. But
>consider what happens in this sequence:
>
>CPU0 CPU1
>list_for_each_entry_continue_reverse_rcu(...)
>pos = list_entry_rcu(pos->member.prev, typeof(*pos), member)
> list_del_rcu(&pos->member)
> { (&pos->member)->prev = LIST_POISON2 }
>pos = list_entry_rcu(pos->member.prev, typeof(*pos), member)
> = container_of(LIST_POISON2, typeof(*pos), member)
>do_something(*pos)
> BAM!
Doh, you are right. :(
>
>Going back to the problem you're trying to solve in team_change_mtu(),
>I think you could either:
>1) take team->lock instead of rcu_read_lock() throughout this particular
>function
>2) save each deleted element in a separate list on the side in case it's
>necessary to roll back
I would go with this one.
>3) remove the rcu double locking, rely on rtnl and add some
>ASSERT_RTNL() if desired. You've said that you don't want to rely on
>rtnl and you want to use separate locking but I fail to see what
>advantage that brings to balance out the extra complexity in code and
>execution? Please clarify this.
I do not want to use rtnl. For example in gennetlink code rtnl is not
held so I need to depend on own lock.
>
>Thanks,
>-Ben
>
>> +/**
>> * hlist_del_rcu - deletes entry from hash list without re-initialization
>> * @n: the element to delete from the hash list.
>> *
>> --
>> 1.7.6
>>
^ permalink raw reply
* Re: [patch net-next V4] net: introduce ethernet teaming device
From: Benjamin Poirier @ 2011-10-24 13:09 UTC (permalink / raw)
To: Jiri Pirko
Cc: netdev, davem, eric.dumazet, bhutchings, shemminger, fubar, andy,
tgraf, ebiederm, mirqus, kaber, greearb, jesse, fbl, jzupka,
Dipankar Sarma, Paul E. McKenney
In-Reply-To: <1319444005-1281-1-git-send-email-jpirko@redhat.com>
On 11/10/24 10:13, Jiri Pirko wrote:
> This patch introduces new network device called team. It supposes to be
> very fast, simple, userspace-driven alternative to existing bonding
> driver.
>
> Userspace library called libteam with couple of demo apps is available
> here:
> https://github.com/jpirko/libteam
> Note it's still in its dipers atm.
>
> team<->libteam use generic netlink for communication. That and rtnl
> suppose to be the only way to configure team device, no sysfs etc.
>
> Python binding basis for libteam was recently introduced (some need
> still need to be done on it though). Daemon providing arpmon/miimon
> active-backup functionality will be introduced shortly.
> All what's necessary is already implemented in kernel team driver.
>
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
> v3->v4:
> - remove redundant synchronize_rcu from __team_change_mode()
> - revert "set and clear of mode_ops happens per pointer, not per
> byte"
> - extend comment of function __team_change_mode()
>
> v2->v3:
> - team_change_mtu() user rcu version of list traversal to unwind
> - set and clear of mode_ops happens per pointer, not per byte
> - port hashlist changed to be embedded into team structure
> - error branch in team_port_enter() does cleanup now
> - fixed rtln->rtnl
>
> v1->v2:
> - modes are made as modules. Makes team more modular and
> extendable.
> - several commenters' nitpicks found on v1 were fixed
> - several other bugs were fixed.
> - note I ignored Eric's comment about roundrobin port selector
> as Eric's way may be easily implemented as another mode (mode
> "random") in future.
> ---
> Documentation/networking/team.txt | 2 +
> MAINTAINERS | 7 +
> drivers/net/Kconfig | 2 +
> drivers/net/Makefile | 1 +
> drivers/net/team/Kconfig | 38 +
> drivers/net/team/Makefile | 7 +
> drivers/net/team/team.c | 1573 +++++++++++++++++++++++++++++
> drivers/net/team/team_mode_activebackup.c | 152 +++
> drivers/net/team/team_mode_roundrobin.c | 107 ++
> include/linux/Kbuild | 1 +
> include/linux/if.h | 1 +
> include/linux/if_team.h | 231 +++++
> include/linux/rculist.h | 14 +
I think you're missing some CC's for the modifications to this file.
I've taken the liberty of adding Dipankar and Paul to the discussion.
> 13 files changed, 2136 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/networking/team.txt
> create mode 100644 drivers/net/team/Kconfig
> create mode 100644 drivers/net/team/Makefile
> create mode 100644 drivers/net/team/team.c
> create mode 100644 drivers/net/team/team_mode_activebackup.c
> create mode 100644 drivers/net/team/team_mode_roundrobin.c
> create mode 100644 include/linux/if_team.h
>
[...]
> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> new file mode 100644
> index 0000000..acfef4c
> --- /dev/null
> +++ b/drivers/net/team/team.c
> +
[...]
> +static int team_change_mtu(struct net_device *dev, int new_mtu)
> +{
> + struct team *team = netdev_priv(dev);
> + struct team_port *port;
> + int err;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(port, &team->port_list, list) {
> + err = dev_set_mtu(port->dev, new_mtu);
> + if (err) {
> + netdev_err(dev, "Device %s failed to change mtu",
> + port->dev->name);
> + goto unwind;
> + }
> + }
> + rcu_read_unlock();
> +
> + dev->mtu = new_mtu;
> +
> + return 0;
> +
> +unwind:
> + list_for_each_entry_continue_reverse_rcu(port, &team->port_list, list)
> + dev_set_mtu(port->dev, dev->mtu);
> +
> + rcu_read_unlock();
> + return err;
> +}
> +
> +
[...]
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index d079290..7586b2c 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -288,6 +288,20 @@ static inline void list_splice_init_rcu(struct list_head *list,
> pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>
> /**
> + * list_for_each_entry_continue_reverse_rcu - iterate backwards from the given point
> + * @pos: the type * to use as a loop cursor.
> + * @head: the head for your list.
> + * @member: the name of the list_struct within the struct.
> + *
> + * Start to iterate over list of given type backwards, continuing after
> + * the current position.
> + */
> +#define list_for_each_entry_continue_reverse_rcu(pos, head, member) \
> + for (pos = list_entry_rcu(pos->member.prev, typeof(*pos), member); \
> + &pos->member != (head); \
> + pos = list_entry_rcu(pos->member.prev, typeof(*pos), member))
> +
rcu lists can be modified while they are traversed with *_rcu()
primitives. This benefit comes with the constraint that they may only be
traversed forwards. This is implicit in the choice of *_rcu()
list-traversal primitives: they only go forwards.
You suggest to add a backwards rcu list-traversal primitive. But
consider what happens in this sequence:
CPU0 CPU1
list_for_each_entry_continue_reverse_rcu(...)
pos = list_entry_rcu(pos->member.prev, typeof(*pos), member)
list_del_rcu(&pos->member)
{ (&pos->member)->prev = LIST_POISON2 }
pos = list_entry_rcu(pos->member.prev, typeof(*pos), member)
= container_of(LIST_POISON2, typeof(*pos), member)
do_something(*pos)
BAM!
Going back to the problem you're trying to solve in team_change_mtu(),
I think you could either:
1) take team->lock instead of rcu_read_lock() throughout this particular
function
2) save each deleted element in a separate list on the side in case it's
necessary to roll back
3) remove the rcu double locking, rely on rtnl and add some
ASSERT_RTNL() if desired. You've said that you don't want to rely on
rtnl and you want to use separate locking but I fail to see what
advantage that brings to balance out the extra complexity in code and
execution? Please clarify this.
Thanks,
-Ben
> +/**
> * hlist_del_rcu - deletes entry from hash list without re-initialization
> * @n: the element to delete from the hash list.
> *
> --
> 1.7.6
>
^ permalink raw reply
* Re: [RFD] Network configuration data in sysfs
From: Kay Sievers @ 2011-10-24 12:46 UTC (permalink / raw)
To: David Miller
Cc: kirill, netdev, kuznet, jmorris, yoshfuji, kaber, gregkh,
gladkov.alexey
In-Reply-To: <20111024.005900.1091819103500072631.davem@davemloft.net>
On Mon, Oct 24, 2011 at 06:59, David Miller <davem@davemloft.net> wrote:
> From: "Kirill A. Shutemov" <kirill@shutemov.name>
> Date: Mon, 24 Oct 2011 07:24:00 +0300
>
>> On Sun, Oct 23, 2011 at 11:24:16PM -0400, David Miller wrote:
>>> From: "Kirill A. Shutemov" <kirill@shutemov.name>
>>> Date: Mon, 24 Oct 2011 04:34:07 +0300
>>>
>>> You can use netlink to perform any configuration change you want, or
>>> to view any network configuration setting.
>>
>> You need /sbin/ip or similar tool to do this, right?
>
> I'm talking about udev using netlink natively.
Kirill, what exactly is the use case? And why what does udev support
mean in that context?
I doubt that "not having /sbin/ip installed" should be a reason to add
and expose complex interfaces in /sys, while we already have a
perfectly working native way to do it.
Kay
^ permalink raw reply
* [PATCH net-next 4/4] be2net: don't create multiple RX/TX rings in multi channel mode
From: Sathya Perla @ 2011-10-24 12:45 UTC (permalink / raw)
To: netdev; +Cc: Suresh Reddy
In-Reply-To: <1319460303-25560-1-git-send-email-sathya.perla@emulex.com>
When the HW is in multi-channel mode based on the skew/IPL, there are 4 functions per port and so not enough resources to create multiple RX/TX rings for each function.
Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
drivers/net/ethernet/emulex/benet/be_cmds.h | 6 ++++++
drivers/net/ethernet/emulex/benet/be_main.c | 16 ++++++++++++----
2 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.h b/drivers/net/ethernet/emulex/benet/be_cmds.h
index 75b7574..a35cd03 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.h
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.h
@@ -1046,6 +1046,12 @@ struct be_cmd_resp_modify_eq_delay {
/******************** Get FW Config *******************/
#define BE_FUNCTION_CAPS_RSS 0x2
+/* The HW can come up in either of the following multi-channel modes
+ * based on the skew/IPL.
+ */
+#define FLEX10_MODE 0x400
+#define VNIC_MODE 0x20000
+#define UMC_ENABLED 0x1000000
struct be_cmd_req_query_fw_cfg {
struct be_cmd_req_hdr hdr;
u32 rsvd[31];
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 77ce24b..91fe12a 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -114,6 +114,13 @@ static const char * const ue_status_hi_desc[] = {
"Unknown"
};
+/* Is BE in a multi-channel mode */
+static inline bool be_is_mc(struct be_adapter *adapter) {
+ return (adapter->function_mode & FLEX10_MODE ||
+ adapter->function_mode & VNIC_MODE ||
+ adapter->function_mode & UMC_ENABLED);
+}
+
static void be_queue_free(struct be_adapter *adapter, struct be_queue_info *q)
{
struct be_dma_mem *mem = &q->dma_mem;
@@ -1289,7 +1296,7 @@ static struct be_rx_compl_info *be_rx_compl_get(struct be_rx_obj *rxo)
if (rxcp->vlanf) {
/* vlanf could be wrongly set in some cards.
* ignore if vtm is not set */
- if ((adapter->function_mode & 0x400) && !rxcp->vtm)
+ if ((adapter->function_mode & FLEX10_MODE) && !rxcp->vtm)
rxcp->vlanf = 0;
if (!lancer_chip(adapter))
@@ -1636,7 +1643,7 @@ static void be_tx_queues_destroy(struct be_adapter *adapter)
static int be_num_txqs_want(struct be_adapter *adapter)
{
if ((num_vfs && adapter->sriov_enabled) ||
- (adapter->function_mode & 0x400) ||
+ be_is_mc(adapter) ||
lancer_chip(adapter) || !be_physfn(adapter) ||
adapter->generation == BE_GEN2)
return 1;
@@ -1718,7 +1725,8 @@ static void be_rx_queues_destroy(struct be_adapter *adapter)
static u32 be_num_rxqs_want(struct be_adapter *adapter)
{
if ((adapter->function_caps & BE_FUNCTION_CAPS_RSS) &&
- !adapter->sriov_enabled && !(adapter->function_mode & 0x400)) {
+ !adapter->sriov_enabled && be_physfn(adapter) &&
+ !be_is_mc(adapter)) {
return 1 + MAX_RSS_QS; /* one default non-RSS queue */
} else {
dev_warn(&adapter->pdev->dev,
@@ -3187,7 +3195,7 @@ static int be_get_config(struct be_adapter *adapter)
if (status)
return status;
- if (adapter->function_mode & 0x400)
+ if (adapter->function_mode & FLEX10_MODE)
adapter->max_vlans = BE_NUM_VLANS_SUPPORTED/4;
else
adapter->max_vlans = BE_NUM_VLANS_SUPPORTED;
--
1.7.4
^ permalink raw reply related
* [PATCH net-next 3/4] be2net: don't create multiple TXQs in BE2
From: Sathya Perla @ 2011-10-24 12:45 UTC (permalink / raw)
To: netdev; +Cc: Vasundhara Volam
In-Reply-To: <1319460303-25560-1-git-send-email-sathya.perla@emulex.com>
Multiple TXQ support is partially broken in BE2. It is fully
supported BE3 onwards and in Lancer.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
drivers/net/ethernet/emulex/benet/be_main.c | 26 ++++++++++++++++----------
1 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 4ab182f..77ce24b 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -1633,6 +1633,17 @@ static void be_tx_queues_destroy(struct be_adapter *adapter)
be_queue_free(adapter, q);
}
+static int be_num_txqs_want(struct be_adapter *adapter)
+{
+ if ((num_vfs && adapter->sriov_enabled) ||
+ (adapter->function_mode & 0x400) ||
+ lancer_chip(adapter) || !be_physfn(adapter) ||
+ adapter->generation == BE_GEN2)
+ return 1;
+ else
+ return MAX_TX_QS;
+}
+
/* One TX event queue is shared by all TX compl qs */
static int be_tx_queues_create(struct be_adapter *adapter)
{
@@ -1640,6 +1651,11 @@ static int be_tx_queues_create(struct be_adapter *adapter)
struct be_tx_obj *txo;
u8 i;
+ adapter->num_tx_qs = be_num_txqs_want(adapter);
+ if (adapter->num_tx_qs != MAX_TX_QS)
+ netif_set_real_num_tx_queues(adapter->netdev,
+ adapter->num_tx_qs);
+
adapter->tx_eq.max_eqd = 0;
adapter->tx_eq.min_eqd = 0;
adapter->tx_eq.cur_eqd = 96;
@@ -3180,16 +3196,6 @@ static int be_get_config(struct be_adapter *adapter)
if (status)
return status;
- if ((num_vfs && adapter->sriov_enabled) ||
- (adapter->function_mode & 0x400) ||
- lancer_chip(adapter) || !be_physfn(adapter)) {
- adapter->num_tx_qs = 1;
- netif_set_real_num_tx_queues(adapter->netdev,
- adapter->num_tx_qs);
- } else {
- adapter->num_tx_qs = MAX_TX_QS;
- }
-
return 0;
}
--
1.7.4
^ permalink raw reply related
* [PATCH net-next 2/4] be2net: refactor VF setup/teardown code into be_vf_setup/clear()
From: Sathya Perla @ 2011-10-24 12:45 UTC (permalink / raw)
To: netdev
In-Reply-To: <1319460303-25560-1-git-send-email-sathya.perla@emulex.com>
Currently the code for VF setup/teardown done by a PF (if_create, mac_add_config, link_status_query etc) is scattered; this patch refactors this code into be_vf_setup() and be_vf_clear().
The if_create/if_destroy/mac_addr_query cmds are now called after the MCCQ is created; so these cmds are now modified to use the MCCQ instead of MBOX.
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
---
drivers/net/ethernet/emulex/benet/be_cmds.c | 66 +++++---
drivers/net/ethernet/emulex/benet/be_cmds.h | 4 +-
drivers/net/ethernet/emulex/benet/be_main.c | 233 ++++++++++++---------------
3 files changed, 149 insertions(+), 154 deletions(-)
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c
index 6e7b521..8ae5a4c 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -615,7 +615,7 @@ int be_cmd_eq_create(struct be_adapter *adapter,
return status;
}
-/* Uses mbox */
+/* Use MCC */
int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 *mac_addr,
u8 type, bool permanent, u32 if_handle)
{
@@ -623,10 +623,13 @@ int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 *mac_addr,
struct be_cmd_req_mac_query *req;
int status;
- if (mutex_lock_interruptible(&adapter->mbox_lock))
- return -1;
+ spin_lock_bh(&adapter->mcc_lock);
- wrb = wrb_from_mbox(adapter);
+ wrb = wrb_from_mccq(adapter);
+ if (!wrb) {
+ status = -EBUSY;
+ goto err;
+ }
req = embedded_payload(wrb);
be_wrb_hdr_prepare(wrb, sizeof(*req), true, 0,
@@ -643,13 +646,14 @@ int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 *mac_addr,
req->permanent = 0;
}
- status = be_mbox_notify_wait(adapter);
+ status = be_mcc_notify_wait(adapter);
if (!status) {
struct be_cmd_resp_mac_query *resp = embedded_payload(wrb);
memcpy(mac_addr, resp->mac.addr, ETH_ALEN);
}
- mutex_unlock(&adapter->mbox_lock);
+err:
+ spin_unlock_bh(&adapter->mcc_lock);
return status;
}
@@ -1111,20 +1115,22 @@ err:
}
/* Create an rx filtering policy configuration on an i/f
- * Uses mbox
+ * Uses MCCQ
*/
int be_cmd_if_create(struct be_adapter *adapter, u32 cap_flags, u32 en_flags,
- u8 *mac, bool pmac_invalid, u32 *if_handle, u32 *pmac_id,
- u32 domain)
+ u8 *mac, u32 *if_handle, u32 *pmac_id, u32 domain)
{
struct be_mcc_wrb *wrb;
struct be_cmd_req_if_create *req;
int status;
- if (mutex_lock_interruptible(&adapter->mbox_lock))
- return -1;
+ spin_lock_bh(&adapter->mcc_lock);
- wrb = wrb_from_mbox(adapter);
+ wrb = wrb_from_mccq(adapter);
+ if (!wrb) {
+ status = -EBUSY;
+ goto err;
+ }
req = embedded_payload(wrb);
be_wrb_hdr_prepare(wrb, sizeof(*req), true, 0,
@@ -1136,23 +1142,25 @@ int be_cmd_if_create(struct be_adapter *adapter, u32 cap_flags, u32 en_flags,
req->hdr.domain = domain;
req->capability_flags = cpu_to_le32(cap_flags);
req->enable_flags = cpu_to_le32(en_flags);
- req->pmac_invalid = pmac_invalid;
- if (!pmac_invalid)
+ if (mac)
memcpy(req->mac_addr, mac, ETH_ALEN);
+ else
+ req->pmac_invalid = true;
- status = be_mbox_notify_wait(adapter);
+ status = be_mcc_notify_wait(adapter);
if (!status) {
struct be_cmd_resp_if_create *resp = embedded_payload(wrb);
*if_handle = le32_to_cpu(resp->interface_id);
- if (!pmac_invalid)
+ if (mac)
*pmac_id = le32_to_cpu(resp->pmac_id);
}
- mutex_unlock(&adapter->mbox_lock);
+err:
+ spin_unlock_bh(&adapter->mcc_lock);
return status;
}
-/* Uses mbox */
+/* Uses MCCQ */
int be_cmd_if_destroy(struct be_adapter *adapter, u32 interface_id, u32 domain)
{
struct be_mcc_wrb *wrb;
@@ -1162,10 +1170,16 @@ int be_cmd_if_destroy(struct be_adapter *adapter, u32 interface_id, u32 domain)
if (adapter->eeh_err)
return -EIO;
- if (mutex_lock_interruptible(&adapter->mbox_lock))
- return -1;
+ if (!interface_id)
+ return 0;
- wrb = wrb_from_mbox(adapter);
+ spin_lock_bh(&adapter->mcc_lock);
+
+ wrb = wrb_from_mccq(adapter);
+ if (!wrb) {
+ status = -EBUSY;
+ goto err;
+ }
req = embedded_payload(wrb);
be_wrb_hdr_prepare(wrb, sizeof(*req), true, 0,
@@ -1177,10 +1191,9 @@ int be_cmd_if_destroy(struct be_adapter *adapter, u32 interface_id, u32 domain)
req->hdr.domain = domain;
req->interface_id = cpu_to_le32(interface_id);
- status = be_mbox_notify_wait(adapter);
-
- mutex_unlock(&adapter->mbox_lock);
-
+ status = be_mcc_notify_wait(adapter);
+err:
+ spin_unlock_bh(&adapter->mcc_lock);
return status;
}
@@ -1301,7 +1314,8 @@ int be_cmd_link_status_query(struct be_adapter *adapter, u8 *mac_speed,
struct be_cmd_resp_link_status *resp = embedded_payload(wrb);
if (resp->mac_speed != PHY_LINK_SPEED_ZERO) {
*link_speed = le16_to_cpu(resp->link_speed);
- *mac_speed = resp->mac_speed;
+ if (mac_speed)
+ *mac_speed = resp->mac_speed;
}
}
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.h b/drivers/net/ethernet/emulex/benet/be_cmds.h
index abaa90c..75b7574 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.h
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.h
@@ -1413,8 +1413,8 @@ extern int be_cmd_pmac_add(struct be_adapter *adapter, u8 *mac_addr,
extern int be_cmd_pmac_del(struct be_adapter *adapter, u32 if_id,
u32 pmac_id, u32 domain);
extern int be_cmd_if_create(struct be_adapter *adapter, u32 cap_flags,
- u32 en_flags, u8 *mac, bool pmac_invalid,
- u32 *if_handle, u32 *pmac_id, u32 domain);
+ u32 en_flags, u8 *mac, u32 *if_handle, u32 *pmac_id,
+ u32 domain);
extern int be_cmd_if_destroy(struct be_adapter *adapter, u32 if_handle,
u32 domain);
extern int be_cmd_eq_create(struct be_adapter *adapter,
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 88c9dca..4ab182f 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -2069,7 +2069,7 @@ done:
return;
}
-static void be_sriov_enable(struct be_adapter *adapter)
+static int be_sriov_enable(struct be_adapter *adapter)
{
be_check_sriov_fn_type(adapter);
#ifdef CONFIG_PCI_IOV
@@ -2091,8 +2091,17 @@ static void be_sriov_enable(struct be_adapter *adapter)
status = pci_enable_sriov(adapter->pdev, num_vfs);
adapter->sriov_enabled = status ? false : true;
+
+ if (adapter->sriov_enabled) {
+ adapter->vf_cfg = kcalloc(num_vfs,
+ sizeof(struct be_vf_cfg),
+ GFP_KERNEL);
+ if (!adapter->vf_cfg)
+ return -ENOMEM;
+ }
}
#endif
+ return 0;
}
static void be_sriov_disable(struct be_adapter *adapter)
@@ -2100,6 +2109,7 @@ static void be_sriov_disable(struct be_adapter *adapter)
#ifdef CONFIG_PCI_IOV
if (adapter->sriov_enabled) {
pci_disable_sriov(adapter->pdev);
+ kfree(adapter->vf_cfg);
adapter->sriov_enabled = false;
}
#endif
@@ -2405,7 +2415,7 @@ static int be_setup_wol(struct be_adapter *adapter, bool enable)
*/
static inline int be_vf_eth_addr_config(struct be_adapter *adapter)
{
- u32 vf = 0;
+ u32 vf;
int status = 0;
u8 mac[ETH_ALEN];
@@ -2427,7 +2437,7 @@ static inline int be_vf_eth_addr_config(struct be_adapter *adapter)
return status;
}
-static inline void be_vf_eth_addr_rem(struct be_adapter *adapter)
+static void be_vf_clear(struct be_adapter *adapter)
{
u32 vf;
@@ -2437,29 +2447,25 @@ static inline void be_vf_eth_addr_rem(struct be_adapter *adapter)
adapter->vf_cfg[vf].vf_if_handle,
adapter->vf_cfg[vf].vf_pmac_id, vf + 1);
}
+
+ for (vf = 0; vf < num_vfs; vf++)
+ if (adapter->vf_cfg[vf].vf_if_handle)
+ be_cmd_if_destroy(adapter,
+ adapter->vf_cfg[vf].vf_if_handle, vf + 1);
}
static int be_clear(struct be_adapter *adapter)
{
- int vf;
-
if (be_physfn(adapter) && adapter->sriov_enabled)
- be_vf_eth_addr_rem(adapter);
+ be_vf_clear(adapter);
+
+ be_cmd_if_destroy(adapter, adapter->if_handle, 0);
be_mcc_queues_destroy(adapter);
be_rx_queues_destroy(adapter);
be_tx_queues_destroy(adapter);
adapter->eq_next_idx = 0;
- if (be_physfn(adapter) && adapter->sriov_enabled)
- for (vf = 0; vf < num_vfs; vf++)
- if (adapter->vf_cfg[vf].vf_if_handle)
- be_cmd_if_destroy(adapter,
- adapter->vf_cfg[vf].vf_if_handle,
- vf + 1);
-
- be_cmd_if_destroy(adapter, adapter->if_handle, 0);
-
adapter->be3_native = false;
adapter->promiscuous = false;
@@ -2468,59 +2474,92 @@ static int be_clear(struct be_adapter *adapter)
return 0;
}
+static int be_vf_setup(struct be_adapter *adapter)
+{
+ u32 cap_flags, en_flags, vf;
+ u16 lnk_speed;
+ int status;
+
+ cap_flags = en_flags = BE_IF_FLAGS_UNTAGGED | BE_IF_FLAGS_BROADCAST;
+ for (vf = 0; vf < num_vfs; vf++) {
+ status = be_cmd_if_create(adapter, cap_flags, en_flags, NULL,
+ &adapter->vf_cfg[vf].vf_if_handle,
+ NULL, vf+1);
+ if (status)
+ goto err;
+ adapter->vf_cfg[vf].vf_pmac_id = BE_INVALID_PMAC_ID;
+ }
+
+ if (!lancer_chip(adapter)) {
+ status = be_vf_eth_addr_config(adapter);
+ if (status)
+ goto err;
+ }
+
+ for (vf = 0; vf < num_vfs; vf++) {
+ status = be_cmd_link_status_query(adapter, NULL, &lnk_speed,
+ vf + 1);
+ if (status)
+ goto err;
+ adapter->vf_cfg[vf].vf_tx_rate = lnk_speed * 10;
+ }
+ return 0;
+err:
+ return status;
+}
+
static int be_setup(struct be_adapter *adapter)
{
struct net_device *netdev = adapter->netdev;
- u32 cap_flags, en_flags, vf = 0;
+ u32 cap_flags, en_flags;
u32 tx_fc, rx_fc;
int status;
u8 mac[ETH_ALEN];
+ /* Allow all priorities by default. A GRP5 evt may modify this */
+ adapter->vlan_prio_bmap = 0xff;
+ adapter->link_speed = -1;
+
be_cmd_req_native_mode(adapter);
- cap_flags = en_flags = BE_IF_FLAGS_UNTAGGED |
- BE_IF_FLAGS_BROADCAST |
- BE_IF_FLAGS_MULTICAST;
-
- if (be_physfn(adapter)) {
- cap_flags |= BE_IF_FLAGS_MCAST_PROMISCUOUS |
- BE_IF_FLAGS_PROMISCUOUS |
- BE_IF_FLAGS_PASS_L3L4_ERRORS;
- en_flags |= BE_IF_FLAGS_PASS_L3L4_ERRORS;
-
- if (adapter->function_caps & BE_FUNCTION_CAPS_RSS) {
- cap_flags |= BE_IF_FLAGS_RSS;
- en_flags |= BE_IF_FLAGS_RSS;
- }
+ status = be_tx_queues_create(adapter);
+ if (status != 0)
+ goto err;
+
+ status = be_rx_queues_create(adapter);
+ if (status != 0)
+ goto err;
+
+ status = be_mcc_queues_create(adapter);
+ if (status != 0)
+ goto err;
+
+ memset(mac, 0, ETH_ALEN);
+ status = be_cmd_mac_addr_query(adapter, mac, MAC_ADDRESS_TYPE_NETWORK,
+ true /*permanent */, 0);
+ if (status)
+ return status;
+ memcpy(adapter->netdev->dev_addr, mac, ETH_ALEN);
+ memcpy(adapter->netdev->perm_addr, mac, ETH_ALEN);
+
+ en_flags = BE_IF_FLAGS_UNTAGGED | BE_IF_FLAGS_BROADCAST |
+ BE_IF_FLAGS_MULTICAST | BE_IF_FLAGS_PASS_L3L4_ERRORS;
+ cap_flags = en_flags | BE_IF_FLAGS_MCAST_PROMISCUOUS |
+ BE_IF_FLAGS_PROMISCUOUS;
+ if (adapter->function_caps & BE_FUNCTION_CAPS_RSS) {
+ cap_flags |= BE_IF_FLAGS_RSS;
+ en_flags |= BE_IF_FLAGS_RSS;
}
-
status = be_cmd_if_create(adapter, cap_flags, en_flags,
- netdev->dev_addr, false/* pmac_invalid */,
- &adapter->if_handle, &adapter->pmac_id, 0);
+ netdev->dev_addr, &adapter->if_handle,
+ &adapter->pmac_id, 0);
if (status != 0)
goto err;
-
- if (be_physfn(adapter)) {
- if (adapter->sriov_enabled) {
- while (vf < num_vfs) {
- cap_flags = en_flags = BE_IF_FLAGS_UNTAGGED |
- BE_IF_FLAGS_BROADCAST;
- status = be_cmd_if_create(adapter, cap_flags,
- en_flags, mac, true,
- &adapter->vf_cfg[vf].vf_if_handle,
- NULL, vf+1);
- if (status) {
- dev_err(&adapter->pdev->dev,
- "Interface Create failed for VF %d\n",
- vf);
- goto err;
- }
- adapter->vf_cfg[vf].vf_pmac_id =
- BE_INVALID_PMAC_ID;
- vf++;
- }
- }
- } else {
+
+ /* For BEx, the VF's permanent mac queried from card is incorrect.
+ * Query the mac configued by the PF using if_handle
+ */
+ if (!be_physfn(adapter) && !lancer_chip(adapter)) {
status = be_cmd_mac_addr_query(adapter, mac,
MAC_ADDRESS_TYPE_NETWORK, false, adapter->if_handle);
if (!status) {
@@ -2529,23 +2568,6 @@ static int be_setup(struct be_adapter *adapter)
}
}
- status = be_tx_queues_create(adapter);
- if (status != 0)
- goto err;
-
- status = be_rx_queues_create(adapter);
- if (status != 0)
- goto err;
-
- /* Allow all priorities by default. A GRP5 evt may modify this */
- adapter->vlan_prio_bmap = 0xff;
-
- status = be_mcc_queues_create(adapter);
- if (status != 0)
- goto err;
-
- adapter->link_speed = -1;
-
be_cmd_get_fw_ver(adapter, adapter->fw_ver, NULL);
status = be_vid_config(adapter, false, 0);
@@ -2565,8 +2587,14 @@ static int be_setup(struct be_adapter *adapter)
}
pcie_set_readrq(adapter->pdev, 4096);
- return 0;
+ if (be_physfn(adapter) && adapter->sriov_enabled) {
+ status = be_vf_setup(adapter);
+ if (status)
+ goto err;
+ }
+
+ return 0;
err:
be_clear(adapter);
return status;
@@ -3123,7 +3151,6 @@ static void __devexit be_remove(struct pci_dev *pdev)
be_ctrl_cleanup(adapter);
- kfree(adapter->vf_cfg);
be_sriov_disable(adapter);
be_msix_disable(adapter);
@@ -3138,30 +3165,12 @@ static void __devexit be_remove(struct pci_dev *pdev)
static int be_get_config(struct be_adapter *adapter)
{
int status;
- u8 mac[ETH_ALEN];
status = be_cmd_query_fw_cfg(adapter, &adapter->port_num,
&adapter->function_mode, &adapter->function_caps);
if (status)
return status;
- memset(mac, 0, ETH_ALEN);
-
- /* A default permanent address is given to each VF for Lancer*/
- if (be_physfn(adapter) || lancer_chip(adapter)) {
- status = be_cmd_mac_addr_query(adapter, mac,
- MAC_ADDRESS_TYPE_NETWORK, true /*permanent */, 0);
-
- if (status)
- return status;
-
- if (!is_valid_ether_addr(mac))
- return -EADDRNOTAVAIL;
-
- memcpy(adapter->netdev->dev_addr, mac, ETH_ALEN);
- memcpy(adapter->netdev->perm_addr, mac, ETH_ALEN);
- }
-
if (adapter->function_mode & 0x400)
adapter->max_vlans = BE_NUM_VLANS_SUPPORTED/4;
else
@@ -3310,18 +3319,13 @@ static int __devinit be_probe(struct pci_dev *pdev,
}
}
- be_sriov_enable(adapter);
- if (adapter->sriov_enabled) {
- adapter->vf_cfg = kcalloc(num_vfs,
- sizeof(struct be_vf_cfg), GFP_KERNEL);
-
- if (!adapter->vf_cfg)
- goto free_netdev;
- }
+ status = be_sriov_enable(adapter);
+ if (status)
+ goto free_netdev;
status = be_ctrl_init(adapter);
if (status)
- goto free_vf_cfg;
+ goto disable_sriov;
if (lancer_chip(adapter)) {
status = lancer_test_and_set_rdy_state(adapter);
@@ -3375,33 +3379,11 @@ static int __devinit be_probe(struct pci_dev *pdev,
if (status != 0)
goto unsetup;
- if (be_physfn(adapter) && adapter->sriov_enabled) {
- u8 mac_speed;
- u16 vf, lnk_speed;
-
- if (!lancer_chip(adapter)) {
- status = be_vf_eth_addr_config(adapter);
- if (status)
- goto unreg_netdev;
- }
-
- for (vf = 0; vf < num_vfs; vf++) {
- status = be_cmd_link_status_query(adapter, &mac_speed,
- &lnk_speed, vf + 1);
- if (!status)
- adapter->vf_cfg[vf].vf_tx_rate = lnk_speed * 10;
- else
- goto unreg_netdev;
- }
- }
-
dev_info(&pdev->dev, "%s port %d\n", nic_name(pdev), adapter->port_num);
schedule_delayed_work(&adapter->work, msecs_to_jiffies(100));
return 0;
-unreg_netdev:
- unregister_netdev(netdev);
unsetup:
be_clear(adapter);
msix_disable:
@@ -3410,10 +3392,9 @@ stats_clean:
be_stats_cleanup(adapter);
ctrl_clean:
be_ctrl_cleanup(adapter);
-free_vf_cfg:
- kfree(adapter->vf_cfg);
+disable_sriov:
+ be_sriov_disable(adapter);
free_netdev:
- be_sriov_disable(adapter);
free_netdev(netdev);
pci_set_drvdata(pdev, NULL);
rel_reg:
--
1.7.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox