* Re: [PATCH net -v2] [BUGFIX] bonding: use local function pointer of bond->recv_probe in bond_handle_frame
From: Jay Vosburgh @ 2011-10-19 17:59 UTC (permalink / raw)
To: David Miller
Cc: mitsuo.hayasaka.hu, andy, netdev, linux-kernel, yrl.pp-manager.tt,
eric.dumazet, xiyou.wangcong
In-Reply-To: <20111019.000311.1490092497677136273.davem@davemloft.net>
David Miller <davem@davemloft.net> wrote:
>From: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
>Date: Thu, 13 Oct 2011 11:04:29 +0900
>
>> The bond->recv_probe is called in bond_handle_frame() when
>> a packet is received, but bond_close() sets it to NULL. So,
>> a panic occurs when both functions work in parallel.
>>
>> Why this happen:
>> After null pointer check of bond->recv_probe, an sk_buff is
>> duplicated and bond->recv_probe is called in bond_handle_frame.
>> So, a panic occurs when bond_close() is called between the
>> check and call of bond->recv_probe.
>>
>> Patch:
>> This patch uses a local function pointer of bond->recv_probe
>> in bond_handle_frame(). So, it can avoid the null pointer
>> dereference.
>>
>>
>> Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
>> Cc: Jay Vosburgh <fubar@us.ibm.com>
>> Cc: Andy Gospodarek <andy@greyhouse.net>
>> Cc: Eric Dumazet <eric.dumazet@gmail.com>
>> Cc: WANG Cong <xiyou.wangcong@gmail.com>
>
>Bonding folks please review this, thanks.
>
Looks reasonable. Even if by some quirk of timing the
recv_probe function ends up being entered after bond_close has
completed, it doesn't look like there is a risk of those functions
misbehaving (because bond_close doesn't deallocate the data structures).
-J
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply
* Re: [patch net-next-2.6] net: introduce ethernet teaming device
From: Jiri Pirko @ 2011-10-19 17:39 UTC (permalink / raw)
To: Benjamin Poirier
Cc: netdev, davem, eric.dumazet, bhutchings, shemminger, fubar, andy,
tgraf, ebiederm, mirqus, kaber, greearb, jesse
In-Reply-To: <20111019172624.GB21324@synalogic.ca>
Wed, Oct 19, 2011 at 07:26:24PM CEST, benjamin.poirier@gmail.com wrote:
>Hi Jiri, just a few late comments:
>
>On 11/10/04 16:15, Jiri Pirko wrote:
>> This patch introduces new network device called team. It supposes to be
>> very fast, simple, userspace-driven alternative to existing bonding
>> driver.
>>
>> Userspace library called libteam with couple of demo apps is available
>> here:
>> https://github.com/jpirko/libteam
>> Note it's still in its dipers atm.
>>
>> team<->libteam use generic netlink for communication. That and rtnl
>> suppose to be the only way to configure team device, no sysfs etc.
>>
>> In near future python binding for libteam will be introduced. Also
>> daemon providing arpmon/miimon active-backup functionality will
>> be introduced. All what's necessary is already implemented in kernel team
>> driver.
>>
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
>[...]
>
>> +/******************************
>> + * Round-robin mode definition
>> + ******************************/
>> +
>> +static struct team_port *__get_first_port_up(struct team *team,
>> + struct team_port *port)
>
>This is more like __get_"next"_port_up() no?
Might be.
>
>> +{
>> + struct team_port *cur;
>> +
>> + if (port->linkup)
>> + return port;
>> + cur = port;
>> + list_for_each_entry_continue_rcu(cur, &team->port_list, list)
>> + if (cur->linkup)
>> + return cur;
>> + list_for_each_entry_rcu(cur, &team->port_list, list) {
>> + if (cur == port)
>> + break;
>> + if (cur->linkup)
>> + return cur;
>> + }
>> + return NULL;
>> +}
>> +
>
>[...]
>
>> +
>> +
>> +/****************
>> + * Mode handling
>> + ****************/
>> +
>> +static const struct team_mode *team_modes[] = {
>> + &rr_mode,
>> + &ab_mode,
>> +};
>> +
>> +static const int team_mode_count = ARRAY_SIZE(team_modes);
>> +
>> +static int team_find_mode(const char *kind)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < team_mode_count; i++) {
>> + const struct team_mode *mode = team_modes[i];
>> +
>> + if (strcmp(mode->kind, kind) == 0)
>> + return i;
>> + }
>> + return -ENOENT;
>> +}
>> +
>> +/*
>> + * We can benefit from the fact that it's ensured no port is present
>> + * at the time of mode change.
>> + */
>> +static void __team_change_mode(struct team *team, const int mode_index)
>> +{
>> + const struct team_mode *mode = team_modes[mode_index];
>
>team_uninit() calls __team_change_mode(team, -1) which will therefore
>dereference team_modes[-1]. Is this always safe?
I changed this bits. New patch is coming soon...
Thanks.
Jirka
>
>-Ben
^ permalink raw reply
* Linux Kernel Development Training
From: Eduardo Panisset @ 2011-10-19 17:27 UTC (permalink / raw)
To: netdev
Hi Guys,
Sorry if this is not the most appropriated place to ask, but on other
hand here is the hot stop for all real network development and what
I'm looking for has a lot of to do with it.
I would like to ask you guys where I could find a amazing training
about Linux Kernel Development, Linux Device Driver Development and in
particular Linux Network Development, including wifi 802.11 stack
implementation.
I'm a software engineer at Nokia Institute of Technology/Brazil and
here we have already touched the linux kernel code, making some
modifications on the IP/IPv6 layer and also reporting bugs to this
list but I would like to acquire a more consistent and organized
knowlege about all those things.
Next year we will be involved with a great deal of this kind of
development on our next projects.
Thanks in Advance,
Eduardo Panisset.
^ permalink raw reply
* Re: [patch net-next-2.6] net: introduce ethernet teaming device
From: Benjamin Poirier @ 2011-10-19 17:26 UTC (permalink / raw)
To: Jiri Pirko
Cc: netdev, davem, eric.dumazet, bhutchings, shemminger, fubar, andy,
tgraf, ebiederm, mirqus, kaber, greearb, jesse
In-Reply-To: <1317737703-19457-1-git-send-email-jpirko@redhat.com>
Hi Jiri, just a few late comments:
On 11/10/04 16:15, Jiri Pirko wrote:
> This patch introduces new network device called team. It supposes to be
> very fast, simple, userspace-driven alternative to existing bonding
> driver.
>
> Userspace library called libteam with couple of demo apps is available
> here:
> https://github.com/jpirko/libteam
> Note it's still in its dipers atm.
>
> team<->libteam use generic netlink for communication. That and rtnl
> suppose to be the only way to configure team device, no sysfs etc.
>
> In near future python binding for libteam will be introduced. Also
> daemon providing arpmon/miimon active-backup functionality will
> be introduced. All what's necessary is already implemented in kernel team
> driver.
>
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
[...]
> +/******************************
> + * Round-robin mode definition
> + ******************************/
> +
> +static struct team_port *__get_first_port_up(struct team *team,
> + struct team_port *port)
This is more like __get_"next"_port_up() no?
> +{
> + struct team_port *cur;
> +
> + if (port->linkup)
> + return port;
> + cur = port;
> + list_for_each_entry_continue_rcu(cur, &team->port_list, list)
> + if (cur->linkup)
> + return cur;
> + list_for_each_entry_rcu(cur, &team->port_list, list) {
> + if (cur == port)
> + break;
> + if (cur->linkup)
> + return cur;
> + }
> + return NULL;
> +}
> +
[...]
> +
> +
> +/****************
> + * Mode handling
> + ****************/
> +
> +static const struct team_mode *team_modes[] = {
> + &rr_mode,
> + &ab_mode,
> +};
> +
> +static const int team_mode_count = ARRAY_SIZE(team_modes);
> +
> +static int team_find_mode(const char *kind)
> +{
> + int i;
> +
> + for (i = 0; i < team_mode_count; i++) {
> + const struct team_mode *mode = team_modes[i];
> +
> + if (strcmp(mode->kind, kind) == 0)
> + return i;
> + }
> + return -ENOENT;
> +}
> +
> +/*
> + * We can benefit from the fact that it's ensured no port is present
> + * at the time of mode change.
> + */
> +static void __team_change_mode(struct team *team, const int mode_index)
> +{
> + const struct team_mode *mode = team_modes[mode_index];
team_uninit() calls __team_change_mode(team, -1) which will therefore
dereference team_modes[-1]. Is this always safe?
-Ben
^ permalink raw reply
* EMAIL VERIFICATION.
From: Help Desk @ 2011-10-19 18:25 UTC (permalink / raw)
--
CLICK REPLY BEFORE FILLING DETAILS
Attention:
An Attempt has been made to login from a new computer. For the security of
your account, we are poised to open a query. Kindly verify your login
details by responding to this email and providing your Username/ID {_______}
Password {_______} Alternate Password {_______} in the spaces.
Do not ignore this message to avoid termination of your webmail account.
^ permalink raw reply
* Re: [PATCH] Disable TCP_DEBUG and FASTRETRANS_DEBUG by default
From: Flavio Leitner @ 2011-10-19 17:16 UTC (permalink / raw)
To: David Miller
Cc: dpmcgee, netdev, kuznet, jmorris, yoshfuji, kaber, linux-kernel
In-Reply-To: <20111017.175238.18474279385644215.davem@davemloft.net>
On Mon, 17 Oct 2011 17:52:38 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:
> From: Dan McGee <dpmcgee@gmail.com>
> Date: Mon, 17 Oct 2011 15:25:24 -0500
>
> > If these are truly debug options, they should be turned off by default
> > and can be tweaked if necessary. Fix one usage of the flag to use #if
> > instead of #ifdef so defining to zero is acceptable.
> >
> > Signed-off-by: Dan McGee <dpmcgee@gmail.com>
>
> Illegal window shrinks are a serious issue, and the fact that everyone
> will see those messages by default and sometimes report them has been
> tremendously useful.
>
Agreed. I recently got bug report because of that message. It is useful
and doesn't disturbe when network is fine, so please don't remove it.
fbl
^ permalink raw reply
* [PATCH 10/10] RDMA/cxgb4: Add query_qp support in driver to query the qp state before flushing.
From: Vipul Pandya @ 2011-10-19 17:11 UTC (permalink / raw)
To: linux-rdma, netdev; +Cc: roland, davem, divy, dm, kumaras, swise, Vipul Pandya
In-Reply-To: <1319044264-779-1-git-send-email-vipul@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 19 +++++++++++++++++++
drivers/infiniband/hw/cxgb4/provider.c | 2 ++
drivers/infiniband/hw/cxgb4/qp.c | 11 +++++++++++
3 files changed, 32 insertions(+), 0 deletions(-)
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 9e26b76..8aa1744 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -550,6 +550,23 @@ static inline int c4iw_convert_state(enum ib_qp_state ib_state)
}
}
+static inline int to_ib_qp_state(int c4iw_qp_state)
+{
+ switch (c4iw_qp_state) {
+ case C4IW_QP_STATE_IDLE:
+ return IB_QPS_INIT;
+ case C4IW_QP_STATE_RTS:
+ return IB_QPS_RTS;
+ case C4IW_QP_STATE_CLOSING:
+ return IB_QPS_SQD;
+ case C4IW_QP_STATE_TERMINATE:
+ return IB_QPS_SQE;
+ case C4IW_QP_STATE_ERROR:
+ return IB_QPS_ERR;
+ }
+ return IB_QPS_ERR;
+}
+
static inline u32 c4iw_ib_to_tpt_access(int a)
{
return (a & IB_ACCESS_REMOTE_WRITE ? FW_RI_MEM_ACCESS_REM_WRITE : 0) |
@@ -845,6 +862,8 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd,
struct ib_udata *udata);
int c4iw_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
int attr_mask, struct ib_udata *udata);
+int c4iw_ib_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
+ int attr_mask, struct ib_qp_init_attr *init_attr);
struct ib_qp *c4iw_get_qp(struct ib_device *dev, int qpn);
u32 c4iw_rqtpool_alloc(struct c4iw_rdev *rdev, int size);
void c4iw_rqtpool_free(struct c4iw_rdev *rdev, u32 addr, int size);
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index db1debc..3cb1128 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -443,6 +443,7 @@ int c4iw_register_device(struct c4iw_dev *dev)
(1ull << IB_USER_VERBS_CMD_REQ_NOTIFY_CQ) |
(1ull << IB_USER_VERBS_CMD_CREATE_QP) |
(1ull << IB_USER_VERBS_CMD_MODIFY_QP) |
+ (1ull << IB_USER_VERBS_CMD_QUERY_QP) |
(1ull << IB_USER_VERBS_CMD_POLL_CQ) |
(1ull << IB_USER_VERBS_CMD_DESTROY_QP) |
(1ull << IB_USER_VERBS_CMD_POST_SEND) |
@@ -465,6 +466,7 @@ int c4iw_register_device(struct c4iw_dev *dev)
dev->ibdev.destroy_ah = c4iw_ah_destroy;
dev->ibdev.create_qp = c4iw_create_qp;
dev->ibdev.modify_qp = c4iw_ib_modify_qp;
+ dev->ibdev.query_qp = c4iw_ib_query_qp;
dev->ibdev.destroy_qp = c4iw_destroy_qp;
dev->ibdev.create_cq = c4iw_create_cq;
dev->ibdev.destroy_cq = c4iw_destroy_cq;
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index 215b66a..a7b8713 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -1696,3 +1696,14 @@ struct ib_qp *c4iw_get_qp(struct ib_device *dev, int qpn)
PDBG("%s ib_dev %p qpn 0x%x\n", __func__, dev, qpn);
return (struct ib_qp *)get_qhp(to_c4iw_dev(dev), qpn);
}
+
+int c4iw_ib_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
+ int attr_mask, struct ib_qp_init_attr *init_attr)
+{
+ struct c4iw_qp *qhp = to_c4iw_qp(ibqp);
+
+ memset(attr, 0, sizeof *attr);
+ memset(init_attr, 0, sizeof *init_attr);
+ attr->qp_state = to_ib_qp_state(qhp->attr.state);
+ return 0;
+}
--
1.7.1
^ permalink raw reply related
* [PATCH 09/10] RDMA/cxgb4: remove kfifo usage
From: Vipul Pandya @ 2011-10-19 17:11 UTC (permalink / raw)
To: linux-rdma, netdev; +Cc: roland, davem, divy, dm, kumaras, swise, Vipul Pandya
In-Reply-To: <1319044264-779-1-git-send-email-vipul@chelsio.com>
Using kfifos for ID management was limiting the number of QPs and
preventing NP384 MPI jobs. So replace it with a simple bitmap
allocator.
Remove IDs from the IDR tables before deallocating them. This bug was
causing the BUG_ON() in insert_handle() to fire because the ID was
getting reused before being removed from the IDR table.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
drivers/infiniband/hw/cxgb4/Makefile | 2 +-
drivers/infiniband/hw/cxgb4/device.c | 37 +++++---
drivers/infiniband/hw/cxgb4/id_table.c | 112 ++++++++++++++++++++++++
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 35 ++++++--
drivers/infiniband/hw/cxgb4/mem.c | 10 +--
drivers/infiniband/hw/cxgb4/provider.c | 9 +--
drivers/infiniband/hw/cxgb4/resource.c | 148 ++++++++------------------------
7 files changed, 203 insertions(+), 150 deletions(-)
create mode 100644 drivers/infiniband/hw/cxgb4/id_table.c
diff --git a/drivers/infiniband/hw/cxgb4/Makefile b/drivers/infiniband/hw/cxgb4/Makefile
index 46b878c..e11cf72 100644
--- a/drivers/infiniband/hw/cxgb4/Makefile
+++ b/drivers/infiniband/hw/cxgb4/Makefile
@@ -2,4 +2,4 @@ ccflags-y := -Idrivers/net/ethernet/chelsio/cxgb4
obj-$(CONFIG_INFINIBAND_CXGB4) += iw_cxgb4.o
-iw_cxgb4-y := device.o cm.o provider.o mem.o cq.o qp.o resource.o ev.o
+iw_cxgb4-y := device.o cm.o provider.o mem.o cq.o qp.o resource.o ev.o id_table.o
diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 8545629..c8fd1d8 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -252,25 +252,26 @@ static int stats_show(struct seq_file *seq, void *v)
{
struct c4iw_dev *dev = seq->private;
- seq_printf(seq, " Object: %10s %10s %10s\n", "Total", "Current", "Max");
- seq_printf(seq, " PDID: %10llu %10llu %10llu\n",
+ seq_printf(seq, " Object: %10s %10s %10s %10s\n", "Total", "Current",
+ "Max", "Fail");
+ seq_printf(seq, " PDID: %10llu %10llu %10llu %10llu\n",
dev->rdev.stats.pd.total, dev->rdev.stats.pd.cur,
- dev->rdev.stats.pd.max);
- seq_printf(seq, " QID: %10llu %10llu %10llu\n",
+ dev->rdev.stats.pd.max, dev->rdev.stats.pd.fail);
+ seq_printf(seq, " QID: %10llu %10llu %10llu %10llu\n",
dev->rdev.stats.qid.total, dev->rdev.stats.qid.cur,
- dev->rdev.stats.qid.max);
- seq_printf(seq, " TPTMEM: %10llu %10llu %10llu\n",
+ dev->rdev.stats.qid.max, dev->rdev.stats.qid.fail);
+ seq_printf(seq, " TPTMEM: %10llu %10llu %10llu %10llu\n",
dev->rdev.stats.stag.total, dev->rdev.stats.stag.cur,
- dev->rdev.stats.stag.max);
- seq_printf(seq, " PBLMEM: %10llu %10llu %10llu\n",
+ dev->rdev.stats.stag.max, dev->rdev.stats.stag.fail);
+ seq_printf(seq, " PBLMEM: %10llu %10llu %10llu %10llu\n",
dev->rdev.stats.pbl.total, dev->rdev.stats.pbl.cur,
- dev->rdev.stats.pbl.max);
- seq_printf(seq, " RQTMEM: %10llu %10llu %10llu\n",
+ dev->rdev.stats.pbl.max, dev->rdev.stats.pbl.fail);
+ seq_printf(seq, " RQTMEM: %10llu %10llu %10llu %10llu\n",
dev->rdev.stats.rqt.total, dev->rdev.stats.rqt.cur,
- dev->rdev.stats.rqt.max);
- seq_printf(seq, " OCQPMEM: %10llu %10llu %10llu\n",
+ dev->rdev.stats.rqt.max, dev->rdev.stats.rqt.fail);
+ seq_printf(seq, " OCQPMEM: %10llu %10llu %10llu %10llu\n",
dev->rdev.stats.ocqp.total, dev->rdev.stats.ocqp.cur,
- dev->rdev.stats.ocqp.max);
+ dev->rdev.stats.ocqp.max, dev->rdev.stats.ocqp.fail);
seq_printf(seq, " DB FULL: %10llu\n", dev->rdev.stats.db_full);
seq_printf(seq, " DB EMPTY: %10llu\n", dev->rdev.stats.db_empty);
seq_printf(seq, " DB DROP: %10llu\n", dev->rdev.stats.db_drop);
@@ -292,11 +293,17 @@ static ssize_t stats_clear(struct file *file, const char __user *buf,
mutex_lock(&dev->rdev.stats.lock);
dev->rdev.stats.pd.max = 0;
+ dev->rdev.stats.pd.fail = 0;
dev->rdev.stats.qid.max = 0;
+ dev->rdev.stats.qid.fail = 0;
dev->rdev.stats.stag.max = 0;
+ dev->rdev.stats.stag.fail = 0;
dev->rdev.stats.pbl.max = 0;
+ dev->rdev.stats.pbl.fail = 0;
dev->rdev.stats.rqt.max = 0;
+ dev->rdev.stats.rqt.fail = 0;
dev->rdev.stats.ocqp.max = 0;
+ dev->rdev.stats.ocqp.fail = 0;
dev->rdev.stats.db_full = 0;
dev->rdev.stats.db_empty = 0;
dev->rdev.stats.db_drop = 0;
@@ -350,8 +357,8 @@ void c4iw_release_dev_ucontext(struct c4iw_rdev *rdev,
entry = list_entry(pos, struct c4iw_qid_list, entry);
list_del_init(&entry->entry);
if (!(entry->qid & rdev->qpmask)) {
- c4iw_put_resource(&rdev->resource.qid_fifo, entry->qid,
- &rdev->resource.qid_fifo_lock);
+ c4iw_put_resource(&rdev->resource.qid_table,
+ entry->qid);
mutex_lock(&rdev->stats.lock);
rdev->stats.qid.cur -= rdev->qpmask + 1;
mutex_unlock(&rdev->stats.lock);
diff --git a/drivers/infiniband/hw/cxgb4/id_table.c b/drivers/infiniband/hw/cxgb4/id_table.c
new file mode 100644
index 0000000..f95e5df
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb4/id_table.c
@@ -0,0 +1,112 @@
+/*
+ * Copyright (c) 2011 Chelsio Communications. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include <linux/kernel.h>
+#include <linux/random.h>
+#include "iw_cxgb4.h"
+
+#define RANDOM_SKIP 16
+
+/*
+ * Trivial bitmap-based allocator. If the random flag is set, the
+ * allocator is designed to:
+ * - pseudo-randomize the id returned such that it is not trivially predictable.
+ * - avoid reuse of recently used id (at the expense of predictability)
+ */
+u32 c4iw_id_alloc(struct c4iw_id_table *alloc)
+{
+ unsigned long flags;
+ u32 obj;
+
+ spin_lock_irqsave(&alloc->lock, flags);
+
+ obj = find_next_zero_bit(alloc->table, alloc->max, alloc->last);
+ if (obj >= alloc->max)
+ obj = find_first_zero_bit(alloc->table, alloc->max);
+
+ if (obj < alloc->max) {
+ if (alloc->flags & C4IW_ID_TABLE_F_RANDOM)
+ alloc->last += random32() % RANDOM_SKIP;
+ else
+ alloc->last = obj + 1;
+ if (alloc->last >= alloc->max)
+ alloc->last = 0;
+ set_bit(obj, alloc->table);
+ obj += alloc->start;
+ } else
+ obj = -1;
+
+ spin_unlock_irqrestore(&alloc->lock, flags);
+ return obj;
+}
+
+void c4iw_id_free(struct c4iw_id_table *alloc, u32 obj)
+{
+ unsigned long flags;
+
+ obj -= alloc->start;
+ BUG_ON((int)obj < 0);
+
+ spin_lock_irqsave(&alloc->lock, flags);
+ clear_bit(obj, alloc->table);
+ spin_unlock_irqrestore(&alloc->lock, flags);
+}
+
+int c4iw_id_table_alloc(struct c4iw_id_table *alloc, u32 start, u32 num,
+ u32 reserved, u32 flags)
+{
+ int i;
+
+ alloc->start = start;
+ alloc->flags = flags;
+ if (flags & C4IW_ID_TABLE_F_RANDOM)
+ alloc->last = random32() % RANDOM_SKIP;
+ else
+ alloc->last = 0;
+ alloc->max = num;
+ spin_lock_init(&alloc->lock);
+ alloc->table = kmalloc(BITS_TO_LONGS(num) * sizeof(long),
+ GFP_KERNEL);
+ if (!alloc->table)
+ return -ENOMEM;
+
+ bitmap_zero(alloc->table, num);
+ if (!(alloc->flags & C4IW_ID_TABLE_F_EMPTY))
+ for (i = 0; i < reserved; ++i)
+ set_bit(i, alloc->table);
+
+ return 0;
+}
+
+void c4iw_id_table_free(struct c4iw_id_table *alloc)
+{
+ kfree(alloc->table);
+}
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 75d643c..9e26b76 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -45,7 +45,6 @@
#include <linux/kref.h>
#include <linux/timer.h>
#include <linux/io.h>
-#include <linux/kfifo.h>
#include <asm/byteorder.h>
@@ -79,13 +78,22 @@ static inline void *cplhdr(struct sk_buff *skb)
return skb->data;
}
+#define C4IW_ID_TABLE_F_RANDOM 1 /* Pseudo-randomize the id's returned */
+#define C4IW_ID_TABLE_F_EMPTY 2 /* Table is initially empty */
+
+struct c4iw_id_table {
+ u32 flags;
+ u32 start; /* logical minimal id */
+ u32 last; /* hint for find */
+ u32 max;
+ spinlock_t lock;
+ unsigned long *table;
+};
+
struct c4iw_resource {
- struct kfifo tpt_fifo;
- spinlock_t tpt_fifo_lock;
- struct kfifo qid_fifo;
- spinlock_t qid_fifo_lock;
- struct kfifo pdid_fifo;
- spinlock_t pdid_fifo_lock;
+ struct c4iw_id_table tpt_table;
+ struct c4iw_id_table qid_table;
+ struct c4iw_id_table pdid_table;
};
struct c4iw_qid_list {
@@ -107,6 +115,7 @@ struct c4iw_stat {
u64 total;
u64 cur;
u64 max;
+ u64 fail;
};
struct c4iw_stats {
@@ -253,7 +262,7 @@ static inline int _insert_handle(struct c4iw_dev *rhp, struct idr *idr,
if (lock)
spin_lock_irq(&rhp->lock);
ret = idr_get_new_above(idr, handle, id, &newid);
- BUG_ON(newid != id);
+ BUG_ON(!ret && newid != id);
if (lock)
spin_unlock_irq(&rhp->lock);
} while (ret == -EAGAIN);
@@ -754,14 +763,20 @@ static inline int compute_wscale(int win)
return wscale;
}
+u32 c4iw_id_alloc(struct c4iw_id_table *alloc);
+void c4iw_id_free(struct c4iw_id_table *alloc, u32 obj);
+int c4iw_id_table_alloc(struct c4iw_id_table *alloc, u32 start, u32 num,
+ u32 reserved, u32 flags);
+void c4iw_id_table_free(struct c4iw_id_table *alloc);
+
typedef int (*c4iw_handler_func)(struct c4iw_dev *dev, struct sk_buff *skb);
int c4iw_ep_redirect(void *ctx, struct dst_entry *old, struct dst_entry *new,
struct l2t_entry *l2t);
void c4iw_put_qpid(struct c4iw_rdev *rdev, u32 qpid,
struct c4iw_dev_ucontext *uctx);
-u32 c4iw_get_resource(struct kfifo *fifo, spinlock_t *lock);
-void c4iw_put_resource(struct kfifo *fifo, u32 entry, spinlock_t *lock);
+u32 c4iw_get_resource(struct c4iw_id_table *id_table);
+void c4iw_put_resource(struct c4iw_id_table *id_table, u32 entry);
int c4iw_init_resource(struct c4iw_rdev *rdev, u32 nr_tpt, u32 nr_pdid);
int c4iw_init_ctrl_qp(struct c4iw_rdev *rdev);
int c4iw_pblpool_create(struct c4iw_rdev *rdev);
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 2a87379..57e07c6 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -131,8 +131,7 @@ static int write_tpt_entry(struct c4iw_rdev *rdev, u32 reset_tpt_entry,
stag_idx = (*stag) >> 8;
if ((!reset_tpt_entry) && (*stag == T4_STAG_UNSET)) {
- stag_idx = c4iw_get_resource(&rdev->resource.tpt_fifo,
- &rdev->resource.tpt_fifo_lock);
+ stag_idx = c4iw_get_resource(&rdev->resource.tpt_table);
if (!stag_idx)
return -ENOMEM;
mutex_lock(&rdev->stats.lock);
@@ -171,8 +170,7 @@ static int write_tpt_entry(struct c4iw_rdev *rdev, u32 reset_tpt_entry,
sizeof(tpt), &tpt);
if (reset_tpt_entry) {
- c4iw_put_resource(&rdev->resource.tpt_fifo, stag_idx,
- &rdev->resource.tpt_fifo_lock);
+ c4iw_put_resource(&rdev->resource.tpt_table, stag_idx);
mutex_lock(&rdev->stats.lock);
rdev->stats.stag.cur -= 32;
mutex_unlock(&rdev->stats.lock);
@@ -695,8 +693,8 @@ int c4iw_dealloc_mw(struct ib_mw *mw)
mhp = to_c4iw_mw(mw);
rhp = mhp->rhp;
mmid = (mw->rkey) >> 8;
- deallocate_window(&rhp->rdev, mhp->attr.stag);
remove_handle(rhp, &rhp->mmidr, mmid);
+ deallocate_window(&rhp->rdev, mhp->attr.stag);
kfree(mhp);
PDBG("%s ib_mw %p mmid 0x%x ptr %p\n", __func__, mw, mmid, mhp);
return 0;
@@ -798,12 +796,12 @@ int c4iw_dereg_mr(struct ib_mr *ib_mr)
mhp = to_c4iw_mr(ib_mr);
rhp = mhp->rhp;
mmid = mhp->attr.stag >> 8;
+ remove_handle(rhp, &rhp->mmidr, mmid);
dereg_mem(&rhp->rdev, mhp->attr.stag, mhp->attr.pbl_size,
mhp->attr.pbl_addr);
if (mhp->attr.pbl_size)
c4iw_pblpool_free(&mhp->rhp->rdev, mhp->attr.pbl_addr,
mhp->attr.pbl_size << 3);
- remove_handle(rhp, &rhp->mmidr, mmid);
if (mhp->kva)
kfree((void *) (unsigned long) mhp->kva);
if (mhp->umem)
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index c2554ef..db1debc 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -188,8 +188,7 @@ static int c4iw_deallocate_pd(struct ib_pd *pd)
php = to_c4iw_pd(pd);
rhp = php->rhp;
PDBG("%s ibpd %p pdid 0x%x\n", __func__, pd, php->pdid);
- c4iw_put_resource(&rhp->rdev.resource.pdid_fifo, php->pdid,
- &rhp->rdev.resource.pdid_fifo_lock);
+ c4iw_put_resource(&rhp->rdev.resource.pdid_table, php->pdid);
mutex_lock(&rhp->rdev.stats.lock);
rhp->rdev.stats.pd.cur--;
mutex_unlock(&rhp->rdev.stats.lock);
@@ -207,14 +206,12 @@ static struct ib_pd *c4iw_allocate_pd(struct ib_device *ibdev,
PDBG("%s ibdev %p\n", __func__, ibdev);
rhp = (struct c4iw_dev *) ibdev;
- pdid = c4iw_get_resource(&rhp->rdev.resource.pdid_fifo,
- &rhp->rdev.resource.pdid_fifo_lock);
+ pdid = c4iw_get_resource(&rhp->rdev.resource.pdid_table);
if (!pdid)
return ERR_PTR(-EINVAL);
php = kzalloc(sizeof(*php), GFP_KERNEL);
if (!php) {
- c4iw_put_resource(&rhp->rdev.resource.pdid_fifo, pdid,
- &rhp->rdev.resource.pdid_fifo_lock);
+ c4iw_put_resource(&rhp->rdev.resource.pdid_table, pdid);
return ERR_PTR(-ENOMEM);
}
php->pdid = pdid;
diff --git a/drivers/infiniband/hw/cxgb4/resource.c b/drivers/infiniband/hw/cxgb4/resource.c
index 1b948d1..cdef4d7 100644
--- a/drivers/infiniband/hw/cxgb4/resource.c
+++ b/drivers/infiniband/hw/cxgb4/resource.c
@@ -30,96 +30,25 @@
* SOFTWARE.
*/
/* Crude resource management */
-#include <linux/kernel.h>
-#include <linux/random.h>
-#include <linux/slab.h>
-#include <linux/kfifo.h>
#include <linux/spinlock.h>
-#include <linux/errno.h>
#include <linux/genalloc.h>
#include <linux/ratelimit.h>
#include "iw_cxgb4.h"
-#define RANDOM_SIZE 16
-
-static int __c4iw_init_resource_fifo(struct kfifo *fifo,
- spinlock_t *fifo_lock,
- u32 nr, u32 skip_low,
- u32 skip_high,
- int random)
-{
- u32 i, j, entry = 0, idx;
- u32 random_bytes;
- u32 rarray[16];
- spin_lock_init(fifo_lock);
-
- if (kfifo_alloc(fifo, nr * sizeof(u32), GFP_KERNEL))
- return -ENOMEM;
-
- for (i = 0; i < skip_low + skip_high; i++)
- kfifo_in(fifo, (unsigned char *) &entry, sizeof(u32));
- if (random) {
- j = 0;
- random_bytes = random32();
- for (i = 0; i < RANDOM_SIZE; i++)
- rarray[i] = i + skip_low;
- for (i = skip_low + RANDOM_SIZE; i < nr - skip_high; i++) {
- if (j >= RANDOM_SIZE) {
- j = 0;
- random_bytes = random32();
- }
- idx = (random_bytes >> (j * 2)) & 0xF;
- kfifo_in(fifo,
- (unsigned char *) &rarray[idx],
- sizeof(u32));
- rarray[idx] = i;
- j++;
- }
- for (i = 0; i < RANDOM_SIZE; i++)
- kfifo_in(fifo,
- (unsigned char *) &rarray[i],
- sizeof(u32));
- } else
- for (i = skip_low; i < nr - skip_high; i++)
- kfifo_in(fifo, (unsigned char *) &i, sizeof(u32));
-
- for (i = 0; i < skip_low + skip_high; i++)
- if (kfifo_out_locked(fifo, (unsigned char *) &entry,
- sizeof(u32), fifo_lock))
- break;
- return 0;
-}
-
-static int c4iw_init_resource_fifo(struct kfifo *fifo, spinlock_t * fifo_lock,
- u32 nr, u32 skip_low, u32 skip_high)
-{
- return __c4iw_init_resource_fifo(fifo, fifo_lock, nr, skip_low,
- skip_high, 0);
-}
-
-static int c4iw_init_resource_fifo_random(struct kfifo *fifo,
- spinlock_t *fifo_lock,
- u32 nr, u32 skip_low, u32 skip_high)
-{
- return __c4iw_init_resource_fifo(fifo, fifo_lock, nr, skip_low,
- skip_high, 1);
-}
-
-static int c4iw_init_qid_fifo(struct c4iw_rdev *rdev)
+static int c4iw_init_qid_table(struct c4iw_rdev *rdev)
{
u32 i;
- spin_lock_init(&rdev->resource.qid_fifo_lock);
-
- if (kfifo_alloc(&rdev->resource.qid_fifo, rdev->lldi.vr->qp.size *
- sizeof(u32), GFP_KERNEL))
+ if (c4iw_id_table_alloc(&rdev->resource.qid_table,
+ rdev->lldi.vr->qp.start,
+ rdev->lldi.vr->qp.size,
+ rdev->lldi.vr->qp.size, 0))
return -ENOMEM;
for (i = rdev->lldi.vr->qp.start;
- i < rdev->lldi.vr->qp.start + rdev->lldi.vr->qp.size; i++)
+ i < rdev->lldi.vr->qp.start + rdev->lldi.vr->qp.size; i++)
if (!(i & rdev->qpmask))
- kfifo_in(&rdev->resource.qid_fifo,
- (unsigned char *) &i, sizeof(u32));
+ c4iw_id_free(&rdev->resource.qid_table, i);
return 0;
}
@@ -127,44 +56,42 @@ static int c4iw_init_qid_fifo(struct c4iw_rdev *rdev)
int c4iw_init_resource(struct c4iw_rdev *rdev, u32 nr_tpt, u32 nr_pdid)
{
int err = 0;
- err = c4iw_init_resource_fifo_random(&rdev->resource.tpt_fifo,
- &rdev->resource.tpt_fifo_lock,
- nr_tpt, 1, 0);
+ err = c4iw_id_table_alloc(&rdev->resource.tpt_table, 0, nr_tpt, 1,
+ C4IW_ID_TABLE_F_RANDOM);
if (err)
goto tpt_err;
- err = c4iw_init_qid_fifo(rdev);
+ err = c4iw_init_qid_table(rdev);
if (err)
goto qid_err;
- err = c4iw_init_resource_fifo(&rdev->resource.pdid_fifo,
- &rdev->resource.pdid_fifo_lock,
- nr_pdid, 1, 0);
+ err = c4iw_id_table_alloc(&rdev->resource.pdid_table, 0,
+ nr_pdid, 1, 0);
if (err)
goto pdid_err;
return 0;
-pdid_err:
- kfifo_free(&rdev->resource.qid_fifo);
-qid_err:
- kfifo_free(&rdev->resource.tpt_fifo);
-tpt_err:
+ pdid_err:
+ c4iw_id_table_free(&rdev->resource.qid_table);
+ qid_err:
+ c4iw_id_table_free(&rdev->resource.tpt_table);
+ tpt_err:
return -ENOMEM;
}
/*
* returns 0 if no resource available
*/
-u32 c4iw_get_resource(struct kfifo *fifo, spinlock_t *lock)
+u32 c4iw_get_resource(struct c4iw_id_table *id_table)
{
u32 entry;
- if (kfifo_out_locked(fifo, (unsigned char *) &entry, sizeof(u32), lock))
- return entry;
- else
+ entry = c4iw_id_alloc(id_table);
+ if (entry == (u32)(-1))
return 0;
+ return entry;
}
-void c4iw_put_resource(struct kfifo *fifo, u32 entry, spinlock_t *lock)
+void c4iw_put_resource(struct c4iw_id_table *id_table, u32 entry)
{
PDBG("%s entry 0x%x\n", __func__, entry);
- kfifo_in_locked(fifo, (unsigned char *) &entry, sizeof(u32), lock);
+ c4iw_id_free(id_table, entry);
}
u32 c4iw_get_cqid(struct c4iw_rdev *rdev, struct c4iw_dev_ucontext *uctx)
@@ -181,8 +108,7 @@ u32 c4iw_get_cqid(struct c4iw_rdev *rdev, struct c4iw_dev_ucontext *uctx)
qid = entry->qid;
kfree(entry);
} else {
- qid = c4iw_get_resource(&rdev->resource.qid_fifo,
- &rdev->resource.qid_fifo_lock);
+ qid = c4iw_get_resource(&rdev->resource.qid_table);
if (!qid)
goto out;
mutex_lock(&rdev->stats.lock);
@@ -252,8 +178,7 @@ u32 c4iw_get_qpid(struct c4iw_rdev *rdev, struct c4iw_dev_ucontext *uctx)
qid = entry->qid;
kfree(entry);
} else {
- qid = c4iw_get_resource(&rdev->resource.qid_fifo,
- &rdev->resource.qid_fifo_lock);
+ qid = c4iw_get_resource(&rdev->resource.qid_table);
if (!qid)
goto out;
mutex_lock(&rdev->stats.lock);
@@ -311,9 +236,9 @@ void c4iw_put_qpid(struct c4iw_rdev *rdev, u32 qid,
void c4iw_destroy_resource(struct c4iw_resource *rscp)
{
- kfifo_free(&rscp->tpt_fifo);
- kfifo_free(&rscp->qid_fifo);
- kfifo_free(&rscp->pdid_fifo);
+ c4iw_id_table_free(&rscp->tpt_table);
+ c4iw_id_table_free(&rscp->qid_table);
+ c4iw_id_table_free(&rscp->pdid_table);
}
/*
@@ -326,16 +251,14 @@ u32 c4iw_pblpool_alloc(struct c4iw_rdev *rdev, int size)
{
unsigned long addr = gen_pool_alloc(rdev->pbl_pool, size);
PDBG("%s addr 0x%x size %d\n", __func__, (u32)addr, size);
- if (!addr)
- printk_ratelimited(KERN_WARNING MOD "%s: Out of PBL memory\n",
- pci_name(rdev->lldi.pdev));
+ mutex_lock(&rdev->stats.lock);
if (addr) {
- mutex_lock(&rdev->stats.lock);
rdev->stats.pbl.cur += roundup(size, 1 << MIN_PBL_SHIFT);
if (rdev->stats.pbl.cur > rdev->stats.pbl.max)
rdev->stats.pbl.max = rdev->stats.pbl.cur;
- mutex_unlock(&rdev->stats.lock);
- }
+ } else
+ rdev->stats.pbl.fail++;
+ mutex_unlock(&rdev->stats.lock);
return (u32)addr;
}
@@ -401,13 +324,14 @@ u32 c4iw_rqtpool_alloc(struct c4iw_rdev *rdev, int size)
if (!addr)
printk_ratelimited(KERN_WARNING MOD "%s: Out of RQT memory\n",
pci_name(rdev->lldi.pdev));
+ mutex_lock(&rdev->stats.lock);
if (addr) {
- mutex_lock(&rdev->stats.lock);
rdev->stats.rqt.cur += roundup(size << 6, 1 << MIN_RQT_SHIFT);
if (rdev->stats.rqt.cur > rdev->stats.rqt.max)
rdev->stats.rqt.max = rdev->stats.rqt.cur;
- mutex_unlock(&rdev->stats.lock);
- }
+ } else
+ rdev->stats.rqt.fail++;
+ mutex_unlock(&rdev->stats.lock);
return (u32)addr;
}
--
1.7.1
^ permalink raw reply related
* [PATCH 08/10] RDMA/cxgb4: Use vmalloc for debugfs qp dump. Allows dumping thousands of qps.
From: Vipul Pandya @ 2011-10-19 17:11 UTC (permalink / raw)
To: linux-rdma, netdev; +Cc: roland, davem, divy, dm, kumaras, swise, Vipul Pandya
In-Reply-To: <1319044264-779-1-git-send-email-vipul@chelsio.com>
Log active open failures of interest.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
drivers/infiniband/hw/cxgb4/cm.c | 18 ++++++++++++++++++
drivers/infiniband/hw/cxgb4/device.c | 4 ++--
2 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index c51818a..3d87d02 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -1411,6 +1411,24 @@ static int act_open_rpl(struct c4iw_dev *dev, struct sk_buff *skb)
return 0;
}
+ /*
+ * Log interesting failures.
+ */
+ switch (status) {
+ case CPL_ERR_CONN_RESET:
+ case CPL_ERR_CONN_TIMEDOUT:
+ break;
+ default:
+ printk(KERN_INFO MOD "Active open failure - "
+ "atid %u status %u errno %d %pI4:%u->%pI4:%u\n",
+ atid, status, status2errno(status),
+ &ep->com.local_addr.sin_addr.s_addr,
+ ntohs(ep->com.local_addr.sin_port),
+ &ep->com.remote_addr.sin_addr.s_addr,
+ ntohs(ep->com.remote_addr.sin_port));
+ break;
+ }
+
connect_reply_upcall(ep, status2errno(status));
state_set(&ep->com, DEAD);
diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index bdb398f..8545629 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -121,7 +121,7 @@ static int qp_release(struct inode *inode, struct file *file)
printk(KERN_INFO "%s null qpd?\n", __func__);
return 0;
}
- kfree(qpd->buf);
+ vfree(qpd->buf);
kfree(qpd);
return 0;
}
@@ -145,7 +145,7 @@ static int qp_open(struct inode *inode, struct file *file)
spin_unlock_irq(&qpd->devp->lock);
qpd->bufsize = count * 128;
- qpd->buf = kmalloc(qpd->bufsize, GFP_KERNEL);
+ qpd->buf = vmalloc(qpd->bufsize);
if (!qpd->buf) {
ret = -ENOMEM;
goto err1;
--
1.7.1
^ permalink raw reply related
* [PATCH 07/10] RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues.
From: Vipul Pandya @ 2011-10-19 17:11 UTC (permalink / raw)
To: linux-rdma, netdev; +Cc: roland, davem, divy, dm, kumaras, swise, Vipul Pandya
In-Reply-To: <1319044264-779-1-git-send-email-vipul@chelsio.com>
- add module option db_fc_threshold which is the count of active QPs
that trigger automatic db flow control mode.
- automatically transition to/from flow control mode when the active qp
count crosses db_fc_theshold.
- add more db debugfs stats
- on DB DROP event from the LLD, recover all the iwarp queues.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
drivers/infiniband/hw/cxgb4/device.c | 176 ++++++++++++++++++++++++++++++-
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 24 ++++-
drivers/infiniband/hw/cxgb4/qp.c | 47 ++++++++-
drivers/infiniband/hw/cxgb4/t4.h | 24 +++++
4 files changed, 259 insertions(+), 12 deletions(-)
diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 9062ed9..bdb398f 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -246,6 +246,8 @@ static const struct file_operations stag_debugfs_fops = {
.llseek = default_llseek,
};
+static char *db_state_str[] = {"NORMAL", "FLOW_CONTROL", "RECOVERY"};
+
static int stats_show(struct seq_file *seq, void *v)
{
struct c4iw_dev *dev = seq->private;
@@ -272,6 +274,9 @@ static int stats_show(struct seq_file *seq, void *v)
seq_printf(seq, " DB FULL: %10llu\n", dev->rdev.stats.db_full);
seq_printf(seq, " DB EMPTY: %10llu\n", dev->rdev.stats.db_empty);
seq_printf(seq, " DB DROP: %10llu\n", dev->rdev.stats.db_drop);
+ seq_printf(seq, " DB State: %s Transitions %llu\n",
+ db_state_str[dev->db_state],
+ dev->rdev.stats.db_state_transitions);
return 0;
}
@@ -295,6 +300,7 @@ static ssize_t stats_clear(struct file *file, const char __user *buf,
dev->rdev.stats.db_full = 0;
dev->rdev.stats.db_empty = 0;
dev->rdev.stats.db_drop = 0;
+ dev->rdev.stats.db_state_transitions = 0;
mutex_unlock(&dev->rdev.stats.lock);
return count;
}
@@ -677,8 +683,11 @@ static int disable_qp_db(int id, void *p, void *data)
static void stop_queues(struct uld_ctx *ctx)
{
spin_lock_irq(&ctx->dev->lock);
- ctx->dev->db_state = FLOW_CONTROL;
- idr_for_each(&ctx->dev->qpidr, disable_qp_db, NULL);
+ if (ctx->dev->db_state == NORMAL) {
+ ctx->dev->rdev.stats.db_state_transitions++;
+ ctx->dev->db_state = FLOW_CONTROL;
+ idr_for_each(&ctx->dev->qpidr, disable_qp_db, NULL);
+ }
spin_unlock_irq(&ctx->dev->lock);
}
@@ -693,9 +702,165 @@ static int enable_qp_db(int id, void *p, void *data)
static void resume_queues(struct uld_ctx *ctx)
{
spin_lock_irq(&ctx->dev->lock);
- ctx->dev->db_state = NORMAL;
- idr_for_each(&ctx->dev->qpidr, enable_qp_db, NULL);
+ if (ctx->dev->qpcnt <= db_fc_threshold &&
+ ctx->dev->db_state == FLOW_CONTROL) {
+ ctx->dev->db_state = NORMAL;
+ ctx->dev->rdev.stats.db_state_transitions++;
+ idr_for_each(&ctx->dev->qpidr, enable_qp_db, NULL);
+ }
+ spin_unlock_irq(&ctx->dev->lock);
+}
+
+struct qp_list {
+ unsigned idx;
+ struct c4iw_qp **qps;
+};
+
+static int add_and_ref_qp(int id, void *p, void *data)
+{
+ struct qp_list *qp_listp = data;
+ struct c4iw_qp *qp = p;
+
+ c4iw_qp_add_ref(&qp->ibqp);
+ qp_listp->qps[qp_listp->idx++] = qp;
+ return 0;
+}
+
+static int count_qps(int id, void *p, void *data)
+{
+ unsigned *countp = data;
+ (*countp)++;
+ return 0;
+}
+
+static void deref_qps(struct qp_list qp_list)
+{
+ int idx;
+
+ for (idx = 0; idx < qp_list.idx; idx++)
+ c4iw_qp_rem_ref(&qp_list.qps[idx]->ibqp);
+}
+
+static void recover_lost_dbs(struct uld_ctx *ctx, struct qp_list *qp_list)
+{
+ int idx;
+ int ret;
+
+ for (idx = 0; idx < qp_list->idx; idx++) {
+ struct c4iw_qp *qp = qp_list->qps[idx];
+
+ ret = cxgb4_sync_txq_pidx(qp->rhp->rdev.lldi.ports[0],
+ qp->wq.sq.qid,
+ t4_sq_host_wq_pidx(&qp->wq),
+ t4_sq_wq_size(&qp->wq));
+ if (ret) {
+ printk(KERN_ERR MOD "%s: Fatal error - "
+ "DB overflow recovery failed - "
+ "error syncing SQ qid %u\n",
+ pci_name(ctx->lldi.pdev), qp->wq.sq.qid);
+ return;
+ }
+
+ ret = cxgb4_sync_txq_pidx(qp->rhp->rdev.lldi.ports[0],
+ qp->wq.rq.qid,
+ t4_rq_host_wq_pidx(&qp->wq),
+ t4_rq_wq_size(&qp->wq));
+
+ if (ret) {
+ printk(KERN_ERR MOD "%s: Fatal error - "
+ "DB overflow recovery failed - "
+ "error syncing RQ qid %u\n",
+ pci_name(ctx->lldi.pdev), qp->wq.rq.qid);
+ return;
+ }
+
+ /* Wait for the dbfifo to drain */
+ while (cxgb4_dbfifo_count(qp->rhp->rdev.lldi.ports[0], 1) > 0) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(usecs_to_jiffies(10));
+ }
+ }
+}
+
+static void recover_queues(struct uld_ctx *ctx)
+{
+ int count = 0;
+ struct qp_list qp_list;
+ int ret;
+
+ /* lock out kernel db ringers */
+ mutex_lock(&ctx->dev->db_mutex);
+
+ /* put all queues in to recovery mode */
+ spin_lock_irq(&ctx->dev->lock);
+ ctx->dev->db_state = RECOVERY;
+ ctx->dev->rdev.stats.db_state_transitions++;
+ idr_for_each(&ctx->dev->qpidr, disable_qp_db, NULL);
+ spin_unlock_irq(&ctx->dev->lock);
+
+ /* slow everybody down */
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(usecs_to_jiffies(1000));
+
+ /* Wait for the dbfifo to completely drain. */
+ while (cxgb4_dbfifo_count(ctx->dev->rdev.lldi.ports[0], 1) > 0) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(usecs_to_jiffies(10));
+ }
+
+ /* flush the SGE contexts */
+ ret = cxgb4_flush_eq_cache(ctx->dev->rdev.lldi.ports[0]);
+ if (ret) {
+ printk(KERN_ERR MOD "%s: Fatal error - DB overflow recovery failed\n",
+ pci_name(ctx->lldi.pdev));
+ goto out;
+ }
+
+ /* Count active queues so we can build a list of queues to recover */
+ spin_lock_irq(&ctx->dev->lock);
+ idr_for_each(&ctx->dev->qpidr, count_qps, &count);
+
+ qp_list.qps = kzalloc(count * sizeof *qp_list.qps, GFP_ATOMIC);
+ if (!qp_list.qps) {
+ printk(KERN_ERR MOD "%s: Fatal error - DB overflow recovery failed\n",
+ pci_name(ctx->lldi.pdev));
+ spin_unlock_irq(&ctx->dev->lock);
+ goto out;
+ }
+ qp_list.idx = 0;
+
+ /* add and ref each qp so it doesn't get freed */
+ idr_for_each(&ctx->dev->qpidr, add_and_ref_qp, &qp_list);
+
spin_unlock_irq(&ctx->dev->lock);
+
+ /* now traverse the list in a safe context to recover the db state*/
+ recover_lost_dbs(ctx, &qp_list);
+
+ /* we're almost done! deref the qps and clean up */
+ deref_qps(qp_list);
+ kfree(qp_list.qps);
+
+ /* Wait for the dbfifo to completely drain again */
+ while (cxgb4_dbfifo_count(ctx->dev->rdev.lldi.ports[0], 1) > 0) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(usecs_to_jiffies(10));
+ }
+
+ /* resume the queues */
+ spin_lock_irq(&ctx->dev->lock);
+ if (ctx->dev->qpcnt > db_fc_threshold)
+ ctx->dev->db_state = FLOW_CONTROL;
+ else {
+ ctx->dev->db_state = NORMAL;
+ idr_for_each(&ctx->dev->qpidr, enable_qp_db, NULL);
+ }
+ ctx->dev->rdev.stats.db_state_transitions++;
+ spin_unlock_irq(&ctx->dev->lock);
+
+out:
+ /* start up kernel db ringers again */
+ mutex_unlock(&ctx->dev->db_mutex);
}
static int c4iw_uld_control(void *handle, enum cxgb4_control control, ...)
@@ -716,8 +881,7 @@ static int c4iw_uld_control(void *handle, enum cxgb4_control control, ...)
mutex_unlock(&ctx->dev->rdev.stats.lock);
break;
case CXGB4_CONTROL_DB_DROP:
- printk(KERN_WARNING MOD "%s: Fatal DB DROP\n",
- pci_name(ctx->lldi.pdev));
+ recover_queues(ctx);
mutex_lock(&ctx->dev->rdev.stats.lock);
ctx->dev->rdev.stats.db_drop++;
mutex_unlock(&ctx->dev->rdev.stats.lock);
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 2ce7741..75d643c 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -120,6 +120,7 @@ struct c4iw_stats {
u64 db_full;
u64 db_empty;
u64 db_drop;
+ u64 db_state_transitions;
};
struct c4iw_rdev {
@@ -212,6 +213,7 @@ struct c4iw_dev {
struct mutex db_mutex;
struct dentry *debugfs_root;
enum db_state db_state;
+ int qpcnt;
};
static inline struct c4iw_dev *to_c4iw_dev(struct ib_device *ibdev)
@@ -271,11 +273,25 @@ static inline int insert_handle_nolock(struct c4iw_dev *rhp, struct idr *idr,
return _insert_handle(rhp, idr, handle, id, 0);
}
-static inline void remove_handle(struct c4iw_dev *rhp, struct idr *idr, u32 id)
+static inline void _remove_handle(struct c4iw_dev *rhp, struct idr *idr,
+ u32 id, int lock)
{
- spin_lock_irq(&rhp->lock);
+ if (lock)
+ spin_lock_irq(&rhp->lock);
idr_remove(idr, id);
- spin_unlock_irq(&rhp->lock);
+ if (lock)
+ spin_unlock_irq(&rhp->lock);
+}
+
+static inline void remove_handle(struct c4iw_dev *rhp, struct idr *idr, u32 id)
+{
+ _remove_handle(rhp, idr, id, 1);
+}
+
+static inline void remove_handle_nolock(struct c4iw_dev *rhp,
+ struct idr *idr, u32 id)
+{
+ _remove_handle(rhp, idr, id, 0);
}
struct c4iw_pd {
@@ -842,5 +858,7 @@ void c4iw_ev_dispatch(struct c4iw_dev *dev, struct t4_cqe *err_cqe);
extern struct cxgb4_client t4c_client;
extern c4iw_handler_func c4iw_handlers[NUM_CPL_CMDS];
extern int c4iw_max_read_depth;
+extern int db_fc_threshold;
+
#endif
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index 36fc94d..215b66a 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -42,6 +42,11 @@ static int ocqp_support = 1;
module_param(ocqp_support, int, 0644);
MODULE_PARM_DESC(ocqp_support, "Support on-chip SQs (default=1)");
+int db_fc_threshold = 2000;
+module_param(db_fc_threshold, int, 0644);
+MODULE_PARM_DESC(db_fc_threshold, "QP count/threshold that triggers automatic "
+ "db flow control mode (default = 2000)");
+
static void set_state(struct c4iw_qp *qhp, enum c4iw_qp_state state)
{
unsigned long flag;
@@ -1132,13 +1137,19 @@ static int ring_kernel_db(struct c4iw_qp *qhp, u32 qid, u16 inc)
mutex_lock(&qhp->rhp->db_mutex);
do {
- if (cxgb4_dbfifo_count(qhp->rhp->rdev.lldi.ports[0], 1) < 768) {
+
+ /*
+ * The interrupt threshold is dbfifo_int_thresh << 6. So
+ * make sure we don't cross that and generate an interrupt.
+ */
+ if (cxgb4_dbfifo_count(qhp->rhp->rdev.lldi.ports[0], 1) <
+ (qhp->rhp->rdev.lldi.dbfifo_int_thresh << 5)) {
writel(V_QID(qid) | V_PIDX(inc), qhp->wq.db);
break;
}
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout(usecs_to_jiffies(delay));
- delay = min(delay << 1, 200000);
+ delay = min(delay << 1, 2000);
} while (1);
mutex_unlock(&qhp->rhp->db_mutex);
return 0;
@@ -1373,6 +1384,14 @@ out:
return ret;
}
+static int enable_qp_db(int id, void *p, void *data)
+{
+ struct c4iw_qp *qp = p;
+
+ t4_enable_wq_db(&qp->wq);
+ return 0;
+}
+
int c4iw_destroy_qp(struct ib_qp *ib_qp)
{
struct c4iw_dev *rhp;
@@ -1390,7 +1409,16 @@ int c4iw_destroy_qp(struct ib_qp *ib_qp)
c4iw_modify_qp(rhp, qhp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 0);
wait_event(qhp->wait, !qhp->ep);
- remove_handle(rhp, &rhp->qpidr, qhp->wq.sq.qid);
+ spin_lock_irq(&rhp->lock);
+ remove_handle_nolock(rhp, &rhp->qpidr, qhp->wq.sq.qid);
+ rhp->qpcnt--;
+ BUG_ON(rhp->qpcnt < 0);
+ if (rhp->qpcnt <= db_fc_threshold && rhp->db_state == FLOW_CONTROL) {
+ rhp->rdev.stats.db_state_transitions++;
+ rhp->db_state = NORMAL;
+ idr_for_each(&rhp->qpidr, enable_qp_db, NULL);
+ }
+ spin_unlock_irq(&rhp->lock);
atomic_dec(&qhp->refcnt);
wait_event(qhp->wait, !atomic_read(&qhp->refcnt));
@@ -1404,6 +1432,14 @@ int c4iw_destroy_qp(struct ib_qp *ib_qp)
return 0;
}
+static int disable_qp_db(int id, void *p, void *data)
+{
+ struct c4iw_qp *qp = p;
+
+ t4_disable_wq_db(&qp->wq);
+ return 0;
+}
+
struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs,
struct ib_udata *udata)
{
@@ -1493,6 +1529,11 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs,
spin_lock_irq(&rhp->lock);
if (rhp->db_state != NORMAL)
t4_disable_wq_db(&qhp->wq);
+ if (++rhp->qpcnt > db_fc_threshold && rhp->db_state == NORMAL) {
+ rhp->rdev.stats.db_state_transitions++;
+ rhp->db_state = FLOW_CONTROL;
+ idr_for_each(&rhp->qpidr, disable_qp_db, NULL);
+ }
ret = insert_handle_nolock(rhp, &rhp->qpidr, qhp, qhp->wq.sq.qid);
spin_unlock_irq(&rhp->lock);
if (ret)
diff --git a/drivers/infiniband/hw/cxgb4/t4.h b/drivers/infiniband/hw/cxgb4/t4.h
index c0221ee..16f26ab 100644
--- a/drivers/infiniband/hw/cxgb4/t4.h
+++ b/drivers/infiniband/hw/cxgb4/t4.h
@@ -62,6 +62,10 @@ struct t4_status_page {
__be16 pidx;
u8 qp_err; /* flit 1 - sw owns */
u8 db_off;
+ u8 pad;
+ u16 host_wq_pidx;
+ u16 host_cidx;
+ u16 host_pidx;
};
#define T4_EQ_ENTRY_SIZE 64
@@ -375,6 +379,16 @@ static inline void t4_rq_consume(struct t4_wq *wq)
wq->rq.cidx = 0;
}
+static inline u16 t4_rq_host_wq_pidx(struct t4_wq *wq)
+{
+ return wq->rq.queue[wq->rq.size].status.host_wq_pidx;
+}
+
+static inline u16 t4_rq_wq_size(struct t4_wq *wq)
+{
+ return wq->rq.size * T4_RQ_NUM_SLOTS;
+}
+
static inline int t4_sq_onchip(struct t4_sq *sq)
{
return sq->flags & T4_SQ_ONCHIP;
@@ -412,6 +426,16 @@ static inline void t4_sq_consume(struct t4_wq *wq)
wq->sq.cidx = 0;
}
+static inline u16 t4_sq_host_wq_pidx(struct t4_wq *wq)
+{
+ return wq->sq.queue[wq->sq.size].status.host_wq_pidx;
+}
+
+static inline u16 t4_sq_wq_size(struct t4_wq *wq)
+{
+ return wq->sq.size * T4_SQ_NUM_SLOTS;
+}
+
static inline void t4_ring_sq_db(struct t4_wq *wq, u16 inc)
{
wmb();
--
1.7.1
^ permalink raw reply related
* [PATCH 06/10] RDMA/cxgb4: disable interrupts in c4iw_ev_dispatch().
From: Vipul Pandya @ 2011-10-19 17:11 UTC (permalink / raw)
To: linux-rdma, netdev; +Cc: roland, davem, divy, dm, kumaras, swise, Vipul Pandya
In-Reply-To: <1319044264-779-1-git-send-email-vipul@chelsio.com>
Use GFP_ATOMIC in _insert_handle() if ints are disabled.
Don't panic if we get an abort with no endpoint found. Just log a
warning.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
drivers/infiniband/hw/cxgb4/cm.c | 5 ++++-
drivers/infiniband/hw/cxgb4/ev.c | 8 ++++----
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 2 +-
3 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index b36cdac..c51818a 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -1360,7 +1360,10 @@ static int abort_rpl(struct c4iw_dev *dev, struct sk_buff *skb)
ep = lookup_tid(t, tid);
PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
- BUG_ON(!ep);
+ if (!ep) {
+ printk(KERN_WARNING MOD "Abort rpl to freed endpoint\n");
+ return 0;
+ }
mutex_lock(&ep->com.mutex);
switch (ep->com.state) {
case ABORTING:
diff --git a/drivers/infiniband/hw/cxgb4/ev.c b/drivers/infiniband/hw/cxgb4/ev.c
index c13041a..e9eac34 100644
--- a/drivers/infiniband/hw/cxgb4/ev.c
+++ b/drivers/infiniband/hw/cxgb4/ev.c
@@ -81,7 +81,7 @@ void c4iw_ev_dispatch(struct c4iw_dev *dev, struct t4_cqe *err_cqe)
struct c4iw_qp *qhp;
u32 cqid;
- spin_lock(&dev->lock);
+ spin_lock_irq(&dev->lock);
qhp = get_qhp(dev, CQE_QPID(err_cqe));
if (!qhp) {
printk(KERN_ERR MOD "BAD AE qpid 0x%x opcode %d "
@@ -90,7 +90,7 @@ void c4iw_ev_dispatch(struct c4iw_dev *dev, struct t4_cqe *err_cqe)
CQE_OPCODE(err_cqe), CQE_STATUS(err_cqe),
CQE_TYPE(err_cqe), CQE_WRID_HI(err_cqe),
CQE_WRID_LOW(err_cqe));
- spin_unlock(&dev->lock);
+ spin_unlock_irq(&dev->lock);
goto out;
}
@@ -106,13 +106,13 @@ void c4iw_ev_dispatch(struct c4iw_dev *dev, struct t4_cqe *err_cqe)
CQE_OPCODE(err_cqe), CQE_STATUS(err_cqe),
CQE_TYPE(err_cqe), CQE_WRID_HI(err_cqe),
CQE_WRID_LOW(err_cqe));
- spin_unlock(&dev->lock);
+ spin_unlock_irq(&dev->lock);
goto out;
}
c4iw_qp_add_ref(&qhp->ibqp);
atomic_inc(&chp->refcnt);
- spin_unlock(&dev->lock);
+ spin_unlock_irq(&dev->lock);
/* Bad incoming write */
if (RQ_TYPE(err_cqe) &&
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 1924c19..2ce7741 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -246,7 +246,7 @@ static inline int _insert_handle(struct c4iw_dev *rhp, struct idr *idr,
int newid;
do {
- if (!idr_pre_get(idr, GFP_KERNEL))
+ if (!idr_pre_get(idr, lock ? GFP_KERNEL : GFP_ATOMIC))
return -ENOMEM;
if (lock)
spin_lock_irq(&rhp->lock);
--
1.7.1
^ permalink raw reply related
* [PATCH 05/10] RDMA/cxgb4: Add DB Overflow Avoidance.
From: Vipul Pandya @ 2011-10-19 17:10 UTC (permalink / raw)
To: linux-rdma, netdev; +Cc: roland, davem, divy, dm, kumaras, swise, Vipul Pandya
In-Reply-To: <1319044264-779-1-git-send-email-vipul@chelsio.com>
- get FULL/EMPTY/DROP events from LLD
- on FULL event, disable normal user mode DB rings.
- add modify_qp semantics to allow user processes to call into
the kernel to ring doobells without overflowing.
Add DB Full/Empty/Drop stats.
Mark queues when created indicating the doorbell state.
If we're in the middle of db overflow avoidance, then newly created
queues should start out in this mode.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
drivers/infiniband/hw/cxgb4/device.c | 84 +++++++++++++++++++++++++++++--
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 37 ++++++++++++--
drivers/infiniband/hw/cxgb4/qp.c | 51 +++++++++++++++++++-
3 files changed, 161 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 8483111..9062ed9 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -44,6 +44,12 @@ MODULE_DESCRIPTION("Chelsio T4 RDMA Driver");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_VERSION(DRV_VERSION);
+struct uld_ctx {
+ struct list_head entry;
+ struct cxgb4_lld_info lldi;
+ struct c4iw_dev *dev;
+};
+
static LIST_HEAD(uld_ctx_list);
static DEFINE_MUTEX(dev_mutex);
@@ -263,6 +269,9 @@ static int stats_show(struct seq_file *seq, void *v)
seq_printf(seq, " OCQPMEM: %10llu %10llu %10llu\n",
dev->rdev.stats.ocqp.total, dev->rdev.stats.ocqp.cur,
dev->rdev.stats.ocqp.max);
+ seq_printf(seq, " DB FULL: %10llu\n", dev->rdev.stats.db_full);
+ seq_printf(seq, " DB EMPTY: %10llu\n", dev->rdev.stats.db_empty);
+ seq_printf(seq, " DB DROP: %10llu\n", dev->rdev.stats.db_drop);
return 0;
}
@@ -283,6 +292,9 @@ static ssize_t stats_clear(struct file *file, const char __user *buf,
dev->rdev.stats.pbl.max = 0;
dev->rdev.stats.rqt.max = 0;
dev->rdev.stats.ocqp.max = 0;
+ dev->rdev.stats.db_full = 0;
+ dev->rdev.stats.db_empty = 0;
+ dev->rdev.stats.db_drop = 0;
mutex_unlock(&dev->rdev.stats.lock);
return count;
}
@@ -443,12 +455,6 @@ static void c4iw_rdev_close(struct c4iw_rdev *rdev)
c4iw_destroy_resource(&rdev->resource);
}
-struct uld_ctx {
- struct list_head entry;
- struct cxgb4_lld_info lldi;
- struct c4iw_dev *dev;
-};
-
static void c4iw_dealloc(struct uld_ctx *ctx)
{
c4iw_rdev_close(&ctx->dev->rdev);
@@ -514,6 +520,7 @@ static struct c4iw_dev *c4iw_alloc(const struct cxgb4_lld_info *infop)
idr_init(&devp->mmidr);
spin_lock_init(&devp->lock);
mutex_init(&devp->rdev.stats.lock);
+ mutex_init(&devp->db_mutex);
if (c4iw_debugfs_root) {
devp->debugfs_root = debugfs_create_dir(
@@ -659,11 +666,76 @@ static int c4iw_uld_state_change(void *handle, enum cxgb4_state new_state)
return 0;
}
+static int disable_qp_db(int id, void *p, void *data)
+{
+ struct c4iw_qp *qp = p;
+
+ t4_disable_wq_db(&qp->wq);
+ return 0;
+}
+
+static void stop_queues(struct uld_ctx *ctx)
+{
+ spin_lock_irq(&ctx->dev->lock);
+ ctx->dev->db_state = FLOW_CONTROL;
+ idr_for_each(&ctx->dev->qpidr, disable_qp_db, NULL);
+ spin_unlock_irq(&ctx->dev->lock);
+}
+
+static int enable_qp_db(int id, void *p, void *data)
+{
+ struct c4iw_qp *qp = p;
+
+ t4_enable_wq_db(&qp->wq);
+ return 0;
+}
+
+static void resume_queues(struct uld_ctx *ctx)
+{
+ spin_lock_irq(&ctx->dev->lock);
+ ctx->dev->db_state = NORMAL;
+ idr_for_each(&ctx->dev->qpidr, enable_qp_db, NULL);
+ spin_unlock_irq(&ctx->dev->lock);
+}
+
+static int c4iw_uld_control(void *handle, enum cxgb4_control control, ...)
+{
+ struct uld_ctx *ctx = handle;
+
+ switch (control) {
+ case CXGB4_CONTROL_DB_FULL:
+ stop_queues(ctx);
+ mutex_lock(&ctx->dev->rdev.stats.lock);
+ ctx->dev->rdev.stats.db_full++;
+ mutex_unlock(&ctx->dev->rdev.stats.lock);
+ break;
+ case CXGB4_CONTROL_DB_EMPTY:
+ resume_queues(ctx);
+ mutex_lock(&ctx->dev->rdev.stats.lock);
+ ctx->dev->rdev.stats.db_empty++;
+ mutex_unlock(&ctx->dev->rdev.stats.lock);
+ break;
+ case CXGB4_CONTROL_DB_DROP:
+ printk(KERN_WARNING MOD "%s: Fatal DB DROP\n",
+ pci_name(ctx->lldi.pdev));
+ mutex_lock(&ctx->dev->rdev.stats.lock);
+ ctx->dev->rdev.stats.db_drop++;
+ mutex_unlock(&ctx->dev->rdev.stats.lock);
+ break;
+ default:
+ printk(KERN_WARNING MOD "%s: unknown control cmd %u\n",
+ pci_name(ctx->lldi.pdev), control);
+ break;
+ }
+ return 0;
+}
+
static struct cxgb4_uld_info c4iw_uld_info = {
.name = DRV_NAME,
.add = c4iw_uld_add,
.rx_handler = c4iw_uld_rx_handler,
.state_change = c4iw_uld_state_change,
+ .control = c4iw_uld_control,
};
static int __init c4iw_init_module(void)
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index ec7c848..1924c19 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -117,6 +117,9 @@ struct c4iw_stats {
struct c4iw_stat pbl;
struct c4iw_stat rqt;
struct c4iw_stat ocqp;
+ u64 db_full;
+ u64 db_empty;
+ u64 db_drop;
};
struct c4iw_rdev {
@@ -192,6 +195,12 @@ static inline int c4iw_wait_for_reply(struct c4iw_rdev *rdev,
return wr_waitp->ret;
}
+enum db_state {
+ NORMAL = 0,
+ FLOW_CONTROL = 1,
+ RECOVERY = 2
+};
+
struct c4iw_dev {
struct ib_device ibdev;
struct c4iw_rdev rdev;
@@ -200,7 +209,9 @@ struct c4iw_dev {
struct idr qpidr;
struct idr mmidr;
spinlock_t lock;
+ struct mutex db_mutex;
struct dentry *debugfs_root;
+ enum db_state db_state;
};
static inline struct c4iw_dev *to_c4iw_dev(struct ib_device *ibdev)
@@ -228,8 +239,8 @@ static inline struct c4iw_mr *get_mhp(struct c4iw_dev *rhp, u32 mmid)
return idr_find(&rhp->mmidr, mmid);
}
-static inline int insert_handle(struct c4iw_dev *rhp, struct idr *idr,
- void *handle, u32 id)
+static inline int _insert_handle(struct c4iw_dev *rhp, struct idr *idr,
+ void *handle, u32 id, int lock)
{
int ret;
int newid;
@@ -237,15 +248,29 @@ static inline int insert_handle(struct c4iw_dev *rhp, struct idr *idr,
do {
if (!idr_pre_get(idr, GFP_KERNEL))
return -ENOMEM;
- spin_lock_irq(&rhp->lock);
+ if (lock)
+ spin_lock_irq(&rhp->lock);
ret = idr_get_new_above(idr, handle, id, &newid);
BUG_ON(newid != id);
- spin_unlock_irq(&rhp->lock);
+ if (lock)
+ spin_unlock_irq(&rhp->lock);
} while (ret == -EAGAIN);
return ret;
}
+static inline int insert_handle(struct c4iw_dev *rhp, struct idr *idr,
+ void *handle, u32 id)
+{
+ return _insert_handle(rhp, idr, handle, id, 1);
+}
+
+static inline int insert_handle_nolock(struct c4iw_dev *rhp, struct idr *idr,
+ void *handle, u32 id)
+{
+ return _insert_handle(rhp, idr, handle, id, 0);
+}
+
static inline void remove_handle(struct c4iw_dev *rhp, struct idr *idr, u32 id)
{
spin_lock_irq(&rhp->lock);
@@ -369,6 +394,8 @@ struct c4iw_qp_attributes {
struct c4iw_ep *llp_stream_handle;
u8 layer_etype;
u8 ecode;
+ u16 sq_db_inc;
+ u16 rq_db_inc;
};
struct c4iw_qp {
@@ -443,6 +470,8 @@ static inline void insert_mmap(struct c4iw_ucontext *ucontext,
enum c4iw_qp_attr_mask {
C4IW_QP_ATTR_NEXT_STATE = 1 << 0,
+ C4IW_QP_ATTR_SQ_DB = 1<<1,
+ C4IW_QP_ATTR_RQ_DB = 1<<2,
C4IW_QP_ATTR_ENABLE_RDMA_READ = 1 << 7,
C4IW_QP_ATTR_ENABLE_RDMA_WRITE = 1 << 8,
C4IW_QP_ATTR_ENABLE_RDMA_BIND = 1 << 9,
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index 74df98e..36fc94d 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -34,6 +34,10 @@
#include "iw_cxgb4.h"
+static int db_delay_usecs = 1;
+module_param(db_delay_usecs, int, 0644);
+MODULE_PARM_DESC(db_delay_usecs, "Usecs to delay awaiting db fifo to drain");
+
static int ocqp_support = 1;
module_param(ocqp_support, int, 0644);
MODULE_PARM_DESC(ocqp_support, "Support on-chip SQs (default=1)");
@@ -1117,6 +1121,29 @@ out:
return ret;
}
+/*
+ * Called by the library when the qp has user dbs disabled due to
+ * a DB_FULL condition. This function will single-thread all user
+ * DB rings to avoid overflowing the hw db-fifo.
+ */
+static int ring_kernel_db(struct c4iw_qp *qhp, u32 qid, u16 inc)
+{
+ int delay = db_delay_usecs;
+
+ mutex_lock(&qhp->rhp->db_mutex);
+ do {
+ if (cxgb4_dbfifo_count(qhp->rhp->rdev.lldi.ports[0], 1) < 768) {
+ writel(V_QID(qid) | V_PIDX(inc), qhp->wq.db);
+ break;
+ }
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(usecs_to_jiffies(delay));
+ delay = min(delay << 1, 200000);
+ } while (1);
+ mutex_unlock(&qhp->rhp->db_mutex);
+ return 0;
+}
+
int c4iw_modify_qp(struct c4iw_dev *rhp, struct c4iw_qp *qhp,
enum c4iw_qp_attr_mask mask,
struct c4iw_qp_attributes *attrs,
@@ -1165,6 +1192,15 @@ int c4iw_modify_qp(struct c4iw_dev *rhp, struct c4iw_qp *qhp,
qhp->attr = newattr;
}
+ if (mask & C4IW_QP_ATTR_SQ_DB) {
+ ret = ring_kernel_db(qhp, qhp->wq.sq.qid, attrs->sq_db_inc);
+ goto out;
+ }
+ if (mask & C4IW_QP_ATTR_RQ_DB) {
+ ret = ring_kernel_db(qhp, qhp->wq.rq.qid, attrs->rq_db_inc);
+ goto out;
+ }
+
if (!(mask & C4IW_QP_ATTR_NEXT_STATE))
goto out;
if (qhp->attr.state == attrs->next_state)
@@ -1454,7 +1490,11 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs,
init_waitqueue_head(&qhp->wait);
atomic_set(&qhp->refcnt, 1);
- ret = insert_handle(rhp, &rhp->qpidr, qhp, qhp->wq.sq.qid);
+ spin_lock_irq(&rhp->lock);
+ if (rhp->db_state != NORMAL)
+ t4_disable_wq_db(&qhp->wq);
+ ret = insert_handle_nolock(rhp, &rhp->qpidr, qhp, qhp->wq.sq.qid);
+ spin_unlock_irq(&rhp->lock);
if (ret)
goto err2;
@@ -1598,6 +1638,15 @@ int c4iw_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
C4IW_QP_ATTR_ENABLE_RDMA_WRITE |
C4IW_QP_ATTR_ENABLE_RDMA_BIND) : 0;
+ /*
+ * Use SQ_PSN and RQ_PSN to pass in IDX_INC values for
+ * ringing the queue db when we're in DB_FULL mode.
+ */
+ attrs.sq_db_inc = attr->sq_psn;
+ attrs.rq_db_inc = attr->rq_psn;
+ mask |= (attr_mask & IB_QP_SQ_PSN) ? C4IW_QP_ATTR_SQ_DB : 0;
+ mask |= (attr_mask & IB_QP_RQ_PSN) ? C4IW_QP_ATTR_RQ_DB : 0;
+
return c4iw_modify_qp(rhp, qhp, mask, &attrs, 0);
}
--
1.7.1
^ permalink raw reply related
* [PATCH 03/10] cxgb4: DB Drop Recovery for RDMA and LLD queues.
From: Vipul Pandya @ 2011-10-19 17:10 UTC (permalink / raw)
To: linux-rdma, netdev; +Cc: roland, davem, divy, dm, kumaras, swise, Vipul Pandya
In-Reply-To: <1319044264-779-1-git-send-email-vipul@chelsio.com>
- recover LLD EQs for DB drop interrupts. This includes adding a new
db_lock, a spin lock disabling BH too, used by the recovery thread and
the ring_tx_db() paths to allow db drop recovery.
- cleaned up initial db avoidance code.
- add read_eq_indices() - allows the LLD to use the pcie mw to efficiently
read hw eq contexts.
- add cxgb4_sync_txq_pidx() - called by iw_cxgb4 to sync up the sw/hw pidx
value.
- add flush_eq_cache() and cxgb4_flush_eq_cache(). This allows iw_cxgb4
to flush the sge eq context cache before beginning db drop recovery.
- add module parameter, dbfoifo_int_thresh, to allow tuning the db
interrupt threshold value.
- add dbfifo_int_thresh to cxgb4_lld_info so iw_cxgb4 knows the threshold.
- add module parameter, dbfoifo_drain_delay, to allow tuning the amount
of time delay between DB FULL and EMPTY upcalls to iw_cxgb4.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 7 +
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 214 +++++++++++++++++++----
drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h | 4 +
drivers/net/ethernet/chelsio/cxgb4/sge.c | 20 ++-
drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 53 ++++++
drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h | 23 +++
6 files changed, 279 insertions(+), 42 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index e18b5ad..f202cb9 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -51,6 +51,8 @@
#define FW_VERSION_MINOR 1
#define FW_VERSION_MICRO 0
+#define CH_WARN(adap, fmt, ...) dev_warn(adap->pdev_dev, fmt, ## __VA_ARGS__)
+
enum {
MAX_NPORTS = 4, /* max # of ports */
SERNUM_LEN = 24, /* Serial # length */
@@ -403,6 +405,9 @@ struct sge_txq {
struct tx_sw_desc *sdesc; /* address of SW Tx descriptor ring */
struct sge_qstat *stat; /* queue status entry */
dma_addr_t phys_addr; /* physical address of the ring */
+ spinlock_t db_lock;
+ int db_disabled;
+ unsigned short db_pidx;
};
struct sge_eth_txq { /* state for an SGE Ethernet Tx queue */
@@ -475,6 +480,7 @@ struct adapter {
void __iomem *regs;
struct pci_dev *pdev;
struct device *pdev_dev;
+ unsigned int mbox;
unsigned int fn;
unsigned int flags;
@@ -607,6 +613,7 @@ irqreturn_t t4_sge_intr_msix(int irq, void *cookie);
void t4_sge_init(struct adapter *adap);
void t4_sge_start(struct adapter *adap);
void t4_sge_stop(struct adapter *adap);
+extern int dbfifo_int_thresh;
#define for_each_port(adapter, iter) \
for (iter = 0; iter < (adapter)->params.nports; ++iter)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 870c320..64ad1c8 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -149,15 +149,6 @@ static unsigned int pfvfres_pmask(struct adapter *adapter,
#endif
enum {
- MEMWIN0_APERTURE = 65536,
- MEMWIN0_BASE = 0x30000,
- MEMWIN1_APERTURE = 32768,
- MEMWIN1_BASE = 0x28000,
- MEMWIN2_APERTURE = 2048,
- MEMWIN2_BASE = 0x1b800,
-};
-
-enum {
MAX_TXQ_ENTRIES = 16384,
MAX_CTRL_TXQ_ENTRIES = 1024,
MAX_RSPQ_ENTRIES = 16384,
@@ -369,6 +360,15 @@ static int set_addr_filters(const struct net_device *dev, bool sleep)
uhash | mhash, sleep);
}
+int dbfifo_int_thresh = 10; /* 10 == 640 entry threshold */
+module_param(dbfifo_int_thresh, int, 0644);
+MODULE_PARM_DESC(dbfifo_int_thresh, "doorbell fifo interrupt threshold");
+
+int dbfifo_drain_delay = 1000; /* usecs to sleep while draining the dbfifo */
+module_param(dbfifo_drain_delay, int, 0644);
+MODULE_PARM_DESC(dbfifo_drain_delay,
+ "usecs to sleep while draining the dbfifo");
+
/*
* Set Rx properties of a port, such as promiscruity, address filters, and MTU.
* If @mtu is -1 it is left unchanged.
@@ -387,6 +387,8 @@ static int set_rxmode(struct net_device *dev, int mtu, bool sleep_ok)
return ret;
}
+static struct workqueue_struct *workq;
+
/**
* link_start - enable a port
* @dev: the port to enable
@@ -2201,7 +2203,7 @@ static void cxgb4_queue_tid_release(struct tid_info *t, unsigned int chan,
adap->tid_release_head = (void **)((uintptr_t)p | chan);
if (!adap->tid_release_task_busy) {
adap->tid_release_task_busy = true;
- schedule_work(&adap->tid_release_task);
+ queue_work(workq, &adap->tid_release_task);
}
spin_unlock_bh(&adap->tid_release_lock);
}
@@ -2428,6 +2430,59 @@ void cxgb4_iscsi_init(struct net_device *dev, unsigned int tag_mask,
}
EXPORT_SYMBOL(cxgb4_iscsi_init);
+int cxgb4_flush_eq_cache(struct net_device *dev)
+{
+ struct adapter *adap = netdev2adap(dev);
+ int ret;
+
+ ret = t4_fwaddrspace_write(adap, adap->mbox,
+ 0xe1000000 + A_SGE_CTXT_CMD, 0x20000000);
+ return ret;
+}
+EXPORT_SYMBOL(cxgb4_flush_eq_cache);
+
+static int read_eq_indices(struct adapter *adap, u16 qid, u16 *pidx, u16 *cidx)
+{
+ u32 addr = t4_read_reg(adap, A_SGE_DBQ_CTXT_BADDR) + 24 * qid + 8;
+ __be64 indices;
+ int ret;
+
+ ret = t4_mem_win_read_len(adap, addr, (__be32 *)&indices, 8);
+ if (!ret) {
+ indices = be64_to_cpu(indices);
+ *cidx = (indices >> 25) & 0xffff;
+ *pidx = (indices >> 9) & 0xffff;
+ }
+ return ret;
+}
+
+int cxgb4_sync_txq_pidx(struct net_device *dev, u16 qid, u16 pidx,
+ u16 size)
+{
+ struct adapter *adap = netdev2adap(dev);
+ u16 hw_pidx, hw_cidx;
+ int ret;
+
+ ret = read_eq_indices(adap, qid, &hw_pidx, &hw_cidx);
+ if (ret)
+ goto out;
+
+ if (pidx != hw_pidx) {
+ u16 delta;
+
+ if (pidx >= hw_pidx)
+ delta = pidx - hw_pidx;
+ else
+ delta = size - hw_pidx + pidx;
+ wmb();
+ t4_write_reg(adap, MYPF_REG(A_SGE_PF_KDOORBELL),
+ V_QID(qid) | V_PIDX(delta));
+ }
+out:
+ return ret;
+}
+EXPORT_SYMBOL(cxgb4_sync_txq_pidx);
+
static struct pci_driver cxgb4_driver;
static void check_neigh_update(struct neighbour *neigh)
@@ -2461,6 +2516,95 @@ static struct notifier_block cxgb4_netevent_nb = {
.notifier_call = netevent_cb
};
+static void drain_db_fifo(struct adapter *adap, int usecs)
+{
+ u32 v;
+
+ do {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(usecs_to_jiffies(usecs));
+ v = t4_read_reg(adap, A_SGE_DBFIFO_STATUS);
+ if (G_LP_COUNT(v) == 0 && G_HP_COUNT(v) == 0)
+ break;
+ } while (1);
+}
+
+static void disable_txq_db(struct sge_txq *q)
+{
+ spin_lock_irq(&q->db_lock);
+ q->db_disabled = 1;
+ spin_unlock_irq(&q->db_lock);
+}
+
+static void enable_txq_db(struct sge_txq *q)
+{
+ spin_lock_irq(&q->db_lock);
+ q->db_disabled = 0;
+ spin_unlock_irq(&q->db_lock);
+}
+
+static void disable_dbs(struct adapter *adap)
+{
+ int i;
+
+ for_each_ethrxq(&adap->sge, i)
+ disable_txq_db(&adap->sge.ethtxq[i].q);
+ for_each_ofldrxq(&adap->sge, i)
+ disable_txq_db(&adap->sge.ofldtxq[i].q);
+ for_each_port(adap, i)
+ disable_txq_db(&adap->sge.ctrlq[i].q);
+}
+
+static void enable_dbs(struct adapter *adap)
+{
+ int i;
+
+ for_each_ethrxq(&adap->sge, i)
+ enable_txq_db(&adap->sge.ethtxq[i].q);
+ for_each_ofldrxq(&adap->sge, i)
+ enable_txq_db(&adap->sge.ofldtxq[i].q);
+ for_each_port(adap, i)
+ enable_txq_db(&adap->sge.ctrlq[i].q);
+}
+
+static void sync_txq_pidx(struct adapter *adap, struct sge_txq *q)
+{
+ u16 hw_pidx, hw_cidx;
+ int ret;
+
+ spin_lock_bh(&q->db_lock);
+ ret = read_eq_indices(adap, (u16)q->cntxt_id, &hw_pidx, &hw_cidx);
+ if (ret)
+ goto out;
+ if (q->db_pidx != hw_pidx) {
+ u16 delta;
+
+ if (q->db_pidx >= hw_pidx)
+ delta = q->db_pidx - hw_pidx;
+ else
+ delta = q->size - hw_pidx + q->db_pidx;
+ wmb();
+ t4_write_reg(adap, MYPF_REG(A_SGE_PF_KDOORBELL),
+ V_QID(q->cntxt_id) | V_PIDX(delta));
+ }
+out:
+ q->db_disabled = 0;
+ spin_unlock_bh(&q->db_lock);
+ if (ret)
+ CH_WARN(adap, "DB drop recovery failed.\n");
+}
+static void recover_all_queues(struct adapter *adap)
+{
+ int i;
+
+ for_each_ethrxq(&adap->sge, i)
+ sync_txq_pidx(adap, &adap->sge.ethtxq[i].q);
+ for_each_ofldrxq(&adap->sge, i)
+ sync_txq_pidx(adap, &adap->sge.ofldtxq[i].q);
+ for_each_port(adap, i)
+ sync_txq_pidx(adap, &adap->sge.ctrlq[i].q);
+}
+
static void notify_rdma_uld(struct adapter *adap, enum cxgb4_control cmd)
{
mutex_lock(&uld_mutex);
@@ -2473,55 +2617,41 @@ static void notify_rdma_uld(struct adapter *adap, enum cxgb4_control cmd)
static void process_db_full(struct work_struct *work)
{
struct adapter *adap;
- static int delay = 1000;
- u32 v;
adap = container_of(work, struct adapter, db_full_task);
-
- /* stop LLD queues */
-
notify_rdma_uld(adap, CXGB4_CONTROL_DB_FULL);
- do {
- set_current_state(TASK_UNINTERRUPTIBLE);
- schedule_timeout(usecs_to_jiffies(delay));
- v = t4_read_reg(adap, A_SGE_DBFIFO_STATUS);
- if (G_LP_COUNT(v) == 0 && G_HP_COUNT(v) == 0)
- break;
- } while (1);
+ drain_db_fifo(adap, dbfifo_drain_delay);
+ t4_set_reg_field(adap, A_SGE_INT_ENABLE3,
+ F_DBFIFO_HP_INT | F_DBFIFO_LP_INT,
+ F_DBFIFO_HP_INT | F_DBFIFO_LP_INT);
notify_rdma_uld(adap, CXGB4_CONTROL_DB_EMPTY);
-
-
- /*
- * The more we get db full interrupts, the more we'll delay
- * in re-enabling db rings on queues, capped off at 200ms.
- */
- delay = min(delay << 1, 200000);
-
- /* resume LLD queues */
}
static void process_db_drop(struct work_struct *work)
{
struct adapter *adap;
- adap = container_of(work, struct adapter, db_drop_task);
+ adap = container_of(work, struct adapter, db_drop_task);
- /*
- * sync the PIDX values in HW and SW for LLD queues.
- */
-
+ t4_set_reg_field(adap, A_SGE_DOORBELL_CONTROL, F_DROPPED_DB, 0);
+ disable_dbs(adap);
notify_rdma_uld(adap, CXGB4_CONTROL_DB_DROP);
+ drain_db_fifo(adap, 1);
+ recover_all_queues(adap);
+ enable_dbs(adap);
}
void t4_db_full(struct adapter *adap)
{
- schedule_work(&adap->db_full_task);
+ t4_set_reg_field(adap, A_SGE_INT_ENABLE3,
+ F_DBFIFO_HP_INT | F_DBFIFO_LP_INT, 0);
+ queue_work(workq, &adap->db_full_task);
}
void t4_db_dropped(struct adapter *adap)
{
- schedule_work(&adap->db_drop_task);
+ queue_work(workq, &adap->db_drop_task);
}
static void uld_attach(struct adapter *adap, unsigned int uld)
@@ -2557,6 +2687,7 @@ static void uld_attach(struct adapter *adap, unsigned int uld)
lli.gts_reg = adap->regs + MYPF_REG(SGE_PF_GTS);
lli.db_reg = adap->regs + MYPF_REG(SGE_PF_KDOORBELL);
lli.fw_vers = adap->params.fw_vers;
+ lli.dbfifo_int_thresh = dbfifo_int_thresh;
handle = ulds[uld].add(&lli);
if (IS_ERR(handle)) {
@@ -3673,6 +3804,7 @@ static int __devinit init_one(struct pci_dev *pdev,
adapter->pdev = pdev;
adapter->pdev_dev = &pdev->dev;
+ adapter->mbox = func;
adapter->fn = func;
adapter->msg_enable = dflt_msg_enable;
memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
@@ -3868,6 +4000,10 @@ static int __init cxgb4_init_module(void)
{
int ret;
+ workq = create_singlethread_workqueue("cxgb4");
+ if (!workq)
+ return -ENOMEM;
+
/* Debugfs support is optional, just warn if this fails */
cxgb4_debugfs_root = debugfs_create_dir(KBUILD_MODNAME, NULL);
if (!cxgb4_debugfs_root)
@@ -3883,6 +4019,8 @@ static void __exit cxgb4_cleanup_module(void)
{
pci_unregister_driver(&cxgb4_driver);
debugfs_remove(cxgb4_debugfs_root); /* NULL ok */
+ flush_workqueue(workq);
+ destroy_workqueue(workq);
}
module_init(cxgb4_init_module);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
index 5cc2f27..d79980c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
@@ -218,6 +218,7 @@ struct cxgb4_lld_info {
unsigned short ucq_density; /* # of user CQs/page */
void __iomem *gts_reg; /* address of GTS register */
void __iomem *db_reg; /* address of kernel doorbell */
+ int dbfifo_int_thresh; /* doorbell fifo int threshold */
};
struct cxgb4_uld_info {
@@ -226,6 +227,7 @@ struct cxgb4_uld_info {
int (*rx_handler)(void *handle, const __be64 *rsp,
const struct pkt_gl *gl);
int (*state_change)(void *handle, enum cxgb4_state new_state);
+ int (*control)(void *handle, enum cxgb4_control control, ...);
};
int cxgb4_register_uld(enum cxgb4_uld type, const struct cxgb4_uld_info *p);
@@ -243,4 +245,6 @@ void cxgb4_iscsi_init(struct net_device *dev, unsigned int tag_mask,
const unsigned int *pgsz_order);
struct sk_buff *cxgb4_pktgl_to_skb(const struct pkt_gl *gl,
unsigned int skb_len, unsigned int pull_len);
+int cxgb4_sync_txq_pidx(struct net_device *dev, u16 qid, u16 pidx, u16 size);
+int cxgb4_flush_eq_cache(struct net_device *dev);
#endif /* !__CXGB4_OFLD_H */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 3631fbb..65ecf1e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -767,8 +767,13 @@ static void write_sgl(const struct sk_buff *skb, struct sge_txq *q,
static inline void ring_tx_db(struct adapter *adap, struct sge_txq *q, int n)
{
wmb(); /* write descriptors before telling HW */
- t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL),
- QID(q->cntxt_id) | PIDX(n));
+ spin_lock(&q->db_lock);
+ if (!q->db_disabled) {
+ t4_write_reg(adap, MYPF_REG(A_SGE_PF_KDOORBELL),
+ V_QID(q->cntxt_id) | V_PIDX(n));
+ }
+ q->db_pidx = q->pidx;
+ spin_unlock(&q->db_lock);
}
/**
@@ -2080,6 +2085,7 @@ static void init_txq(struct adapter *adap, struct sge_txq *q, unsigned int id)
q->stops = q->restarts = 0;
q->stat = (void *)&q->desc[q->size];
q->cntxt_id = id;
+ spin_lock_init(&q->db_lock);
adap->sge.egr_map[id - adap->sge.egr_start] = q;
}
@@ -2414,9 +2420,15 @@ void t4_sge_init(struct adapter *adap)
RXPKTCPLMODE |
(STAT_LEN == 128 ? EGRSTATUSPAGESIZE : 0));
+ /*
+ * Set up to drop DOORBELL writes when the DOORBELL FIFO overflows
+ * and generate an interrupt when this occurs so we can recover.
+ */
t4_set_reg_field(adap, A_SGE_DBFIFO_STATUS,
- V_HP_INT_THRESH(5) | V_LP_INT_THRESH(5),
- V_HP_INT_THRESH(5) | V_LP_INT_THRESH(5));
+ V_HP_INT_THRESH(M_HP_INT_THRESH) |
+ V_LP_INT_THRESH(M_LP_INT_THRESH),
+ V_HP_INT_THRESH(dbfifo_int_thresh) |
+ V_LP_INT_THRESH(dbfifo_int_thresh));
t4_set_reg_field(adap, A_SGE_DOORBELL_CONTROL, F_ENABLE_DROP,
F_ENABLE_DROP);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
index 0adc5bc..111fc32 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
@@ -190,6 +190,59 @@
#define SGE_DEBUG_DATA_LOW 0x10d4
#define SGE_INGRESS_QUEUES_PER_PAGE_PF 0x10f4
+#define S_LP_INT_THRESH 12
+#define V_LP_INT_THRESH(x) ((x) << S_LP_INT_THRESH)
+#define S_HP_INT_THRESH 28
+#define V_HP_INT_THRESH(x) ((x) << S_HP_INT_THRESH)
+#define A_SGE_DBFIFO_STATUS 0x10a4
+
+#define S_ENABLE_DROP 13
+#define V_ENABLE_DROP(x) ((x) << S_ENABLE_DROP)
+#define F_ENABLE_DROP V_ENABLE_DROP(1U)
+#define A_SGE_DOORBELL_CONTROL 0x10a8
+
+#define A_SGE_CTXT_CMD 0x11fc
+#define A_SGE_DBQ_CTXT_BADDR 0x1084
+
+#define A_SGE_PF_KDOORBELL 0x0
+
+#define S_QID 15
+#define V_QID(x) ((x) << S_QID)
+
+#define S_PIDX 0
+#define V_PIDX(x) ((x) << S_PIDX)
+
+#define M_LP_COUNT 0x7ffU
+#define S_LP_COUNT 0
+#define G_LP_COUNT(x) (((x) >> S_LP_COUNT) & M_LP_COUNT)
+
+#define M_HP_COUNT 0x7ffU
+#define S_HP_COUNT 16
+#define G_HP_COUNT(x) (((x) >> S_HP_COUNT) & M_HP_COUNT)
+
+#define A_SGE_INT_ENABLE3 0x1040
+
+#define S_DBFIFO_HP_INT 8
+#define V_DBFIFO_HP_INT(x) ((x) << S_DBFIFO_HP_INT)
+#define F_DBFIFO_HP_INT V_DBFIFO_HP_INT(1U)
+
+#define S_DBFIFO_LP_INT 7
+#define V_DBFIFO_LP_INT(x) ((x) << S_DBFIFO_LP_INT)
+#define F_DBFIFO_LP_INT V_DBFIFO_LP_INT(1U)
+
+#define S_DROPPED_DB 0
+#define V_DROPPED_DB(x) ((x) << S_DROPPED_DB)
+#define F_DROPPED_DB V_DROPPED_DB(1U)
+
+#define S_ERR_DROPPED_DB 18
+#define V_ERR_DROPPED_DB(x) ((x) << S_ERR_DROPPED_DB)
+#define F_ERR_DROPPED_DB V_ERR_DROPPED_DB(1U)
+
+#define A_PCIE_MEM_ACCESS_OFFSET 0x306c
+
+#define M_HP_INT_THRESH 0xfU
+#define M_LP_INT_THRESH 0xfU
+
#define PCIE_PF_CLI 0x44
#define PCIE_INT_CAUSE 0x3004
#define UNXSPLCPLERR 0x20000000U
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index 83ca454..0579e98 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -1625,4 +1625,27 @@ int t4_mem_win_read_len(struct adapter *adap, u32 addr, __be32 *data, int len);
int t4_fwaddrspace_write(struct adapter *adap, unsigned int mbox,
u32 addr, u32 val);
+#define S_FW_CMD_OP 24
+#define V_FW_CMD_OP(x) ((x) << S_FW_CMD_OP)
+
+#define S_FW_CMD_REQUEST 23
+#define V_FW_CMD_REQUEST(x) ((x) << S_FW_CMD_REQUEST)
+#define F_FW_CMD_REQUEST V_FW_CMD_REQUEST(1U)
+
+#define S_FW_CMD_WRITE 21
+#define V_FW_CMD_WRITE(x) ((x) << S_FW_CMD_WRITE)
+#define F_FW_CMD_WRITE V_FW_CMD_WRITE(1U)
+
+#define S_FW_LDST_CMD_ADDRSPACE 0
+#define V_FW_LDST_CMD_ADDRSPACE(x) ((x) << S_FW_LDST_CMD_ADDRSPACE)
+
+enum {
+ MEMWIN0_APERTURE = 65536,
+ MEMWIN0_BASE = 0x30000,
+ MEMWIN1_APERTURE = 32768,
+ MEMWIN1_BASE = 0x28000,
+ MEMWIN2_APERTURE = 2048,
+ MEMWIN2_BASE = 0x1b800,
+};
+
#endif /* _T4FW_INTERFACE_H_ */
--
1.7.1
^ permalink raw reply related
* [PATCH 02/10] cxgb4: DB Drop Recovery for RDMA and LLD queues.
From: Vipul Pandya @ 2011-10-19 17:10 UTC (permalink / raw)
To: linux-rdma, netdev; +Cc: roland, davem, divy, dm, kumaras, swise, Vipul Pandya
In-Reply-To: <1319044264-779-1-git-send-email-vipul@chelsio.com>
- Add platform-specific callback functions for interrupts. This is
needed to do a single read-clear of the CAUSE register and then call
out to platform specific functions for DB threshold interrupts and DB
drop interrupts.
- Add t4_mem_win_read_len() - mem-window reads for arbitrary lengths.
This is used to read the CIDX/PIDX values from EC contexts during DB
drop recovery.
- Add t4_fwaddrspace_write() - sends addrspace write cmds to the fw.
Needed to flush the sge eq context cache.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 69 +++++++++++++++++++++----
drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h | 5 ++
2 files changed, 63 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 13609bf..32e1dd5 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -868,11 +868,14 @@ int t4_restart_aneg(struct adapter *adap, unsigned int mbox, unsigned int port)
return t4_wr_mbox(adap, mbox, &c, sizeof(c), NULL);
}
+typedef void (*int_handler_t)(struct adapter *adap);
+
struct intr_info {
unsigned int mask; /* bits to check in interrupt status */
const char *msg; /* message to print or NULL */
short stat_idx; /* stat counter to increment or -1 */
unsigned short fatal; /* whether the condition reported is fatal */
+ int_handler_t int_handler; /* platform-specific int handler */
};
/**
@@ -905,6 +908,8 @@ static int t4_handle_intr_status(struct adapter *adapter, unsigned int reg,
} else if (acts->msg && printk_ratelimit())
dev_warn(adapter->pdev_dev, "%s (0x%x)\n", acts->msg,
status & acts->mask);
+ if (acts->int_handler)
+ acts->int_handler(adapter);
mask |= acts->mask;
}
status &= mask;
@@ -1013,9 +1018,9 @@ static void sge_intr_handler(struct adapter *adapter)
{ ERR_INVALID_CIDX_INC,
"SGE GTS CIDX increment too large", -1, 0 },
{ ERR_CPL_OPCODE_0, "SGE received 0-length CPL", -1, 0 },
- { F_DBFIFO_LP_INT, NULL, -1, 0 },
- { F_DBFIFO_HP_INT, NULL, -1, 0 },
- { ERR_DROPPED_DB, "SGE doorbell dropped", -1, 0 },
+ { F_DBFIFO_LP_INT, NULL, -1, 0, t4_db_full },
+ { F_DBFIFO_HP_INT, NULL, -1, 0, t4_db_full },
+ { F_ERR_DROPPED_DB, NULL, -1, 0, t4_db_dropped },
{ ERR_DATA_CPL_ON_HIGH_QID1 | ERR_DATA_CPL_ON_HIGH_QID0,
"SGE IQID > 1023 received CPL for FL", -1, 0 },
{ ERR_BAD_DB_PIDX3, "SGE DBP 3 pidx increment too large", -1,
@@ -1036,20 +1041,14 @@ static void sge_intr_handler(struct adapter *adapter)
};
v = (u64)t4_read_reg(adapter, SGE_INT_CAUSE1) |
- ((u64)t4_read_reg(adapter, SGE_INT_CAUSE2) << 32);
+ ((u64)t4_read_reg(adapter, SGE_INT_CAUSE2) << 32);
if (v) {
dev_alert(adapter->pdev_dev, "SGE parity error (%#llx)\n",
- (unsigned long long)v);
+ (unsigned long long)v);
t4_write_reg(adapter, SGE_INT_CAUSE1, v);
t4_write_reg(adapter, SGE_INT_CAUSE2, v >> 32);
}
- err = t4_read_reg(adapter, A_SGE_INT_CAUSE3);
- if (err & (F_DBFIFO_HP_INT|F_DBFIFO_LP_INT))
- t4_db_full(adapter);
- if (err & F_ERR_DROPPED_DB)
- t4_db_dropped(adapter);
-
if (t4_handle_intr_status(adapter, SGE_INT_CAUSE3, sge_intr_info) ||
v != 0)
t4_fatal_err(adapter);
@@ -1995,6 +1994,54 @@ int t4_wol_pat_enable(struct adapter *adap, unsigned int port, unsigned int map,
(var).retval_len16 = htonl(FW_LEN16(var)); \
} while (0)
+int t4_fwaddrspace_write(struct adapter *adap, unsigned int mbox,
+ u32 addr, u32 val)
+{
+ struct fw_ldst_cmd c;
+
+ memset(&c, 0, sizeof(c));
+ c.op_to_addrspace = htonl(V_FW_CMD_OP(FW_LDST_CMD) | F_FW_CMD_REQUEST |
+ F_FW_CMD_WRITE |
+ V_FW_LDST_CMD_ADDRSPACE(FW_LDST_ADDRSPC_FIRMWARE));
+ c.cycles_to_len16 = htonl(FW_LEN16(c));
+ c.u.addrval.addr = htonl(addr);
+ c.u.addrval.val = htonl(val);
+
+ return t4_wr_mbox(adap, mbox, &c, sizeof(c), NULL);
+}
+
+/*
+ * t4_mem_win_read_len - read memory through PCIE memory window
+ * @adap: the adapter
+ * @addr: address of first byte requested aligned on 32b.
+ * @data: len bytes to hold the data read
+ * @len: amount of data to read from window. Must be <=
+ * MEMWIN0_APERATURE after adjusting for 16B alignment
+ * requirements of the the memory window.
+ *
+ * Read len bytes of data from MC starting at @addr.
+ */
+int t4_mem_win_read_len(struct adapter *adap, u32 addr, __be32 *data, int len)
+{
+ int i;
+ int off;
+
+ /*
+ * Align on a 16B boundary.
+ */
+ off = addr & 15;
+ if ((addr & 3) || (len + off) > MEMWIN0_APERTURE)
+ return -EINVAL;
+
+ t4_write_reg(adap, A_PCIE_MEM_ACCESS_OFFSET, addr & ~15);
+ t4_read_reg(adap, A_PCIE_MEM_ACCESS_OFFSET);
+
+ for (i = 0; i < len; i += 4)
+ *data++ = t4_read_reg(adap, (MEMWIN0_BASE + off + i));
+
+ return 0;
+}
+
/**
* t4_mdio_rd - read a PHY register through MDIO
* @adap: the adapter
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index edcfd7e..83ca454 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -1620,4 +1620,9 @@ struct fw_hdr {
#define FW_HDR_FW_VER_MINOR_GET(x) (((x) >> 16) & 0xff)
#define FW_HDR_FW_VER_MICRO_GET(x) (((x) >> 8) & 0xff)
#define FW_HDR_FW_VER_BUILD_GET(x) (((x) >> 0) & 0xff)
+
+int t4_mem_win_read_len(struct adapter *adap, u32 addr, __be32 *data, int len);
+int t4_fwaddrspace_write(struct adapter *adap, unsigned int mbox,
+ u32 addr, u32 val);
+
#endif /* _T4FW_INTERFACE_H_ */
--
1.7.1
^ permalink raw reply related
* [PATCH 00/10] Doorbell drop recovery for T4 iWARP
From: Vipul Pandya @ 2011-10-19 17:10 UTC (permalink / raw)
To: linux-rdma, netdev; +Cc: roland, davem, divy, dm, kumaras, swise, Vipul Pandya
This patch-series implements doorbell drop recovery for T4 iWARP driver.
In the event where DBs are dropped application can get stalled for one or more
reasons. So, we recover RDMA and LLD queues in such an event.
We also take care for handling DB overflow events.
The patch-series also has some bug fixes, adds RDMA debugfs stats and removes
kfifo usage for ID mangement.
The patch-series is based on linux-next, and involves changes on
drivers/net/ethernet/chelsio/cxgb4 and drivers/infiniband/hw/cxgb4.
The changes on drivers/infiniband/hw/cxgb4 are dependent on the changes of
drivers/net/ethernet/chelsio/cxgb4 for the T4-iWARP driver to build correctly.
So, we request to merge the entire patch-series through one tree - either
through Roland's tree, or through Dave Miller's tree.
Both linux-rdma and netdev are included in this post for review.
Vipul Pandya (10):
cxgb4: Detect DB FULL events and notify RDMA ULD.
cxgb4: DB Drop Recovery for RDMA and LLD queues.
cxgb4: DB Drop Recovery for RDMA and LLD queues.
RDMA/cxgb4: Add debugfs rdma memory stats
RDMA/cxgb4: Add DB Overflow Avoidance.
RDMA/cxgb4: disable interrupts in c4iw_ev_dispatch().
RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues.
RDMA/cxgb4: Use vmalloc for debugfs qp dump. Allows dumping thousands
of qps.
RDMA/cxgb4: remove kfifo usage
RDMA/cxgb4: Add query_qp support in driver to query the qp state
before flushing.
drivers/infiniband/hw/cxgb4/Makefile | 2 +-
drivers/infiniband/hw/cxgb4/cm.c | 23 ++-
drivers/infiniband/hw/cxgb4/device.c | 339 ++++++++++++++++++++++-
drivers/infiniband/hw/cxgb4/ev.c | 8 +-
drivers/infiniband/hw/cxgb4/id_table.c | 112 ++++++++
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 134 ++++++++--
drivers/infiniband/hw/cxgb4/mem.c | 21 +-
drivers/infiniband/hw/cxgb4/provider.c | 19 +-
drivers/infiniband/hw/cxgb4/qp.c | 105 +++++++-
drivers/infiniband/hw/cxgb4/resource.c | 180 +++++-------
drivers/infiniband/hw/cxgb4/t4.h | 24 ++
drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 11 +
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 235 +++++++++++++++-
drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h | 11 +
drivers/net/ethernet/chelsio/cxgb4/sge.c | 22 ++-
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 62 ++++-
drivers/net/ethernet/chelsio/cxgb4/t4_regs.h | 53 ++++
drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h | 28 ++
18 files changed, 1218 insertions(+), 171 deletions(-)
create mode 100644 drivers/infiniband/hw/cxgb4/id_table.c
^ permalink raw reply
* [PATCH 04/10] RDMA/cxgb4: Add debugfs rdma memory stats
From: Vipul Pandya @ 2011-10-19 17:10 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: roland-BHEL68pLQRGGvPXPguhicg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
divy-ut6Up61K2wZBDgjK7y7TUQ, dm-ut6Up61K2wZBDgjK7y7TUQ,
kumaras-ut6Up61K2wZBDgjK7y7TUQ,
swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW, Vipul Pandya
In-Reply-To: <1319044264-779-1-git-send-email-vipul-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Vipul Pandya <vipul-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
drivers/infiniband/hw/cxgb4/device.c | 78 +++++++++++++++++++++++++++++++-
drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 17 +++++++
drivers/infiniband/hw/cxgb4/mem.c | 11 ++++-
drivers/infiniband/hw/cxgb4/provider.c | 8 +++
drivers/infiniband/hw/cxgb4/resource.c | 44 ++++++++++++++++++
5 files changed, 155 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 6d0df6e..8483111 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -240,6 +240,62 @@ static const struct file_operations stag_debugfs_fops = {
.llseek = default_llseek,
};
+static int stats_show(struct seq_file *seq, void *v)
+{
+ struct c4iw_dev *dev = seq->private;
+
+ seq_printf(seq, " Object: %10s %10s %10s\n", "Total", "Current", "Max");
+ seq_printf(seq, " PDID: %10llu %10llu %10llu\n",
+ dev->rdev.stats.pd.total, dev->rdev.stats.pd.cur,
+ dev->rdev.stats.pd.max);
+ seq_printf(seq, " QID: %10llu %10llu %10llu\n",
+ dev->rdev.stats.qid.total, dev->rdev.stats.qid.cur,
+ dev->rdev.stats.qid.max);
+ seq_printf(seq, " TPTMEM: %10llu %10llu %10llu\n",
+ dev->rdev.stats.stag.total, dev->rdev.stats.stag.cur,
+ dev->rdev.stats.stag.max);
+ seq_printf(seq, " PBLMEM: %10llu %10llu %10llu\n",
+ dev->rdev.stats.pbl.total, dev->rdev.stats.pbl.cur,
+ dev->rdev.stats.pbl.max);
+ seq_printf(seq, " RQTMEM: %10llu %10llu %10llu\n",
+ dev->rdev.stats.rqt.total, dev->rdev.stats.rqt.cur,
+ dev->rdev.stats.rqt.max);
+ seq_printf(seq, " OCQPMEM: %10llu %10llu %10llu\n",
+ dev->rdev.stats.ocqp.total, dev->rdev.stats.ocqp.cur,
+ dev->rdev.stats.ocqp.max);
+ return 0;
+}
+
+static int stats_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, stats_show, inode->i_private);
+}
+
+static ssize_t stats_clear(struct file *file, const char __user *buf,
+ size_t count, loff_t *pos)
+{
+ struct c4iw_dev *dev = ((struct seq_file *)file->private_data)->private;
+
+ mutex_lock(&dev->rdev.stats.lock);
+ dev->rdev.stats.pd.max = 0;
+ dev->rdev.stats.qid.max = 0;
+ dev->rdev.stats.stag.max = 0;
+ dev->rdev.stats.pbl.max = 0;
+ dev->rdev.stats.rqt.max = 0;
+ dev->rdev.stats.ocqp.max = 0;
+ mutex_unlock(&dev->rdev.stats.lock);
+ return count;
+}
+
+static const struct file_operations stats_debugfs_fops = {
+ .owner = THIS_MODULE,
+ .open = stats_open,
+ .release = single_release,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .write = stats_clear,
+};
+
static int setup_debugfs(struct c4iw_dev *devp)
{
struct dentry *de;
@@ -256,6 +312,12 @@ static int setup_debugfs(struct c4iw_dev *devp)
(void *)devp, &stag_debugfs_fops);
if (de && de->d_inode)
de->d_inode->i_size = 4096;
+
+ de = debugfs_create_file("stats", S_IWUSR, devp->debugfs_root,
+ (void *)devp, &stats_debugfs_fops);
+ if (de && de->d_inode)
+ de->d_inode->i_size = 4096;
+
return 0;
}
@@ -269,9 +331,13 @@ void c4iw_release_dev_ucontext(struct c4iw_rdev *rdev,
list_for_each_safe(pos, nxt, &uctx->qpids) {
entry = list_entry(pos, struct c4iw_qid_list, entry);
list_del_init(&entry->entry);
- if (!(entry->qid & rdev->qpmask))
+ if (!(entry->qid & rdev->qpmask)) {
c4iw_put_resource(&rdev->resource.qid_fifo, entry->qid,
- &rdev->resource.qid_fifo_lock);
+ &rdev->resource.qid_fifo_lock);
+ mutex_lock(&rdev->stats.lock);
+ rdev->stats.qid.cur -= rdev->qpmask + 1;
+ mutex_unlock(&rdev->stats.lock);
+ }
kfree(entry);
}
@@ -332,6 +398,13 @@ static int c4iw_rdev_open(struct c4iw_rdev *rdev)
goto err1;
}
+ rdev->stats.pd.total = T4_MAX_NUM_PD;
+ rdev->stats.stag.total = rdev->lldi.vr->stag.size;
+ rdev->stats.pbl.total = rdev->lldi.vr->pbl.size;
+ rdev->stats.rqt.total = rdev->lldi.vr->rq.size;
+ rdev->stats.ocqp.total = rdev->lldi.vr->ocq.size;
+ rdev->stats.qid.total = rdev->lldi.vr->qp.size;
+
err = c4iw_init_resource(rdev, c4iw_num_stags(rdev), T4_MAX_NUM_PD);
if (err) {
printk(KERN_ERR MOD "error %d initializing resources\n", err);
@@ -440,6 +513,7 @@ static struct c4iw_dev *c4iw_alloc(const struct cxgb4_lld_info *infop)
idr_init(&devp->qpidr);
idr_init(&devp->mmidr);
spin_lock_init(&devp->lock);
+ mutex_init(&devp->rdev.stats.lock);
if (c4iw_debugfs_root) {
devp->debugfs_root = debugfs_create_dir(
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 62cea0e..ec7c848 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -103,6 +103,22 @@ enum c4iw_rdev_flags {
T4_FATAL_ERROR = (1<<0),
};
+struct c4iw_stat {
+ u64 total;
+ u64 cur;
+ u64 max;
+};
+
+struct c4iw_stats {
+ struct mutex lock;
+ struct c4iw_stat qid;
+ struct c4iw_stat pd;
+ struct c4iw_stat stag;
+ struct c4iw_stat pbl;
+ struct c4iw_stat rqt;
+ struct c4iw_stat ocqp;
+};
+
struct c4iw_rdev {
struct c4iw_resource resource;
unsigned long qpshift;
@@ -117,6 +133,7 @@ struct c4iw_rdev {
struct cxgb4_lld_info lldi;
unsigned long oc_mw_pa;
void __iomem *oc_mw_kva;
+ struct c4iw_stats stats;
};
static inline int c4iw_fatal_error(struct c4iw_rdev *rdev)
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 40c8353..2a87379 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -135,6 +135,11 @@ static int write_tpt_entry(struct c4iw_rdev *rdev, u32 reset_tpt_entry,
&rdev->resource.tpt_fifo_lock);
if (!stag_idx)
return -ENOMEM;
+ mutex_lock(&rdev->stats.lock);
+ rdev->stats.stag.cur += 32;
+ if (rdev->stats.stag.cur > rdev->stats.stag.max)
+ rdev->stats.stag.max = rdev->stats.stag.cur;
+ mutex_unlock(&rdev->stats.lock);
*stag = (stag_idx << 8) | (atomic_inc_return(&key) & 0xff);
}
PDBG("%s stag_state 0x%0x type 0x%0x pdid 0x%0x, stag_idx 0x%x\n",
@@ -165,9 +170,13 @@ static int write_tpt_entry(struct c4iw_rdev *rdev, u32 reset_tpt_entry,
(rdev->lldi.vr->stag.start >> 5),
sizeof(tpt), &tpt);
- if (reset_tpt_entry)
+ if (reset_tpt_entry) {
c4iw_put_resource(&rdev->resource.tpt_fifo, stag_idx,
&rdev->resource.tpt_fifo_lock);
+ mutex_lock(&rdev->stats.lock);
+ rdev->stats.stag.cur -= 32;
+ mutex_unlock(&rdev->stats.lock);
+ }
return err;
}
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 247fe70..c2554ef 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -190,6 +190,9 @@ static int c4iw_deallocate_pd(struct ib_pd *pd)
PDBG("%s ibpd %p pdid 0x%x\n", __func__, pd, php->pdid);
c4iw_put_resource(&rhp->rdev.resource.pdid_fifo, php->pdid,
&rhp->rdev.resource.pdid_fifo_lock);
+ mutex_lock(&rhp->rdev.stats.lock);
+ rhp->rdev.stats.pd.cur--;
+ mutex_unlock(&rhp->rdev.stats.lock);
kfree(php);
return 0;
}
@@ -222,6 +225,11 @@ static struct ib_pd *c4iw_allocate_pd(struct ib_device *ibdev,
return ERR_PTR(-EFAULT);
}
}
+ mutex_lock(&rhp->rdev.stats.lock);
+ rhp->rdev.stats.pd.cur++;
+ if (rhp->rdev.stats.pd.cur > rhp->rdev.stats.pd.max)
+ rhp->rdev.stats.pd.max = rhp->rdev.stats.pd.cur;
+ mutex_unlock(&rhp->rdev.stats.lock);
PDBG("%s pdid 0x%0x ptr 0x%p\n", __func__, pdid, php);
return &php->ibpd;
}
diff --git a/drivers/infiniband/hw/cxgb4/resource.c b/drivers/infiniband/hw/cxgb4/resource.c
index 407ff39..1b948d1 100644
--- a/drivers/infiniband/hw/cxgb4/resource.c
+++ b/drivers/infiniband/hw/cxgb4/resource.c
@@ -185,6 +185,9 @@ u32 c4iw_get_cqid(struct c4iw_rdev *rdev, struct c4iw_dev_ucontext *uctx)
&rdev->resource.qid_fifo_lock);
if (!qid)
goto out;
+ mutex_lock(&rdev->stats.lock);
+ rdev->stats.qid.cur += rdev->qpmask + 1;
+ mutex_unlock(&rdev->stats.lock);
for (i = qid+1; i & rdev->qpmask; i++) {
entry = kmalloc(sizeof *entry, GFP_KERNEL);
if (!entry)
@@ -213,6 +216,10 @@ u32 c4iw_get_cqid(struct c4iw_rdev *rdev, struct c4iw_dev_ucontext *uctx)
out:
mutex_unlock(&uctx->lock);
PDBG("%s qid 0x%x\n", __func__, qid);
+ mutex_lock(&rdev->stats.lock);
+ if (rdev->stats.qid.cur > rdev->stats.qid.max)
+ rdev->stats.qid.max = rdev->stats.qid.cur;
+ mutex_unlock(&rdev->stats.lock);
return qid;
}
@@ -249,6 +256,9 @@ u32 c4iw_get_qpid(struct c4iw_rdev *rdev, struct c4iw_dev_ucontext *uctx)
&rdev->resource.qid_fifo_lock);
if (!qid)
goto out;
+ mutex_lock(&rdev->stats.lock);
+ rdev->stats.qid.cur += rdev->qpmask + 1;
+ mutex_unlock(&rdev->stats.lock);
for (i = qid+1; i & rdev->qpmask; i++) {
entry = kmalloc(sizeof *entry, GFP_KERNEL);
if (!entry)
@@ -277,6 +287,10 @@ u32 c4iw_get_qpid(struct c4iw_rdev *rdev, struct c4iw_dev_ucontext *uctx)
out:
mutex_unlock(&uctx->lock);
PDBG("%s qid 0x%x\n", __func__, qid);
+ mutex_lock(&rdev->stats.lock);
+ if (rdev->stats.qid.cur > rdev->stats.qid.max)
+ rdev->stats.qid.max = rdev->stats.qid.cur;
+ mutex_unlock(&rdev->stats.lock);
return qid;
}
@@ -315,12 +329,22 @@ u32 c4iw_pblpool_alloc(struct c4iw_rdev *rdev, int size)
if (!addr)
printk_ratelimited(KERN_WARNING MOD "%s: Out of PBL memory\n",
pci_name(rdev->lldi.pdev));
+ if (addr) {
+ mutex_lock(&rdev->stats.lock);
+ rdev->stats.pbl.cur += roundup(size, 1 << MIN_PBL_SHIFT);
+ if (rdev->stats.pbl.cur > rdev->stats.pbl.max)
+ rdev->stats.pbl.max = rdev->stats.pbl.cur;
+ mutex_unlock(&rdev->stats.lock);
+ }
return (u32)addr;
}
void c4iw_pblpool_free(struct c4iw_rdev *rdev, u32 addr, int size)
{
PDBG("%s addr 0x%x size %d\n", __func__, addr, size);
+ mutex_lock(&rdev->stats.lock);
+ rdev->stats.pbl.cur -= roundup(size, 1 << MIN_PBL_SHIFT);
+ mutex_unlock(&rdev->stats.lock);
gen_pool_free(rdev->pbl_pool, (unsigned long)addr, size);
}
@@ -377,12 +401,22 @@ u32 c4iw_rqtpool_alloc(struct c4iw_rdev *rdev, int size)
if (!addr)
printk_ratelimited(KERN_WARNING MOD "%s: Out of RQT memory\n",
pci_name(rdev->lldi.pdev));
+ if (addr) {
+ mutex_lock(&rdev->stats.lock);
+ rdev->stats.rqt.cur += roundup(size << 6, 1 << MIN_RQT_SHIFT);
+ if (rdev->stats.rqt.cur > rdev->stats.rqt.max)
+ rdev->stats.rqt.max = rdev->stats.rqt.cur;
+ mutex_unlock(&rdev->stats.lock);
+ }
return (u32)addr;
}
void c4iw_rqtpool_free(struct c4iw_rdev *rdev, u32 addr, int size)
{
PDBG("%s addr 0x%x size %d\n", __func__, addr, size << 6);
+ mutex_lock(&rdev->stats.lock);
+ rdev->stats.rqt.cur -= roundup(size << 6, 1 << MIN_RQT_SHIFT);
+ mutex_unlock(&rdev->stats.lock);
gen_pool_free(rdev->rqt_pool, (unsigned long)addr, size << 6);
}
@@ -433,12 +467,22 @@ u32 c4iw_ocqp_pool_alloc(struct c4iw_rdev *rdev, int size)
{
unsigned long addr = gen_pool_alloc(rdev->ocqp_pool, size);
PDBG("%s addr 0x%x size %d\n", __func__, (u32)addr, size);
+ if (addr) {
+ mutex_lock(&rdev->stats.lock);
+ rdev->stats.ocqp.cur += roundup(size, 1 << MIN_OCQP_SHIFT);
+ if (rdev->stats.ocqp.cur > rdev->stats.ocqp.max)
+ rdev->stats.ocqp.max = rdev->stats.ocqp.cur;
+ mutex_unlock(&rdev->stats.lock);
+ }
return (u32)addr;
}
void c4iw_ocqp_pool_free(struct c4iw_rdev *rdev, u32 addr, int size)
{
PDBG("%s addr 0x%x size %d\n", __func__, addr, size);
+ mutex_lock(&rdev->stats.lock);
+ rdev->stats.ocqp.cur -= roundup(size, 1 << MIN_OCQP_SHIFT);
+ mutex_unlock(&rdev->stats.lock);
gen_pool_free(rdev->ocqp_pool, (unsigned long)addr, size);
}
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 01/10] cxgb4: Detect DB FULL events and notify RDMA ULD.
From: Vipul Pandya @ 2011-10-19 17:10 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: roland-BHEL68pLQRGGvPXPguhicg, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
divy-ut6Up61K2wZBDgjK7y7TUQ, dm-ut6Up61K2wZBDgjK7y7TUQ,
kumaras-ut6Up61K2wZBDgjK7y7TUQ,
swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW, Vipul Pandya
In-Reply-To: <1319044264-779-1-git-send-email-vipul-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Vipul Pandya <vipul-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 4 +
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 77 +++++++++++++++++++++++
drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h | 7 ++
drivers/net/ethernet/chelsio/cxgb4/sge.c | 6 ++
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 9 +++
5 files changed, 103 insertions(+), 0 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 223a7f7..e18b5ad 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -504,6 +504,8 @@ struct adapter {
void **tid_release_head;
spinlock_t tid_release_lock;
struct work_struct tid_release_task;
+ struct work_struct db_full_task;
+ struct work_struct db_drop_task;
bool tid_release_task_busy;
struct dentry *debugfs_root;
@@ -719,4 +721,6 @@ int t4_ctrl_eq_free(struct adapter *adap, unsigned int mbox, unsigned int pf,
int t4_ofld_eq_free(struct adapter *adap, unsigned int mbox, unsigned int pf,
unsigned int vf, unsigned int eqid);
int t4_handle_fw_rpl(struct adapter *adap, const __be64 *rpl);
+void t4_db_full(struct adapter *adapter);
+void t4_db_dropped(struct adapter *adapter);
#endif /* __CXGB4_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 4c8f42a..870c320 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -2371,6 +2371,16 @@ unsigned int cxgb4_port_chan(const struct net_device *dev)
}
EXPORT_SYMBOL(cxgb4_port_chan);
+unsigned int cxgb4_dbfifo_count(const struct net_device *dev, int lpfifo)
+{
+ struct adapter *adap = netdev2adap(dev);
+ u32 v;
+
+ v = t4_read_reg(adap, A_SGE_DBFIFO_STATUS);
+ return lpfifo ? G_LP_COUNT(v) : G_HP_COUNT(v);
+}
+EXPORT_SYMBOL(cxgb4_dbfifo_count);
+
/**
* cxgb4_port_viid - get the VI id of a port
* @dev: the net device for the port
@@ -2451,6 +2461,69 @@ static struct notifier_block cxgb4_netevent_nb = {
.notifier_call = netevent_cb
};
+static void notify_rdma_uld(struct adapter *adap, enum cxgb4_control cmd)
+{
+ mutex_lock(&uld_mutex);
+ if (adap->uld_handle[CXGB4_ULD_RDMA])
+ ulds[CXGB4_ULD_RDMA].control(adap->uld_handle[CXGB4_ULD_RDMA],
+ cmd);
+ mutex_unlock(&uld_mutex);
+}
+
+static void process_db_full(struct work_struct *work)
+{
+ struct adapter *adap;
+ static int delay = 1000;
+ u32 v;
+
+ adap = container_of(work, struct adapter, db_full_task);
+
+
+ /* stop LLD queues */
+
+ notify_rdma_uld(adap, CXGB4_CONTROL_DB_FULL);
+ do {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(usecs_to_jiffies(delay));
+ v = t4_read_reg(adap, A_SGE_DBFIFO_STATUS);
+ if (G_LP_COUNT(v) == 0 && G_HP_COUNT(v) == 0)
+ break;
+ } while (1);
+ notify_rdma_uld(adap, CXGB4_CONTROL_DB_EMPTY);
+
+
+ /*
+ * The more we get db full interrupts, the more we'll delay
+ * in re-enabling db rings on queues, capped off at 200ms.
+ */
+ delay = min(delay << 1, 200000);
+
+ /* resume LLD queues */
+}
+
+static void process_db_drop(struct work_struct *work)
+{
+ struct adapter *adap;
+ adap = container_of(work, struct adapter, db_drop_task);
+
+
+ /*
+ * sync the PIDX values in HW and SW for LLD queues.
+ */
+
+ notify_rdma_uld(adap, CXGB4_CONTROL_DB_DROP);
+}
+
+void t4_db_full(struct adapter *adap)
+{
+ schedule_work(&adap->db_full_task);
+}
+
+void t4_db_dropped(struct adapter *adap)
+{
+ schedule_work(&adap->db_drop_task);
+}
+
static void uld_attach(struct adapter *adap, unsigned int uld)
{
void *handle;
@@ -2654,6 +2727,8 @@ static void cxgb_down(struct adapter *adapter)
{
t4_intr_disable(adapter);
cancel_work_sync(&adapter->tid_release_task);
+ cancel_work_sync(&adapter->db_full_task);
+ cancel_work_sync(&adapter->db_drop_task);
adapter->tid_release_task_busy = false;
adapter->tid_release_head = NULL;
@@ -3606,6 +3681,8 @@ static int __devinit init_one(struct pci_dev *pdev,
spin_lock_init(&adapter->tid_release_lock);
INIT_WORK(&adapter->tid_release_task, process_tid_release_list);
+ INIT_WORK(&adapter->db_full_task, process_db_full);
+ INIT_WORK(&adapter->db_drop_task, process_db_drop);
err = t4_prep_adapter(adapter);
if (err)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
index b1d39b8..5cc2f27 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h
@@ -163,6 +163,12 @@ enum cxgb4_state {
CXGB4_STATE_DETACH
};
+enum cxgb4_control {
+ CXGB4_CONTROL_DB_FULL,
+ CXGB4_CONTROL_DB_EMPTY,
+ CXGB4_CONTROL_DB_DROP,
+};
+
struct pci_dev;
struct l2t_data;
struct net_device;
@@ -225,6 +231,7 @@ struct cxgb4_uld_info {
int cxgb4_register_uld(enum cxgb4_uld type, const struct cxgb4_uld_info *p);
int cxgb4_unregister_uld(enum cxgb4_uld type);
int cxgb4_ofld_send(struct net_device *dev, struct sk_buff *skb);
+unsigned int cxgb4_dbfifo_count(const struct net_device *dev, int lpfifo);
unsigned int cxgb4_port_chan(const struct net_device *dev);
unsigned int cxgb4_port_viid(const struct net_device *dev);
unsigned int cxgb4_port_idx(const struct net_device *dev);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 007ce23..3631fbb 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2414,6 +2414,12 @@ void t4_sge_init(struct adapter *adap)
RXPKTCPLMODE |
(STAT_LEN == 128 ? EGRSTATUSPAGESIZE : 0));
+ t4_set_reg_field(adap, A_SGE_DBFIFO_STATUS,
+ V_HP_INT_THRESH(5) | V_LP_INT_THRESH(5),
+ V_HP_INT_THRESH(5) | V_LP_INT_THRESH(5));
+ t4_set_reg_field(adap, A_SGE_DOORBELL_CONTROL, F_ENABLE_DROP,
+ F_ENABLE_DROP);
+
for (i = v = 0; i < 32; i += 4)
v |= (PAGE_SHIFT - 10) << i;
t4_write_reg(adap, SGE_HOST_PAGE_SIZE, v);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index d1ec111..13609bf 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -1013,6 +1013,8 @@ static void sge_intr_handler(struct adapter *adapter)
{ ERR_INVALID_CIDX_INC,
"SGE GTS CIDX increment too large", -1, 0 },
{ ERR_CPL_OPCODE_0, "SGE received 0-length CPL", -1, 0 },
+ { F_DBFIFO_LP_INT, NULL, -1, 0 },
+ { F_DBFIFO_HP_INT, NULL, -1, 0 },
{ ERR_DROPPED_DB, "SGE doorbell dropped", -1, 0 },
{ ERR_DATA_CPL_ON_HIGH_QID1 | ERR_DATA_CPL_ON_HIGH_QID0,
"SGE IQID > 1023 received CPL for FL", -1, 0 },
@@ -1042,6 +1044,12 @@ static void sge_intr_handler(struct adapter *adapter)
t4_write_reg(adapter, SGE_INT_CAUSE2, v >> 32);
}
+ err = t4_read_reg(adapter, A_SGE_INT_CAUSE3);
+ if (err & (F_DBFIFO_HP_INT|F_DBFIFO_LP_INT))
+ t4_db_full(adapter);
+ if (err & F_ERR_DROPPED_DB)
+ t4_db_dropped(adapter);
+
if (t4_handle_intr_status(adapter, SGE_INT_CAUSE3, sge_intr_info) ||
v != 0)
t4_fatal_err(adapter);
@@ -1513,6 +1521,7 @@ void t4_intr_enable(struct adapter *adapter)
ERR_BAD_DB_PIDX2 | ERR_BAD_DB_PIDX1 |
ERR_BAD_DB_PIDX0 | ERR_ING_CTXT_PRIO |
ERR_EGR_CTXT_PRIO | INGRESS_SIZE_ERR |
+ F_DBFIFO_HP_INT | F_DBFIFO_LP_INT |
EGRESS_SIZE_ERR);
t4_write_reg(adapter, MYPF_REG(PL_PF_INT_ENABLE), PF_INTR_MASK);
t4_set_reg_field(adapter, PL_INT_MAP0, 0, 1 << pf);
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [net-next 5/6] ixgbe: add hardware timestamping support
From: Jacob Keller @ 2011-10-19 17:04 UTC (permalink / raw)
To: Richard Cochran
Cc: Jeff Kirsher, davem, Jacob Keller, netdev, gospo, sassmann
In-Reply-To: <20111017164433.GA2028@netboy.at.omicron.at>
On Mon, Oct 17, 2011 at 9:44 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Mon, Oct 17, 2011 at 05:21:01AM -0700, Jeff Kirsher wrote:
>> The cyclecounter has the potential to miss a wrap-around of the
>> systim register (this should occur no more often than every 35
>> seconds) unless some activity regarding the cycle counter occurs at
>> least once within this time. This version adds a cycle counter read
>> every time the watchdog task is run, which should occur at least once
>> within this timeframe. Any packets being timestamped will also count
>> as a read due to the call to timecompare_update.
>
> So, is this wrap around due to the fact that you are tied to the
> system time via time_compare? Or, putting it another way, can't you
> program the hardware time stamping unit so that the registers have
> some reasonable resolution (like 64 bits worth of nanoseconds) and
> just offer RAW timestamps?
The wrap around is due to hardware limitations. The ixgbe devices
cannot support 64bits worth of nanoseconds and still have the ability
to adjust the frequency in parts per billion. A larger increment
increases the resolution available for frequency adjustments, but
decreases the time it takes for the cycle counter to wrap around.
>
> I would really like to move away from the timecompare hacks and
> towards a proper PHC->SYS PPS solution.
>
I agree that this is the correct approach. The timecompare
functionality does have issues.
>> This version fixes an issue regarding timecompare not updating
>> detected skew after the clock offset is changed due to ptpd or outside
>> influence from the OS. Now the skew detection is forced just before we
>> hand a timestamp up to the kernel stack
>
> Again, doing the update thing on every packet won't work for real
> world PTP scenarios.
>
Which is why the PHC solution is better. Work on implementing this
support is in progress. Out of curiosity, what is the sync rate for
the scenario that breaks this? I would like to try that rate out on my
setup.
> Thanks,
> Richard
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: [PATCH] iproute2: Conforming to -D_FORTIFY_SOURCE=2 restrictions
From: Stephen Hemminger @ 2011-10-19 16:50 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Bin Li, netdev
In-Reply-To: <1319023851.3103.17.camel@edumazet-laptop>
On Wed, 19 Oct 2011 13:30:51 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 19 octobre 2011 à 17:15 +0800, Bin Li a écrit :
> > Stephen,
> >
> > You can reproduce this issue in 2.6.37 like below. And the previous
> > gdb log is after the install the debuginfo package in SUSE.
> >
> > # ip -6 xfrm state add src 3ffe:501:ffff:ff03:21a:64ff:fe12:e4c1 dst
> > 3ffe:501:ffff:ff05:200:ff:fe00:c1c1 proto ah spi 0x1000 mode transport
> > auth md5 "TAHITEST89ABCDEF"
> >
> > *** buffer overflow detected ***: ip terminated
> > ======= Backtrace: =========
> > /lib/libc.so.6(__fortify_fail+0x40)[0xb76d0070]
> > /lib/libc.so.6(+0xe8e27)[0xb76cde27]
> > /lib/libc.so.6(+0xe8317)[0xb76cd317]
> > ip[0x806d6c4]
> > ip(do_xfrm_state+0x120)[0x806dc70]
> > ip(do_xfrm+0x81)[0x806ad51]
> > ip[0x804c355]
> > ip(main+0x476)[0x804caa6]
> > /lib/libc.so.6(__libc_start_main+0xfe)[0xb75fbc2e]
> > ip[0x804c261]
> > ======= Memory map: ========
> > 08048000-08087000 r-xp 00000000 08:01 4465 /sbin/ip
> > 08087000-08088000 r--p 0003e000 08:01 4465 /sbin/ip
> > 08088000-0808a000 rw-p 0003f000 08:01 4465 /sbin/ip
> > 0808a000-080ad000 rw-p 00000000 00:00 0 [heap]
> > b75c6000-b75e2000 r-xp 00000000 08:01 131084 /lib/libgcc_s.so.1
> > b75e2000-b75e3000 r--p 0001b000 08:01 131084 /lib/libgcc_s.so.1
> > b75e3000-b75e4000 rw-p 0001c000 08:01 131084 /lib/libgcc_s.so.1
> > b75e4000-b75e5000 rw-p 00000000 00:00 0
> > b75e5000-b774b000 r-xp 00000000 08:01 131375 /lib/libc-2.11.3.so
> > b774b000-b774c000 ---p 00166000 08:01 131375 /lib/libc-2.11.3.so
> > b774c000-b774e000 r--p 00166000 08:01 131375 /lib/libc-2.11.3.so
> > b774e000-b774f000 rw-p 00168000 08:01 131375 /lib/libc-2.11.3.so
> > b774f000-b7752000 rw-p 00000000 00:00 0
> > b7752000-b7755000 r-xp 00000000 08:01 131428 /lib/libdl-2.11.3.so
> > b7755000-b7756000 r--p 00002000 08:01 131428 /lib/libdl-2.11.3.so
> > b7756000-b7757000 rw-p 00003000 08:01 131428 /lib/libdl-2.11.3.so
> > b7774000-b7775000 rw-p 00000000 00:00 0
> > b7775000-b7794000 r-xp 00000000 08:01 154467 /lib/ld-2.11.3.so
> > b7794000-b7795000 r--p 0001e000 08:01 154467 /lib/ld-2.11.3.so
> > b7795000-b7796000 rw-p 0001f000 08:01 154467 /lib/ld-2.11.3.so
> > bfa02000-bfa23000 rw-p 00000000 00:00 0 [stack]
> > ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso]
> > Aborted
> >
> > And If without -D_FORTIFY_SOURCE=2 in gcc, it works fine, so It's a
> > bug in iproute2 which is not conforming to -D_FORTIFY_SOURCE=2
> > restrictions.
> >
>
> FORTIFY assumes we cant copy a string on alg.u.alg.alg_key !
>
> This completely precludes 0-sized arrays
>
> struct xfrm_algo {
> char alg_name[64];
> unsigned int alg_key_len; /* in bits */
> char alg_key[0];
> };
>
> struct {
> union {
> struct xfrm_algo alg;
> struct xfrm_algo_aead aead;
> struct xfrm_algo_auth auth;
> } u;
> char buf[XFRM_ALGO_KEY_BUF_SIZE];
> } alg = {};
>
> I would say its a FORTIFY bug. This kind of construct is perfectly
> valid.
Maybe it will handle flexible style arrays.
See also:
http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
At this time, I won't accept the patch that uses alloca() just to deal
with this FORTIFY bug.
^ permalink raw reply
* Re: BUG: All network processes hang (brcmsmac/wpa_supplicant)
From: Arend van Spriel @ 2011-10-19 16:48 UTC (permalink / raw)
To: Nico Schottelius, Eric Dumazet, LKML,
"linux-wireless@vger.kernel.org" <linux-wireles
In-Reply-To: <20111019145501.GB1414@schottelius.org>
[-- Attachment #1.1: Type: text/plain, Size: 1028 bytes --]
On 10/19/2011 04:55 PM, Nico Schottelius wrote:
> Hey Arend,
>
>
> Arend van Spriel [Wed, Oct 19, 2011 at 04:43:27PM +0200]:
>> I actually don't see the rtnl_lock or mutex_lock that was listed in your
>> original dmesg. There are a lot of warning traces from mac80211.c.
The warnings are probably a diversion. This is the warning in
ieee80211_rx():
if (WARN_ON(status->rate_idx < 0 ||
status->rate_idx >= sband->n_bitrates))
goto drop;
rate = &sband->bitrates[status->rate_idx];
This means that the driver (brcmsmac) provides a data packet with an
out-of-range rate index. This reminded me of a fix I made about a month
ago. Could you apply the attached patch file. It is based on:
commit bee709ab1d390afe69e4407bc86bb706c6fb2965
Merge: ad1c761 1f2c7e9
Author: Nico Schottelius <nico@kr.ethz.ch>
Date: Tue Oct 18 00:04:05 2011 +0200
Merge branch 'fix-edp-vdd-power' of ../keithp/linux
As it drops receive packets it may be your problem. Is your AP on 5GHz?
Gr. AvS
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: 0001-staging-brcm80211-fix-for-rate-index-in-receive-stat.patch --]
[-- Type: text/x-patch; name="0001-staging-brcm80211-fix-for-rate-index-in-receive-stat.patch", Size: 1096 bytes --]
From 7d14bd6cbfbf26369c5958e56a468fd8429841d7 Mon Sep 17 00:00:00 2001
From: Arend van Spriel <arend@broadcom.com>
Date: Wed, 19 Oct 2011 18:42:45 +0200
Subject: [PATCH] staging: brcm80211: fix for rate index in receive status
Made a patch for Nico to try whether this resolves his issue.
Signed-off-by: Arend van Spriel <arend@broadcom.com>
---
drivers/staging/brcm80211/brcmsmac/main.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/drivers/staging/brcm80211/brcmsmac/main.c b/drivers/staging/brcm80211/brcmsmac/main.c
index 1763c45..49c8eb9 100644
--- a/drivers/staging/brcm80211/brcmsmac/main.c
+++ b/drivers/staging/brcm80211/brcmsmac/main.c
@@ -4608,6 +4608,10 @@ prep_mac80211_status(struct brcms_c_info *wlc, struct d11rxhdr *rxh,
wiphy_err(wlc->wiphy, "%s: Unknown rate\n", __func__);
}
+ /* DEBUG: fix rate index in receive status */
+ if (rx_status->band == IEEE80211_BAND_5GHZ)
+ rx_status->rate_idx -= 4;
+
/* Determine short preamble and rate_idx */
preamble = 0;
if (IS_CCK(rspec)) {
--
1.7.4.1
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 900 bytes --]
^ permalink raw reply related
* RE: [PATCH net-next] igb: fix a compile warning
From: Rose, Gregory V @ 2011-10-19 16:28 UTC (permalink / raw)
To: roy.qing.li@gmail.com, netdev@vger.kernel.org
In-Reply-To: <1319014355-15678-1-git-send-email-roy.qing.li@gmail.com>
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of roy.qing.li@gmail.com
> Sent: Wednesday, October 19, 2011 1:53 AM
> To: netdev@vger.kernel.org
> Subject: [PATCH net-next] igb: fix a compile warning
>
> From: RongQing Li <roy.qing.li@gmail.com>
>
> control these three function declarations and
> definitions with same macro CONFIG_PCI_IOV
>
> drivers/net/ethernet/intel/igb/igb_main.c:165:
> warning: ‘igb_vf_configure’ declared ‘static’ but never defined
> drivers/net/ethernet/intel/igb/igb_main.c:166:
> warning: ‘igb_find_enabled_vfs’ declared ‘static’ but never defined
> drivers/net/ethernet/intel/igb/igb_main.c:167:
> warning: ‘igb_check_vf_assignment’ declared ‘static’ but never defined
>
> Signed-off-by: RongQing Li <roy.qing.li@gmail.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
^ permalink raw reply
* Re: BUG: All network processes hang (brcmsmac/wpa_supplicant)
From: Nico Schottelius @ 2011-10-19 14:58 UTC (permalink / raw)
To: Eric Dumazet
Cc: Nico Schottelius, Arend van Spriel, LKML,
linux-wireless@vger.kernel.org, netdev
In-Reply-To: <1319035584.8416.34.camel@edumazet-laptop>
Eric Dumazet [Wed, Oct 19, 2011 at 04:46:24PM +0200]:
> Le mercredi 19 octobre 2011 à 16:28 +0200, Nico Schottelius a écrit :
> > Triggered it!
> >
> > And also got the traceback!
> >
> > Dmesg + log attached, let me know when I can pull the fix from
> > somewhere.
> >
>
> Sorry, I cant see same messages in your dmesg, no mention of stuck
> processes (blocked for more than 120 seconds.)
>
> By the way, please update to latest 3.1.rc10
I can rebuild Linus' tree if you want, but only use it temporarily due
to the other errors mentioned in the previous message.
If you've a tree somewhere that merges on top of Keith+Jiri,
I'd try to merge that one as well, to make the mba more usable.
Cheers,
Nico
--
PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0
^ permalink raw reply
* Re: [PATCH 6/7] mlx4_en: Adding rxhash support
From: Ben Hutchings @ 2011-10-19 14:57 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Yevgeny Petrilin, Eric Dumazet, davem@davemloft.net,
netdev@vger.kernel.org
In-Reply-To: <1318966500.2783.124.camel@bwh-desktop>
On Tue, 2011-10-18 at 20:35 +0100, Ben Hutchings wrote:
[...]
> There was also a proposal a while back that we should try to make the
> hash symmetric w.r.t. RX and TX addresses, so that both directions of a
> flow through a router/bridge are aligned. I think this was to be done
> by repeating a 16-bit pattern across the key. Not sure whether that's
> worthwhile.
That also makes it relatively cheap to calculate in software, which
DragonflyBSD does:
http://gitweb.dragonflybsd.org/dragonfly.git/blob/master:/sys/net/toeplitz.c
http://gitweb.dragonflybsd.org/dragonfly.git/blob/master:/sys/net/toeplitz2.h
(the latter file appears to assume that in_addr_t/in_port_t are byte-
swapped i.e. the host is little-endian).
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: BUG: All network processes hang (brcmsmac/wpa_supplicant)
From: Nico Schottelius @ 2011-10-19 14:55 UTC (permalink / raw)
To: Arend van Spriel
Cc: Nico Schottelius, Eric Dumazet, LKML,
linux-wireless@vger.kernel.org, netdev, wey-yi.w.guy
In-Reply-To: <4E9EE20F.1030403@broadcom.com>
[-- Attachment #1: Type: text/plain, Size: 2199 bytes --]
Hey Arend,
Arend van Spriel [Wed, Oct 19, 2011 at 04:43:27PM +0200]:
> I actually don't see the rtnl_lock or mutex_lock that was listed in your
> original dmesg. There are a lot of warning traces from mac80211.c.
Oh, sorry, ip r was stuck already and I didn't wait 120 seconds!
> The other interesting thing is the following:
>
> [ 367.389096] wlan0: deauthenticating from 00:03:52:e3:0e:10 by local
> choice (reason=3)
> [ 367.416312] cfg80211: Calling CRDA to update world regulatory domain
> [ 369.367667] wlan0: direct probe to 00:03:52:2c:01:00 (try 1/3)
> [ 369.565008] wlan0: direct probe to 00:03:52:2c:01:00 (try 2/3)
> [ 369.764891] wlan0: direct probe to 00:03:52:2c:01:00 (try 3/3)
> [ 369.964768] wlan0: direct probe to 00:03:52:2c:01:00 timed out
> [ 381.310339] wlan0: direct probe to 00:0f:61:1a:0c:50 (try 1/3)
> [ 381.508071] wlan0: direct probe to 00:0f:61:1a:0c:50 (try 2/3)
> [ 381.707955] wlan0: direct probe to 00:0f:61:1a:0c:50 (try 3/3)
> [ 381.907788] wlan0: direct probe to 00:0f:61:1a:0c:50 timed out
> [ 393.250073] wlan0: direct probe to 00:03:52:2c:01:00 (try 1/3)
> [ 393.447810] wlan0: direct probe to 00:03:52:2c:01:00 (try 2/3)
> [ 393.647700] wlan0: direct probe to 00:03:52:2c:01:00 (try 3/3)
> [ 393.847576] wlan0: direct probe to 00:03:52:2c:01:00 timed out
> [ 405.183177] wlan0: direct probe to 00:0f:61:1a:1d:30 (try 1/3)
> [ 405.380906] wlan0: direct probe to 00:0f:61:1a:1d:30 (try 2/3)
> [ 405.580773] wlan0: direct probe to 00:0f:61:1a:1d:30 (try 3/3)
> [ 405.780658] wlan0: direct probe to 00:0f:61:1a:1d:30 timed out
>
> There is another thread title "iwlagn is getting very shaky". It makes
> me wonder.
>
> Where did you get your kernel from. From you dmesg I see:
It's a mix of Keith's and Jiri's branches:
http://www.nico.schottelius.org/blog/macbook-air-42-touchpad-keyboard-correct-screen-resolution/
Those two are needed to get screen/keyboard/mouse working on the MacBook Air.
You can pull the merged result directly from
git://git.schottelius.org/foreign/linux-keith-jiri-mba
Cheers,
Nico
--
PGP key: 7ED9 F7D3 6B10 81D7 0EC5 5C09 D7DC C8E4 3187 7DF0
[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply
* Re: BUG: All network processes hang (brcmsmac/wpa_supplicant)
From: Eric Dumazet @ 2011-10-19 14:46 UTC (permalink / raw)
To: Nico Schottelius
Cc: Arend van Spriel, LKML, linux-wireless@vger.kernel.org, netdev
In-Reply-To: <20111019142837.GA1414@schottelius.org>
Le mercredi 19 octobre 2011 à 16:28 +0200, Nico Schottelius a écrit :
> Triggered it!
>
> And also got the traceback!
>
> Dmesg + log attached, let me know when I can pull the fix from
> somewhere.
>
Sorry, I cant see same messages in your dmesg, no mention of stuck
processes (blocked for more than 120 seconds.)
By the way, please update to latest 3.1.rc10
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox