* mlx4 BUG_ON in probe path
From: Bjorn Helgaas @ 2016-11-16 18:25 UTC (permalink / raw)
To: Yishai Hadas; +Cc: netdev, linux-rdma, Johannes Thumshirn, linux-kernel
Hi Yishai,
Johannes has been working on an mlx4 initialization problem on an
IBM x3850 X6. The underlying problem is a PCI core issue -- we're
setting RCB in the Mellanox device, which means it thinks it can
generate 128-byte Completions, even though the Root Port above it
can't handle them. That issue is
https://bugzilla.kernel.org/show_bug.cgi?id=187781
The machine crashed when this happened, apparently not because of any
error reported via AER, but because mlx4 contains a BUG_ON, probably
the one in mlx4_enter_error_state().
That one happens if pci_channel_offline() returns false. Is this
telling us about a problem in PCI error handling, or is it just a case
where mlx4 isn't as smart as it could be?
Ideally, if mlx4 can't initialize the device, it should just return an
error from the probe function instead of crashing the whole machine.
Here's the crash (the entire dmesg log is in the bugzilla above):
mlx4_core 0000:41:00.0: command 0xfff timed out (go bit not cleared)
mlx4_core 0000:41:00.0: device is going to be reset
mlx4_core 0000:41:00.0: Failed to obtain HW semaphore, aborting
mlx4_core 0000:41:00.0: Fail to reset HCA
------------[ cut here ]------------
kernel BUG at drivers/net/ethernet/mellanox/mlx4/catas.c:193!
invalid opcode: 0000 [#1] SMP
Modules linked in: sr_mod(E) cdrom(E) uas(E) usb_storage(E) mlx4_core(E+) cdc_ether(E) usbnet(E) mii(E) joydev(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) drbg(E) ansi_cprng(E) aesni_intel(E) iTCO_wdt(E) aes_x86_64(E) igb(E) ipmi_devintf(E) iTCO_vendor_support(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) ptp(E) cryptd(E) pps_core(E) sb_edac(E) pcspkr(E) lpc_ich(E) ipmi_ssif(E) ioatdma(E) edac_core(E) shpchp(E) mfd_core(E) dca(E) wmi(E) ipmi_si(E) ipmi_msghandler(E) fjes(E) button(E) processor(E) acpi_pad(E) hid_generic(E) usbhid(E) ext4(E) crc16(E) jbd2(E) mbcache(E) sd_mod(E) mgag200(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) xhci_pci(E) sysfillrect(E) ehci_pci(E) sysimgbl
t(E)
fb_sys_fops(E) xhci_hcd(E) ehci_hcd(E) ttm(E) usbcore(E) drm(E) usb_common(E) megaraid_sas(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) autofs4(E)
Supported: Yes
CPU: 27 PID: 2867 Comm: modprobe Tainted: G E 4.4.21-default #6
Hardware name: IBM x3850 X6 -[3837Z7P]-/00FN772, BIOS -[A8E120CUS-1.30]- 08/22/2016
task: ffff881fb2ff9280 ti: ffff881fbd3c4000 task.ti: ffff881fbd3c4000
RIP: 0010:[<ffffffffa0446740>] [<ffffffffa0446740>] mlx4_enter_error_state+0x240/0x320 [mlx4_core]
RSP: 0018:ffff881fbd3c79a0 EFLAGS: 00010246
RAX: ffff8820b2486e00 RBX: ffff883fbe240000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffff881fbf63b000
RBP: ffff8820b2486e60 R08: 0000000000000029 R09: ffff88803feda50f
R10: 00000000000d1b50 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffff883fbe240460 R15: 00000000fffffffb
FS: 00007f7c55203700(0000) GS:ffff883fbf900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1813c88000 CR3: 0000003fbe637000 CR4: 00000000001406e0
Stack:
15b30000c0000100 ffff883fbe240000 0000000000000fff 0000000000000000
ffffffffa0447d54 000000000000ffff ffffffff00000000 000000000000ea60
0000000000000000 000000000000ea60 ffffc90031dba680 ffff883fbe240000
Call Trace:
[<ffffffffa0447d54>] __mlx4_cmd+0x594/0x8a0 [mlx4_core]
[<ffffffffa045191b>] mlx4_map_cmd+0x2ab/0x3c0 [mlx4_core]
[<ffffffffa045a855>] mlx4_load_one+0x515/0x1220 [mlx4_core]
[<ffffffffa045bb69>] mlx4_init_one+0x4e9/0x6a0 [mlx4_core]
[<ffffffff8135626f>] local_pci_probe+0x3f/0xa0
[<ffffffff81357694>] pci_device_probe+0xd4/0x120
[<ffffffff8144d0b7>] driver_probe_device+0x1f7/0x420
[<ffffffff8144d35b>] __driver_attach+0x7b/0x80
[<ffffffff8144afc8>] bus_for_each_dev+0x58/0x90
[<ffffffff8144c519>] bus_add_driver+0x1c9/0x280
[<ffffffff8144dccb>] driver_register+0x5b/0xd0
[<ffffffffa03f911a>] mlx4_init+0x11a/0x1000 [mlx4_core]
[<ffffffff81002138>] do_one_initcall+0xc8/0x1f0
[<ffffffff81182a08>] do_init_module+0x5a/0x1d7
[<ffffffff81103726>] load_module+0x1366/0x1c50
[<ffffffff811041c0>] SYSC_finit_module+0x70/0xa0
[<ffffffff815e14ae>] entry_SYSCALL_64_fastpath+0x12/0x71
^ permalink raw reply
* [PATCH rdma-core] ccan: Add likely implementation
From: Leon Romanovsky @ 2016-11-16 17:40 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
yishaih-VPRAkNaXOzVWk0Htik3J/w,
Tatyana.E.Nikolova-ral2JQCrhuEAvxtiuMwx3w,
oulijun-hv44wF8Li93QT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Add likely/unlikely macros to ccan directory.
This change includes adjustments to various providers, who
defined it locally (nes, mlx4, mlx5 and hns).
The code of i40iw had such definitions too, but without actual usage and
this patch removed this dead code.
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
ccan/CMakeLists.txt | 2 +
ccan/likely.c | 136 ++++++++++++++++++++++++++++++++++++++++++
ccan/likely.h | 111 ++++++++++++++++++++++++++++++++++
providers/hns/hns_roce_u.h | 5 +-
providers/i40iw/i40iw_umain.h | 7 ---
providers/mlx4/mlx4.h | 8 ---
providers/mlx4/qp.c | 1 +
providers/mlx5/mlx5.h | 6 +-
providers/nes/nes_umain.h | 8 +--
9 files changed, 253 insertions(+), 31 deletions(-)
create mode 100644 ccan/likely.c
create mode 100644 ccan/likely.h
diff --git a/ccan/CMakeLists.txt b/ccan/CMakeLists.txt
index b5de515..153426f 100644
--- a/ccan/CMakeLists.txt
+++ b/ccan/CMakeLists.txt
@@ -6,11 +6,13 @@ publish_internal_headers(ccan
minmax.h
str.h
str_debug.h
+ likely.h
)
set(C_FILES
list.c
str.c
+ likely.c
)
add_library(ccan STATIC ${C_FILES})
add_library(ccan_pic STATIC ${C_FILES})
diff --git a/ccan/likely.c b/ccan/likely.c
new file mode 100644
index 0000000..83e8d6f
--- /dev/null
+++ b/ccan/likely.c
@@ -0,0 +1,136 @@
+/* CC0 (Public domain) - see LICENSE file for details. */
+#ifdef CCAN_LIKELY_DEBUG
+#include <ccan/likely/likely.h>
+#include <ccan/hash/hash.h>
+#include <ccan/htable/htable_type.h>
+#include <stdlib.h>
+#include <stdio.h>
+struct trace {
+ const char *condstr;
+ const char *file;
+ unsigned int line;
+ bool expect;
+ unsigned long count, right;
+};
+
+static size_t hash_trace(const struct trace *trace)
+{
+ return hash(trace->condstr, strlen(trace->condstr),
+ hash(trace->file, strlen(trace->file),
+ trace->line + trace->expect));
+}
+
+static bool trace_eq(const struct trace *t1, const struct trace *t2)
+{
+ return t1->condstr == t2->condstr
+ && t1->file == t2->file
+ && t1->line == t2->line
+ && t1->expect == t2->expect;
+}
+
+/* struct thash */
+HTABLE_DEFINE_TYPE(struct trace, (const struct trace *), hash_trace, trace_eq,
+ thash);
+
+static struct thash htable
+= { HTABLE_INITIALIZER(htable.raw, thash_hash, NULL) };
+
+static void init_trace(struct trace *trace,
+ const char *condstr, const char *file, unsigned int line,
+ bool expect)
+{
+ trace->condstr = condstr;
+ trace->file = file;
+ trace->line = line;
+ trace->expect = expect;
+ trace->count = trace->right = 0;
+}
+
+static struct trace *add_trace(const struct trace *t)
+{
+ struct trace *trace = malloc(sizeof(*trace));
+ *trace = *t;
+ thash_add(&htable, trace);
+ return trace;
+}
+
+long _likely_trace(bool cond, bool expect,
+ const char *condstr,
+ const char *file, unsigned int line)
+{
+ struct trace *p, trace;
+
+ init_trace(&trace, condstr, file, line, expect);
+ p = thash_get(&htable, &trace);
+ if (!p)
+ p = add_trace(&trace);
+
+ p->count++;
+ if (cond == expect)
+ p->right++;
+
+ return cond;
+}
+
+static double right_ratio(const struct trace *t)
+{
+ return (double)t->right / t->count;
+}
+
+char *likely_stats(unsigned int min_hits, unsigned int percent)
+{
+ struct trace *worst;
+ double worst_ratio;
+ struct thash_iter i;
+ char *ret;
+ struct trace *t;
+
+ worst = NULL;
+ worst_ratio = 2;
+
+ /* This is O(n), but it's not likely called that often. */
+ for (t = thash_first(&htable, &i); t; t = thash_next(&htable, &i)) {
+ if (t->count >= min_hits) {
+ if (right_ratio(t) < worst_ratio) {
+ worst = t;
+ worst_ratio = right_ratio(t);
+ }
+ }
+ }
+
+ if (worst_ratio * 100 > percent)
+ return NULL;
+
+ ret = malloc(strlen(worst->condstr) +
+ strlen(worst->file) +
+ sizeof(long int) * 8 +
+ sizeof("%s:%u:%slikely(%s) correct %u%% (%lu/%lu)"));
+ sprintf(ret, "%s:%u:%slikely(%s) correct %u%% (%lu/%lu)",
+ worst->file, worst->line,
+ worst->expect ? "" : "un", worst->condstr,
+ (unsigned)(worst_ratio * 100),
+ worst->right, worst->count);
+
+ thash_del(&htable, worst);
+ free(worst);
+
+ return ret;
+}
+
+void likely_stats_reset(void)
+{
+ struct thash_iter i;
+ struct trace *t;
+
+ /* This is a bit better than O(n^2), but we have to loop since
+ * first/next during delete is unreliable. */
+ while ((t = thash_first(&htable, &i)) != NULL) {
+ for (; t; t = thash_next(&htable, &i)) {
+ thash_del(&htable, t);
+ free(t);
+ }
+ }
+
+ thash_clear(&htable);
+}
+#endif /*CCAN_LIKELY_DEBUG*/
diff --git a/ccan/likely.h b/ccan/likely.h
new file mode 100644
index 0000000..a8f003d
--- /dev/null
+++ b/ccan/likely.h
@@ -0,0 +1,111 @@
+/* CC0 (Public domain) - see LICENSE file for details */
+#ifndef CCAN_LIKELY_H
+#define CCAN_LIKELY_H
+#include "config.h"
+#include <stdbool.h>
+
+#ifndef CCAN_LIKELY_DEBUG
+#if HAVE_BUILTIN_EXPECT
+/**
+ * likely - indicate that a condition is likely to be true.
+ * @cond: the condition
+ *
+ * This uses a compiler extension where available to indicate a likely
+ * code path and optimize appropriately; it's also useful for readers
+ * to quickly identify exceptional paths through functions. The
+ * threshold for "likely" is usually considered to be between 90 and
+ * 99%; marginal cases should not be marked either way.
+ *
+ * See Also:
+ * unlikely(), likely_stats()
+ *
+ * Example:
+ * // Returns false if we overflow.
+ * static inline bool inc_int(unsigned int *val)
+ * {
+ * (*val)++;
+ * if (likely(*val))
+ * return true;
+ * return false;
+ * }
+ */
+#define likely(cond) __builtin_expect(!!(cond), 1)
+
+/**
+ * unlikely - indicate that a condition is unlikely to be true.
+ * @cond: the condition
+ *
+ * This uses a compiler extension where available to indicate an unlikely
+ * code path and optimize appropriately; see likely() above.
+ *
+ * See Also:
+ * likely(), likely_stats(), COLD (compiler.h)
+ *
+ * Example:
+ * // Prints a warning if we overflow.
+ * static inline void inc_int(unsigned int *val)
+ * {
+ * (*val)++;
+ * if (unlikely(*val == 0))
+ * fprintf(stderr, "Overflow!");
+ * }
+ */
+#define unlikely(cond) __builtin_expect(!!(cond), 0)
+#else
+#define likely(cond) (!!(cond))
+#define unlikely(cond) (!!(cond))
+#endif
+#else /* CCAN_LIKELY_DEBUG versions */
+#include <ccan/str/str.h>
+
+#define likely(cond) \
+ (_likely_trace(!!(cond), 1, stringify(cond), __FILE__, __LINE__))
+#define unlikely(cond) \
+ (_likely_trace(!!(cond), 0, stringify(cond), __FILE__, __LINE__))
+
+long _likely_trace(bool cond, bool expect,
+ const char *condstr,
+ const char *file, unsigned int line);
+/**
+ * likely_stats - return description of abused likely()/unlikely()
+ * @min_hits: minimum number of hits
+ * @percent: maximum percentage correct
+ *
+ * When CCAN_LIKELY_DEBUG is defined, likely() and unlikely() trace their
+ * results: this causes a significant slowdown, but allows analysis of
+ * whether the branches are labelled correctly.
+ *
+ * This function returns a malloc'ed description of the least-correct
+ * usage of likely() or unlikely(). It ignores places which have been
+ * called less than @min_hits times, and those which were predicted
+ * correctly more than @percent of the time. It returns NULL when
+ * nothing meets those criteria.
+ *
+ * Note that this call is destructive; the returned offender is
+ * removed from the trace so that the next call to likely_stats() will
+ * return the next-worst likely()/unlikely() usage.
+ *
+ * Example:
+ * // Print every place hit more than twice which was wrong > 5%.
+ * static void report_stats(void)
+ * {
+ * #ifdef CCAN_LIKELY_DEBUG
+ * const char *bad;
+ *
+ * while ((bad = likely_stats(2, 95)) != NULL) {
+ * printf("Suspicious likely: %s", bad);
+ * free(bad);
+ * }
+ * #endif
+ * }
+ */
+char *likely_stats(unsigned int min_hits, unsigned int percent);
+
+/**
+ * likely_stats_reset - free up memory of likely()/unlikely() branches.
+ *
+ * This can also plug memory leaks.
+ */
+void likely_stats_reset(void);
+#endif /* CCAN_LIKELY_DEBUG */
+#endif /* CCAN_LIKELY_H */
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index 4a6ed8e..1659958 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -39,6 +39,7 @@
#include <infiniband/arch.h>
#include <infiniband/verbs.h>
#include <ccan/container_of.h>
+#include <ccan/likely.h>
#define HNS_ROCE_CQE_ENTRY_SIZE 0x20
@@ -51,10 +52,6 @@
#define PFX "hns: "
-#ifndef likely
-#define likely(x) __builtin_expect(!!(x), 1)
-#endif
-
#define roce_get_field(origin, mask, shift) \
(((origin) & (mask)) >> (shift))
diff --git a/providers/i40iw/i40iw_umain.h b/providers/i40iw/i40iw_umain.h
index 719aefc..6a23504 100644
--- a/providers/i40iw/i40iw_umain.h
+++ b/providers/i40iw/i40iw_umain.h
@@ -47,13 +47,6 @@
#include "i40iw_status.h"
#include "i40iw_user.h"
-#ifndef likely
-#define likely(x) __builtin_expect((x), 1)
-#endif
-#ifndef unlikely
-#define unlikely(x) __builtin_expect((x), 0)
-#endif
-
#define PFX "libi40iw-"
#define I40IW_BASE_PUSH_PAGE 1
diff --git a/providers/mlx4/mlx4.h b/providers/mlx4/mlx4.h
index b851e95..6467d5a 100644
--- a/providers/mlx4/mlx4.h
+++ b/providers/mlx4/mlx4.h
@@ -51,14 +51,6 @@ enum {
MLX4_STAT_RATE_OFFSET = 5
};
-#ifndef likely
-#ifdef __GNUC__
-#define likely(x) __builtin_expect(!!(x),1)
-#else
-#define likely(x) (x)
-#endif
-#endif
-
enum {
MLX4_QP_TABLE_BITS = 8,
MLX4_QP_TABLE_SIZE = 1 << MLX4_QP_TABLE_BITS,
diff --git a/providers/mlx4/qp.c b/providers/mlx4/qp.c
index 268fb7d..af08874 100644
--- a/providers/mlx4/qp.c
+++ b/providers/mlx4/qp.c
@@ -40,6 +40,7 @@
#include <string.h>
#include <errno.h>
#include <util/compiler.h>
+#include <ccan/likely.h>
#include "mlx4.h"
#include "doorbell.h"
diff --git a/providers/mlx5/mlx5.h b/providers/mlx5/mlx5.h
index cb65429..d5704d4 100644
--- a/providers/mlx5/mlx5.h
+++ b/providers/mlx5/mlx5.h
@@ -42,11 +42,7 @@
#include <ccan/list.h>
#include "bitmap.h"
#include <ccan/minmax.h>
-
-#ifdef __GNUC__
-#define likely(x) __builtin_expect((x), 1)
-#define unlikely(x) __builtin_expect((x), 0)
-#endif
+#include <ccan/likely.h>
#include <valgrind/memcheck.h>
diff --git a/providers/nes/nes_umain.h b/providers/nes/nes_umain.h
index 093956a..5f357a2 100644
--- a/providers/nes/nes_umain.h
+++ b/providers/nes/nes_umain.h
@@ -40,13 +40,7 @@
#include <infiniband/driver.h>
#include <infiniband/arch.h>
-
-#ifndef likely
-#define likely(x) __builtin_expect((x),1)
-#endif
-#ifndef unlikely
-#define unlikely(x) __builtin_expect((x),0)
-#endif
+#include <ccan/likely.h>
#define PFX "libnes: "
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* RE: Configuration of cq->cqe is lower than entries by 1
From: Amrani, Ram @ 2016-11-16 10:05 UTC (permalink / raw)
To: Majd Dibbiny
Cc: Leon Romanovsky,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <84A08B2A-85D5-467C-AE80-63134CA07767-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Correct. We create it with 64(the result of the roundup) and let him work with
> 63 only.
ACK
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* RE: Configuration of cq->cqe is lower than entries by 1
From: Amrani, Ram @ 2016-11-16 9:45 UTC (permalink / raw)
To: Majd Dibbiny
Cc: Leon Romanovsky,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <2DF5C492-C364-4353-8FA9-51FA5EE760F0-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> 192 entries = roundup_pow_of_two(entries + 1);
>> 193 cq->ibcq.cqe = entries - 1;
>>
>> I thought something else might hide there.
>> Hi Ram,
>>
> For CQs, we always reserve an extra CQE, and thus report one less CQE to the
> user.
> This CQE is used for resize CQ operations.
> When the CQ is resized, the HW posts a CQE to indicate that the operation was
> completed on the original CQ buffer.
> Hope now it's clear.
Thanks. I had a guess this was related.
Note that there might be different behavior than what you are describing. If the user requested for (2^n) -1 then it'll receive the exact number of entries it requested, without the (at least one) extra entries.
For example, if the user requests 63 entries (n=6) it'll receive the same number of entries:
192 entries = roundup_pow_of_two(63+ 1); // this gives 64
193 cq->ibcq.cqe = entries - 1; // this gives 63 again.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH rdma-core] rxe: Remove perl::switch module dependancy
From: Yonatan Cohen @ 2016-11-16 8:46 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA, leon-DgEjT+Ai2ygdnm+yROfE0A
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yonatan Cohen
Remove perl::switch dependency from RXE, since it is
not installed by default.
Signed-off-by: Yonatan Cohen <yonatanc-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
providers/rxe/rxe_cfg | 35 +++++++++++++----------------------
1 file changed, 13 insertions(+), 22 deletions(-)
diff --git a/providers/rxe/rxe_cfg b/providers/rxe/rxe_cfg
index 6c414fb..c2dbd0e 100755
--- a/providers/rxe/rxe_cfg
+++ b/providers/rxe/rxe_cfg
@@ -37,7 +37,6 @@ use strict;
use File::Basename;
use Getopt::Long;
-use Switch;
my $help = 0;
my $no_persist = 0;
@@ -559,26 +558,21 @@ sub do_debug {
my $debugfile = "$parms/debug";
chomp($arg2);
- #print "debug $arg2\n";
- #system("echo 'debug $arg2' > $proc");
-
if (!(-e "$debugfile")) {
print "Error: debug is compiled out of this rxe driver\n";
return;
}
- switch ($arg2) {
- case "on" { system("echo '31' > $debugfile"); }
- case "off" { system("echo '0' > $debugfile"); }
- case "0" { system("echo '0' > $debugfile"); }
- case "" { }
+ if ($arg2 eq "on") { system("echo '31' > $debugfile"); }
+ elsif ($arg2 eq "off") { system("echo '0' > $debugfile"); }
+ elsif ($arg2 eq "0") { system("echo '0' > $debugfile"); }
+ elsif ($arg2 eq "") { }
elsif ($arg2 ge "0" && $arg2 le "31") {
system("echo '$arg2' > $debugfile");
}
else {
print "unrecognized debug cmd ($arg2)\n";
}
- }
my $current = read_file($debugfile);
chomp($current);
@@ -645,11 +639,10 @@ sub main {
}
# stuff that does not require modules to be loaded
- switch($arg1) {
- case "help" { usage(); exit; }
- case "start" { do_start(); do_status(); exit; }
- case "persistent" { system("cat $persistence_file"); exit; }
- }
+ if ($arg1 eq "help") { usage(); exit; }
+ elsif ($arg1 eq "start") { do_start(); do_status(); exit; }
+ elsif ($arg1 eq "persistent") { system("cat $persistence_file"); exit; }
+
# can't do much else, bail if modules aren't loaded
if (check_module_status()) {
@@ -668,13 +661,11 @@ sub main {
get_dev_info();
# Stuff that requires the rdma_rxe module to be loaded
- switch($arg1) {
- case "stop" { do_stop(); exit; }
- case "debug" { do_debug($arg2); exit; }
- case "add" { rxe_add($arg2); exit; }
- case "remove" { rxe_remove($arg2); exit; }
- case "help" { usage(); exit; }
- }
+ if ($arg1 eq "stop") { do_stop(); exit; }
+ elsif ($arg1 eq "debug") { do_debug($arg2); exit; }
+ elsif ($arg1 eq "add") { rxe_add($arg2); exit; }
+ elsif ($arg1 eq "remove") { rxe_remove($arg2); exit; }
+ elsif ($arg1 eq "help") { usage(); exit; }
}
main();
--
1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-rc 5/5] IB/rxe: Update qp state for user query
From: Leon Romanovsky @ 2016-11-16 8:39 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yonatan Cohen
In-Reply-To: <1479285558-19627-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Yonatan Cohen <yonatanc-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
The method rxe_qp_error() transitions QP to error state
and make sure the QP is drained. It did not though update
the QP state for user's query.
This patch fixes this.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/sw/rxe/rxe_qp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index 95aaaa2..c3e60e4 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -574,6 +574,7 @@ void rxe_qp_error(struct rxe_qp *qp)
{
qp->req.state = QP_STATE_ERROR;
qp->resp.state = QP_STATE_ERROR;
+ qp->attr.qp_state = IB_QPS_ERR;
/* drain work and packet queues */
rxe_run_task(&qp->resp.task, 1);
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-rc 4/5] IB/rxe: Clear queue buffer when modifying QP to reset
From: Leon Romanovsky @ 2016-11-16 8:39 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yonatan Cohen
In-Reply-To: <1479285558-19627-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Yonatan Cohen <yonatanc-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
RXE resets the send-q only once in rxe_qp_init_req() when
QP is created, but when the QP is reused after QP reset, the send-q
holds previous garbage data.
This garbage data wrongly fails CQEs that otherwise
should have completed successfully.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/sw/rxe/rxe_qp.c | 1 +
drivers/infiniband/sw/rxe/rxe_queue.c | 9 +++++++++
drivers/infiniband/sw/rxe/rxe_queue.h | 2 ++
3 files changed, 12 insertions(+)
diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
index b8036cf..95aaaa2 100644
--- a/drivers/infiniband/sw/rxe/rxe_qp.c
+++ b/drivers/infiniband/sw/rxe/rxe_qp.c
@@ -522,6 +522,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
if (qp->sq.queue) {
__rxe_do_task(&qp->comp.task);
__rxe_do_task(&qp->req.task);
+ rxe_queue_reset(qp->sq.queue);
}
/* cleanup attributes */
diff --git a/drivers/infiniband/sw/rxe/rxe_queue.c b/drivers/infiniband/sw/rxe/rxe_queue.c
index 0827425..d14bf49 100644
--- a/drivers/infiniband/sw/rxe/rxe_queue.c
+++ b/drivers/infiniband/sw/rxe/rxe_queue.c
@@ -84,6 +84,15 @@ int do_mmap_info(struct rxe_dev *rxe,
return -EINVAL;
}
+inline void rxe_queue_reset(struct rxe_queue *q)
+{
+ /* queue is comprised from header and the memory
+ * of the actual queue. See "struct rxe_queue_buf" in rxe_queue.h
+ * reset only the queue itself and not the management header
+ */
+ memset(q->buf->data, 0, q->buf_size - sizeof(struct rxe_queue_buf));
+}
+
struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe,
int *num_elem,
unsigned int elem_size)
diff --git a/drivers/infiniband/sw/rxe/rxe_queue.h b/drivers/infiniband/sw/rxe/rxe_queue.h
index 239fd60..8c8641c 100644
--- a/drivers/infiniband/sw/rxe/rxe_queue.h
+++ b/drivers/infiniband/sw/rxe/rxe_queue.h
@@ -84,6 +84,8 @@ int do_mmap_info(struct rxe_dev *rxe,
size_t buf_size,
struct rxe_mmap_info **ip_p);
+void rxe_queue_reset(struct rxe_queue *q);
+
struct rxe_queue *rxe_queue_init(struct rxe_dev *rxe,
int *num_elem,
unsigned int elem_size);
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-rc 3/5] IB/rxe: Increase max number of completions to 32k
From: Leon Romanovsky @ 2016-11-16 8:39 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yonatan Cohen
In-Reply-To: <1479285558-19627-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Yonatan Cohen <yonatanc-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Increase limit of max CQE from 8K to 32K to allow demanding
applications to work over SoftRoCE with same configuration
as most RoCEv2 HW vendors have.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_param.h b/drivers/infiniband/sw/rxe/rxe_param.h
index f459c43..13ed2cc 100644
--- a/drivers/infiniband/sw/rxe/rxe_param.h
+++ b/drivers/infiniband/sw/rxe/rxe_param.h
@@ -82,7 +82,7 @@ enum rxe_device_param {
RXE_MAX_SGE = 32,
RXE_MAX_SGE_RD = 32,
RXE_MAX_CQ = 16384,
- RXE_MAX_LOG_CQE = 13,
+ RXE_MAX_LOG_CQE = 15,
RXE_MAX_MR = 2 * 1024,
RXE_MAX_PD = 0x7ffc,
RXE_MAX_QP_RD_ATOM = 128,
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-rc 2/5] IB/rxe: Fix handling of erroneous WR
From: Leon Romanovsky @ 2016-11-16 8:39 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yonatan Cohen
In-Reply-To: <1479285558-19627-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Yonatan Cohen <yonatanc-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
To correctly handle a erroneous WR this fix does the following
1. Make sure the bad WQE causes a user completion event.
2. Call rxe_completer to handle the erred WQE.
Before the fix, when rxe_requester found a bad WQE, it changed its
status to IB_WC_LOC_PROT_ERR and exit with 0 for non RC QPs.
If this was the 1st WQE then there would be no ACK to invoke the
completer and this bad WQE would be stuck in the QP's send-q.
On top of that the requester exiting with 0 caused rxe_do_task to
endlessly invoke rxe_requester, resulting in a soft-lockup attached
below.
In case the WQE was not the 1st and rxe_completer did get a chance to
handle the bad WQE, it did not cause a complete event since the WQE's
IB_SEND_SIGNALED flag was not set.
Setting WQE status to IB_SEND_SIGNALED is subject to IBA spec
version 1.2.1, section 10.7.3.1 Signaled Completions.
NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s!
[<ffffffffa0590145>] ? rxe_pool_get_index+0x35/0xb0 [rdma_rxe]
[<ffffffffa05952ec>] lookup_mem+0x3c/0xc0 [rdma_rxe]
[<ffffffffa0595534>] copy_data+0x1c4/0x230 [rdma_rxe]
[<ffffffffa058c180>] rxe_requester+0x9d0/0x1100 [rdma_rxe]
[<ffffffff8158e98a>] ? kfree_skbmem+0x5a/0x60
[<ffffffffa05962c9>] rxe_do_task+0x89/0xf0 [rdma_rxe]
[<ffffffffa05963e2>] rxe_run_task+0x12/0x30 [rdma_rxe]
[<ffffffffa059110a>] rxe_post_send+0x41a/0x550 [rdma_rxe]
[<ffffffff811ef922>] ? __kmalloc+0x182/0x200
[<ffffffff816ba512>] ? down_read+0x12/0x40
[<ffffffffa054bd32>] ib_uverbs_post_send+0x532/0x540 [ib_uverbs]
[<ffffffff815f8722>] ? tcp_sendmsg+0x402/0xb80
[<ffffffffa05453dc>] ib_uverbs_write+0x18c/0x3f0 [ib_uverbs]
[<ffffffff81623c2e>] ? inet_recvmsg+0x7e/0xb0
[<ffffffff8158764d>] ? sock_recvmsg+0x3d/0x50
[<ffffffff81215b87>] __vfs_write+0x37/0x140
[<ffffffff81216892>] vfs_write+0xb2/0x1b0
[<ffffffff81217ce5>] SyS_write+0x55/0xc0
[<ffffffff816bc672>] entry_SYSCALL_64_fastpath+0x1a/0xa
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/sw/rxe/rxe_req.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 832846b..22bd963 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -696,7 +696,8 @@ int rxe_requester(void *arg)
qp->req.wqe_index);
wqe->state = wqe_state_done;
wqe->status = IB_WC_SUCCESS;
- goto complete;
+ __rxe_do_task(&qp->comp.task);
+ return 0;
}
payload = mtu;
}
@@ -745,13 +746,17 @@ int rxe_requester(void *arg)
wqe->status = IB_WC_LOC_PROT_ERR;
wqe->state = wqe_state_error;
-complete:
- if (qp_type(qp) != IB_QPT_RC) {
- while (rxe_completer(qp) == 0)
- ;
- }
-
- return 0;
+ /*
+ * IBA Spec. Section 10.7.3.1 SIGNALED COMPLETIONS
+ * ---------8<---------8<-------------
+ * ...Note that if a completion error occurs, a Work Completion
+ * will always be generated, even if the signaling
+ * indicator requests an Unsignaled Completion.
+ * ---------8<---------8<-------------
+ */
+ wqe->wr.send_flags |= IB_SEND_SIGNALED;
+ __rxe_do_task(&qp->comp.task);
+ return -EAGAIN;
exit:
return -EAGAIN;
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-rc 1/5] IB/rxe: Fix kernel panic in UDP tunnel with GRO and RX checksum
From: Leon Romanovsky @ 2016-11-16 8:39 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Yonatan Cohen
In-Reply-To: <1479285558-19627-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Yonatan Cohen <yonatanc-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Missing initialization of udp_tunnel_sock_cfg causes to following
kernel panic, while kernel tries to execute gro_receive().
While being there, we converted udp_port_cfg to use the same
initialization scheme as udp_tunnel_sock_cfg.
------------[ cut here ]------------
kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
BUG: unable to handle kernel paging request at ffffffffa0588c50
IP: [<ffffffffa0588c50>] __this_module+0x50/0xffffffffffff8400 [ib_rxe]
PGD 1c09067 PUD 1c0a063 PMD bb394067 PTE 80000000ad5e8163
Oops: 0011 [#1] SMP
Modules linked in: ib_rxe ip6_udp_tunnel udp_tunnel
CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc3+ #2
Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
task: ffff880235e4e680 ti: ffff880235e68000 task.ti: ffff880235e68000
RIP: 0010:[<ffffffffa0588c50>]
[<ffffffffa0588c50>] __this_module+0x50/0xffffffffffff8400 [ib_rxe]
RSP: 0018:ffff880237343c80 EFLAGS: 00010282
RAX: 00000000dffe482d RBX: ffff8800ae330900 RCX: 000000002001b712
RDX: ffff8800ae330900 RSI: ffff8800ae102578 RDI: ffff880235589c00
RBP: ffff880237343cb0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800ae33e262
R13: ffff880235589c00 R14: 0000000000000014 R15: ffff8800ae102578
FS: 0000000000000000(0000) GS:ffff880237340000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa0588c50 CR3: 0000000001c06000 CR4: 00000000000006e0
Stack:
ffffffff8160860e ffff8800ae330900 ffff8800ae102578 0000000000000014
000000000000004e ffff8800ae102578 ffff880237343ce0 ffffffff816088fb
0000000000000000 ffff8800ae330900 0000000000000000 00000000ffad0000
Call Trace:
<IRQ>
[<ffffffff8160860e>] ? udp_gro_receive+0xde/0x130
[<ffffffff816088fb>] udp4_gro_receive+0x10b/0x2d0
[<ffffffff81611373>] inet_gro_receive+0x1d3/0x270
[<ffffffff81594e29>] dev_gro_receive+0x269/0x3b0
[<ffffffff81595188>] napi_gro_receive+0x38/0x120
[<ffffffffa011caee>] mlx5e_handle_rx_cqe+0x27e/0x340 [mlx5_core]
[<ffffffffa011d076>] mlx5e_poll_rx_cq+0x66/0x6d0 [mlx5_core]
[<ffffffffa011d7ae>] mlx5e_napi_poll+0x8e/0x400 [mlx5_core]
[<ffffffff815949a0>] net_rx_action+0x160/0x380
[<ffffffff816a9197>] __do_softirq+0xd7/0x2c5
[<ffffffff81085c35>] irq_exit+0xf5/0x100
[<ffffffff816a8f16>] do_IRQ+0x56/0xd0
[<ffffffff816a6dcc>] common_interrupt+0x8c/0x8c
<EOI>
[<ffffffff81061f96>] ? native_safe_halt+0x6/0x10
[<ffffffff81037ade>] default_idle+0x1e/0xd0
[<ffffffff8103828f>] arch_cpu_idle+0xf/0x20
[<ffffffff810c37dc>] default_idle_call+0x3c/0x50
[<ffffffff810c3b13>] cpu_startup_entry+0x323/0x3c0
[<ffffffff81050d8c>] start_secondary+0x15c/0x1a0
RIP [<ffffffffa0588c50>] __this_module+0x50/0xffffffffffff8400 [ib_rxe]
RSP <ffff880237343c80>
CR2: ffffffffa0588c50
---[ end trace 489ee31fa7614ac5 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt
------------[ cut here ]------------
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/sw/rxe/rxe_net.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index b8258e4..ffff5a5 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -243,10 +243,8 @@ static struct socket *rxe_setup_udp_tunnel(struct net *net, __be16 port,
{
int err;
struct socket *sock;
- struct udp_port_cfg udp_cfg;
- struct udp_tunnel_sock_cfg tnl_cfg;
-
- memset(&udp_cfg, 0, sizeof(udp_cfg));
+ struct udp_port_cfg udp_cfg = {0};
+ struct udp_tunnel_sock_cfg tnl_cfg = {0};
if (ipv6) {
udp_cfg.family = AF_INET6;
@@ -264,10 +262,8 @@ static struct socket *rxe_setup_udp_tunnel(struct net *net, __be16 port,
return ERR_PTR(err);
}
- tnl_cfg.sk_user_data = NULL;
tnl_cfg.encap_type = 1;
tnl_cfg.encap_rcv = rxe_udp_encap_recv;
- tnl_cfg.encap_destroy = NULL;
/* Setup UDP tunnel */
setup_udp_tunnel_sock(net, sock, &tnl_cfg);
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-rc 0/5] RXE fixes for 4.9
From: Leon Romanovsky @ 2016-11-16 8:39 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi Doug,
Please find below the RXE fixes for 4.9 from Yonatan and Moni.
This patchset was generated against v4.9-rc3.
Available in the "topic/rxe-fixes-4.9" topic branch of this git repo:
git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git
Or for browsing:
https://git.kernel.org/cgit/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/rxe-fixes-4.9
Thanks
Yonatan Cohen (5):
IB/rxe: Fix kernel panic in UDP tunnel with GRO and RX checksum
IB/rxe: Fix handling of erroneous WR
IB/rxe: Increase max number of completions to 32k
IB/rxe: Clear queue buffer when modifying QP to reset
IB/rxe: Update qp state for user query
drivers/infiniband/sw/rxe/rxe_net.c | 8 ++------
drivers/infiniband/sw/rxe/rxe_param.h | 2 +-
drivers/infiniband/sw/rxe/rxe_qp.c | 2 ++
drivers/infiniband/sw/rxe/rxe_queue.c | 9 +++++++++
drivers/infiniband/sw/rxe/rxe_queue.h | 2 ++
drivers/infiniband/sw/rxe/rxe_req.c | 21 +++++++++++++--------
6 files changed, 29 insertions(+), 15 deletions(-)
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of allocating memory using APIs
From: Leon Romanovsky @ 2016-11-16 8:36 UTC (permalink / raw)
To: Salil Mehta
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Huwei (Xavier),
oulijun, mehta.salil.lnk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Linuxarm,
Zhangping (ZP)
In-Reply-To: <F4CC6FACFEB3C54C9141D49AD221F7F91A7A2371@lhreml503-mbx>
[-- Attachment #1: Type: text/plain, Size: 3338 bytes --]
On Tue, Nov 15, 2016 at 03:52:46PM +0000, Salil Mehta wrote:
> > -----Original Message-----
> > From: Leon Romanovsky [mailto:leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org]
> > Sent: Wednesday, November 09, 2016 7:22 AM
> > To: Salil Mehta
> > Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; Huwei (Xavier); oulijun;
> > mehta.salil.lnk-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> > netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Linuxarm;
> > Zhangping (ZP)
> > Subject: Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of
> > allocating memory using APIs
> >
> > On Fri, Nov 04, 2016 at 04:36:25PM +0000, Salil Mehta wrote:
> > > From: "Wei Hu (Xavier)" <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > >
> > > This patch modified the logic of allocating memory using APIs in
> > > hns RoCE driver. We used kcalloc instead of kmalloc_array and
> > > bitmap_zero. And When kcalloc failed, call vzalloc to alloc
> > > memory.
> > >
> > > Signed-off-by: Wei Hu (Xavier) <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > > Signed-off-by: Ping Zhang <zhangping5-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > > Signed-off-by: Salil Mehta <salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
> > > ---
> > > drivers/infiniband/hw/hns/hns_roce_mr.c | 15 ++++++++-------
> > > 1 file changed, 8 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > index fb87883..d3dfb5f 100644
> > > --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > @@ -137,11 +137,12 @@ static int hns_roce_buddy_init(struct
> > hns_roce_buddy *buddy, int max_order)
> > >
> > > for (i = 0; i <= buddy->max_order; ++i) {
> > > s = BITS_TO_LONGS(1 << (buddy->max_order - i));
> > > - buddy->bits[i] = kmalloc_array(s, sizeof(long),
> > GFP_KERNEL);
> > > - if (!buddy->bits[i])
> > > - goto err_out_free;
> > > -
> > > - bitmap_zero(buddy->bits[i], 1 << (buddy->max_order - i));
> > > + buddy->bits[i] = kcalloc(s, sizeof(long), GFP_KERNEL);
> > > + if (!buddy->bits[i]) {
> > > + buddy->bits[i] = vzalloc(s * sizeof(long));
> >
> > I wonder, why don't you use directly vzalloc instead of kcalloc
> > fallback?
> As we know we will have physical contiguous pages if the kcalloc
> call succeeds. This will give us a chance to have better performance
> over the allocations which are just virtually contiguous through the
> function vzalloc(). Therefore, later has only been used as a fallback
> when our memory request cannot be entertained through kcalloc.
>
> Are you suggesting that there will not be much performance penalty
> if we use just vzalloc ?
Not exactly,
I asked it, because we have similar code in our drivers and this
construction looks strange to me.
1. If performance is critical, we will use kmalloc.
2. If performance is not critical, we will use vmalloc.
But in this case, such construction shows me that we can live with
vmalloc performance and kmalloc allocation are not really needed.
In your specific case, I'm not sure that kcalloc will ever fail.
Thanks
>
> >
> > > + if (!buddy->bits[i])
> > > + goto err_out_free;
> > > + }
> > > }
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* libhns: userspace library for hns
From: Leon Romanovsky @ 2016-11-16 7:58 UTC (permalink / raw)
To: Lijun Ou
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linuxarm-hv44wF8Li93QT0dZR+AlfA, Jason Gunthorpe
[-- Attachment #1: Type: text/plain, Size: 765 bytes --]
Hi Lijun,
I realized that didn't send this notice, while accepted your libhns into
rdma-core, so I'm sending it now to be sure that it is not missed by any
future reviewers.
---
We accepted your libhns code and didn't continue to pursue you to
correctly parse DT strings [1], mainly for the following reasons:
1. It is your private library and support of angry customers will be on you.
2. We won't accept hacks and you will be required to implement it correctly
and handle all incompatibility in out-of-tree code.
3. Once your new hns2 arrive, it should be implemented in the scope of existing
libhns. We wouldn't allow to create libhns2 without real reason (different HW).
Hope it clears now.
[1] http://marc.info/?l=linux-rdma&m=147879905628034&w=2
Thanks
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* LPC RDMA Workshop 2016 Summary
From: Leon Romanovsky @ 2016-11-16 6:19 UTC (permalink / raw)
To: RDMA mailing list; +Cc: Christoph Lameter, Liran Liss
[-- Attachment #1: Type: text/plain, Size: 5031 bytes --]
Hello RDMAers,
First of all, we would like to say thank you for everyone who made the RDMA
workshop and summit happened. This two day event was held at KS/LPC 2016 venue
and offsite, respectively [1].
It wasn’t happened without you, Thank you.
The below summary for the first day came from many attendees and doesn’t try to
cover everything discussed there. We invite you to share your notes with us and
mailing list.
The following list mostly consists from future plans and TODOs.
· Implementing a new Linux RDMA provider by Knut Omang
o Weak documentation for driver developers.
o Unclear development guides, no best practices.
o Need general RDMA test suite.
· New IOCTL ABI by Matan Barak
o Agreed on acceptance track. All core changes in bisectable way and ioctl
parts under EXPERIMENTAL Kconfig option and after that patch-per-command
transition.
o Extensive review was done and it will be addressed in the actual patches.
o This new code is valid candidate to introduce tracepoints into RDMA
subsystem.
· Consolidation of RDMA User-space code by Jason Gunthorpe
o Ongoing work on code quality/modernization, there is need to continue with
sparse annotations, coverity, MMIO accessor.
o There is need for proper systemd integration to autoload modules and proper
vendor upgrade sequence.
o 7 licenses is too much for 76K LOC code, there is clear benefit in
decreasing number of copyrights.
o Ensure that distribution packages are built from rdma-core. RedHat is
already doing it, there is need to continue to work with other distributions.
o Reuse kernel uAPI headers as a source base for libibverbs and providers.
· Debuggability and Tracing by Leon Romanovsky
o Extend standard tools as much as possible, like ss and ethtool.
o Use tracepoints for MAD snooping, new ABI work and memory registrations
flows.
o Reduce redundant printks.
· Integration with other subsystems by Liran Liss
o Challenges to integrate complex RDMA schemes (GPU-to-HCA-to-GPU) where CPU
is not involved. Multiple solutions were sent to mailing lists, but not
accepted yet.
o Best candidate seems to be DMA-BUFs, which don’t map peer BARs to the CPU.
o Associating SELinux security contexts with partition keys and subnet IDs is
an effective policy for isolating users in an IB fabric.
· Containers and RDMA by Liran Liss
o Supporting RoCE UD QPs is subtle because these QPs may only send and
receive from a subset of the GIDs – those that correspond to netdevs within the
process's namespace. Providers must enforce this implicitly, or fail UD QP
creation.
o Consider delivering RoCE CM packets through the network stack for better
enforcement.
o For IB, a process must be constrained to the partitions associated with the
netdev within the process's network namespace.
o RDMA cgroups will expose two types of objects only (instead of exposing
verbs objects): HCA objects and HCA handles, which will be presented in
absolute values and can be limited by the drivers. It will allow native support
of future vendor specific objects.
· New Fabrics by Ira Weiny
o It was open discussion about the scope of RDMA subsystem and the old
requirement to support verbs for every driver in driver/infiniband/, which
doesn’t hold for usnic and PSM.
· Future and Roadmap for RDMA for Doug Ledford
o The interoperability between IB vendors is less important now, since
Mellanox is the only one active vendor and Oracle will be in the future.
o RoCE is the main place for interoperability.
o Revealed UNH testing lab initiative to test and perform debug hackatons for
vendors, in similar way to NFS-RDMA.
o OFED software stack is rising, while upstream continues to shine :)
· User Mode Ethernet Verbs by Tzahi Oved
o Verbs objects suite well OS bypass Eth programming with generic object model
to expend to new I/O offloads. These APIs can support both Eth and IPoIB
offloads.
o Developers can use common API for OS bypass and share objects like memory
registration, CQs, SRQs for both RDMA and user mode Eth.
o New introduced RSS objects present generic approach of breaking QP into
transport and work queues allowing flexible future expansion.
o Object oriented approach taken with CQ can be taken with posting work queue.
o Future vendor specific capabilities can be implemented using vendor
specific new kernel ABI support and user mode vendor API query through
libibverbs.
· Dual Licensing by Susan Coulter
o OFA can’t and won’t enforce dual-license on kernel space.
o User space will continue to be under dual-license.
It was great event and we are looking forward to see you next year.
Christoph and Leon.
[1] http://marc.info/?l=linux-rdma&m=147767504329945&w=2
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* RE: NFSD generic R/W API (sendto path) performance results
From: Steve Wise @ 2016-11-15 20:35 UTC (permalink / raw)
To: 'Chuck Lever', 'Christoph Hellwig',
'Sagi Grimberg'
Cc: 'List Linux RDMA Mailing'
In-Reply-To: <9170C872-DEE1-4D96-B9D8-E9D2B3F91915-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>
> I've built a prototype conversion of the in-kernel NFS server's sendto
> path to use the new generic R/W API. This path handles NFS Replies, so
> it is responsible for building and sending RDMA Writes carrying NFS
> READ payloads, and for transmitting all NFS Replies.
>
> I've published the prototype (against my for-4.10 server series) here:
>
>
http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfsd-rdma-rw
-api
>
> It's the very last patch in the series.
>
>
> "iozone -i0 -i1 -s2g -r1m -I" with NFSv3, sec=sys, CX-3 on both sides,
> FDR fabric, share is a tmpfs. This test writes and reads a 2GB file with
> 1MB direct writes and reads.
>
> The client forms NFS requests with a single 1MB RDMA segment to catch
> the NFS READ payload. Before the conversion, the server posts a series
> of single Write WRs with 30 pages each, for each RDMA segment written
> to the client. After the conversion, the server posts a single chain
> of 30-page Write WRs for each RDMA segment written to the client.
>
> Before the API conversion: rdma_stat_post_send = 45097
>
> After the API conversion: rdma_stat_post_send = 16411
>
> That's what I expected to see. This shows the number of ib_post_send
> calls is significantly lower after the conversion.
>
>
> Unfortunately the throughput and latency numbers are worse (ignore
> the write/rewrite numbers for now). Output is in kBytes/sec.
>
> Before conversion, one iozone run:
>
> kB reclen write rewrite read reread
> 2097152 1024 772835 931267 1895922 1927848
>
> READ:
> 4098 ops (49%)
> avg bytes sent per op: 140 avg bytes received per op: 1048704
> backlog wait: 0.006345 RTT: 0.321132 total execute time: 0.332113
>
> After conversion:
>
> kB reclen write rewrite read reread
> 2097152 1024 703850 913824 1561682 1441448
>
> READ:
> 4098 ops (49%)
> avg bytes sent per op: 140 avg bytes received per op: 1048704
> backlog wait: 0.010737 RTT: 0.469497 total execute time: 0.488043
>
> That's 140us worse RTT per READ, in this run. The gap between before and
> after was roughly the same for all runs.
>
>
> To partially explain this, I captured traffic on the server using ibdump
> during a similar iozone test. This removes fabric and client HCA latencies
> from the picture.
>
> This is a QD=1 test, so it's easy to analyze individual NFS READ operations
> in each capture. I computed three latency numbers per READ transaction
> based on the timestamps in the capture file, which should be accurate to
> 1 microsecond:
>
> 1. Call took: the time between when the server i/f sees the incoming RDMA
> Send carrying the NFS READ Call, and when the server i/f sees the outgoing
> RDMA Send carrying the NFS READ Reply.
>
> 2. Call-to-first-Write: the time between when the server i/f sees the
> incoming RDMA Send carrying the NFS READ Call, and when the server i/f
> sees the first outgoing RDMA Write request. Roughly how long it takes
> the server to prepare and post the RDMA Writes.
>
> 3. First-to-last-Write: the time between when the server i/f sees the
> first outgoing RDMA Write request, and when the server i/f sees the
> last outgoing RDMA Write request. Roughly how long it takes the HCA
> to transmit the RDMA Writes.
>
>
> Averages over 5 NFS READ calls chosen at random, before conversion:
> Call took 414us. Call-to-first-Write 85us. First-to-last-Write 327us
>
> Averages over 5 NFS READ calls chosen at random, after conversion:
> Call took 521us. Call-to-first-Write 160us. First-to-last-Write 360us
>
> The gap between before and after results was 100% consistent with
> the average results across the individual NFS READ operations.
>
>
Good work here!
> There are two stories here:
>
> 1. Call-to-first-Write takes longer. My first guess is that the server
> takes longer to build and DMA map a long Write WR chain than it does
> to build, map, and post a single Write WR. The HCA can get started
> transmitting Writes sooner, and the server continues working on
> posting Write WRs in parallel with the on-the-wire activity.
>
So perhaps the RDMA R/W API can have a threshold where it will dump a list of
WRs once it exceeds the threshold, and continue chunking? That threshold, by
the way, is probably device-specific.
> 2. First-to-last-Write takes longer. I don't have any explanation
> for the HCA taking 10% longer to transmit the full 1MB payload.
>
Perhaps the single WR posts are hitting device's fast-path and lowering latency
vs a long chain post that must be DMAed by the device? I'm not sure exactly how
the MLX devices work, but they do have a fast path that utilizes the CPU's
write-combining logic to send a WR over the bus as a single PCIE transaction.
But your WRs are probably large since they have 30 pages in the SGE. I'm not
sure what the threshold is for this fastpath logic for mlx. For cxgb, its 64B,
so the WR would have to fit in 64B to take advantage.
Steve.
>
> --
> Chuck Lever
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: suspicious rcu_dereference_protected() usage in mlx4 when tearing down an interface
From: Or Gerlitz @ 2016-11-15 20:19 UTC (permalink / raw)
To: Sagi Grimberg, Tariq Toukan
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <52285420-40b9-afdb-cb29-7e0c939ece26-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
On Tue, Nov 15, 2016 at 2:30 PM, Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:
> I just hit this with 4.9-rc4 on my test vm.
> Didn't find any patches on the list so maybe this is new?
Yep, seems to be new, I stepped on it today as well and sent note to
Tariq (cc-ed) who is the driver maintainer,
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* 4.9 cxgb4 patches
From: Steve Wise @ 2016-11-15 20:16 UTC (permalink / raw)
To: 'Doug Ledford'; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hey Doug,
Please confirm that you plan to merge these for 4.9-rc:
https://patchwork.kernel.org/patch/9383045/
https://patchwork.kernel.org/patch/9411375/
Thanks!
Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* NFSD generic R/W API (sendto path) performance results
From: Chuck Lever @ 2016-11-15 18:45 UTC (permalink / raw)
To: Christoph Hellwig, Sagi Grimberg; +Cc: List Linux RDMA Mailing
I've built a prototype conversion of the in-kernel NFS server's sendto
path to use the new generic R/W API. This path handles NFS Replies, so
it is responsible for building and sending RDMA Writes carrying NFS
READ payloads, and for transmitting all NFS Replies.
I've published the prototype (against my for-4.10 server series) here:
http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfsd-rdma-rw-api
It's the very last patch in the series.
"iozone -i0 -i1 -s2g -r1m -I" with NFSv3, sec=sys, CX-3 on both sides,
FDR fabric, share is a tmpfs. This test writes and reads a 2GB file with
1MB direct writes and reads.
The client forms NFS requests with a single 1MB RDMA segment to catch
the NFS READ payload. Before the conversion, the server posts a series
of single Write WRs with 30 pages each, for each RDMA segment written
to the client. After the conversion, the server posts a single chain
of 30-page Write WRs for each RDMA segment written to the client.
Before the API conversion: rdma_stat_post_send = 45097
After the API conversion: rdma_stat_post_send = 16411
That's what I expected to see. This shows the number of ib_post_send
calls is significantly lower after the conversion.
Unfortunately the throughput and latency numbers are worse (ignore
the write/rewrite numbers for now). Output is in kBytes/sec.
Before conversion, one iozone run:
kB reclen write rewrite read reread
2097152 1024 772835 931267 1895922 1927848
READ:
4098 ops (49%)
avg bytes sent per op: 140 avg bytes received per op: 1048704
backlog wait: 0.006345 RTT: 0.321132 total execute time: 0.332113
After conversion:
kB reclen write rewrite read reread
2097152 1024 703850 913824 1561682 1441448
READ:
4098 ops (49%)
avg bytes sent per op: 140 avg bytes received per op: 1048704
backlog wait: 0.010737 RTT: 0.469497 total execute time: 0.488043
That's 140us worse RTT per READ, in this run. The gap between before and
after was roughly the same for all runs.
To partially explain this, I captured traffic on the server using ibdump
during a similar iozone test. This removes fabric and client HCA latencies
from the picture.
This is a QD=1 test, so it's easy to analyze individual NFS READ operations
in each capture. I computed three latency numbers per READ transaction
based on the timestamps in the capture file, which should be accurate to
1 microsecond:
1. Call took: the time between when the server i/f sees the incoming RDMA
Send carrying the NFS READ Call, and when the server i/f sees the outgoing
RDMA Send carrying the NFS READ Reply.
2. Call-to-first-Write: the time between when the server i/f sees the
incoming RDMA Send carrying the NFS READ Call, and when the server i/f
sees the first outgoing RDMA Write request. Roughly how long it takes
the server to prepare and post the RDMA Writes.
3. First-to-last-Write: the time between when the server i/f sees the
first outgoing RDMA Write request, and when the server i/f sees the
last outgoing RDMA Write request. Roughly how long it takes the HCA
to transmit the RDMA Writes.
Averages over 5 NFS READ calls chosen at random, before conversion:
Call took 414us. Call-to-first-Write 85us. First-to-last-Write 327us
Averages over 5 NFS READ calls chosen at random, after conversion:
Call took 521us. Call-to-first-Write 160us. First-to-last-Write 360us
The gap between before and after results was 100% consistent with
the average results across the individual NFS READ operations.
There are two stories here:
1. Call-to-first-Write takes longer. My first guess is that the server
takes longer to build and DMA map a long Write WR chain than it does
to build, map, and post a single Write WR. The HCA can get started
transmitting Writes sooner, and the server continues working on
posting Write WRs in parallel with the on-the-wire activity.
2. First-to-last-Write takes longer. I don't have any explanation
for the HCA taking 10% longer to transmit the full 1MB payload.
--
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH V2 for-next 11/11] IB/hns: Fix for Checkpatch.pl comment style errors
From: Salil Mehta @ 2016-11-15 18:10 UTC (permalink / raw)
To: dledford
Cc: salil.mehta, xavier.huwei, oulijun, xushaobo2, mehta.salil.lnk,
lijun_nudt, linux-rdma, netdev, linux-kernel, linuxarm
In-Reply-To: <20161115181053.399568-1-salil.mehta@huawei.com>
This patch correct the comment style errors caught by
checkpatch.pl script
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
drivers/infiniband/hw/hns/hns_roce_cmd.c | 8 ++--
drivers/infiniband/hw/hns/hns_roce_device.h | 28 ++++++-------
drivers/infiniband/hw/hns/hns_roce_eq.c | 6 +--
drivers/infiniband/hw/hns/hns_roce_hem.c | 6 +--
drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 58 +++++++++++++--------------
drivers/infiniband/hw/hns/hns_roce_main.c | 28 ++++++-------
6 files changed, 67 insertions(+), 67 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 2a0b6c0..8c1f7a6 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -216,10 +216,10 @@ static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev *hr_dev, u64 in_param,
goto out;
/*
- * It is timeout when wait_for_completion_timeout return 0
- * The return value is the time limit set in advance
- * how many seconds showing
- */
+ * It is timeout when wait_for_completion_timeout return 0
+ * The return value is the time limit set in advance
+ * how many seconds showing
+ */
if (!wait_for_completion_timeout(&context->done,
msecs_to_jiffies(timeout))) {
dev_err(dev, "[cmd]wait_for_completion_timeout timeout\n");
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 9ef1cc3..e48464d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -201,9 +201,9 @@ struct hns_roce_bitmap {
/* Order = 0: bitmap is biggest, order = max bitmap is least (only a bit) */
/* Every bit repesent to a partner free/used status in bitmap */
/*
-* Initial, bits of other bitmap are all 0 except that a bit of max_order is 1
-* Bit = 1 represent to idle and available; bit = 0: not available
-*/
+ * Initial, bits of other bitmap are all 0 except that a bit of max_order is 1
+ * Bit = 1 represent to idle and available; bit = 0: not available
+ */
struct hns_roce_buddy {
/* Members point to every order level bitmap */
unsigned long **bits;
@@ -365,25 +365,25 @@ struct hns_roce_cmdq {
struct mutex hcr_mutex;
struct semaphore poll_sem;
/*
- * Event mode: cmd register mutex protection,
- * ensure to not exceed max_cmds and user use limit region
- */
+ * Event mode: cmd register mutex protection,
+ * ensure to not exceed max_cmds and user use limit region
+ */
struct semaphore event_sem;
int max_cmds;
spinlock_t context_lock;
int free_head;
struct hns_roce_cmd_context *context;
/*
- * Result of get integer part
- * which max_comds compute according a power of 2
- */
+ * Result of get integer part
+ * which max_comds compute according a power of 2
+ */
u16 token_mask;
/*
- * Process whether use event mode, init default non-zero
- * After the event queue of cmd event ready,
- * can switch into event mode
- * close device, switch into poll mode(non event mode)
- */
+ * Process whether use event mode, init default non-zero
+ * After the event queue of cmd event ready,
+ * can switch into event mode
+ * close device, switch into poll mode(non event mode)
+ */
u8 use_events;
u8 toggle;
};
diff --git a/drivers/infiniband/hw/hns/hns_roce_eq.c b/drivers/infiniband/hw/hns/hns_roce_eq.c
index 21e21b0..50f8649 100644
--- a/drivers/infiniband/hw/hns/hns_roce_eq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_eq.c
@@ -371,9 +371,9 @@ static int hns_roce_aeq_ovf_int(struct hns_roce_dev *hr_dev,
int i = 0;
/**
- * AEQ overflow ECC mult bit err CEQ overflow alarm
- * must clear interrupt, mask irq, clear irq, cancel mask operation
- */
+ * AEQ overflow ECC mult bit err CEQ overflow alarm
+ * must clear interrupt, mask irq, clear irq, cancel mask operation
+ */
aeshift_val = roce_read(hr_dev, ROCEE_CAEP_AEQC_AEQE_SHIFT_REG);
if (roce_get_bit(aeshift_val,
diff --git a/drivers/infiniband/hw/hns/hns_roce_hem.c b/drivers/infiniband/hw/hns/hns_roce_hem.c
index 250d8f2..c5104e0 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hem.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hem.c
@@ -80,9 +80,9 @@ struct hns_roce_hem *hns_roce_alloc_hem(struct hns_roce_dev *hr_dev, int npages,
--order;
/*
- * Alloc memory one time. If failed, don't alloc small block
- * memory, directly return fail.
- */
+ * Alloc memory one time. If failed, don't alloc small block
+ * memory, directly return fail.
+ */
mem = &chunk->mem[chunk->npages];
buf = dma_alloc_coherent(&hr_dev->pdev->dev, PAGE_SIZE << order,
&sg_dma_address(mem), gfp_mask);
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index 34b7898..125ab90 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -1352,9 +1352,9 @@ static void __hns_roce_v1_cq_clean(struct hns_roce_cq *hr_cq, u32 qpn,
}
/*
- * Now backwards through the CQ, removing CQ entries
- * that match our QP by overwriting them with next entries.
- */
+ * Now backwards through the CQ, removing CQ entries
+ * that match our QP by overwriting them with next entries.
+ */
while ((int) --prod_index - (int) hr_cq->cons_index >= 0) {
cqe = get_cqe(hr_cq, prod_index & hr_cq->ib_cq.cqe);
if ((roce_get_field(cqe->cqe_byte_16, CQE_BYTE_16_LOCAL_QPN_M,
@@ -1376,9 +1376,9 @@ static void __hns_roce_v1_cq_clean(struct hns_roce_cq *hr_cq, u32 qpn,
if (nfreed) {
hr_cq->cons_index += nfreed;
/*
- * Make sure update of buffer contents is done before
- * updating consumer index.
- */
+ * Make sure update of buffer contents is done before
+ * updating consumer index.
+ */
wmb();
hns_roce_v1_cq_set_ci(hr_cq, hr_cq->cons_index);
@@ -1473,7 +1473,7 @@ void hns_roce_v1_write_cqc(struct hns_roce_dev *hr_dev,
roce_set_bit(cq_context->cqc_byte_32,
CQ_CQNTEXT_CQC_BYTE_32_TYPE_OF_COMPLETION_NOTIFICATION_S,
0);
- /*The initial value of cq's ci is 0 */
+ /* The initial value of cq's ci is 0 */
roce_set_field(cq_context->cqc_byte_32,
CQ_CONTEXT_CQC_BYTE_32_CQ_CONS_IDX_M,
CQ_CONTEXT_CQC_BYTE_32_CQ_CONS_IDX_S, 0);
@@ -1490,9 +1490,9 @@ int hns_roce_v1_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags flags)
notification_flag = (flags & IB_CQ_SOLICITED_MASK) ==
IB_CQ_SOLICITED ? CQ_DB_REQ_NOT : CQ_DB_REQ_NOT_SOL;
/*
- * flags = 0; Notification Flag = 1, next
- * flags = 1; Notification Flag = 0, solocited
- */
+ * flags = 0; Notification Flag = 1, next
+ * flags = 1; Notification Flag = 0, solocited
+ */
doorbell[0] = hr_cq->cons_index & ((hr_cq->cq_depth << 1) - 1);
roce_set_bit(doorbell[1], ROCEE_DB_OTHERS_H_ROCEE_DB_OTH_HW_SYNS_S, 1);
roce_set_field(doorbell[1], ROCEE_DB_OTHERS_H_ROCEE_DB_OTH_CMD_M,
@@ -1647,10 +1647,10 @@ static int hns_roce_v1_poll_one(struct hns_roce_cq *hr_cq,
wq = &(*cur_qp)->sq;
if ((*cur_qp)->sq_signal_bits) {
/*
- * If sg_signal_bit is 1,
- * firstly tail pointer updated to wqe
- * which current cqe correspond to
- */
+ * If sg_signal_bit is 1,
+ * firstly tail pointer updated to wqe
+ * which current cqe correspond to
+ */
wqe_ctr = (u16)roce_get_field(cqe->cqe_byte_4,
CQE_BYTE_4_WQE_INDEX_M,
CQE_BYTE_4_WQE_INDEX_S);
@@ -2072,11 +2072,11 @@ static int hns_roce_v1_m_qp(struct ib_qp *ibqp, const struct ib_qp_attr *attr,
}
/*
- *Reset to init
- * Mandatory param:
- * IB_QP_STATE | IB_QP_PKEY_INDEX | IB_QP_PORT | IB_QP_ACCESS_FLAGS
- * Optional param: NA
- */
+ * Reset to init
+ * Mandatory param:
+ * IB_QP_STATE | IB_QP_PKEY_INDEX | IB_QP_PORT | IB_QP_ACCESS_FLAGS
+ * Optional param: NA
+ */
if (cur_state == IB_QPS_RESET && new_state == IB_QPS_INIT) {
roce_set_field(context->qpc_bytes_4,
QP_CONTEXT_QPC_BYTES_4_TRANSPORT_SERVICE_TYPE_M,
@@ -2584,9 +2584,9 @@ static int hns_roce_v1_m_qp(struct ib_qp *ibqp, const struct ib_qp_attr *attr,
}
/*
- * Use rst2init to instead of init2init with drv,
- * need to hw to flash RQ HEAD by DB again
- */
+ * Use rst2init to instead of init2init with drv,
+ * need to hw to flash RQ HEAD by DB again
+ */
if (cur_state == IB_QPS_INIT && new_state == IB_QPS_INIT) {
/* Memory barrier */
wmb();
@@ -2708,7 +2708,7 @@ static int hns_roce_v1_q_sqp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
goto done;
}
- addr = ROCEE_QP1C_CFG0_0_REG +
+ addr = ROCEE_QP1C_CFG0_0_REG +
hr_qp->port * sizeof(struct hns_roce_sqp_context);
context.qp1c_bytes_4 = roce_read(hr_dev, addr);
context.sq_rq_bt_l = roce_read(hr_dev, addr + 1);
@@ -2921,9 +2921,9 @@ static void hns_roce_v1_destroy_qp_common(struct hns_roce_dev *hr_dev,
if (hr_qp->ibqp.qp_type == IB_QPT_RC) {
if (hr_qp->state != IB_QPS_RESET) {
/*
- * Set qp to ERR,
- * waiting for hw complete processing all dbs
- */
+ * Set qp to ERR,
+ * waiting for hw complete processing all dbs
+ */
if (hns_roce_v1_qp_modify(hr_dev, NULL,
to_hns_roce_state(
(enum ib_qp_state)hr_qp->state),
@@ -2936,9 +2936,9 @@ static void hns_roce_v1_destroy_qp_common(struct hns_roce_dev *hr_dev,
sdbisusepr_val = roce_read(hr_dev,
ROCEE_SDB_ISSUE_PTR_REG);
/*
- * Query db process status,
- * until hw process completely
- */
+ * Query db process status,
+ * until hw process completely
+ */
end = msecs_to_jiffies(
HNS_ROCE_QP_DESTROY_TIMEOUT_MSECS) + jiffies;
do {
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index 795ef97..914d0ac 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -148,8 +148,8 @@ static int handle_en_event(struct hns_roce_dev *hr_dev, u8 port,
break;
case NETDEV_DOWN:
/*
- * In v1 engine, only support all ports closed together.
- */
+ * In v1 engine, only support all ports closed together.
+ */
break;
default:
dev_dbg(dev, "NETDEV event = 0x%x!\n", (u32)(event));
@@ -773,10 +773,10 @@ static int hns_roce_init_hem(struct hns_roce_dev *hr_dev)
}
/**
-* hns_roce_setup_hca - setup host channel adapter
-* @hr_dev: pointer to hns roce device
-* Return : int
-*/
+ * hns_roce_setup_hca - setup host channel adapter
+ * @hr_dev: pointer to hns roce device
+ * Return : int
+ */
static int hns_roce_setup_hca(struct hns_roce_dev *hr_dev)
{
int ret;
@@ -841,11 +841,11 @@ static int hns_roce_setup_hca(struct hns_roce_dev *hr_dev)
}
/**
-* hns_roce_probe - RoCE driver entrance
-* @pdev: pointer to platform device
-* Return : int
-*
-*/
+ * hns_roce_probe - RoCE driver entrance
+ * @pdev: pointer to platform device
+ * Return : int
+ *
+ */
static int hns_roce_probe(struct platform_device *pdev)
{
int ret;
@@ -958,9 +958,9 @@ static int hns_roce_probe(struct platform_device *pdev)
}
/**
-* hns_roce_remove - remove RoCE device
-* @pdev: pointer to platform device
-*/
+ * hns_roce_remove - remove RoCE device
+ * @pdev: pointer to platform device
+ */
static int hns_roce_remove(struct platform_device *pdev)
{
struct hns_roce_dev *hr_dev = platform_get_drvdata(pdev);
--
1.7.9.5
^ permalink raw reply related
* [PATCH V2 for-next 10/11] IB/hns: Implement the add_gid/del_gid and optimize the GIDs management
From: Salil Mehta @ 2016-11-15 18:10 UTC (permalink / raw)
To: dledford
Cc: salil.mehta, xavier.huwei, oulijun, xushaobo2, mehta.salil.lnk,
lijun_nudt, linux-rdma, netdev, linux-kernel, linuxarm
In-Reply-To: <20161115181053.399568-1-salil.mehta@huawei.com>
From: Shaobo Xu <xushaobo2@huawei.com>
IB core has implemented the calculation of GIDs and the management
of GID tables, and it is now responsible to supply query function
for GIDs. So the calculation of GIDs and the management of GID
tables in the RoCE driver is redundant.
The patch is to implement the add_gid/del_gid to set the GIDs in
the RoCE driver, remove the redundant calculation and management of
GIDs in the notifier call of the net device and the inet, and
update the query_gid.
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
drivers/infiniband/hw/hns/hns_roce_device.h | 2 -
drivers/infiniband/hw/hns/hns_roce_main.c | 270 +++++----------------------
2 files changed, 48 insertions(+), 224 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 593a42a..9ef1cc3 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -429,8 +429,6 @@ struct hns_roce_ib_iboe {
struct net_device *netdevs[HNS_ROCE_MAX_PORTS];
struct notifier_block nb;
struct notifier_block nb_inet;
- /* 16 GID is shared by 6 port in v1 engine. */
- union ib_gid gid_table[HNS_ROCE_MAX_GID_NUM];
u8 phy_port[HNS_ROCE_MAX_PORTS];
};
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index 6770171..795ef97 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -35,52 +35,13 @@
#include <rdma/ib_addr.h>
#include <rdma/ib_smi.h>
#include <rdma/ib_user_verbs.h>
+#include <rdma/ib_cache.h>
#include "hns_roce_common.h"
#include "hns_roce_device.h"
#include "hns_roce_user.h"
#include "hns_roce_hem.h"
/**
- * hns_roce_addrconf_ifid_eui48 - Get default gid.
- * @eui: eui.
- * @vlan_id: gid
- * @dev: net device
- * Description:
- * MAC convert to GID
- * gid[0..7] = fe80 0000 0000 0000
- * gid[8] = mac[0] ^ 2
- * gid[9] = mac[1]
- * gid[10] = mac[2]
- * gid[11] = ff (VLAN ID high byte (4 MS bits))
- * gid[12] = fe (VLAN ID low byte)
- * gid[13] = mac[3]
- * gid[14] = mac[4]
- * gid[15] = mac[5]
- */
-static void hns_roce_addrconf_ifid_eui48(u8 *eui, u16 vlan_id,
- struct net_device *dev)
-{
- memcpy(eui, dev->dev_addr, 3);
- memcpy(eui + 5, dev->dev_addr + 3, 3);
- if (vlan_id < 0x1000) {
- eui[3] = vlan_id >> 8;
- eui[4] = vlan_id & 0xff;
- } else {
- eui[3] = 0xff;
- eui[4] = 0xfe;
- }
- eui[0] ^= 2;
-}
-
-static void hns_roce_make_default_gid(struct net_device *dev, union ib_gid *gid)
-{
- memset(gid, 0, sizeof(*gid));
- gid->raw[0] = 0xFE;
- gid->raw[1] = 0x80;
- hns_roce_addrconf_ifid_eui48(&gid->raw[8], 0xffff, dev);
-}
-
-/**
* hns_get_gid_index - Get gid index.
* @hr_dev: pointer to structure hns_roce_dev.
* @port: port, value range: 0 ~ MAX
@@ -96,30 +57,6 @@ int hns_get_gid_index(struct hns_roce_dev *hr_dev, u8 port, int gid_index)
return gid_index * hr_dev->caps.num_ports + port;
}
-static int hns_roce_set_gid(struct hns_roce_dev *hr_dev, u8 port, int gid_index,
- union ib_gid *gid)
-{
- struct device *dev = &hr_dev->pdev->dev;
- u8 gid_idx = 0;
-
- if (gid_index >= hr_dev->caps.gid_table_len[port]) {
- dev_err(dev, "gid_index %d illegal, port %d gid range: 0~%d\n",
- gid_index, port, hr_dev->caps.gid_table_len[port] - 1);
- return -EINVAL;
- }
-
- gid_idx = hns_get_gid_index(hr_dev, port, gid_index);
-
- if (!memcmp(gid, &hr_dev->iboe.gid_table[gid_idx], sizeof(*gid)))
- return -EINVAL;
-
- memcpy(&hr_dev->iboe.gid_table[gid_idx], gid, sizeof(*gid));
-
- hr_dev->hw->set_gid(hr_dev, port, gid_index, gid);
-
- return 0;
-}
-
static void hns_roce_set_mac(struct hns_roce_dev *hr_dev, u8 port, u8 *addr)
{
u8 phy_port;
@@ -147,15 +84,44 @@ static void hns_roce_set_mtu(struct hns_roce_dev *hr_dev, u8 port, int mtu)
hr_dev->hw->set_mtu(hr_dev, phy_port, tmp);
}
-static void hns_roce_update_gids(struct hns_roce_dev *hr_dev, int port)
+static int hns_roce_add_gid(struct ib_device *device, u8 port_num,
+ unsigned int index, const union ib_gid *gid,
+ const struct ib_gid_attr *attr, void **context)
+{
+ struct hns_roce_dev *hr_dev = to_hr_dev(device);
+ u8 port = port_num - 1;
+ unsigned long flags;
+
+ if (port >= hr_dev->caps.num_ports)
+ return -EINVAL;
+
+ spin_lock_irqsave(&hr_dev->iboe.lock, flags);
+
+ hr_dev->hw->set_gid(hr_dev, port, index, (union ib_gid *)gid);
+
+ spin_unlock_irqrestore(&hr_dev->iboe.lock, flags);
+
+ return 0;
+}
+
+static int hns_roce_del_gid(struct ib_device *device, u8 port_num,
+ unsigned int index, void **context)
{
- struct ib_event event;
+ struct hns_roce_dev *hr_dev = to_hr_dev(device);
+ union ib_gid zgid = { {0} };
+ u8 port = port_num - 1;
+ unsigned long flags;
+
+ if (port >= hr_dev->caps.num_ports)
+ return -EINVAL;
- /* Refresh gid in ib_cache */
- event.device = &hr_dev->ib_dev;
- event.element.port_num = port + 1;
- event.event = IB_EVENT_GID_CHANGE;
- ib_dispatch_event(&event);
+ spin_lock_irqsave(&hr_dev->iboe.lock, flags);
+
+ hr_dev->hw->set_gid(hr_dev, port, index, &zgid);
+
+ spin_unlock_irqrestore(&hr_dev->iboe.lock, flags);
+
+ return 0;
}
static int handle_en_event(struct hns_roce_dev *hr_dev, u8 port,
@@ -164,8 +130,6 @@ static int handle_en_event(struct hns_roce_dev *hr_dev, u8 port,
struct device *dev = &hr_dev->pdev->dev;
struct net_device *netdev;
unsigned long flags;
- union ib_gid gid;
- int ret = 0;
netdev = hr_dev->iboe.netdevs[port];
if (!netdev) {
@@ -181,10 +145,6 @@ static int handle_en_event(struct hns_roce_dev *hr_dev, u8 port,
case NETDEV_REGISTER:
case NETDEV_CHANGEADDR:
hns_roce_set_mac(hr_dev, port, netdev->dev_addr);
- hns_roce_make_default_gid(netdev, &gid);
- ret = hns_roce_set_gid(hr_dev, port, 0, &gid);
- if (!ret)
- hns_roce_update_gids(hr_dev, port);
break;
case NETDEV_DOWN:
/*
@@ -197,7 +157,7 @@ static int handle_en_event(struct hns_roce_dev *hr_dev, u8 port,
}
spin_unlock_irqrestore(&hr_dev->iboe.lock, flags);
- return ret;
+ return 0;
}
static int hns_roce_netdev_event(struct notifier_block *self,
@@ -224,118 +184,17 @@ static int hns_roce_netdev_event(struct notifier_block *self,
return NOTIFY_DONE;
}
-static void hns_roce_addr_event(int event, struct net_device *event_netdev,
- struct hns_roce_dev *hr_dev, union ib_gid *gid)
-{
- struct hns_roce_ib_iboe *iboe = NULL;
- int gid_table_len = 0;
- unsigned long flags;
- union ib_gid zgid;
- u8 gid_idx = 0;
- u8 port = 0;
- int i = 0;
- int free;
- struct net_device *real_dev = rdma_vlan_dev_real_dev(event_netdev) ?
- rdma_vlan_dev_real_dev(event_netdev) :
- event_netdev;
-
- if (event != NETDEV_UP && event != NETDEV_DOWN)
- return;
-
- iboe = &hr_dev->iboe;
- while (port < hr_dev->caps.num_ports) {
- if (real_dev == iboe->netdevs[port])
- break;
- port++;
- }
-
- if (port >= hr_dev->caps.num_ports) {
- dev_dbg(&hr_dev->pdev->dev, "can't find netdev\n");
- return;
- }
-
- memset(zgid.raw, 0, sizeof(zgid.raw));
- free = -1;
- gid_table_len = hr_dev->caps.gid_table_len[port];
-
- spin_lock_irqsave(&hr_dev->iboe.lock, flags);
-
- for (i = 0; i < gid_table_len; i++) {
- gid_idx = hns_get_gid_index(hr_dev, port, i);
- if (!memcmp(gid->raw, iboe->gid_table[gid_idx].raw,
- sizeof(gid->raw)))
- break;
- if (free < 0 && !memcmp(zgid.raw,
- iboe->gid_table[gid_idx].raw, sizeof(zgid.raw)))
- free = i;
- }
-
- if (i >= gid_table_len) {
- if (free < 0) {
- spin_unlock_irqrestore(&hr_dev->iboe.lock, flags);
- dev_dbg(&hr_dev->pdev->dev,
- "gid_index overflow, port(%d)\n", port);
- return;
- }
- if (!hns_roce_set_gid(hr_dev, port, free, gid))
- hns_roce_update_gids(hr_dev, port);
- } else if (event == NETDEV_DOWN) {
- if (!hns_roce_set_gid(hr_dev, port, i, &zgid))
- hns_roce_update_gids(hr_dev, port);
- }
-
- spin_unlock_irqrestore(&hr_dev->iboe.lock, flags);
-}
-
-static int hns_roce_inet_event(struct notifier_block *self, unsigned long event,
- void *ptr)
-{
- struct in_ifaddr *ifa = ptr;
- struct hns_roce_dev *hr_dev;
- struct net_device *dev = ifa->ifa_dev->dev;
- union ib_gid gid;
-
- ipv6_addr_set_v4mapped(ifa->ifa_address, (struct in6_addr *)&gid);
-
- hr_dev = container_of(self, struct hns_roce_dev, iboe.nb_inet);
-
- hns_roce_addr_event(event, dev, hr_dev, &gid);
-
- return NOTIFY_DONE;
-}
-
-static int hns_roce_setup_mtu_gids(struct hns_roce_dev *hr_dev)
+static int hns_roce_setup_mtu_mac(struct hns_roce_dev *hr_dev)
{
- struct in_ifaddr *ifa_list = NULL;
- union ib_gid gid = {{0} };
- u32 ipaddr = 0;
- int index = 0;
- int ret = 0;
- u8 i = 0;
+ u8 i;
for (i = 0; i < hr_dev->caps.num_ports; i++) {
hns_roce_set_mtu(hr_dev, i,
ib_mtu_enum_to_int(hr_dev->caps.max_mtu));
hns_roce_set_mac(hr_dev, i, hr_dev->iboe.netdevs[i]->dev_addr);
-
- if (hr_dev->iboe.netdevs[i]->ip_ptr) {
- ifa_list = hr_dev->iboe.netdevs[i]->ip_ptr->ifa_list;
- index = 1;
- while (ifa_list) {
- ipaddr = ifa_list->ifa_address;
- ipv6_addr_set_v4mapped(ipaddr,
- (struct in6_addr *)&gid);
- ret = hns_roce_set_gid(hr_dev, i, index, &gid);
- if (ret)
- break;
- index++;
- ifa_list = ifa_list->ifa_next;
- }
- hns_roce_update_gids(hr_dev, i);
- }
}
- return ret;
+ return 0;
}
static int hns_roce_query_device(struct ib_device *ib_dev,
@@ -444,31 +303,6 @@ static enum rdma_link_layer hns_roce_get_link_layer(struct ib_device *device,
static int hns_roce_query_gid(struct ib_device *ib_dev, u8 port_num, int index,
union ib_gid *gid)
{
- struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev);
- struct device *dev = &hr_dev->pdev->dev;
- u8 gid_idx = 0;
- u8 port;
-
- if (port_num < 1 || port_num > hr_dev->caps.num_ports ||
- index >= hr_dev->caps.gid_table_len[port_num - 1]) {
- dev_err(dev,
- "port_num %d index %d illegal! correct range: port_num 1~%d index 0~%d!\n",
- port_num, index, hr_dev->caps.num_ports,
- hr_dev->caps.gid_table_len[port_num - 1] - 1);
- return -EINVAL;
- }
-
- port = port_num - 1;
- gid_idx = hns_get_gid_index(hr_dev, port, index);
- if (gid_idx >= HNS_ROCE_MAX_GID_NUM) {
- dev_err(dev, "port_num %d index %d illegal! total gid num %d!\n",
- port_num, index, HNS_ROCE_MAX_GID_NUM);
- return -EINVAL;
- }
-
- memcpy(gid->raw, hr_dev->iboe.gid_table[gid_idx].raw,
- HNS_ROCE_GID_SIZE);
-
return 0;
}
@@ -646,6 +480,8 @@ static int hns_roce_register_device(struct hns_roce_dev *hr_dev)
ib_dev->get_link_layer = hns_roce_get_link_layer;
ib_dev->get_netdev = hns_roce_get_netdev;
ib_dev->query_gid = hns_roce_query_gid;
+ ib_dev->add_gid = hns_roce_add_gid;
+ ib_dev->del_gid = hns_roce_del_gid;
ib_dev->query_pkey = hns_roce_query_pkey;
ib_dev->alloc_ucontext = hns_roce_alloc_ucontext;
ib_dev->dealloc_ucontext = hns_roce_dealloc_ucontext;
@@ -688,32 +524,22 @@ static int hns_roce_register_device(struct hns_roce_dev *hr_dev)
return ret;
}
- ret = hns_roce_setup_mtu_gids(hr_dev);
+ ret = hns_roce_setup_mtu_mac(hr_dev);
if (ret) {
- dev_err(dev, "roce_setup_mtu_gids failed!\n");
- goto error_failed_setup_mtu_gids;
+ dev_err(dev, "setup_mtu_mac failed!\n");
+ goto error_failed_setup_mtu_mac;
}
iboe->nb.notifier_call = hns_roce_netdev_event;
ret = register_netdevice_notifier(&iboe->nb);
if (ret) {
dev_err(dev, "register_netdevice_notifier failed!\n");
- goto error_failed_setup_mtu_gids;
- }
-
- iboe->nb_inet.notifier_call = hns_roce_inet_event;
- ret = register_inetaddr_notifier(&iboe->nb_inet);
- if (ret) {
- dev_err(dev, "register inet addr notifier failed!\n");
- goto error_failed_register_inetaddr_notifier;
+ goto error_failed_setup_mtu_mac;
}
return 0;
-error_failed_register_inetaddr_notifier:
- unregister_netdevice_notifier(&iboe->nb);
-
-error_failed_setup_mtu_gids:
+error_failed_setup_mtu_mac:
ib_unregister_device(ib_dev);
return ret;
--
1.7.9.5
^ permalink raw reply related
* [PATCH V2 for-next 09/11] IB/hns: Change qpn allocation to round-robin mode.
From: Salil Mehta @ 2016-11-15 18:10 UTC (permalink / raw)
To: dledford
Cc: salil.mehta, xavier.huwei, oulijun, xushaobo2, mehta.salil.lnk,
lijun_nudt, linux-rdma, netdev, linux-kernel, linuxarm
In-Reply-To: <20161115181053.399568-1-salil.mehta@huawei.com>
From: "Wei Hu (Xavier)" <xavier.huwei@huawei.com>
When using CM to establish connections, qp number that was freed
just now will be rejected by ib core. To fix these problem, We
change qpn allocation to round-robin mode. We added the round-robin
mode for allocating resources using bitmap. We use round-robin mode
for qp number and non round-robing mode for other resources like
cq number, pd number etc.
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
drivers/infiniband/hw/hns/hns_roce_alloc.c | 11 +++++++----
drivers/infiniband/hw/hns/hns_roce_cq.c | 4 ++--
drivers/infiniband/hw/hns/hns_roce_device.h | 9 +++++++--
drivers/infiniband/hw/hns/hns_roce_mr.c | 2 +-
drivers/infiniband/hw/hns/hns_roce_pd.c | 5 +++--
drivers/infiniband/hw/hns/hns_roce_qp.c | 2 +-
6 files changed, 21 insertions(+), 12 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_alloc.c b/drivers/infiniband/hw/hns/hns_roce_alloc.c
index 863a17a..605962f 100644
--- a/drivers/infiniband/hw/hns/hns_roce_alloc.c
+++ b/drivers/infiniband/hw/hns/hns_roce_alloc.c
@@ -61,9 +61,10 @@ int hns_roce_bitmap_alloc(struct hns_roce_bitmap *bitmap, unsigned long *obj)
return ret;
}
-void hns_roce_bitmap_free(struct hns_roce_bitmap *bitmap, unsigned long obj)
+void hns_roce_bitmap_free(struct hns_roce_bitmap *bitmap, unsigned long obj,
+ int rr)
{
- hns_roce_bitmap_free_range(bitmap, obj, 1);
+ hns_roce_bitmap_free_range(bitmap, obj, 1, rr);
}
int hns_roce_bitmap_alloc_range(struct hns_roce_bitmap *bitmap, int cnt,
@@ -106,7 +107,8 @@ int hns_roce_bitmap_alloc_range(struct hns_roce_bitmap *bitmap, int cnt,
}
void hns_roce_bitmap_free_range(struct hns_roce_bitmap *bitmap,
- unsigned long obj, int cnt)
+ unsigned long obj, int cnt,
+ int rr)
{
int i;
@@ -116,7 +118,8 @@ void hns_roce_bitmap_free_range(struct hns_roce_bitmap *bitmap,
for (i = 0; i < cnt; i++)
clear_bit(obj + i, bitmap->table);
- bitmap->last = min(bitmap->last, obj);
+ if (!rr)
+ bitmap->last = min(bitmap->last, obj);
bitmap->top = (bitmap->top + bitmap->max + bitmap->reserved_top)
& bitmap->mask;
spin_unlock(&bitmap->lock);
diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c b/drivers/infiniband/hw/hns/hns_roce_cq.c
index 461a273..c9f6c3d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
@@ -166,7 +166,7 @@ static int hns_roce_cq_alloc(struct hns_roce_dev *hr_dev, int nent,
hns_roce_table_put(hr_dev, &cq_table->table, hr_cq->cqn);
err_out:
- hns_roce_bitmap_free(&cq_table->bitmap, hr_cq->cqn);
+ hns_roce_bitmap_free(&cq_table->bitmap, hr_cq->cqn, BITMAP_NO_RR);
return ret;
}
@@ -204,7 +204,7 @@ static void hns_roce_free_cq(struct hns_roce_dev *hr_dev,
spin_unlock_irq(&cq_table->lock);
hns_roce_table_put(hr_dev, &cq_table->table, hr_cq->cqn);
- hns_roce_bitmap_free(&cq_table->bitmap, hr_cq->cqn);
+ hns_roce_bitmap_free(&cq_table->bitmap, hr_cq->cqn, BITMAP_NO_RR);
}
static int hns_roce_ib_get_cq_umem(struct hns_roce_dev *hr_dev,
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 7242b14..593a42a 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -72,6 +72,9 @@
#define HNS_ROCE_MAX_GID_NUM 16
#define HNS_ROCE_GID_SIZE 16
+#define BITMAP_NO_RR 0
+#define BITMAP_RR 1
+
#define MR_TYPE_MR 0x00
#define MR_TYPE_DMA 0x03
@@ -661,7 +664,8 @@ int hns_roce_buf_write_mtt(struct hns_roce_dev *hr_dev,
void hns_roce_cleanup_qp_table(struct hns_roce_dev *hr_dev);
int hns_roce_bitmap_alloc(struct hns_roce_bitmap *bitmap, unsigned long *obj);
-void hns_roce_bitmap_free(struct hns_roce_bitmap *bitmap, unsigned long obj);
+void hns_roce_bitmap_free(struct hns_roce_bitmap *bitmap, unsigned long obj,
+ int rr);
int hns_roce_bitmap_init(struct hns_roce_bitmap *bitmap, u32 num, u32 mask,
u32 reserved_bot, u32 resetrved_top);
void hns_roce_bitmap_cleanup(struct hns_roce_bitmap *bitmap);
@@ -669,7 +673,8 @@ int hns_roce_bitmap_init(struct hns_roce_bitmap *bitmap, u32 num, u32 mask,
int hns_roce_bitmap_alloc_range(struct hns_roce_bitmap *bitmap, int cnt,
int align, unsigned long *obj);
void hns_roce_bitmap_free_range(struct hns_roce_bitmap *bitmap,
- unsigned long obj, int cnt);
+ unsigned long obj, int cnt,
+ int rr);
struct ib_ah *hns_roce_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr);
int hns_roce_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr);
diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
index 2227962..6396bc5 100644
--- a/drivers/infiniband/hw/hns/hns_roce_mr.c
+++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
@@ -288,7 +288,7 @@ static void hns_roce_mr_free(struct hns_roce_dev *hr_dev,
}
hns_roce_bitmap_free(&hr_dev->mr_table.mtpt_bitmap,
- key_to_hw_index(mr->key));
+ key_to_hw_index(mr->key), BITMAP_NO_RR);
}
static int hns_roce_mr_enable(struct hns_roce_dev *hr_dev,
diff --git a/drivers/infiniband/hw/hns/hns_roce_pd.c b/drivers/infiniband/hw/hns/hns_roce_pd.c
index 05db7d5..a64500f 100644
--- a/drivers/infiniband/hw/hns/hns_roce_pd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_pd.c
@@ -40,7 +40,7 @@ static int hns_roce_pd_alloc(struct hns_roce_dev *hr_dev, unsigned long *pdn)
static void hns_roce_pd_free(struct hns_roce_dev *hr_dev, unsigned long pdn)
{
- hns_roce_bitmap_free(&hr_dev->pd_bitmap, pdn);
+ hns_roce_bitmap_free(&hr_dev->pd_bitmap, pdn, BITMAP_NO_RR);
}
int hns_roce_init_pd_table(struct hns_roce_dev *hr_dev)
@@ -121,7 +121,8 @@ int hns_roce_uar_alloc(struct hns_roce_dev *hr_dev, struct hns_roce_uar *uar)
void hns_roce_uar_free(struct hns_roce_dev *hr_dev, struct hns_roce_uar *uar)
{
- hns_roce_bitmap_free(&hr_dev->uar_table.bitmap, uar->index);
+ hns_roce_bitmap_free(&hr_dev->uar_table.bitmap, uar->index,
+ BITMAP_NO_RR);
}
int hns_roce_init_uar_table(struct hns_roce_dev *hr_dev)
diff --git a/drivers/infiniband/hw/hns/hns_roce_qp.c b/drivers/infiniband/hw/hns/hns_roce_qp.c
index e86dd8d..4775b5c 100644
--- a/drivers/infiniband/hw/hns/hns_roce_qp.c
+++ b/drivers/infiniband/hw/hns/hns_roce_qp.c
@@ -250,7 +250,7 @@ void hns_roce_release_range_qp(struct hns_roce_dev *hr_dev, int base_qpn,
if (base_qpn < SQP_NUM)
return;
- hns_roce_bitmap_free_range(&qp_table->bitmap, base_qpn, cnt);
+ hns_roce_bitmap_free_range(&qp_table->bitmap, base_qpn, cnt, BITMAP_RR);
}
static int hns_roce_set_rq_size(struct hns_roce_dev *hr_dev,
--
1.7.9.5
^ permalink raw reply related
* [PATCH V2 for-next 08/11] IB/hns: Modify query info named port_num when querying RC QP
From: Salil Mehta @ 2016-11-15 18:10 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: salil.mehta-hv44wF8Li93QT0dZR+AlfA,
xavier.huwei-hv44wF8Li93QT0dZR+AlfA,
oulijun-hv44wF8Li93QT0dZR+AlfA, xushaobo2-hv44wF8Li93QT0dZR+AlfA,
mehta.salil.lnk-Re5JQEeQqe8AvxtiuMwx3w, lijun_nudt-9Onoh4P/yGk,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161115181053.399568-1-salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
From: "Wei Hu (Xavier)" <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
This patch modified the output query info qp_attr->port_num
to fix bug in hip06.
Signed-off-by: Wei Hu (Xavier) <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Salil Mehta <salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index 509ea75..34b7898 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -2857,9 +2857,7 @@ static int hns_roce_v1_q_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
qp_attr->pkey_index = roce_get_field(context->qpc_bytes_12,
QP_CONTEXT_QPC_BYTES_12_P_KEY_INDEX_M,
QP_CONTEXT_QPC_BYTES_12_P_KEY_INDEX_S);
- qp_attr->port_num = (u8)roce_get_field(context->qpc_bytes_156,
- QP_CONTEXT_QPC_BYTES_156_PORT_NUM_M,
- QP_CONTEXT_QPC_BYTES_156_PORT_NUM_S) + 1;
+ qp_attr->port_num = hr_qp->port + 1;
qp_attr->sq_draining = 0;
qp_attr->max_rd_atomic = roce_get_field(context->qpc_bytes_156,
QP_CONTEXT_QPC_BYTES_156_INITIATOR_DEPTH_M,
--
1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH V2 for-next 07/11] IB/hns: Modify the macro for the timeout when cmd process
From: Salil Mehta @ 2016-11-15 18:10 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: salil.mehta-hv44wF8Li93QT0dZR+AlfA,
xavier.huwei-hv44wF8Li93QT0dZR+AlfA,
oulijun-hv44wF8Li93QT0dZR+AlfA, xushaobo2-hv44wF8Li93QT0dZR+AlfA,
mehta.salil.lnk-Re5JQEeQqe8AvxtiuMwx3w, lijun_nudt-9Onoh4P/yGk,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161115181053.399568-1-salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
From: "Wei Hu (Xavier)" <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
This patch modified the macro for the timeout when cmd is
processing as follows:
Before modification:
enum {
HNS_ROCE_CMD_TIME_CLASS_A = 10000,
HNS_ROCE_CMD_TIME_CLASS_B = 10000,
HNS_ROCE_CMD_TIME_CLASS_C = 10000,
};
After modification:
#define HNS_ROCE_CMD_TIMEOUT_MSECS 10000
Signed-off-by: Wei Hu (Xavier) <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Salil Mehta <salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
drivers/infiniband/hw/hns/hns_roce_cmd.h | 7 +------
drivers/infiniband/hw/hns/hns_roce_cq.c | 4 ++--
drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 8 ++++----
drivers/infiniband/hw/hns/hns_roce_mr.c | 4 ++--
4 files changed, 9 insertions(+), 14 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.h b/drivers/infiniband/hw/hns/hns_roce_cmd.h
index e3997d3..ed14ad3 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.h
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.h
@@ -34,6 +34,7 @@
#define _HNS_ROCE_CMD_H
#define HNS_ROCE_MAILBOX_SIZE 4096
+#define HNS_ROCE_CMD_TIMEOUT_MSECS 10000
enum {
/* TPT commands */
@@ -57,12 +58,6 @@ enum {
HNS_ROCE_CMD_QUERY_QP = 0x22,
};
-enum {
- HNS_ROCE_CMD_TIME_CLASS_A = 10000,
- HNS_ROCE_CMD_TIME_CLASS_B = 10000,
- HNS_ROCE_CMD_TIME_CLASS_C = 10000,
-};
-
struct hns_roce_cmd_mailbox {
void *buf;
dma_addr_t dma;
diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c b/drivers/infiniband/hw/hns/hns_roce_cq.c
index 5dc8d92..461a273 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
@@ -77,7 +77,7 @@ static int hns_roce_sw2hw_cq(struct hns_roce_dev *dev,
unsigned long cq_num)
{
return hns_roce_cmd_mbox(dev, mailbox->dma, 0, cq_num, 0,
- HNS_ROCE_CMD_SW2HW_CQ, HNS_ROCE_CMD_TIME_CLASS_A);
+ HNS_ROCE_CMD_SW2HW_CQ, HNS_ROCE_CMD_TIMEOUT_MSECS);
}
static int hns_roce_cq_alloc(struct hns_roce_dev *hr_dev, int nent,
@@ -176,7 +176,7 @@ static int hns_roce_hw2sw_cq(struct hns_roce_dev *dev,
{
return hns_roce_cmd_mbox(dev, 0, mailbox ? mailbox->dma : 0, cq_num,
mailbox ? 0 : 1, HNS_ROCE_CMD_HW2SW_CQ,
- HNS_ROCE_CMD_TIME_CLASS_A);
+ HNS_ROCE_CMD_TIMEOUT_MSECS);
}
static void hns_roce_free_cq(struct hns_roce_dev *hr_dev,
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index b835a55..509ea75 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -1871,12 +1871,12 @@ static int hns_roce_v1_qp_modify(struct hns_roce_dev *hr_dev,
if (op[cur_state][new_state] == HNS_ROCE_CMD_2RST_QP)
return hns_roce_cmd_mbox(hr_dev, 0, 0, hr_qp->qpn, 2,
HNS_ROCE_CMD_2RST_QP,
- HNS_ROCE_CMD_TIME_CLASS_A);
+ HNS_ROCE_CMD_TIMEOUT_MSECS);
if (op[cur_state][new_state] == HNS_ROCE_CMD_2ERR_QP)
return hns_roce_cmd_mbox(hr_dev, 0, 0, hr_qp->qpn, 2,
HNS_ROCE_CMD_2ERR_QP,
- HNS_ROCE_CMD_TIME_CLASS_A);
+ HNS_ROCE_CMD_TIMEOUT_MSECS);
mailbox = hns_roce_alloc_cmd_mailbox(hr_dev);
if (IS_ERR(mailbox))
@@ -1886,7 +1886,7 @@ static int hns_roce_v1_qp_modify(struct hns_roce_dev *hr_dev,
ret = hns_roce_cmd_mbox(hr_dev, mailbox->dma, 0, hr_qp->qpn, 0,
op[cur_state][new_state],
- HNS_ROCE_CMD_TIME_CLASS_C);
+ HNS_ROCE_CMD_TIMEOUT_MSECS);
hns_roce_free_cmd_mailbox(hr_dev, mailbox);
return ret;
@@ -2681,7 +2681,7 @@ static int hns_roce_v1_query_qpc(struct hns_roce_dev *hr_dev,
ret = hns_roce_cmd_mbox(hr_dev, 0, mailbox->dma, hr_qp->qpn, 0,
HNS_ROCE_CMD_QUERY_QP,
- HNS_ROCE_CMD_TIME_CLASS_A);
+ HNS_ROCE_CMD_TIMEOUT_MSECS);
if (!ret)
memcpy(hr_context, mailbox->buf, sizeof(*hr_context));
else
diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c
index d3dfb5f..2227962 100644
--- a/drivers/infiniband/hw/hns/hns_roce_mr.c
+++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
@@ -53,7 +53,7 @@ static int hns_roce_sw2hw_mpt(struct hns_roce_dev *hr_dev,
{
return hns_roce_cmd_mbox(hr_dev, mailbox->dma, 0, mpt_index, 0,
HNS_ROCE_CMD_SW2HW_MPT,
- HNS_ROCE_CMD_TIME_CLASS_B);
+ HNS_ROCE_CMD_TIMEOUT_MSECS);
}
static int hns_roce_hw2sw_mpt(struct hns_roce_dev *hr_dev,
@@ -62,7 +62,7 @@ static int hns_roce_hw2sw_mpt(struct hns_roce_dev *hr_dev,
{
return hns_roce_cmd_mbox(hr_dev, 0, mailbox ? mailbox->dma : 0,
mpt_index, !mailbox, HNS_ROCE_CMD_HW2SW_MPT,
- HNS_ROCE_CMD_TIME_CLASS_B);
+ HNS_ROCE_CMD_TIMEOUT_MSECS);
}
static int hns_roce_buddy_alloc(struct hns_roce_buddy *buddy, int order,
--
1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH V2 for-next 06/11] IB/hns: Fix the bug for qp state in hns_roce_v1_m_qp()
From: Salil Mehta @ 2016-11-15 18:10 UTC (permalink / raw)
To: dledford
Cc: salil.mehta, xavier.huwei, oulijun, xushaobo2, mehta.salil.lnk,
lijun_nudt, linux-rdma, netdev, linux-kernel, linuxarm
In-Reply-To: <20161115181053.399568-1-salil.mehta@huawei.com>
From: Lijun Ou <oulijun@huawei.com>
In old code, the value of qp state from qpc was assigned for
attr->qp_state. The value may be an error while attr_mask &
IB_QP_STATE is zero.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
---
drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index 643a2ff..b835a55 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -2571,7 +2571,7 @@ static int hns_roce_v1_m_qp(struct ib_qp *ibqp, const struct ib_qp_attr *attr,
/* Every status migrate must change state */
roce_set_field(context->qpc_bytes_144,
QP_CONTEXT_QPC_BYTES_144_QP_STATE_M,
- QP_CONTEXT_QPC_BYTES_144_QP_STATE_S, attr->qp_state);
+ QP_CONTEXT_QPC_BYTES_144_QP_STATE_S, new_state);
/* SW pass context to HW */
ret = hns_roce_v1_qp_modify(hr_dev, &hr_qp->mtt,
--
1.7.9.5
^ permalink raw reply related
* [PATCH V2 for-next 05/11] IB/hns: Modify the condition of notifying hardware loopback
From: Salil Mehta @ 2016-11-15 18:10 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: salil.mehta-hv44wF8Li93QT0dZR+AlfA,
xavier.huwei-hv44wF8Li93QT0dZR+AlfA,
oulijun-hv44wF8Li93QT0dZR+AlfA, xushaobo2-hv44wF8Li93QT0dZR+AlfA,
mehta.salil.lnk-Re5JQEeQqe8AvxtiuMwx3w, lijun_nudt-9Onoh4P/yGk,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161115181053.399568-1-salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
From: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
This patch modified the condition of notifying hardware loopback.
In hip06, RoCE Engine has several ports, one QP is related
to one port. hardware only support loopback in the same port,
not in the different ports.
So, If QP related to port N, the dmac in the QP context equals
the smac of the local port N or the loop_idc is 1, we should
set loopback bit in QP context to notify hardware.
Signed-off-by: Wei Hu (Xavier) <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Salil Mehta <salil.mehta-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 24 +++++++-----------------
1 file changed, 7 insertions(+), 17 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index e080dd6..643a2ff 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -2244,24 +2244,14 @@ static int hns_roce_v1_m_qp(struct ib_qp *ibqp, const struct ib_qp_attr *attr,
QP_CONTEXT_QPC_BYTE_32_SIGNALING_TYPE_S,
hr_qp->sq_signal_bits);
- for (port = 0; port < hr_dev->caps.num_ports; port++) {
- smac = (u8 *)hr_dev->dev_addr[port];
- dev_dbg(dev, "smac: %2x: %2x: %2x: %2x: %2x: %2x\n",
- smac[0], smac[1], smac[2], smac[3], smac[4],
- smac[5]);
- if ((dmac[0] == smac[0]) && (dmac[1] == smac[1]) &&
- (dmac[2] == smac[2]) && (dmac[3] == smac[3]) &&
- (dmac[4] == smac[4]) && (dmac[5] == smac[5])) {
- roce_set_bit(context->qpc_bytes_32,
- QP_CONTEXT_QPC_BYTE_32_LOOPBACK_INDICATOR_S,
- 1);
- break;
- }
- }
-
- if (hr_dev->loop_idc == 0x1)
+ port = (attr_mask & IB_QP_PORT) ? (attr->port_num - 1) :
+ hr_qp->port;
+ smac = (u8 *)hr_dev->dev_addr[port];
+ /* when dmac equals smac or loop_idc is 1, it should loopback */
+ if (ether_addr_equal_unaligned(dmac, smac) ||
+ hr_dev->loop_idc == 0x1)
roce_set_bit(context->qpc_bytes_32,
- QP_CONTEXT_QPC_BYTE_32_LOOPBACK_INDICATOR_S, 1);
+ QP_CONTEXT_QPC_BYTE_32_LOOPBACK_INDICATOR_S, 1);
roce_set_bit(context->qpc_bytes_32,
QP_CONTEXT_QPC_BYTE_32_GLOBAL_HEADER_S,
--
1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox