* [net-next][PATCH v3 00/17] net: RDS updates
From: Santosh Shilimkar @ 2017-01-02 22:45 UTC (permalink / raw)
To: netdev, davem; +Cc: linux-kernel, santosh.shilimkar
v2->v3:
- Re-based against latest net-next head.
- Dropped a user visible change after discussing with David Miller.
It needs some more work to fully support old/new tools matrix.
- Addressed Dave's comment about bool usage in patch
"RDS: IB: track and log active side..."
v1->v2:
Re-aligned indentation in patch 'RDS: mark few internal functions..."
Series consist of:
- RDMA transport fixes for map failure, listen sequence, handler panic and
composite message notification.
- Couple of sparse fixes.
- Message logging improvements for bind failure, use once mr semantics
and connection remote address, active end point.
- Performance improvement for RDMA transport by reducing the post send
pressure on the queue and spreading the CQ vectors.
- Useful statistics for socket send/recv usage and receive cache usage.
- Additional RDS CMSG used by application to track the RDS message
stages for certain type of traffic to find out latency spots.
Can be enabled/disabled per socket.
Series generated against 'net-next'. Full patchset is also available on
below git tree.
The following changes since commit 525dfa2cdce4f5ab76251b5e57ebabf4f2dfc40c:
Merge branch 'mlx5-odp' (2017-01-02 15:51:21 -0500)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git for_4.11/net-next/rds_v3
for you to fetch changes up to 3289025aedc018f8fd9d0e37fb9efa0c6d531ffa:
RDS: add receive message trace used by application (2017-01-02 14:02:59 -0800)
----------------------------------------------------------------
Avinash Repaka (1):
RDS: make message size limit compliant with spec
Qing Huang (1):
RDS: RDMA: start rdma listening after init
Santosh Shilimkar (14):
RDS: log the address on bind failure
RDS: mark few internal functions static to make sparse build happy
RDS: IB: include faddr in connection log
RDS: IB: make the transport retry count smallest
RDS: RDMA: fix the ib_map_mr_sg_zbva() argument
RDS: RDMA: return appropriate error on rdma map failures
RDS: IB: split the mr registration and invalidation path
RDS: RDMA: silence the use_once mr log flood
RDS: IB: track and log active side endpoint in connection
RDS: IB: add few useful cache stasts
RDS: IB: Add vector spreading for cqs
RDS: RDMA: Fix the composite message user notification
RDS: IB: fix panic due to handlers running post teardown
RDS: add receive message trace used by application
Venkat Venkatsubra (1):
RDS: add stat for socket recv memory usage
include/uapi/linux/rds.h | 33 ++++++++++++++++++
net/rds/af_rds.c | 28 +++++++++++++++
net/rds/bind.c | 4 +--
net/rds/connection.c | 10 +++---
net/rds/ib.c | 11 ++++++
net/rds/ib.h | 22 ++++++++++--
net/rds/ib_cm.c | 89 ++++++++++++++++++++++++++++++++++++++----------
net/rds/ib_frmr.c | 16 +++++----
net/rds/ib_recv.c | 14 ++++++--
net/rds/ib_send.c | 29 +++++++++-------
net/rds/ib_stats.c | 2 ++
net/rds/rdma.c | 22 ++++++++++--
net/rds/rdma_transport.c | 11 ++----
net/rds/rds.h | 17 +++++++++
net/rds/recv.c | 36 ++++++++++++++++++--
net/rds/send.c | 50 ++++++++++++++++++++++++---
net/rds/tcp_listen.c | 1 +
net/rds/tcp_recv.c | 5 +++
18 files changed, 335 insertions(+), 65 deletions(-)
--
1.9.1
^ permalink raw reply
* [net-next][PATCH v3 05/17] RDS: RDMA: fix the ib_map_mr_sg_zbva() argument
From: Santosh Shilimkar @ 2017-01-02 22:45 UTC (permalink / raw)
To: netdev, davem; +Cc: linux-kernel, santosh.shilimkar
In-Reply-To: <1483397152-8307-1-git-send-email-santosh.shilimkar@oracle.com>
Fixes warning: Using plain integer as NULL pointer
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
net/rds/ib_frmr.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/rds/ib_frmr.c b/net/rds/ib_frmr.c
index d921adc..66b3d62 100644
--- a/net/rds/ib_frmr.c
+++ b/net/rds/ib_frmr.c
@@ -104,14 +104,15 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
struct rds_ib_frmr *frmr = &ibmr->u.frmr;
struct ib_send_wr *failed_wr;
struct ib_reg_wr reg_wr;
- int ret;
+ int ret, off = 0;
while (atomic_dec_return(&ibmr->ic->i_fastreg_wrs) <= 0) {
atomic_inc(&ibmr->ic->i_fastreg_wrs);
cpu_relax();
}
- ret = ib_map_mr_sg_zbva(frmr->mr, ibmr->sg, ibmr->sg_len, 0, PAGE_SIZE);
+ ret = ib_map_mr_sg_zbva(frmr->mr, ibmr->sg, ibmr->sg_len,
+ &off, PAGE_SIZE);
if (unlikely(ret != ibmr->sg_len))
return ret < 0 ? ret : -EINVAL;
--
1.9.1
^ permalink raw reply related
* [net-next][PATCH v3 06/17] RDS: RDMA: start rdma listening after init
From: Santosh Shilimkar @ 2017-01-02 22:45 UTC (permalink / raw)
To: netdev, davem; +Cc: linux-kernel, santosh.shilimkar
In-Reply-To: <1483397152-8307-1-git-send-email-santosh.shilimkar@oracle.com>
From: Qing Huang <qing.huang@oracle.com>
This prevents RDS from handling incoming rdma packets before RDS
completes initializing its recv/send components.
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
net/rds/rdma_transport.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index d5f3117..fc59821 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -206,18 +206,13 @@ static int rds_rdma_init(void)
{
int ret;
- ret = rds_rdma_listen_init();
+ ret = rds_ib_init();
if (ret)
goto out;
- ret = rds_ib_init();
+ ret = rds_rdma_listen_init();
if (ret)
- goto err_ib_init;
-
- goto out;
-
-err_ib_init:
- rds_rdma_listen_stop();
+ rds_ib_exit();
out:
return ret;
}
--
1.9.1
^ permalink raw reply related
* [net-next][PATCH v3 08/17] RDS: IB: split the mr registration and invalidation path
From: Santosh Shilimkar @ 2017-01-02 22:45 UTC (permalink / raw)
To: netdev, davem; +Cc: linux-kernel, santosh.shilimkar
In-Reply-To: <1483397152-8307-1-git-send-email-santosh.shilimkar@oracle.com>
MR invalidation in RDS is done in background thread and not in
data path like registration. So break the dependency between them
which helps to remove the performance bottleneck.
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
net/rds/ib.h | 4 +++-
net/rds/ib_cm.c | 9 +++++++--
net/rds/ib_frmr.c | 11 ++++++-----
3 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/net/rds/ib.h b/net/rds/ib.h
index f4e8121..f14c26d 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -14,7 +14,8 @@
#define RDS_IB_DEFAULT_RECV_WR 1024
#define RDS_IB_DEFAULT_SEND_WR 256
-#define RDS_IB_DEFAULT_FR_WR 512
+#define RDS_IB_DEFAULT_FR_WR 256
+#define RDS_IB_DEFAULT_FR_INV_WR 256
#define RDS_IB_DEFAULT_RETRY_COUNT 1
@@ -125,6 +126,7 @@ struct rds_ib_connection {
/* To control the number of wrs from fastreg */
atomic_t i_fastreg_wrs;
+ atomic_t i_fastunreg_wrs;
/* interrupt handling */
struct tasklet_struct i_send_tasklet;
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index b9da1e5..3002acf 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -382,7 +382,10 @@ static int rds_ib_setup_qp(struct rds_connection *conn)
* completion queue and send queue. This extra space is used for FRMR
* registration and invalidation work requests
*/
- fr_queue_space = (rds_ibdev->use_fastreg ? RDS_IB_DEFAULT_FR_WR : 0);
+ fr_queue_space = rds_ibdev->use_fastreg ?
+ (RDS_IB_DEFAULT_FR_WR + 1) +
+ (RDS_IB_DEFAULT_FR_INV_WR + 1)
+ : 0;
/* add the conn now so that connection establishment has the dev */
rds_ib_add_conn(rds_ibdev, conn);
@@ -444,6 +447,7 @@ static int rds_ib_setup_qp(struct rds_connection *conn)
attr.send_cq = ic->i_send_cq;
attr.recv_cq = ic->i_recv_cq;
atomic_set(&ic->i_fastreg_wrs, RDS_IB_DEFAULT_FR_WR);
+ atomic_set(&ic->i_fastunreg_wrs, RDS_IB_DEFAULT_FR_INV_WR);
/*
* XXX this can fail if max_*_wr is too large? Are we supposed
@@ -766,7 +770,8 @@ void rds_ib_conn_path_shutdown(struct rds_conn_path *cp)
wait_event(rds_ib_ring_empty_wait,
rds_ib_ring_empty(&ic->i_recv_ring) &&
(atomic_read(&ic->i_signaled_sends) == 0) &&
- (atomic_read(&ic->i_fastreg_wrs) == RDS_IB_DEFAULT_FR_WR));
+ (atomic_read(&ic->i_fastreg_wrs) == RDS_IB_DEFAULT_FR_WR) &&
+ (atomic_read(&ic->i_fastunreg_wrs) == RDS_IB_DEFAULT_FR_INV_WR));
tasklet_kill(&ic->i_send_tasklet);
tasklet_kill(&ic->i_recv_tasklet);
diff --git a/net/rds/ib_frmr.c b/net/rds/ib_frmr.c
index 66b3d62..48332a6 100644
--- a/net/rds/ib_frmr.c
+++ b/net/rds/ib_frmr.c
@@ -241,8 +241,8 @@ static int rds_ib_post_inv(struct rds_ib_mr *ibmr)
if (frmr->fr_state != FRMR_IS_INUSE)
goto out;
- while (atomic_dec_return(&ibmr->ic->i_fastreg_wrs) <= 0) {
- atomic_inc(&ibmr->ic->i_fastreg_wrs);
+ while (atomic_dec_return(&ibmr->ic->i_fastunreg_wrs) <= 0) {
+ atomic_inc(&ibmr->ic->i_fastunreg_wrs);
cpu_relax();
}
@@ -261,7 +261,7 @@ static int rds_ib_post_inv(struct rds_ib_mr *ibmr)
if (unlikely(ret)) {
frmr->fr_state = FRMR_IS_STALE;
frmr->fr_inv = false;
- atomic_inc(&ibmr->ic->i_fastreg_wrs);
+ atomic_inc(&ibmr->ic->i_fastunreg_wrs);
pr_err("RDS/IB: %s returned error(%d)\n", __func__, ret);
goto out;
}
@@ -289,9 +289,10 @@ void rds_ib_mr_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc)
if (frmr->fr_inv) {
frmr->fr_state = FRMR_IS_FREE;
frmr->fr_inv = false;
+ atomic_inc(&ic->i_fastreg_wrs);
+ } else {
+ atomic_inc(&ic->i_fastunreg_wrs);
}
-
- atomic_inc(&ic->i_fastreg_wrs);
}
void rds_ib_unreg_frmr(struct list_head *list, unsigned int *nfreed,
--
1.9.1
^ permalink raw reply related
* [net-next][PATCH v3 03/17] RDS: IB: include faddr in connection log
From: Santosh Shilimkar @ 2017-01-02 22:45 UTC (permalink / raw)
To: netdev, davem; +Cc: linux-kernel, santosh.shilimkar
In-Reply-To: <1483397152-8307-1-git-send-email-santosh.shilimkar@oracle.com>
Also use pr_* for it.
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
net/rds/ib_cm.c | 19 +++++++++----------
net/rds/ib_recv.c | 4 ++--
net/rds/ib_send.c | 4 ++--
3 files changed, 13 insertions(+), 14 deletions(-)
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index 5b2ab95..b9da1e5 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -113,19 +113,18 @@ void rds_ib_cm_connect_complete(struct rds_connection *conn, struct rdma_cm_even
}
if (conn->c_version < RDS_PROTOCOL(3, 1)) {
- printk(KERN_NOTICE "RDS/IB: Connection to %pI4 version %u.%u failed,"
- " no longer supported\n",
- &conn->c_faddr,
- RDS_PROTOCOL_MAJOR(conn->c_version),
- RDS_PROTOCOL_MINOR(conn->c_version));
+ pr_notice("RDS/IB: Connection <%pI4,%pI4> version %u.%u no longer supported\n",
+ &conn->c_laddr, &conn->c_faddr,
+ RDS_PROTOCOL_MAJOR(conn->c_version),
+ RDS_PROTOCOL_MINOR(conn->c_version));
rds_conn_destroy(conn);
return;
} else {
- printk(KERN_NOTICE "RDS/IB: connected to %pI4 version %u.%u%s\n",
- &conn->c_faddr,
- RDS_PROTOCOL_MAJOR(conn->c_version),
- RDS_PROTOCOL_MINOR(conn->c_version),
- ic->i_flowctl ? ", flow control" : "");
+ pr_notice("RDS/IB: connected <%pI4,%pI4> version %u.%u%s\n",
+ &conn->c_laddr, &conn->c_faddr,
+ RDS_PROTOCOL_MAJOR(conn->c_version),
+ RDS_PROTOCOL_MINOR(conn->c_version),
+ ic->i_flowctl ? ", flow control" : "");
}
/*
diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 606a11f..6803b75 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -980,8 +980,8 @@ void rds_ib_recv_cqe_handler(struct rds_ib_connection *ic,
} else {
/* We expect errors as the qp is drained during shutdown */
if (rds_conn_up(conn) || rds_conn_connecting(conn))
- rds_ib_conn_error(conn, "recv completion on %pI4 had status %u (%s), disconnecting and reconnecting\n",
- &conn->c_faddr,
+ rds_ib_conn_error(conn, "recv completion on <%pI4,%pI4> had status %u (%s), disconnecting and reconnecting\n",
+ &conn->c_laddr, &conn->c_faddr,
wc->status,
ib_wc_status_msg(wc->status));
}
diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c
index 84d90c9..19eca5c 100644
--- a/net/rds/ib_send.c
+++ b/net/rds/ib_send.c
@@ -300,8 +300,8 @@ void rds_ib_send_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc)
/* We expect errors as the qp is drained during shutdown */
if (wc->status != IB_WC_SUCCESS && rds_conn_up(conn)) {
- rds_ib_conn_error(conn, "send completion on %pI4 had status %u (%s), disconnecting and reconnecting\n",
- &conn->c_faddr, wc->status,
+ rds_ib_conn_error(conn, "send completion on <%pI4,%pI4> had status %u (%s), disconnecting and reconnecting\n",
+ &conn->c_laddr, &conn->c_faddr, wc->status,
ib_wc_status_msg(wc->status));
}
}
--
1.9.1
^ permalink raw reply related
* [net-next][PATCH v3 09/17] RDS: RDMA: silence the use_once mr log flood
From: Santosh Shilimkar @ 2017-01-02 22:45 UTC (permalink / raw)
To: netdev, davem; +Cc: linux-kernel, santosh.shilimkar
In-Reply-To: <1483397152-8307-1-git-send-email-santosh.shilimkar@oracle.com>
In absence of extension headers, message log will keep
flooding the console. As such even without use_once we can
clean up the MRs so its not really an error case message
so make it debug message
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
net/rds/rdma.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/rds/rdma.c b/net/rds/rdma.c
index ea96114..4297f3f 100644
--- a/net/rds/rdma.c
+++ b/net/rds/rdma.c
@@ -415,7 +415,8 @@ void rds_rdma_unuse(struct rds_sock *rs, u32 r_key, int force)
spin_lock_irqsave(&rs->rs_rdma_lock, flags);
mr = rds_mr_tree_walk(&rs->rs_rdma_keys, r_key, NULL);
if (!mr) {
- printk(KERN_ERR "rds: trying to unuse MR with unknown r_key %u!\n", r_key);
+ pr_debug("rds: trying to unuse MR with unknown r_key %u!\n",
+ r_key);
spin_unlock_irqrestore(&rs->rs_rdma_lock, flags);
return;
}
--
1.9.1
^ permalink raw reply related
* [net-next][PATCH v3 11/17] RDS: IB: add few useful cache stasts
From: Santosh Shilimkar @ 2017-01-02 22:45 UTC (permalink / raw)
To: netdev, davem; +Cc: linux-kernel, santosh.shilimkar
In-Reply-To: <1483397152-8307-1-git-send-email-santosh.shilimkar@oracle.com>
Tracks the ib receive cache total, incoming and frag allocations.
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
net/rds/ib.h | 7 +++++++
net/rds/ib_recv.c | 6 ++++++
net/rds/ib_stats.c | 2 ++
3 files changed, 15 insertions(+)
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 5f02b4d..c62e551 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -151,6 +151,7 @@ struct rds_ib_connection {
u64 i_ack_recv; /* last ACK received */
struct rds_ib_refill_cache i_cache_incs;
struct rds_ib_refill_cache i_cache_frags;
+ atomic_t i_cache_allocs;
/* sending acks */
unsigned long i_ack_flags;
@@ -254,6 +255,8 @@ struct rds_ib_statistics {
uint64_t s_ib_rx_refill_from_cq;
uint64_t s_ib_rx_refill_from_thread;
uint64_t s_ib_rx_alloc_limit;
+ uint64_t s_ib_rx_total_frags;
+ uint64_t s_ib_rx_total_incs;
uint64_t s_ib_rx_credit_updates;
uint64_t s_ib_ack_sent;
uint64_t s_ib_ack_send_failure;
@@ -276,6 +279,8 @@ struct rds_ib_statistics {
uint64_t s_ib_rdma_mr_1m_reused;
uint64_t s_ib_atomic_cswp;
uint64_t s_ib_atomic_fadd;
+ uint64_t s_ib_recv_added_to_cache;
+ uint64_t s_ib_recv_removed_from_cache;
};
extern struct workqueue_struct *rds_ib_wq;
@@ -406,6 +411,8 @@ int rds_ib_send_grab_credits(struct rds_ib_connection *ic, u32 wanted,
/* ib_stats.c */
DECLARE_PER_CPU(struct rds_ib_statistics, rds_ib_stats);
#define rds_ib_stats_inc(member) rds_stats_inc_which(rds_ib_stats, member)
+#define rds_ib_stats_add(member, count) \
+ rds_stats_add_which(rds_ib_stats, member, count)
unsigned int rds_ib_stats_info_copy(struct rds_info_iterator *iter,
unsigned int avail);
diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 6803b75..4b0f126 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -194,6 +194,8 @@ static void rds_ib_frag_free(struct rds_ib_connection *ic,
rdsdebug("frag %p page %p\n", frag, sg_page(&frag->f_sg));
rds_ib_recv_cache_put(&frag->f_cache_entry, &ic->i_cache_frags);
+ atomic_add(RDS_FRAG_SIZE / SZ_1K, &ic->i_cache_allocs);
+ rds_ib_stats_add(s_ib_recv_added_to_cache, RDS_FRAG_SIZE);
}
/* Recycle inc after freeing attached frags */
@@ -261,6 +263,7 @@ static struct rds_ib_incoming *rds_ib_refill_one_inc(struct rds_ib_connection *i
atomic_dec(&rds_ib_allocation);
return NULL;
}
+ rds_ib_stats_inc(s_ib_rx_total_incs);
}
INIT_LIST_HEAD(&ibinc->ii_frags);
rds_inc_init(&ibinc->ii_inc, ic->conn, ic->conn->c_faddr);
@@ -278,6 +281,8 @@ static struct rds_page_frag *rds_ib_refill_one_frag(struct rds_ib_connection *ic
cache_item = rds_ib_recv_cache_get(&ic->i_cache_frags);
if (cache_item) {
frag = container_of(cache_item, struct rds_page_frag, f_cache_entry);
+ atomic_sub(RDS_FRAG_SIZE / SZ_1K, &ic->i_cache_allocs);
+ rds_ib_stats_add(s_ib_recv_added_to_cache, RDS_FRAG_SIZE);
} else {
frag = kmem_cache_alloc(rds_ib_frag_slab, slab_mask);
if (!frag)
@@ -290,6 +295,7 @@ static struct rds_page_frag *rds_ib_refill_one_frag(struct rds_ib_connection *ic
kmem_cache_free(rds_ib_frag_slab, frag);
return NULL;
}
+ rds_ib_stats_inc(s_ib_rx_total_frags);
}
INIT_LIST_HEAD(&frag->f_item);
diff --git a/net/rds/ib_stats.c b/net/rds/ib_stats.c
index 7e78dca..9252ad1 100644
--- a/net/rds/ib_stats.c
+++ b/net/rds/ib_stats.c
@@ -55,6 +55,8 @@
"ib_rx_refill_from_cq",
"ib_rx_refill_from_thread",
"ib_rx_alloc_limit",
+ "ib_rx_total_frags",
+ "ib_rx_total_incs",
"ib_rx_credit_updates",
"ib_ack_sent",
"ib_ack_send_failure",
--
1.9.1
^ permalink raw reply related
* [PATCH] net: freescale: dpaa: use new api ethtool_{get|set}_link_ksettings
From: Philippe Reynes @ 2017-01-02 22:49 UTC (permalink / raw)
To: madalin.bucur, davem; +Cc: netdev, linux-kernel, Philippe Reynes
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
---
drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 18 +++++++++---------
1 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
index 27e7044..15571e2 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
@@ -72,8 +72,8 @@
#define DPAA_STATS_PERCPU_LEN ARRAY_SIZE(dpaa_stats_percpu)
#define DPAA_STATS_GLOBAL_LEN ARRAY_SIZE(dpaa_stats_global)
-static int dpaa_get_settings(struct net_device *net_dev,
- struct ethtool_cmd *et_cmd)
+static int dpaa_get_link_ksettings(struct net_device *net_dev,
+ struct ethtool_link_ksettings *cmd)
{
int err;
@@ -82,13 +82,13 @@ static int dpaa_get_settings(struct net_device *net_dev,
return 0;
}
- err = phy_ethtool_gset(net_dev->phydev, et_cmd);
+ err = phy_ethtool_ksettings_get(net_dev->phydev, cmd);
return err;
}
-static int dpaa_set_settings(struct net_device *net_dev,
- struct ethtool_cmd *et_cmd)
+static int dpaa_set_link_ksettings(struct net_device *net_dev,
+ const struct ethtool_link_ksettings *cmd)
{
int err;
@@ -97,9 +97,9 @@ static int dpaa_set_settings(struct net_device *net_dev,
return -ENODEV;
}
- err = phy_ethtool_sset(net_dev->phydev, et_cmd);
+ err = phy_ethtool_ksettings_set(net_dev->phydev, cmd);
if (err < 0)
- netdev_err(net_dev, "phy_ethtool_sset() = %d\n", err);
+ netdev_err(net_dev, "phy_ethtool_ksettings_set() = %d\n", err);
return err;
}
@@ -402,8 +402,6 @@ static void dpaa_get_strings(struct net_device *net_dev, u32 stringset,
}
const struct ethtool_ops dpaa_ethtool_ops = {
- .get_settings = dpaa_get_settings,
- .set_settings = dpaa_set_settings,
.get_drvinfo = dpaa_get_drvinfo,
.get_msglevel = dpaa_get_msglevel,
.set_msglevel = dpaa_set_msglevel,
@@ -414,4 +412,6 @@ static void dpaa_get_strings(struct net_device *net_dev, u32 stringset,
.get_sset_count = dpaa_get_sset_count,
.get_ethtool_stats = dpaa_get_ethtool_stats,
.get_strings = dpaa_get_strings,
+ .get_link_ksettings = dpaa_get_link_ksettings,
+ .set_link_ksettings = dpaa_set_link_ksettings,
};
--
1.7.4.4
^ permalink raw reply related
* Re: [PATCH v2 net-next 1/2] af_packet: TX_RING support for TPACKET_V3
From: Willem de Bruijn @ 2017-01-02 22:57 UTC (permalink / raw)
To: Sowmini Varadhan
Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
David Miller
In-Reply-To: <bc66616d08758cfe225a4b02316b72de16e00302.1483309545.git.sowmini.varadhan@oracle.com>
On Sun, Jan 1, 2017 at 5:45 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> Although TPACKET_V3 Rx has some benefits over TPACKET_V2 Rx, *_v3
> does not currently have TX_RING support. As a result an application
> that wants the best perf for Tx and Rx (e.g. to handle request/response
> transacations) ends up needing 2 sockets, one with *_v2 for Tx and
> another with *_v3 for Rx.
>
> This patch enables TPACKET_V2 compatible Tx features in TPACKET_V3
> so that an application can use a single descriptor to get the benefits
> of _v3 RX_RING and _v2 TX_RING. An application may do a block-send by
> first filling up multiple frames in the Tx ring and then triggering a
> transmit. This patch only support fixed size Tx frames for TPACKET_V3,
> and requires that tp_next_offset must be zero.
>
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> ---
> v2: sanity checks on tp_next_offset and corresponding Doc updates
> as suggested by Willem de Bruijn
>
> Documentation/networking/packet_mmap.txt | 9 +++++++--
> net/packet/af_packet.c | 27 +++++++++++++++++++--------
> 2 files changed, 26 insertions(+), 10 deletions(-)
> @@ -4180,9 +4193,7 @@ static int packet_set_ring(struct sock *sk, union tpacket_req_u *req_u,
> goto out;
> switch (po->tp_version) {
> case TPACKET_V3:
> - /* Transmit path is not supported. We checked
> - * it above but just being paranoid
> - */
> + /* Block transmit is not supported yet */
> if (!tx_ring)
> init_prb_bdqc(po, rb, pg_vec, req_u);
One more point. We should validate the tpacket_req3 input and fail if
unsupported options are passed. Specifically, fail if any of
{tp_retire_blk_tov, tp_sizeof_priv, tp_feature_req_word} is non-zero.
Otherwise looks good to me.
^ permalink raw reply
* Re: [PATCH net-next] net/sched: cls_flower: Add user specified data
From: John Fastabend @ 2017-01-02 22:58 UTC (permalink / raw)
To: Jamal Hadi Salim, Paul Blakey, David S. Miller, netdev
Cc: Jiri Pirko, Hadar Hen Zion, Or Gerlitz, Roi Dayan, Roman Mashak,
Simon Horman
In-Reply-To: <6b671aaf-d35d-70a5-65e0-40308baeb471@mojatatu.com>
On 17-01-02 02:21 PM, Jamal Hadi Salim wrote:
> On 17-01-02 01:23 PM, John Fastabend wrote:
>
>>
>> Additionally I would like to point out this is an arbitrary length binary
>> blob (for undefined use, without even a specified encoding) that gets pushed
>> between user space and hardware ;) This seemed to get folks fairly excited in
>> the past.
>>
>
> The binary blob size is a little strange - but i think there is value
> in storing some "cookie" field. The challenge is whether the kernel
> gets to intepret it; in which case encoding must be specified. Or
> whether we should leave it up to user space - in which something
> like tc could standardize its own encodings.
>
Well having the length value avoids ending up with cookie1, cookie2, ...
values as folks push more and more data into the cookie.
I don't see any use in the kernel interpreting it. It has no use
for it as far as I can see. It doesn't appear to be metadata which
we use skb->mark for at the moment.
>> Some questions, exactly what do you mean by "port mappings" above? In
>> general the 'tc' API uses the netdev the netlink msg is processed on as
>> the port mapping. If you mean OVS port to netdev port I think this is
>> a OVS problem and nothing to do with 'tc'. For what its worth there is an
>> existing problem with 'tc' where rules only apply to a single ingress or
>> egress port which is limiting on hardware.
>>
>
> In our case the desire is to be able to correlate for a system wide
> mostly identity/key mapping.
>
The tuple <ifindex:qdisc:prio:handle> really should be unique why
not use this for system wide mappings?
The only thing I can think to do with this that I can't do with
the above tuple and a simple userspace lookup is stick hardware specific
"hints" in the cookie for the firmware to consume. Which would be
very helpful for what its worth.
>> The UFID in my ovs code base is defined as best I can tell here,
>>
>> [OVS_FLOW_ATTR_UFID] = { .type = NL_A_UNSPEC, .optional = true,
>> .min_len = sizeof(ovs_u128) },
>>
>> So you need 128 bits if you want a 1:1 mapping onto 'tc'. So rather
>> than an arbitrary blob why not make the case that 'tc' ids need to be
>> 128 bits long? Even if its just initially done in flower call it
>> flower_flow_id and define it so its not opaque and at least at the code
>> level it isn't an arbitrary blob of data.
>>
>
> I dont know what this UFID is, but do note:
> The idea is not new - the FIB for example has some such cookie
> (albeit a tiny one) which will typically be populated to tell
> you who/what installed the entry.
> I could see f.e use for this cookie to simplify and pretty print in
> a human language for the u32 classifier (i.e user space tc sets
> some fields in the cookie when updating kernel and when user space
> invokes get/dump it uses the cookie to intepret how to pretty print).
>
> I have attached a compile tested version of the cookies on actions
> (flat 64 bit; now that we have experienced the use when we have a
> large number of counters - I would not mind a 128 bit field).
>
Its a bit strange to push it as an action when its not really an
action in the traditional datapath.
I suspect the OVS usage is a simple 1:1 lookup from OVS id to TC id to
avoid a userspace map lookup.
>
> cheers,
> jamal
>
>> And what are the "next" uses of this besides OVS. It would be really
>> valuable to see how this generalizes to other usage models. To avoid
>> embedding OVS syntax into 'tc'.
>>
>> Finally if you want to see an example of binary data encodings look at
>> how drivers/hardware/users are currently using the user defined bits in
>> ethtools ntuple API. Also track down out of tree drivers to see other
>> interesting uses. And that was capped at 64bits :/
>>
>> Thanks,
>> John
>>
>>
>>
>>
>>
>
^ permalink raw reply
* Re: [PATCH v2 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support
From: Sowmini Varadhan @ 2017-01-02 23:02 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
David Miller
In-Reply-To: <CAF=yD-JLDM9cekGTz7m3VYXmf4d+HEMX4dYZx=sB1X7+5drqsA@mail.gmail.com>
On (01/02/17 17:31), Willem de Bruijn wrote:
>
> Thanks for adding this.
>
> walk_v3_tx is almost identical to walk_v1_v2_tx. That function can
> just be extended to add a v3 case where it already multiplexes between
> v1 and v2.
I looked at that, but the sticky point is that v1/v2 sets up the
ring->rd* related variables based on frames (e.g., rd_num is tp_frame_nr)
whereas V3 sets these up based on blocks (e.g, rd_num is tp_block_nr)
so this impacts the core sending loop a bit.
I suppose we could change the walk_v2_v2_tx to be something like
while (total_packets > 0) {
if (ring->version) {
/* V3 send, that takes above difference into account */
} else {
/* existing code */
}
/* status_bar_update(), user_ready update frame_num */
}
I can change it as above, if you think this would help.
--Sowmini
^ permalink raw reply
* Re: [PATCH v2 net-next 1/2] af_packet: TX_RING support for TPACKET_V3
From: Sowmini Varadhan @ 2017-01-02 23:07 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
David Miller
In-Reply-To: <CAF=yD-J2Sw0=iTuvPe-sa01Ld0D1U6f=R-99WLRKggap_WKdug@mail.gmail.com>
On (01/02/17 17:57), Willem de Bruijn wrote:
> One more point. We should validate the tpacket_req3 input and fail if
> unsupported options are passed. Specifically, fail if any of
> {tp_retire_blk_tov, tp_sizeof_priv, tp_feature_req_word} is non-zero.
>
> Otherwise looks good to me.
Ok, I'll send out v3 tomorrow, with the test case also updated
to share code with walk_v1_v2_tx as cleanly as possible.
Thanks for the review feedback!
--Sowmini
^ permalink raw reply
* Re: [PATCH v2 net-next 2/2] tools: test case for TPACKET_V3/TX_RING support
From: Willem de Bruijn @ 2017-01-02 23:15 UTC (permalink / raw)
To: Sowmini Varadhan
Cc: Network Development, Daniel Borkmann, Willem de Bruijn,
David Miller
In-Reply-To: <20170102230242.GC31716@oracle.com>
On Mon, Jan 2, 2017 at 6:02 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (01/02/17 17:31), Willem de Bruijn wrote:
>>
>> Thanks for adding this.
>>
>> walk_v3_tx is almost identical to walk_v1_v2_tx. That function can
>> just be extended to add a v3 case where it already multiplexes between
>> v1 and v2.
>
> I looked at that, but the sticky point is that v1/v2 sets up the
> ring->rd* related variables based on frames (e.g., rd_num is tp_frame_nr)
> whereas V3 sets these up based on blocks (e.g, rd_num is tp_block_nr)
> so this impacts the core sending loop a bit.
Good point. Yes, deduplicating the function will help make it crystal
clear where v3 differs from v2.
The patch already has __v3_tx_kernel_ready and __v3_tx_user_ready,
which can be plugged into the existing multiplexer functions
__v1_v2_tx_kernel_ready and __v2_v2_tx_user_ready multiplexer
(along with changing their names).
We'll indeed need a similar multiplexer function for calculating the next
frame to work around this rd_num issue, then.
^ permalink raw reply
* Re: tcp_bbr: Forcing set of BBR congestion control as default
From: Neal Cardwell @ 2017-01-02 23:16 UTC (permalink / raw)
To: sedat.dilek; +Cc: Netdev
In-Reply-To: <CA+icZUVkui7PaeD7JAfdqJZQY2c9goANwi9foFej3=CsZX+SVA@mail.gmail.com>
On Mon, Jan 2, 2017 at 2:30 PM, Sedat Dilek <sedat.dilek@gmail.com> wrote:
> OK, this looks now good.
Great. Glad to hear it!
> Does BBR only work with fq-qdisc best?
Yes. BBR is designed to work with pacing. And so far the "fq" qdisc is
the only qdisc that offers pacing. So BBR currently needs the "fq"
qdisc. In the future, other qdiscs (or even other layers of the stack)
may offer pacing, in which case BBR could use those as well.
> What about fq_codel?
The "fq_codel" qdisc does not implement pacing, so it would not be sufficient.
cheers,
neal
^ permalink raw reply
* Re: [PATCH for-next V2 00/11] Mellanox mlx5 core and ODP updates 2017-01-01
From: Saeed Mahameed @ 2017-01-02 23:30 UTC (permalink / raw)
To: David Miller
Cc: Saeed Mahameed, Doug Ledford, Linux Netdev List, linux-rdma,
Leon Romanovsky, ilyal, artemyko
In-Reply-To: <20170102.155341.1496970446718890403.davem@davemloft.net>
On Mon, Jan 2, 2017 at 10:53 PM, David Miller <davem@davemloft.net> wrote:
> From: Saeed Mahameed <saeedm@mellanox.com>
> Date: Mon, 2 Jan 2017 11:37:37 +0200
>
>> The following eleven patches mainly come from Artemy Kovalyov
>> who expanded mlx5 on-demand-paging (ODP) support. In addition
>> there are three cleanup patches which don't change any functionality,
>> but are needed to align codebase prior accepting other patches.
>
> Series applied to net-next, thanks.
Whoops,
This series was meant as a pull request, you can blame it on me I
kinda messed up the V2 title.
Doug will have to pull same patches later, will this produce a
conflict on merge window ?
Sorry for the confusion.
^ permalink raw reply
* [PATCH] drop_monitor: consider inserted data in genlmsg_end
From: Reiter Wolfgang @ 2017-01-02 23:34 UTC (permalink / raw)
To: nhorman; +Cc: davem, netdev, linux-kernel, Reiter Wolfgang
Final nlmsg_len field update must reflect inserted net_dm_drop_point
data.
This patch depends on previous patch:
"drop_monitor: add missing call to genlmsg_end"
Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
---
net/core/drop_monitor.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/core/drop_monitor.c b/net/core/drop_monitor.c
index f465bad..ccaaf3e 100644
--- a/net/core/drop_monitor.c
+++ b/net/core/drop_monitor.c
@@ -102,7 +102,6 @@ static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data)
}
msg = nla_data(nla);
memset(msg, 0, al);
- genlmsg_end(skb, msg_header);
goto out;
err:
@@ -112,6 +111,12 @@ static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data)
swap(data->skb, skb);
spin_unlock_irqrestore(&data->lock, flags);
+ if(skb) {
+ struct nlmsghdr *nlh = (struct nlmsghdr *)skb->data;
+ struct genlmsghdr *gnlh = (struct genlmsghdr *)nlmsg_data(nlh);
+ genlmsg_end(skb, genlmsg_data(gnlh));
+ }
+
return skb;
}
--
2.9.3
^ permalink raw reply related
* Re: [PATCH] drop_monitor: consider inserted data in genlmsg_end
From: David Miller @ 2017-01-03 0:30 UTC (permalink / raw)
To: wr0112358; +Cc: nhorman, netdev, linux-kernel
In-Reply-To: <20170102233410.4070-1-wr0112358@gmail.com>
From: Reiter Wolfgang <wr0112358@gmail.com>
Date: Tue, 3 Jan 2017 00:34:10 +0100
> Final nlmsg_len field update must reflect inserted net_dm_drop_point
> data.
>
> This patch depends on previous patch:
> "drop_monitor: add missing call to genlmsg_end"
>
> Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
Several coding style errors:
> @@ -112,6 +111,12 @@ static struct sk_buff *reset_per_cpu_data(struct per_cpu_dm_data *data)
> swap(data->skb, skb);
> spin_unlock_irqrestore(&data->lock, flags);
>
> + if(skb) {
There must be a space between "if" and "(skb)"
> + struct nlmsghdr *nlh = (struct nlmsghdr *)skb->data;
> + struct genlmsghdr *gnlh = (struct genlmsghdr *)nlmsg_data(nlh);
> + genlmsg_end(skb, genlmsg_data(gnlh));
> + }
There should be an empty line between the local variable declarations
and actual code.
^ permalink raw reply
* Re: [PATCH v2 1/2] net: mdio: add mdio45_ethtool_ksettings_get
From: David Miller @ 2017-01-03 0:30 UTC (permalink / raw)
To: tremyfr
Cc: linux-net-drivers, ecree, bkenward, andrew, f.fainelli, netdev,
linux-kernel
In-Reply-To: <1483293766-4581-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sun, 1 Jan 2017 19:02:45 +0100
> There is a function in mdio for the old ethtool api gset.
> We add a new function mdio45_ethtool_ksettings_get for the
> new ethtool api glinksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
> ---
> Changelog:
> v2:
> - simplify the code of ef4_ethtool_get_link_ksettings
> (feedback from Bert Kenward)
Applied.
^ permalink raw reply
* Re: [PATCH v2 2/2] net: sfc: falcon: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:30 UTC (permalink / raw)
To: tremyfr
Cc: linux-net-drivers, ecree, bkenward, andrew, f.fainelli, netdev,
linux-kernel
In-Reply-To: <1483293766-4581-2-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sun, 1 Jan 2017 19:02:46 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
> ---
> Changelog:
> v2:
> - simplify the code of ef4_ethtool_get_link_ksettings
> (feedback from Bert Kenward)
Applied.
^ permalink raw reply
* Re: [PATCH] net: dec: de2104x: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:31 UTC (permalink / raw)
To: tremyfr; +Cc: jarod, netdev, linux-parisc, linux-kernel
In-Reply-To: <1483293938-5227-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sun, 1 Jan 2017 19:05:38 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: dec: uli526x: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:31 UTC (permalink / raw)
To: tremyfr; +Cc: mugunthanvnm, a, fw, jarod, netdev, linux-parisc, linux-kernel
In-Reply-To: <1483294266-6222-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sun, 1 Jan 2017 19:11:06 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: dec: winbond-840: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:31 UTC (permalink / raw)
To: tremyfr; +Cc: mugunthanvnm, a, fw, jarod, netdev, linux-parisc, linux-kernel
In-Reply-To: <1483300021-21783-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sun, 1 Jan 2017 20:47:01 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: dlink: dl2k: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:31 UTC (permalink / raw)
To: tremyfr; +Cc: mugunthanvnm, a, fw, jarod, netdev, linux-kernel
In-Reply-To: <1483300166-22297-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sun, 1 Jan 2017 20:49:26 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> The previous implementation of set_settings was modifying
> the value of speed and duplex, but with the new API, it's not
> possible. The structure ethtool_link_ksettings is defined
> as const.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: dlink: sundance: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:31 UTC (permalink / raw)
To: tremyfr; +Cc: kda, netdev, linux-kernel
In-Reply-To: <1483300332-22834-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Sun, 1 Jan 2017 20:52:12 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH] net: emulex: benet: use new api ethtool_{get|set}_link_ksettings
From: David Miller @ 2017-01-03 0:31 UTC (permalink / raw)
To: tremyfr
Cc: sathya.perla, ajit.khaparde, sriharsha.basavapatna, somnath.kotur,
netdev, linux-kernel
In-Reply-To: <1483375334-28056-1-git-send-email-tremyfr@gmail.com>
From: Philippe Reynes <tremyfr@gmail.com>
Date: Mon, 2 Jan 2017 17:42:14 +0100
> The ethtool api {get|set}_settings is deprecated.
> We move this driver to new api {get|set}_link_ksettings.
>
> Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Applied.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox