Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next 6/7] bnxt_en: Refactor the driver registration function with firmware.
From: Michael Chan @ 2016-12-06 17:09 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	selvin.xavier-dY08KVG/lbpWk0Htik3J/w,
	somnath.kotur-dY08KVG/lbpWk0Htik3J/w,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1481044178-25193-1-git-send-email-michael.chan-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>

The driver register function with firmware consists of passing version
information and registering for async events.  To support the RDMA driver,
the async events that we need to register may change.  Separate the
driver register function into 2 parts so that we can just update the
async events for the RDMA driver.

Signed-off-by: Michael Chan <michael.chan-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 34 ++++++++++++++++++++++++++-----
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  2 ++
 2 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 7218d65..c26735ea 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3117,27 +3117,46 @@ int hwrm_send_message_silent(struct bnxt *bp, void *msg, u32 msg_len,
 	return rc;
 }
 
-static int bnxt_hwrm_func_drv_rgtr(struct bnxt *bp)
+int bnxt_hwrm_func_rgtr_async_events(struct bnxt *bp, unsigned long *bmap,
+				     int bmap_size)
 {
 	struct hwrm_func_drv_rgtr_input req = {0};
-	int i;
 	DECLARE_BITMAP(async_events_bmap, 256);
 	u32 *events = (u32 *)async_events_bmap;
+	int i;
 
 	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_DRV_RGTR, -1, -1);
 
 	req.enables =
-		cpu_to_le32(FUNC_DRV_RGTR_REQ_ENABLES_OS_TYPE |
-			    FUNC_DRV_RGTR_REQ_ENABLES_VER |
-			    FUNC_DRV_RGTR_REQ_ENABLES_ASYNC_EVENT_FWD);
+		cpu_to_le32(FUNC_DRV_RGTR_REQ_ENABLES_ASYNC_EVENT_FWD);
 
 	memset(async_events_bmap, 0, sizeof(async_events_bmap));
 	for (i = 0; i < ARRAY_SIZE(bnxt_async_events_arr); i++)
 		__set_bit(bnxt_async_events_arr[i], async_events_bmap);
 
+	if (bmap && bmap_size) {
+		for (i = 0; i < bmap_size; i++) {
+			if (test_bit(i, bmap))
+				__set_bit(i, async_events_bmap);
+		}
+	}
+
 	for (i = 0; i < 8; i++)
 		req.async_event_fwd[i] |= cpu_to_le32(events[i]);
 
+	return hwrm_send_message(bp, &req, sizeof(req), HWRM_CMD_TIMEOUT);
+}
+
+static int bnxt_hwrm_func_drv_rgtr(struct bnxt *bp)
+{
+	struct hwrm_func_drv_rgtr_input req = {0};
+
+	bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_FUNC_DRV_RGTR, -1, -1);
+
+	req.enables =
+		cpu_to_le32(FUNC_DRV_RGTR_REQ_ENABLES_OS_TYPE |
+			    FUNC_DRV_RGTR_REQ_ENABLES_VER);
+
 	req.os_type = cpu_to_le16(FUNC_DRV_RGTR_REQ_OS_TYPE_LINUX);
 	req.ver_maj = DRV_VER_MAJ;
 	req.ver_min = DRV_VER_MIN;
@@ -3146,6 +3165,7 @@ static int bnxt_hwrm_func_drv_rgtr(struct bnxt *bp)
 	if (BNXT_PF(bp)) {
 		DECLARE_BITMAP(vf_req_snif_bmap, 256);
 		u32 *data = (u32 *)vf_req_snif_bmap;
+		int i;
 
 		memset(vf_req_snif_bmap, 0, sizeof(vf_req_snif_bmap));
 		for (i = 0; i < ARRAY_SIZE(bnxt_vf_req_snif); i++)
@@ -7023,6 +7043,10 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (rc)
 		goto init_err;
 
+	rc = bnxt_hwrm_func_rgtr_async_events(bp, NULL, 0);
+	if (rc)
+		goto init_err;
+
 	/* Get the MAX capabilities for this function */
 	rc = bnxt_hwrm_func_qcaps(bp);
 	if (rc) {
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index d796836..eec2415 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1240,6 +1240,8 @@ static inline void bnxt_disable_poll(struct bnxt_napi *bnapi)
 int _hwrm_send_message(struct bnxt *, void *, u32, int);
 int hwrm_send_message(struct bnxt *, void *, u32, int);
 int hwrm_send_message_silent(struct bnxt *, void *, u32, int);
+int bnxt_hwrm_func_rgtr_async_events(struct bnxt *bp, unsigned long *bmap,
+				     int bmap_size);
 int bnxt_hwrm_set_coal(struct bnxt *);
 unsigned int bnxt_get_max_func_stat_ctxs(struct bnxt *bp);
 unsigned int bnxt_get_max_func_cp_rings(struct bnxt *bp);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH net-next 5/7] bnxt_en: Reserve RDMA resources by default.
From: Michael Chan @ 2016-12-06 17:09 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	selvin.xavier-dY08KVG/lbpWk0Htik3J/w,
	somnath.kotur-dY08KVG/lbpWk0Htik3J/w,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1481044178-25193-1-git-send-email-michael.chan-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>

If the device supports RDMA, we'll setup network default rings so that
there are enough minimum resources for RDMA, if possible.  However, the
user can still increase network rings to the max if he wants.  The actual
RDMA resources won't be reserved until the RDMA driver registers.

Signed-off-by: Somnath Kotur <somnath.kotur-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Signed-off-by: Michael Chan <michael.chan-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 58 ++++++++++++++++++++++++++++++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  9 +++++
 2 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 1f6be83..7218d65 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4166,6 +4166,11 @@ static int bnxt_hwrm_func_qcaps(struct bnxt *bp)
 	if (rc)
 		goto hwrm_func_qcaps_exit;
 
+	if (resp->flags & cpu_to_le32(FUNC_QCAPS_RESP_FLAGS_ROCE_V1_SUPPORTED))
+		bp->flags |= BNXT_FLAG_ROCEV1_CAP;
+	if (resp->flags & cpu_to_le32(FUNC_QCAPS_RESP_FLAGS_ROCE_V2_SUPPORTED))
+		bp->flags |= BNXT_FLAG_ROCEV2_CAP;
+
 	bp->tx_push_thresh = 0;
 	if (resp->flags &
 	    cpu_to_le32(FUNC_QCAPS_RESP_FLAGS_PUSH_MODE_SUPPORTED))
@@ -4808,6 +4813,24 @@ static int bnxt_setup_int_mode(struct bnxt *bp)
 	return rc;
 }
 
+unsigned int bnxt_get_max_func_stat_ctxs(struct bnxt *bp)
+{
+	if (BNXT_PF(bp))
+		return bp->pf.max_stat_ctxs;
+#if defined(CONFIG_BNXT_SRIOV)
+	return bp->vf.max_stat_ctxs;
+#endif
+}
+
+unsigned int bnxt_get_max_func_cp_rings(struct bnxt *bp)
+{
+	if (BNXT_PF(bp))
+		return bp->pf.max_cp_rings;
+#if defined(CONFIG_BNXT_SRIOV)
+	return bp->vf.max_cp_rings;
+#endif
+}
+
 static unsigned int bnxt_get_max_func_irqs(struct bnxt *bp)
 {
 	if (BNXT_PF(bp))
@@ -6832,6 +6855,39 @@ int bnxt_get_max_rings(struct bnxt *bp, int *max_rx, int *max_tx, bool shared)
 	return bnxt_trim_rings(bp, max_rx, max_tx, cp, shared);
 }
 
+static int bnxt_get_dflt_rings(struct bnxt *bp, int *max_rx, int *max_tx,
+			       bool shared)
+{
+	int rc;
+
+	rc = bnxt_get_max_rings(bp, max_rx, max_tx, shared);
+	if (rc)
+		return rc;
+
+	if (bp->flags & BNXT_FLAG_ROCE_CAP) {
+		int max_cp, max_stat, max_irq;
+
+		/* Reserve minimum resources for RoCE */
+		max_cp = bnxt_get_max_func_cp_rings(bp);
+		max_stat = bnxt_get_max_func_stat_ctxs(bp);
+		max_irq = bnxt_get_max_func_irqs(bp);
+		if (max_cp <= BNXT_MIN_ROCE_CP_RINGS ||
+		    max_irq <= BNXT_MIN_ROCE_CP_RINGS ||
+		    max_stat <= BNXT_MIN_ROCE_STAT_CTXS)
+			return 0;
+
+		max_cp -= BNXT_MIN_ROCE_CP_RINGS;
+		max_irq -= BNXT_MIN_ROCE_CP_RINGS;
+		max_stat -= BNXT_MIN_ROCE_STAT_CTXS;
+		max_cp = min_t(int, max_cp, max_irq);
+		max_cp = min_t(int, max_cp, max_stat);
+		rc = bnxt_trim_rings(bp, max_rx, max_tx, max_cp, shared);
+		if (rc)
+			rc = 0;
+	}
+	return rc;
+}
+
 static int bnxt_set_dflt_rings(struct bnxt *bp)
 {
 	int dflt_rings, max_rx_rings, max_tx_rings, rc;
@@ -6840,7 +6896,7 @@ static int bnxt_set_dflt_rings(struct bnxt *bp)
 	if (sh)
 		bp->flags |= BNXT_FLAG_SHARED_RINGS;
 	dflt_rings = netif_get_num_default_rss_queues();
-	rc = bnxt_get_max_rings(bp, &max_rx_rings, &max_tx_rings, sh);
+	rc = bnxt_get_dflt_rings(bp, &max_rx_rings, &max_tx_rings, sh);
 	if (rc)
 		return rc;
 	bp->rx_nr_rings = min_t(int, dflt_rings, max_rx_rings);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 43a4b17..d796836 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -387,6 +387,9 @@ struct rx_tpa_end_cmp_ext {
 #define DB_KEY_TX_PUSH						(0x4 << 28)
 #define DB_LONG_TX_PUSH						(0x2 << 24)
 
+#define BNXT_MIN_ROCE_CP_RINGS	2
+#define BNXT_MIN_ROCE_STAT_CTXS	1
+
 #define INVALID_HW_RING_ID	((u16)-1)
 
 /* The hardware supports certain page sizes.  Use the supported page sizes
@@ -953,6 +956,10 @@ struct bnxt {
 	#define BNXT_FLAG_PORT_STATS	0x400
 	#define BNXT_FLAG_UDP_RSS_CAP	0x800
 	#define BNXT_FLAG_EEE_CAP	0x1000
+	#define BNXT_FLAG_ROCEV1_CAP	0x8000
+	#define BNXT_FLAG_ROCEV2_CAP	0x10000
+	#define BNXT_FLAG_ROCE_CAP	(BNXT_FLAG_ROCEV1_CAP |	\
+					 BNXT_FLAG_ROCEV2_CAP)
 	#define BNXT_FLAG_CHIP_NITRO_A0	0x1000000
 
 	#define BNXT_FLAG_ALL_CONFIG_FEATS (BNXT_FLAG_TPA |		\
@@ -1234,6 +1241,8 @@ static inline void bnxt_disable_poll(struct bnxt_napi *bnapi)
 int hwrm_send_message(struct bnxt *, void *, u32, int);
 int hwrm_send_message_silent(struct bnxt *, void *, u32, int);
 int bnxt_hwrm_set_coal(struct bnxt *);
+unsigned int bnxt_get_max_func_stat_ctxs(struct bnxt *bp);
+unsigned int bnxt_get_max_func_cp_rings(struct bnxt *bp);
 void bnxt_set_max_func_irqs(struct bnxt *bp, unsigned int max);
 void bnxt_tx_disable(struct bnxt *bp);
 void bnxt_tx_enable(struct bnxt *bp);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH net-next 2/7] bnxt_en: Enable MSIX early in bnxt_init_one().
From: Michael Chan @ 2016-12-06 17:09 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	selvin.xavier-dY08KVG/lbpWk0Htik3J/w,
	somnath.kotur-dY08KVG/lbpWk0Htik3J/w,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1481044178-25193-1-git-send-email-michael.chan-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>

To better support the new RDMA driver, we need to move pci_enable_msix()
from bnxt_open() to bnxt_init_one().  This way, MSIX vectors are available
to the RDMA driver whether the network device is up or down.

Part of the existing bnxt_setup_int_mode() function is now refactored into
a new bnxt_init_int_mode().  bnxt_init_int_mode() is called during
bnxt_init_one() to enable MSIX.  The remaining logic in
bnxt_setup_int_mode() to map the IRQs to the completion rings is called
during bnxt_open().

Signed-off-by: Somnath Kotur <somnath.kotur-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Signed-off-by: Michael Chan <michael.chan-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 183 +++++++++++++++++++-----------
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |   1 +
 2 files changed, 115 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 6cdfe3e..9178bf8 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4743,6 +4743,80 @@ static int bnxt_trim_rings(struct bnxt *bp, int *rx, int *tx, int max,
 	return 0;
 }
 
+static void bnxt_setup_msix(struct bnxt *bp)
+{
+	const int len = sizeof(bp->irq_tbl[0].name);
+	struct net_device *dev = bp->dev;
+	int tcs, i;
+
+	tcs = netdev_get_num_tc(dev);
+	if (tcs > 1) {
+		bp->tx_nr_rings_per_tc = bp->tx_nr_rings / tcs;
+		if (bp->tx_nr_rings_per_tc == 0) {
+			netdev_reset_tc(dev);
+			bp->tx_nr_rings_per_tc = bp->tx_nr_rings;
+		} else {
+			int i, off, count;
+
+			bp->tx_nr_rings = bp->tx_nr_rings_per_tc * tcs;
+			for (i = 0; i < tcs; i++) {
+				count = bp->tx_nr_rings_per_tc;
+				off = i * count;
+				netdev_set_tc_queue(dev, i, count, off);
+			}
+		}
+	}
+
+	for (i = 0; i < bp->cp_nr_rings; i++) {
+		char *attr;
+
+		if (bp->flags & BNXT_FLAG_SHARED_RINGS)
+			attr = "TxRx";
+		else if (i < bp->rx_nr_rings)
+			attr = "rx";
+		else
+			attr = "tx";
+
+		snprintf(bp->irq_tbl[i].name, len, "%s-%s-%d", dev->name, attr,
+			 i);
+		bp->irq_tbl[i].handler = bnxt_msix;
+	}
+}
+
+static void bnxt_setup_inta(struct bnxt *bp)
+{
+	const int len = sizeof(bp->irq_tbl[0].name);
+
+	if (netdev_get_num_tc(bp->dev))
+		netdev_reset_tc(bp->dev);
+
+	snprintf(bp->irq_tbl[0].name, len, "%s-%s-%d", bp->dev->name, "TxRx",
+		 0);
+	bp->irq_tbl[0].handler = bnxt_inta;
+}
+
+static int bnxt_setup_int_mode(struct bnxt *bp)
+{
+	int rc;
+
+	if (bp->flags & BNXT_FLAG_USING_MSIX)
+		bnxt_setup_msix(bp);
+	else
+		bnxt_setup_inta(bp);
+
+	rc = bnxt_set_real_num_queues(bp);
+	return rc;
+}
+
+static unsigned int bnxt_get_max_func_irqs(struct bnxt *bp)
+{
+	if (BNXT_PF(bp))
+		return bp->pf.max_irqs;
+#if defined(CONFIG_BNXT_SRIOV)
+	return bp->vf.max_irqs;
+#endif
+}
+
 void bnxt_set_max_func_irqs(struct bnxt *bp, unsigned int max_irqs)
 {
 	if (BNXT_PF(bp))
@@ -4753,16 +4827,12 @@ void bnxt_set_max_func_irqs(struct bnxt *bp, unsigned int max_irqs)
 #endif
 }
 
-static int bnxt_setup_msix(struct bnxt *bp)
+static int bnxt_init_msix(struct bnxt *bp)
 {
-	struct msix_entry *msix_ent;
-	struct net_device *dev = bp->dev;
 	int i, total_vecs, rc = 0, min = 1;
-	const int len = sizeof(bp->irq_tbl[0].name);
-
-	bp->flags &= ~BNXT_FLAG_USING_MSIX;
-	total_vecs = bp->cp_nr_rings;
+	struct msix_entry *msix_ent;
 
+	total_vecs = bnxt_get_max_func_irqs(bp);
 	msix_ent = kcalloc(total_vecs, sizeof(struct msix_entry), GFP_KERNEL);
 	if (!msix_ent)
 		return -ENOMEM;
@@ -4783,8 +4853,10 @@ static int bnxt_setup_msix(struct bnxt *bp)
 
 	bp->irq_tbl = kcalloc(total_vecs, sizeof(struct bnxt_irq), GFP_KERNEL);
 	if (bp->irq_tbl) {
-		int tcs;
+		for (i = 0; i < total_vecs; i++)
+			bp->irq_tbl[i].vector = msix_ent[i].vector;
 
+		bp->total_irqs = total_vecs;
 		/* Trim rings based upon num of vectors allocated */
 		rc = bnxt_trim_rings(bp, &bp->rx_nr_rings, &bp->tx_nr_rings,
 				     total_vecs, min == 1);
@@ -4792,43 +4864,10 @@ static int bnxt_setup_msix(struct bnxt *bp)
 			goto msix_setup_exit;
 
 		bp->tx_nr_rings_per_tc = bp->tx_nr_rings;
-		tcs = netdev_get_num_tc(dev);
-		if (tcs > 1) {
-			bp->tx_nr_rings_per_tc = bp->tx_nr_rings / tcs;
-			if (bp->tx_nr_rings_per_tc == 0) {
-				netdev_reset_tc(dev);
-				bp->tx_nr_rings_per_tc = bp->tx_nr_rings;
-			} else {
-				int i, off, count;
+		bp->cp_nr_rings = (min == 1) ?
+				  max_t(int, bp->tx_nr_rings, bp->rx_nr_rings) :
+				  bp->tx_nr_rings + bp->rx_nr_rings;
 
-				bp->tx_nr_rings = bp->tx_nr_rings_per_tc * tcs;
-				for (i = 0; i < tcs; i++) {
-					count = bp->tx_nr_rings_per_tc;
-					off = i * count;
-					netdev_set_tc_queue(dev, i, count, off);
-				}
-			}
-		}
-		bp->cp_nr_rings = total_vecs;
-
-		for (i = 0; i < bp->cp_nr_rings; i++) {
-			char *attr;
-
-			bp->irq_tbl[i].vector = msix_ent[i].vector;
-			if (bp->flags & BNXT_FLAG_SHARED_RINGS)
-				attr = "TxRx";
-			else if (i < bp->rx_nr_rings)
-				attr = "rx";
-			else
-				attr = "tx";
-
-			snprintf(bp->irq_tbl[i].name, len,
-				 "%s-%s-%d", dev->name, attr, i);
-			bp->irq_tbl[i].handler = bnxt_msix;
-		}
-		rc = bnxt_set_real_num_queues(bp);
-		if (rc)
-			goto msix_setup_exit;
 	} else {
 		rc = -ENOMEM;
 		goto msix_setup_exit;
@@ -4838,52 +4877,54 @@ static int bnxt_setup_msix(struct bnxt *bp)
 	return 0;
 
 msix_setup_exit:
-	netdev_err(bp->dev, "bnxt_setup_msix err: %x\n", rc);
+	netdev_err(bp->dev, "bnxt_init_msix err: %x\n", rc);
+	kfree(bp->irq_tbl);
+	bp->irq_tbl = NULL;
 	pci_disable_msix(bp->pdev);
 	kfree(msix_ent);
 	return rc;
 }
 
-static int bnxt_setup_inta(struct bnxt *bp)
+static int bnxt_init_inta(struct bnxt *bp)
 {
-	int rc;
-	const int len = sizeof(bp->irq_tbl[0].name);
-
-	if (netdev_get_num_tc(bp->dev))
-		netdev_reset_tc(bp->dev);
-
 	bp->irq_tbl = kcalloc(1, sizeof(struct bnxt_irq), GFP_KERNEL);
-	if (!bp->irq_tbl) {
-		rc = -ENOMEM;
-		return rc;
-	}
+	if (!bp->irq_tbl)
+		return -ENOMEM;
+
+	bp->total_irqs = 1;
 	bp->rx_nr_rings = 1;
 	bp->tx_nr_rings = 1;
 	bp->cp_nr_rings = 1;
 	bp->tx_nr_rings_per_tc = bp->tx_nr_rings;
 	bp->flags |= BNXT_FLAG_SHARED_RINGS;
 	bp->irq_tbl[0].vector = bp->pdev->irq;
-	snprintf(bp->irq_tbl[0].name, len,
-		 "%s-%s-%d", bp->dev->name, "TxRx", 0);
-	bp->irq_tbl[0].handler = bnxt_inta;
-	rc = bnxt_set_real_num_queues(bp);
-	return rc;
+	return 0;
 }
 
-static int bnxt_setup_int_mode(struct bnxt *bp)
+static int bnxt_init_int_mode(struct bnxt *bp)
 {
 	int rc = 0;
 
 	if (bp->flags & BNXT_FLAG_MSIX_CAP)
-		rc = bnxt_setup_msix(bp);
+		rc = bnxt_init_msix(bp);
 
 	if (!(bp->flags & BNXT_FLAG_USING_MSIX) && BNXT_PF(bp)) {
 		/* fallback to INTA */
-		rc = bnxt_setup_inta(bp);
+		rc = bnxt_init_inta(bp);
 	}
 	return rc;
 }
 
+static void bnxt_clear_int_mode(struct bnxt *bp)
+{
+	if (bp->flags & BNXT_FLAG_USING_MSIX)
+		pci_disable_msix(bp->pdev);
+
+	kfree(bp->irq_tbl);
+	bp->irq_tbl = NULL;
+	bp->flags &= ~BNXT_FLAG_USING_MSIX;
+}
+
 static void bnxt_free_irq(struct bnxt *bp)
 {
 	struct bnxt_irq *irq;
@@ -4902,10 +4943,6 @@ static void bnxt_free_irq(struct bnxt *bp)
 			free_irq(irq->vector, bp->bnapi[i]);
 		irq->requested = 0;
 	}
-	if (bp->flags & BNXT_FLAG_USING_MSIX)
-		pci_disable_msix(bp->pdev);
-	kfree(bp->irq_tbl);
-	bp->irq_tbl = NULL;
 }
 
 static int bnxt_request_irq(struct bnxt *bp)
@@ -6695,6 +6732,7 @@ static void bnxt_remove_one(struct pci_dev *pdev)
 	cancel_work_sync(&bp->sp_task);
 	bp->sp_event = 0;
 
+	bnxt_clear_int_mode(bp);
 	bnxt_hwrm_func_drv_unrgtr(bp);
 	bnxt_free_hwrm_resources(bp);
 	bnxt_dcb_free(bp);
@@ -6990,10 +7028,14 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (rc)
 		goto init_err;
 
-	rc = register_netdev(dev);
+	rc = bnxt_init_int_mode(bp);
 	if (rc)
 		goto init_err;
 
+	rc = register_netdev(dev);
+	if (rc)
+		goto init_err_clr_int;
+
 	netdev_info(dev, "%s found at mem %lx, node addr %pM\n",
 		    board_info[ent->driver_data].name,
 		    (long)pci_resource_start(pdev, 0), dev->dev_addr);
@@ -7002,6 +7044,9 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	return 0;
 
+init_err_clr_int:
+	bnxt_clear_int_mode(bp);
+
 init_err:
 	pci_iounmap(pdev, bp->bar0);
 	pci_release_regions(pdev);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 8327d0d..1461355 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -1024,6 +1024,7 @@ struct bnxt {
 #define BNXT_STATE_FN_RST_DONE	2
 
 	struct bnxt_irq	*irq_tbl;
+	int			total_irqs;
 	u8			mac_addr[ETH_ALEN];
 
 #ifdef CONFIG_BNXT_DCB
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] net/udp: do not touch skb->peeked unless really needed
From: Paolo Abeni @ 2016-12-06 17:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Willem de Bruijn
In-Reply-To: <1481020451.6225.38.camel@redhat.com>

On Tue, 2016-12-06 at 11:34 +0100, Paolo Abeni wrote:
> On Mon, 2016-12-05 at 09:57 -0800, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > In UDP recvmsg() path we currently access 3 cache lines from an skb
> > while holding receive queue lock, plus another one if packet is
> > dequeued, since we need to change skb->next->prev
> > 
> > 1st cache line (contains ->next/prev pointers, offsets 0x00 and 0x08)
> > 2nd cache line (skb->len & skb->peeked, offsets 0x80 and 0x8e)
> > 3rd cache line (skb->truesize/users, offsets 0xe0 and 0xe4)
> > 
> > skb->peeked is only needed to make sure 0-length packets are properly
> > handled while MSG_PEEK is operated.
> > 
> > I had first the intent to remove skb->peeked but the "MSG_PEEK at
> > non-zero offset" support added by Sam Kumar makes this not possible.
> > 
> > This patch avoids one cache line miss during the locked section, when
> > skb->len and skb->peeked do not have to be read.
> > 
> > It also avoids the skb_set_peeked() cost for non empty UDP datagrams.
> > 
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> Thank you for all the good work.
> 
> After all your improvement, I see the cacheline miss in inet_recvmsg()
> as a major perf offender for the user space process in the udp flood
> scenario due to skc_rxhash sharing the same sk_drops cacheline.
> 
> Using an udp-specific drop counter (and an sk_drops accessor to wrap
> sk_drops access where needed), we could avoid such cache miss. With that
> - patch for udp.h only below - I get 3% improvement on top of all the
> pending udp patches, and the gain should be more relevant after the 2
> queues rework. What do you think ?

Here follow what I'm experimenting. 

The 'pcflag' changes is not strictly needed, but it shrinks the udp_sock
struct a bit, so that the newly added cacheline does not create
additional holes - with my kconfig, at least. I can use a separate patch
for that chunk.
---
diff --git a/include/linux/udp.h b/include/linux/udp.h
index d1fd8cd..a21baaf 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -49,7 +49,11 @@ struct udp_sock {
 	unsigned int	 corkflag;	/* Cork is required */
 	__u8		 encap_type;	/* Is this an Encapsulation socket? */
 	unsigned char	 no_check6_tx:1,/* Send zero UDP6 checksums on TX? */
-			 no_check6_rx:1;/* Allow zero UDP6 checksums on RX? */
+			 no_check6_rx:1,/* Allow zero UDP6 checksums on RX? */
+			 pcflag:6;	/* UDP-Lite specific, moved here to */
+					/* fill an hole, marks socket as */
+					/* UDP-Lite if > 0    */
+
 	/*
 	 * Following member retains the information to create a UDP header
 	 * when the socket is uncorked.
@@ -64,8 +68,7 @@ struct udp_sock {
 #define UDPLITE_BIT      0x1  		/* set by udplite proto init function */
 #define UDPLITE_SEND_CC  0x2  		/* set via udplite setsockopt         */
 #define UDPLITE_RECV_CC  0x4		/* set via udplite setsocktopt        */
-	__u8		 pcflag;        /* marks socket as UDP-Lite if > 0    */
-	__u8		 unused[3];
+
 	/*
 	 * For encapsulation sockets.
 	 */
@@ -79,6 +82,9 @@ struct udp_sock {
 	int			(*gro_complete)(struct sock *sk,
 						struct sk_buff *skb,
 						int nhoff);
+
+	/* since we are prone to drops, avoid dirtying any sk cacheline */
+	atomic_t		drops ____cacheline_aligned_in_smp;
 };
 
 static inline struct udp_sock *udp_sk(const struct sock *sk)
@@ -106,6 +112,17 @@ static inline bool udp_get_no_check6_rx(struct sock *sk)
 	return udp_sk(sk)->no_check6_rx;
 }
 
+static inline int udp_drops_read(const struct sock *sk)
+{
      * +	return atomic_read(&udp_sk(sk)->drops);
+}
+
+static inline void
+udp_skb_set_dropcount(const struct sock *sk, struct sk_buff *skb)
+{
+	SOCK_SKB_CB(skb)->dropcount = udp_drops_read(sk);
+}
+
 #define udp_portaddr_for_each_entry(__sk, list) \
 	hlist_for_each_entry(__sk, list, __sk_common.skc_portaddr_node)
 
diff --git a/include/net/sock.h b/include/net/sock.h
index ed75dec..113e495 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2138,6 +2138,8 @@ struct sock_skb_cb {
 	SOCK_SKB_CB(skb)->dropcount = atomic_read(&sk->sk_drops);
 }
 
+int sk_drops_read(const struct sock *sk);
+
 static inline void sk_drops_add(struct sock *sk, const struct sk_buff *skb)
 {
 	int segs = max_t(u16, 1, skb_shinfo(skb)->gso_segs);
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index 6b10573..dc41727 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -9,6 +9,7 @@
 #include <net/sock.h>
 #include <linux/kernel.h>
 #include <linux/tcp.h>
+#include <linux/udp.h>
 #include <linux/workqueue.h>
 
 #include <linux/inet_diag.h>
@@ -19,6 +20,14 @@
 static DEFINE_MUTEX(sock_diag_table_mutex);
 static struct workqueue_struct *broadcast_wq;
 
+int sk_drops_read(const struct sock *sk)
+{
+	if (sk->sk_protocol == IPPROTO_UDP)
+		return udp_drops_read(sk);
+	else
+		return atomic_read(&sk->sk_drops);
+}
+
 static u64 sock_gen_cookie(struct sock *sk)
 {
 	while (1) {
@@ -67,7 +76,7 @@ int sock_diag_put_meminfo(struct sock *sk, struct sk_buff *skb, int attrtype)
 	mem[SK_MEMINFO_WMEM_QUEUED] = sk->sk_wmem_queued;
 	mem[SK_MEMINFO_OPTMEM] = atomic_read(&sk->sk_omem_alloc);
 	mem[SK_MEMINFO_BACKLOG] = sk->sk_backlog.len;
-	mem[SK_MEMINFO_DROPS] = atomic_read(&sk->sk_drops);
+	mem[SK_MEMINFO_DROPS] = sk_drops_read(sk);
 
 	return nla_put(skb, attrtype, sizeof(mem), &mem);
 }
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index fbd6b69..d7c4980 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1174,6 +1174,11 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,
 	return ret;
 }
 
+static void udp_drops_inc(struct sock *sk)
+{
+	atomic_inc(&udp_sk(sk)->drops);
+}
+
 /* fully reclaim rmem/fwd memory allocated for skb */
 static void udp_rmem_release(struct sock *sk, int size, int partial)
 {
@@ -1244,7 +1249,7 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb)
 	/* no need to setup a destructor, we will explicitly release the
 	 * forward allocated memory on dequeue
 	 */
-	sock_skb_set_dropcount(sk, skb);
+	udp_skb_set_dropcount(sk, skb);
 
 	__skb_queue_tail(list, skb);
 	spin_unlock(&list->lock);
@@ -1258,7 +1263,7 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb)
 	atomic_sub(skb->truesize, &sk->sk_rmem_alloc);
 
 drop:
-	atomic_inc(&sk->sk_drops);
+	udp_drops_inc(sk);
 	return err;
 }
 EXPORT_SYMBOL_GPL(__udp_enqueue_schedule_skb);
@@ -1319,7 +1324,7 @@ static int first_packet_length(struct sock *sk)
 				IS_UDPLITE(sk));
 		__UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS,
 				IS_UDPLITE(sk));
-		atomic_inc(&sk->sk_drops);
+		udp_drops_inc(sk);
 		__skb_unlink(skb, rcvq);
 		total += skb->truesize;
 		kfree_skb(skb);
@@ -1417,7 +1422,7 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int noblock,
 
 	if (unlikely(err)) {
 		if (!peeked) {
-			atomic_inc(&sk->sk_drops);
+			udp_drops_inc(sk);
 			UDP_INC_STATS(sock_net(sk),
 				      UDP_MIB_INERRORS, is_udplite);
 		}
@@ -1714,7 +1719,7 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	__UDP_INC_STATS(sock_net(sk), UDP_MIB_CSUMERRORS, is_udplite);
 drop:
 	__UDP_INC_STATS(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
-	atomic_inc(&sk->sk_drops);
+	udp_drops_inc(sk);
 	kfree_skb(skb);
 	return -1;
 }
@@ -1772,7 +1777,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 		nskb = skb_clone(skb, GFP_ATOMIC);
 
 		if (unlikely(!nskb)) {
-			atomic_inc(&sk->sk_drops);
+			udp_drops_inc(sk);
 			__UDP_INC_STATS(net, UDP_MIB_RCVBUFERRORS,
 					IS_UDPLITE(sk));
 			__UDP_INC_STATS(net, UDP_MIB_INERRORS,
@@ -2491,7 +2496,7 @@ static void udp4_format_sock(struct sock *sp, struct seq_file *f,
 		from_kuid_munged(seq_user_ns(f), sock_i_uid(sp)),
 		0, sock_i_ino(sp),
 		atomic_read(&sp->sk_refcnt), sp,
-		atomic_read(&sp->sk_drops));
+		udp_drops_read(sp));
 }
 
 int udp4_seq_show(struct seq_file *seq, void *v)
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index c5d76d2..9f46dff 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -1038,5 +1038,5 @@ void ip6_dgram_sock_seq_show(struct seq_file *seq, struct sock *sp,
 		   0,
 		   sock_i_ino(sp),
 		   atomic_read(&sp->sk_refcnt), sp,
-		   atomic_read(&sp->sk_drops));
+		   sk_drops_read(sp));
 }

^ permalink raw reply related

* [GIT] Networking
From: David Miller @ 2016-12-06 17:04 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


1) When dcbnl_cee_fill() fails to be able to push a new netlink attribute, it
   return 0 instead of an error code.  From Pan Bian.

2) Two suffix handling fixes to FIB trie code, from Alexander Duyck.

3) bnxt_hwrm_stat_ctx_alloc() goes through all the trouble of setting
   and maintaining a return code 'rc' but fails to actually return it.
   Also from Pan Bian.

4) ping socket ICMP handler needs to validate ICMP header length, from
   Kees Cook.

5) caif_sktinit_module() has this interesting logic:

	int err = sock_register(...);
	if (!err)
		return err;
	return 0;

   Just return sock_register()'s return value directly which is the only
   possible correct thing to do.

6) Two bnx2x driver fixes from Yuval Mintz, return a reasonable estimate
   from get_ringparam() ethtool op when interface is down and avoid trying
   to use UDP port based tunneling on 577xx chips.

7) Fix ep93xx_eth crash on module unload from Florian Fainelli.

8) Missing uapi exports, from Stephen Hemminger.

9) Don't schedule work from sk_destruct(), because the socket will be
   freed upon return from that function.  From Herbert Xu.

10) Buggy drivers, of which we know there is at least one, can send a huge
    packet into the TCP stack but forget to set the gso_size in the SKB,
    which causes all kinds of problems.

    Correct this when it happens, and emit a one-time warning with
    the device name included so that it can be diagnosed more easily.

    From Marcelo Ricardo Leitner.

11) virtio-net does DMA off the stack causes hiccups with VMAP_STACK,
    fix from Andy Lutomirski.

12) Fix fec driver compilation with CONFIG_M5272, from Nikita
    Yushchenko.

13) mlx5 fixes from Kamal Heib, Saeed Mahameed, and Mohamad Haj Yahia.
    (erroneously flushing queues on error, module parameter validation,
    etc.)

Please pull, thanks a lot!

The following changes since commit 8dc0f265d39a3933f4c1f846c7c694f12a2ab88a:

  Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc (2016-12-02 13:34:37 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 32f16e142d7acabad68ef27c123d0caf1548aac3:

  Merge branch 'mlx5-fixes' (2016-12-06 11:44:45 -0500)

----------------------------------------------------------------
Alexander Duyck (2):
      ipv4: Drop leaf from suffix pull/push functions
      ipv4: Drop suffix update from resize code

Andy Lutomirski (1):
      virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address()

David S. Miller (4):
      Merge tag 'batadv-net-for-davem-20161202' of git://git.open-mesh.org/linux-merge
      Merge branch 'fib-suffix-length-fixes'
      Merge branch 'bnx2x-fixes'
      Merge branch 'mlx5-fixes'

Florian Fainelli (1):
      net: ep93xx_eth: Do not crash unloading module

Herbert Xu (1):
      netlink: Do not schedule work from sk_destruct

Jonas Gorski (1):
      uapi glibc compat: fix outer guard of net device flags enum

Kamal Heib (3):
      net/mlx5: Verify module parameters
      net/mlx5: Remove duplicate pci dev name print
      net/mlx5: Fix query ISSI flow

Kees Cook (1):
      net: ping: check minimum size on ICMP header length

Marcelo Ricardo Leitner (1):
      tcp: warn on bogus MSS and try to amend it

Mintz, Yuval (2):
      bnx2x: Correct ringparam estimate when DOWN
      bnx2x: Prevent tunnel config for 577xx

Mohamad Haj Yahia (1):
      net/mlx5e: Change the SQ/RQ operational state to positive logic

Nikita Yushchenko (1):
      net: fec: fix compile with CONFIG_M5272

Niklas Cassel (1):
      net: stmmac: clear reset value of snps, wr_osr_lmt/snps, rd_osr_lmt before writing

Pan Bian (11):
      net: dcb: set error code on failures
      netdev: broadcom: propagate error code
      net: bridge: set error code on failure
      net: usb: set error code when usb_alloc_urb fails
      atm: lanai: set error code when ioremap fails
      net: caif: remove ineffective check
      net: irda: set error code on failures
      atm: fix improper return value
      net: ethernet: qlogic: set error code on failure
      net: bnx2x: fix improper return value
      isdn: hisax: set error code on failure

Saeed Mahameed (2):
      net/mlx5e: Don't notify HW when filling the edge of ICO SQ
      net/mlx5e: Don't flush SQ on error

Suraj Deshmukh (1):
      net: af_mpls.c add space before open parenthesis

Sven Eckelmann (1):
      batman-adv: Check for alloc errors when preparing TT local data

Venkat Duvvuru (1):
      be2net: Add DEVSEC privilege to SET_HSW_CONFIG command.

stephen hemminger (2):
      uapi: export tc_skbmod.h
      uapi: export nf_log.h

 drivers/atm/eni.c                                     |  2 +-
 drivers/atm/lanai.c                                   |  1 +
 drivers/isdn/hisax/hfc4s8s_l1.c                       |  1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c   |  8 ++++++++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c      |  5 +++--
 drivers/net/ethernet/broadcom/bnxt/bnxt.c             |  2 +-
 drivers/net/ethernet/cirrus/ep93xx_eth.c              |  4 ++++
 drivers/net/ethernet/emulex/benet/be_cmds.c           |  3 ++-
 drivers/net/ethernet/freescale/fec_main.c             | 13 ++++++++++---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c         |  5 -----
 drivers/net/ethernet/mellanox/mlx5/core/en.h          |  4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c     | 15 +++++++++------
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c       |  8 ++++----
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c       |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c     |  4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/main.c        | 42 +++++++++++++++++++++++++-----------------
 drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h   | 15 ++++++++++-----
 drivers/net/ethernet/qlogic/qed/qed_ll2.c             |  1 +
 drivers/net/ethernet/stmicro/stmmac/dwmac1000_dma.c   |  2 ++
 drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c      |  2 ++
 drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c |  6 ++++--
 drivers/net/irda/irda-usb.c                           |  1 +
 drivers/net/usb/lan78xx.c                             |  1 +
 drivers/net/virtio_net.c                              | 19 ++++++++++++++-----
 include/uapi/linux/if.h                               |  4 ++--
 include/uapi/linux/netfilter/Kbuild                   |  1 +
 include/uapi/linux/tc_act/Kbuild                      |  1 +
 net/batman-adv/translation-table.c                    |  4 ++--
 net/bridge/br_sysfs_br.c                              |  1 +
 net/caif/caif_socket.c                                |  5 +----
 net/dcb/dcbnl.c                                       |  1 +
 net/ipv4/fib_trie.c                                   | 68 +++++++++++++++++++++++++++++++++++---------------------------------
 net/ipv4/ping.c                                       |  4 ++++
 net/ipv4/tcp_input.c                                  | 22 +++++++++++++++++++++-
 net/mpls/af_mpls.c                                    |  2 +-
 net/netlink/af_netlink.c                              | 32 +++++++++++++++-----------------
 36 files changed, 194 insertions(+), 117 deletions(-)

^ permalink raw reply

* Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups
From: Andy Lutomirski @ 2016-12-06 17:01 UTC (permalink / raw)
  To: Tejun Heo
  Cc: John Stultz, Alexei Starovoitov, Andy Lutomirski,
	Mickaël Salaün, Daniel Mack, David S. Miller,
	kafai-b10kYP2dOMg, Florian Westphal, Harald Hoyer,
	Network Development, Sargun Dhillon, Pablo Neira Ayuso, lkml,
	Li Zefan, Jonathan Corbet, open list:CONTROL GROUP (CGROUP),
	Android Kernel Team, Rom Lemarchand, Colin Cross
In-Reply-To: <20161206165519.GA17648-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>

On Tue, Dec 6, 2016 at 8:55 AM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> Hello,
>
> On Mon, Dec 05, 2016 at 04:36:51PM -0800, Andy Lutomirski wrote:
>> I really don't know.  The cgroupfs interface is a bit unfortunate in
>> that it doesn't really express the constraints.  To safely migrate a
>> task, ISTM you ought to have some form of privilege over the task
>> *and* some form of privilege over the cgroup.  cgroupfs only handles
>> the latter.
>>
>> CAP_CGROUP_MIGRATE ought to be okay.  Or maybe cgroupfs needs to gain
>> a concept of "dangerous" cgroups and further restrict them and
>> CAP_SYS_RESOURCE should be fine for non-dangerous cgroups?  I think I
>> favor the latter, but it might be nice to hear from Tejun first.
>
> If we can't do CAP_SYS_RESOURCE due to overlaps, let's go with a
> separate CAP.  While for android and cgroup v1, it's nice to have a
> finer grained CAP for security control, privilege isolation in cgroup
> should also primarily done through hierarchical delegation.  It
> doesn't make sense to have another system in parallel.
>
> We can't do it properly on v1 because some controllers aren't properly
> hierarchical and delegation model isn't well defined.  e.g. nothing
> prevents a process from being pulled across different subtrees with
> the same delegation, but v2 can do it properly.  All that's necessary
> is to make the CAP test OR'd to other perm checks instead of AND'ing
> so that the CAP just allows overriding restrictions expressed through
> delegation but it's normally possible to move processes around in
> one's own delegated subtree.

How would one be granted the right to move processes around in one's
own subtree?

Are you imagining that, if you're in /a/b and you want to move a
process that's currently in /a/b/c to /a/b/d then you're allowed to
because the target process is in your tree?  If so, I doubt this has
the security properties you want -- namely, if you can cooperate with
anyone in /, even if they're unprivileged, you can break it.

^ permalink raw reply

* Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups
From: Tejun Heo @ 2016-12-06 16:57 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Andy Lutomirski, John Stultz, Alexei Starovoitov, Andy Lutomirski,
	Mickaël Salaün, Daniel Mack, David S. Miller,
	kafai-b10kYP2dOMg, Florian Westphal, Harald Hoyer,
	Network Development, Sargun Dhillon, Pablo Neira Ayuso, lkml,
	Li Zefan, Jonathan Corbet, open list:CONTROL GROUP (CGROUP),
	Android Kernel Team, Rom Lemarchand
In-Reply-To: <20161206020011.GA22261-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>

Hello, Serge.

On Mon, Dec 05, 2016 at 08:00:11PM -0600, Serge E. Hallyn wrote:
> > I really don't know.  The cgroupfs interface is a bit unfortunate in
> > that it doesn't really express the constraints.  To safely migrate a
> > task, ISTM you ought to have some form of privilege over the task
> > *and* some form of privilege over the cgroup.
> 
> Agreed.  The problem is that the privilege required should depend on
> the controller (I guess).  For memory and cpuset, CAP_SYS_NICE seems
> right.  Perhaps CAP_SYS_RESOURCE would be needed for some..  but then,
> as I look through the lists (capabilities(7) and the list of controllers),
> it seems like CAP_SYS_NICE works for everything.  What else would we need?
> Maybe CAP_NET_ADMIN for net_cls and net_prio?  CAP_SYS_RESOURCE|CAP_SYS_ADMIN
> for pids?

Please see my other reply but I don't think it's a good idea to have
these extra checks on the side when there already is hierarchical
delegation mechanism which should be able to handle both resource
control and process management delegation.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH] net: return value of skb_linearize should be handled in Linux kernel
From: Yuval Shaia @ 2016-12-06 16:57 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: faisal.latif-ral2JQCrhuEAvxtiuMwx3w,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w,
	jeffrey.t.kirsher-ral2JQCrhuEAvxtiuMwx3w,
	QLogic-Storage-Upstream-h88ZbnxC6KDQT0dZR+AlfA,
	jejb-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA,
	jth-DgEjT+Ai2ygdnm+yROfE0A, jon.maloy-IzeFyvvaP7pWk0Htik3J/w,
	ying.xue-CWA4WttNNZF54TAoqtyWWQ, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	intel-wired-lan-qjLDD68F18P21nG7glBr7A,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	fcoe-devel-s9riP+hp16TNLxjTenLetw,
	tipc-discussion-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
In-Reply-To: <1481008233-16777-1-git-send-email-zhouzhouyi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On Tue, Dec 06, 2016 at 03:10:33PM +0800, Zhouyi Zhou wrote:
> kmalloc_reserve may fail to allocate memory inside skb_linearize, 
> which means skb_linearize's return value should not be ignored. 
> Following patch correct the uses of skb_linearize.
> 
> Compiled in x86_64

FWIW compiled also on SPARC

Reviewed-by: Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

> 
> Signed-off-by: Zhouyi Zhou <zhouzhouyi-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> ---
>  drivers/infiniband/hw/nes/nes_nic.c           | 5 +++--
>  drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c | 6 +++++-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 +--
>  drivers/scsi/bnx2fc/bnx2fc_fcoe.c             | 7 +++++--
>  drivers/scsi/fcoe/fcoe.c                      | 5 ++++-
>  net/tipc/link.c                               | 3 ++-
>  net/tipc/name_distr.c                         | 5 ++++-
>  7 files changed, 24 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/nes/nes_nic.c b/drivers/infiniband/hw/nes/nes_nic.c
> index 2b27d13..69372ea 100644
> --- a/drivers/infiniband/hw/nes/nes_nic.c
> +++ b/drivers/infiniband/hw/nes/nes_nic.c
> @@ -662,10 +662,11 @@ static int nes_netdev_start_xmit(struct sk_buff *skb, struct net_device *netdev)
>  				nesnic->sq_head &= nesnic->sq_size-1;
>  			}
>  		} else {
> -			nesvnic->linearized_skbs++;
>  			hoffset = skb_transport_header(skb) - skb->data;
>  			nhoffset = skb_network_header(skb) - skb->data;
> -			skb_linearize(skb);
> +			if (skb_linearize(skb))
> +				return NETDEV_TX_BUSY;
> +			nesvnic->linearized_skbs++;
>  			skb_set_transport_header(skb, hoffset);
>  			skb_set_network_header(skb, nhoffset);
>  			if (!nes_nic_send(skb, netdev))
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
> index 2a653ec..ab787cb 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_fcoe.c
> @@ -490,7 +490,11 @@ int ixgbe_fcoe_ddp(struct ixgbe_adapter *adapter,
>  	 */
>  	if ((fh->fh_r_ctl == FC_RCTL_DD_SOL_DATA) &&
>  	    (fctl & FC_FC_END_SEQ)) {
> -		skb_linearize(skb);
> +		int err = 0;
> +
> +		err = skb_linearize(skb);
> +		if (err)
> +			return err;
>  		crc = (struct fcoe_crc_eof *)skb_put(skb, sizeof(*crc));
>  		crc->fcoe_eof = FC_EOF_T;
>  	}
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index fee1f29..4926d48 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -2173,8 +2173,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>  				total_rx_bytes += ddp_bytes;
>  				total_rx_packets += DIV_ROUND_UP(ddp_bytes,
>  								 mss);
> -			}
> -			if (!ddp_bytes) {
> +			} else {
>  				dev_kfree_skb_any(skb);
>  				continue;
>  			}
> diff --git a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
> index f9ddb61..197d02e 100644
> --- a/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
> +++ b/drivers/scsi/bnx2fc/bnx2fc_fcoe.c
> @@ -542,8 +542,11 @@ static void bnx2fc_recv_frame(struct sk_buff *skb)
>  		return;
>  	}
>  
> -	if (skb_is_nonlinear(skb))
> -		skb_linearize(skb);
> +	if (skb_linearize(skb)) {
> +		kfree_skb(skb);
> +		return;
> +	}
> +
>  	mac = eth_hdr(skb)->h_source;
>  	dest_mac = eth_hdr(skb)->h_dest;
>  
> diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
> index 9bd41a3..f691b97 100644
> --- a/drivers/scsi/fcoe/fcoe.c
> +++ b/drivers/scsi/fcoe/fcoe.c
> @@ -1685,7 +1685,10 @@ static void fcoe_recv_frame(struct sk_buff *skb)
>  			skb->dev ? skb->dev->name : "<NULL>");
>  
>  	port = lport_priv(lport);
> -	skb_linearize(skb); /* check for skb_is_nonlinear is within skb_linearize */
> +	if (skb_linearize(skb)) {
> +		kfree_skb(skb);
> +		return;
> +	}
>  
>  	/*
>  	 * Frame length checks and setting up the header pointers
> diff --git a/net/tipc/link.c b/net/tipc/link.c
> index bda89bf..077c570 100644
> --- a/net/tipc/link.c
> +++ b/net/tipc/link.c
> @@ -1446,7 +1446,8 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb,
>  	if (tipc_own_addr(l->net) > msg_prevnode(hdr))
>  		l->net_plane = msg_net_plane(hdr);
>  
> -	skb_linearize(skb);
> +	if (skb_linearize(skb))
> +		goto exit;
>  	hdr = buf_msg(skb);
>  	data = msg_data(hdr);
>  
> diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
> index c1cfd92..4e05d2a 100644
> --- a/net/tipc/name_distr.c
> +++ b/net/tipc/name_distr.c
> @@ -356,7 +356,10 @@ void tipc_named_rcv(struct net *net, struct sk_buff_head *inputq)
>  
>  	spin_lock_bh(&tn->nametbl_lock);
>  	for (skb = skb_dequeue(inputq); skb; skb = skb_dequeue(inputq)) {
> -		skb_linearize(skb);
> +		if (skb_linearize(skb)) {
> +			kfree_skb(skb);
> +			continue;
> +		}
>  		msg = buf_msg(skb);
>  		mtype = msg_type(msg);
>  		item = (struct distr_item *)msg_data(msg);
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups
From: Tejun Heo @ 2016-12-06 16:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: John Stultz, Alexei Starovoitov, Andy Lutomirski,
	Mickaël Salaün, Daniel Mack, David S. Miller, kafai,
	Florian Westphal, Harald Hoyer, Network Development,
	Sargun Dhillon, Pablo Neira Ayuso, lkml, Li Zefan,
	Jonathan Corbet, open list:CONTROL GROUP (CGROUP),
	Android Kernel Team, Rom Lemarchand, Colin Cross
In-Reply-To: <CALCETrWuKpmXQqoQmcy4Va8abdpfCnMBYHWSD0nK5pocj2nfXA@mail.gmail.com>

Hello,

On Mon, Dec 05, 2016 at 04:36:51PM -0800, Andy Lutomirski wrote:
> I really don't know.  The cgroupfs interface is a bit unfortunate in
> that it doesn't really express the constraints.  To safely migrate a
> task, ISTM you ought to have some form of privilege over the task
> *and* some form of privilege over the cgroup.  cgroupfs only handles
> the latter.
> 
> CAP_CGROUP_MIGRATE ought to be okay.  Or maybe cgroupfs needs to gain
> a concept of "dangerous" cgroups and further restrict them and
> CAP_SYS_RESOURCE should be fine for non-dangerous cgroups?  I think I
> favor the latter, but it might be nice to hear from Tejun first.

If we can't do CAP_SYS_RESOURCE due to overlaps, let's go with a
separate CAP.  While for android and cgroup v1, it's nice to have a
finer grained CAP for security control, privilege isolation in cgroup
should also primarily done through hierarchical delegation.  It
doesn't make sense to have another system in parallel.

We can't do it properly on v1 because some controllers aren't properly
hierarchical and delegation model isn't well defined.  e.g. nothing
prevents a process from being pulled across different subtrees with
the same delegation, but v2 can do it properly.  All that's necessary
is to make the CAP test OR'd to other perm checks instead of AND'ing
so that the CAP just allows overriding restrictions expressed through
delegation but it's normally possible to move processes around in
one's own delegated subtree.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH v2 net-next 3/4] mlx4: xdp: Reserve headroom for receiving packet when XDP prog is active
From: Saeed Mahameed @ 2016-12-06 16:50 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Linux Netdev List, Alexei Starovoitov, Brenden Blanco,
	Daniel Borkmann, David Miller, Jesper Dangaard Brouer,
	Saeed Mahameed, Tariq Toukan, Kernel Team
In-Reply-To: <20161205195519.GA29784@kafai-mba.local>

On Mon, Dec 5, 2016 at 9:55 PM, Martin KaFai Lau <kafai@fb.com> wrote:
> On Mon, Dec 05, 2016 at 02:54:06AM +0200, Saeed Mahameed wrote:
>> On Sun, Dec 4, 2016 at 5:17 AM, Martin KaFai Lau <kafai@fb.com> wrote:
>> > Reserve XDP_PACKET_HEADROOM and honor bpf_xdp_adjust_head()
>> > when XDP prog is active.  This patch only affects the code
>> > path when XDP is active.
>> >
>> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
>> > ---
>>
>> Hi Martin, Sorry for the late review, i have some comments below
>>
>> >  drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 17 +++++++++++++++--
>> >  drivers/net/ethernet/mellanox/mlx4/en_rx.c     | 23 +++++++++++++++++------
>> >  drivers/net/ethernet/mellanox/mlx4/en_tx.c     |  9 +++++----
>> >  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  3 ++-
>> >  4 files changed, 39 insertions(+), 13 deletions(-)
>> >
>> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> > index 311c14153b8b..094a13b52cf6 100644
>> > --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> > @@ -51,7 +51,8 @@
>> >  #include "mlx4_en.h"
>> >  #include "en_port.h"
>> >
>> > -#define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN)))
>> > +#define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) - \
>> > +                                  XDP_PACKET_HEADROOM))
>> >
>> >  int mlx4_en_setup_tc(struct net_device *dev, u8 up)
>> >  {
>> > @@ -1551,6 +1552,7 @@ int mlx4_en_start_port(struct net_device *dev)
>> >         struct mlx4_en_tx_ring *tx_ring;
>> >         int rx_index = 0;
>> >         int err = 0;
>> > +       int mtu;
>> >         int i, t;
>> >         int j;
>> >         u8 mc_list[16] = {0};
>> > @@ -1684,8 +1686,12 @@ int mlx4_en_start_port(struct net_device *dev)
>> >         }
>> >
>> >         /* Configure port */
>> > +       mtu = priv->rx_skb_size + ETH_FCS_LEN;
>> > +       if (priv->tx_ring_num[TX_XDP])
>> > +               mtu += XDP_PACKET_HEADROOM;
>> > +
>>
>> Why would the physical MTU care for the headroom you preserve for XDP prog?
>> This is the wire MTU, it shouldn't be changed, please keep it as
>> before, any preservation you make in packets buffers are needed only
>> for FWD case or modify case (HW or wire should not care about them).
>
> Thanks for your feedback!

Just doing my job :))

>
> FWD:
> packet received from a port
> => process by a XDP prog
> => XDP_TX out to the same port.
>
> For example, if the received packet has 1500 payload and the XDP prog
> encapsulates it in an IPv6 header (+40 bytes).  After testing, it cannot
> be sent out due to the HW/wire MTU is 1500.
>
> Even the wire MTU info was passed to the XDP prog, there is not much a
> XDP prog could do here other than dropping it.
>
> Hence, this patch gives guarantee to the XDP prog such that
> it can always send out what it has received + XDP_PACKET_HEADROOM.
>

Still i am not convinced ! this is against common sense,
this means that the XDP prog can send packets larger than the  MTU
seen on netdev!

anyway if a packet with the size (MTU + XDP_PACKET_HEADROOM) was sent
from XDP ring and HW allowed it to exit somehow (with the code you
provided :)), most likely it will be dropped
at the other end.

I still think XDP prog should not be allowed to FW packets larger than
the MTU seen on the netdev and you shouldn't modify the wire MTU just
for this case.

>>
>> >         err = mlx4_SET_PORT_general(mdev->dev, priv->port,
>> > -                                   priv->rx_skb_size + ETH_FCS_LEN,
>> > +                                   mtu,
>> >                                     priv->prof->tx_pause,
>> >                                     priv->prof->tx_ppp,
>> >                                     priv->prof->rx_pause,
>> > @@ -2255,6 +2261,13 @@ static bool mlx4_en_check_xdp_mtu(struct net_device *dev, int mtu)
>> >  {
>> >         struct mlx4_en_priv *priv = netdev_priv(dev);
>> >
>> > +       if (mtu + XDP_PACKET_HEADROOM > priv->max_mtu) {
>> > +               en_err(priv,
>> > +                      "Device max mtu:%d does not allow %d bytes reserved headroom for XDP prog\n",
>> > +                      priv->max_mtu, XDP_PACKET_HEADROOM);
>> > +               return false;
>> > +       }
>> > +
>> >         if (mtu > MLX4_EN_MAX_XDP_MTU) {
>> >                 en_err(priv, "mtu:%d > max:%d when XDP prog is attached\n",
>> >                        mtu, MLX4_EN_MAX_XDP_MTU);
>> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> > index 23e9d04d1ef4..324771ac929e 100644
>> > --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> > @@ -96,7 +96,6 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
>> >         struct mlx4_en_rx_alloc page_alloc[MLX4_EN_MAX_RX_FRAGS];
>> >         const struct mlx4_en_frag_info *frag_info;
>> >         struct page *page;
>> > -       dma_addr_t dma;
>> >         int i;
>> >
>> >         for (i = 0; i < priv->num_frags; i++) {
>> > @@ -115,9 +114,10 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
>> >
>> >         for (i = 0; i < priv->num_frags; i++) {
>> >                 frags[i] = ring_alloc[i];
>> > -               dma = ring_alloc[i].dma + ring_alloc[i].page_offset;
>> > +               frags[i].page_offset += priv->frag_info[i].rx_headroom;
>>
>> I don't see any need for headroom on frag_info other that frag0 (which
>> where the packet starts).
>> What is the meaning of a headroom of a frag in a middle of a packet ?
>>
>> if you agree with me then, you can use XDP_PACKET_HEADROOM as is where
>> needed (i.e frag0 page offset) and remove
>> "priv->frag_info[i].rx_headroom"
>>
>> ...
>>
>> After going through the code a little bit i see that this code is
>> shared between XDP and common path, and you didn't want to add boolean
>> conditions.
>>
>> Ok i see what you did here.
>>
>> Maybe we can pass headroom as a function parameter and split frag0
>> handling from the rest ?
>> If it is too much then i am ok with the code as it is,
> Right, this patch does the boolean check (XDP active or not) early on
> in mlx4_en_calc_rx_buf() (i.e. out of the fast path) and store
> the result in priv->frag_info[0].rx_headroom.
>
> Just want to ensure I understand your comment correctly.
> You prefer not to store the boolean test result in frag_info[0].rx_headroom
> since it is redundant to !!priv->tx_ring_num[TX_XDP] and rx_headroom is also
> confusing for frag[1-3].
>
> Instead, do the XDP [in]active test before calling mlx4_en_alloc_frags()
> and then only adjust frags[0].page_offset by +XDP_PACKET_HEADROOM if is needed.
> It could be done either by passing an extra argument to mlx4_en_alloc_frags()
> or completely separate mlx4_en_alloc_frags().  I am fine with this also.
>

Correct, but if this change will add extra checks to the data path
then I am ok with the current code.

>
>>
>> > +               rx_desc->data[i].addr = cpu_to_be64(frags[i].dma +
>> > +                                                   frags[i].page_offset);
>> >                 ring_alloc[i] = page_alloc[i];
>> > -               rx_desc->data[i].addr = cpu_to_be64(dma);
>> >         }
>> >
>> >         return 0;
>> > @@ -250,7 +250,8 @@ static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv *priv,
>> >
>> >         if (ring->page_cache.index > 0) {
>> >                 frags[0] = ring->page_cache.buf[--ring->page_cache.index];
>> > -               rx_desc->data[0].addr = cpu_to_be64(frags[0].dma);
>> > +               rx_desc->data[0].addr = cpu_to_be64(frags[0].dma +
>> > +                                                   frags[0].page_offset);
>> >                 return 0;
>> >         }
>> >
>> > @@ -889,6 +890,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>> >                 if (xdp_prog) {
>> >                         struct xdp_buff xdp;
>> >                         dma_addr_t dma;
>> > +                       void *pg_addr, *orig_data;
>> >                         u32 act;
>> >
>> >                         dma = be64_to_cpu(rx_desc->data[0].addr);
>> > @@ -896,11 +898,18 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>> >                                                 priv->frag_info[0].frag_size,
>> >                                                 DMA_FROM_DEVICE);
>> >
>> > -                       xdp.data = page_address(frags[0].page) +
>> > -                                                       frags[0].page_offset;
>> > +                       pg_addr = page_address(frags[0].page);
>> > +                       orig_data = pg_addr + frags[0].page_offset;
>> > +                       xdp.data = orig_data;
>> >                         xdp.data_end = xdp.data + length;
>> >
>> >                         act = bpf_prog_run_xdp(xdp_prog, &xdp);
>> > +
>> > +                       if (xdp.data != orig_data) {
>> > +                               length = xdp.data_end - xdp.data;
>> > +                               frags[0].page_offset = xdp.data - pg_addr;
>> > +                       }
>> > +
>> >
>>
>> is this needed only for XDP FWD case ?
> No. It is also for PASS.
>

I see.

>> is this the only way to detect that the user modified the packet
>> headers (comparing pointers, before and after) ?
> Yes
>
>>
>> if the answer is yes, it should be faster to unconditionally reset
>> packet offset and lenght on XDP_FWD :
>> case XDP_FWD:
>>    length = xdp.data_end - xdp.data;
>>    frags[0].page_offset = xdp.data - pg_addr;
>>
>>
>> >                         switch (act) {
>> >                         case XDP_PASS:
>> >                                 break;
>> > @@ -1180,6 +1189,7 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
>> >                  */
>> >                 priv->frag_info[0].frag_stride = PAGE_SIZE;
>> >                 priv->frag_info[0].dma_dir = PCI_DMA_BIDIRECTIONAL;
>> > +               priv->frag_info[0].rx_headroom = XDP_PACKET_HEADROOM;
>> >                 i = 1;
>> >         } else {
>> >                 int buf_size = 0;
>> > @@ -1194,6 +1204,7 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
>> >                                 ALIGN(priv->frag_info[i].frag_size,
>> >                                       SMP_CACHE_BYTES);
>> >                         priv->frag_info[i].dma_dir = PCI_DMA_FROMDEVICE;
>> > +                       priv->frag_info[i].rx_headroom = 0;
>>
>> IMHO, redundant. as you see here frag0 and other frags handling are
>> separated, maybe we can do the same in mlx4_en_alloc_frags.
>>
>> >                         buf_size += priv->frag_info[i].frag_size;
>> >                         i++;
>> >                 }
>> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> > index 4b597dca5c52..9e5f38cefe5f 100644
>> > --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
>> > @@ -354,7 +354,7 @@ u32 mlx4_en_recycle_tx_desc(struct mlx4_en_priv *priv,
>> >         struct mlx4_en_rx_alloc frame = {
>> >                 .page = tx_info->page,
>> >                 .dma = tx_info->map0_dma,
>> > -               .page_offset = 0,
>> > +               .page_offset = XDP_PACKET_HEADROOM,
>> >                 .page_size = PAGE_SIZE,
>> >         };
>> >
>> > @@ -1132,7 +1132,7 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
>> >         tx_info->page = frame->page;
>> >         frame->page = NULL;
>> >         tx_info->map0_dma = dma;
>> > -       tx_info->map0_byte_count = length;
>> > +       tx_info->map0_byte_count = length + frame->page_offset;
>>
>> Didn't you already take care of lenght by the following code:
>>                        if (xdp.data != orig_data) {
>>                                length = xdp.data_end - xdp.data;
>>                                frags[0].page_offset = xdp.data - pg_addr;
>>                         }
>>
> Before this patch, length always assumes the data starts at the beginning
> of the page and dma is the start of the page.  Hence, adding
> framg->page_offset back to the length here.
>
> However, if I read the codes correctly, I think the map0_byte_count (before or
> after this patch) does not matter since it is only used in dma_unmap_page() and
> PAGE_SIZE is always used in dma_unmap_page() for this code patch.  Hence, I think
> we can just set map0_byte_count to PAGE_SIZE here.
>

Right, in mlx4_alloc_pages we always map with PAGE_SIZE <<  order
 dma = dma_map_page(priv->ddev, page, 0, PAGE_SIZE << order,
  frag_info->dma_dir);
for XDP order is always 0, so you can safely set it to PAGE_SIZE.

>> and here  frame->page_offset is not really page offset, it can only be
>> XDP_PACKET_HEADROOM.
> Note that the XDP prog can call bpf_xdp_adjust_head() to add a header.
> The XDP prog can extend up to XDP_PACKET_HEADROOM (256) bytes but it
> can also (and usually) only add 40 bytes IPv6 header and then XDP_TX it out.
>

I see.

>>
>> >         tx_info->nr_txbb = nr_txbb;
>> >         tx_info->nr_bytes = max_t(unsigned int, length, ETH_ZLEN);
>> >         tx_info->data_offset = (void *)data - (void *)tx_desc;
>> > @@ -1141,9 +1141,10 @@ netdev_tx_t mlx4_en_xmit_frame(struct mlx4_en_rx_ring *rx_ring,
>> >         tx_info->linear = 1;
>> >         tx_info->inl = 0;
>> >
>> > -       dma_sync_single_for_device(priv->ddev, dma, length, PCI_DMA_TODEVICE);
>> > +       dma_sync_single_range_for_device(priv->ddev, dma, frame->page_offset,
>> > +                                        length, PCI_DMA_TODEVICE);
>> >
>> > -       data->addr = cpu_to_be64(dma);
>> > +       data->addr = cpu_to_be64(dma + frame->page_offset);
>> >         data->lkey = ring->mr_key;
>> >         dma_wmb();
>> >         data->byte_count = cpu_to_be32(length);
>> > diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
>> > index 20a936428f4a..ba1c6cd0cc79 100644
>> > --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
>> > +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
>> > @@ -475,7 +475,8 @@ struct mlx4_en_frag_info {
>> >         u16 frag_prefix_size;
>> >         u32 frag_stride;
>> >         enum dma_data_direction dma_dir;
>> > -       int order;
>> > +       u16 order;
>> > +       u16 rx_headroom;
>> >  };
>> >
>> >  #ifdef CONFIG_MLX4_EN_DCB
>> > --
>> > 2.5.1
>> >

^ permalink raw reply

* Re: Gigabit ethernet driver for Alacritechs SLIC devices (v4)
From: David Miller @ 2016-12-06 16:46 UTC (permalink / raw)
  To: gregkh
  Cc: LinoSanfilippo, charrer, liodot, andrew, roszenrami,
	markus.boehme, f.fainelli, devel, linux-kernel, netdev
In-Reply-To: <20161206164039.GA7091@kroah.com>

From: Greg KH <gregkh@linuxfoundation.org>
Date: Tue, 6 Dec 2016 17:40:39 +0100

> On Tue, Dec 06, 2016 at 11:30:04AM -0500, David Miller wrote:
>> From: Lino Sanfilippo <LinoSanfilippo@gmx.de>
>> Date: Mon,  5 Dec 2016 23:07:15 +0100
>> 
>> > this is the forth version of the slicoss gigabit ethernet driver (which is a
>> > rework of the driver from Alacritech which can currently be found under
>> > drivers/staging/slicoss). The driver is supposed to support Mojave, Oasis and
>> > Kalahari cards, for both copper and fiber.
>> > 
>> > If this code is accepted the staging version can be removed.
>> > 
>> > The driver has been tested on a SEN2104ET adapter (4 Port PCIe copper).
>> 
>> I've applied this series, nice work.
>> 
>> But realize that if you use SLICOSS as the Kconfig symbol to select
>> this driver, then while the staging driver is still in the tree it
>> will enable both drivers which both want to attach to the same exact
>> device IDs.
> 
> If you have taken this in your tree now, I will go delete the staging
> driver from the staging tree, so we will not have that issue.  Should I
> do that now?

Please do.

^ permalink raw reply

* Re: [PATCH v4 09/13] net: ethernet: ti: cpts: rework initialization/deinitialization
From: Grygorii Strashko @ 2016-12-06 16:45 UTC (permalink / raw)
  To: Richard Cochran
  Cc: David S. Miller, netdev-u79uwXL29TY76Z2rM5mHXA, Mugunthan V N,
	Sekhar Nori, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Murali Karicheri, Wingman Kwok,
	Thomas Gleixner
In-Reply-To: <20161206134037.GA15946-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>



On 12/06/2016 07:40 AM, Richard Cochran wrote:
> On Mon, Dec 05, 2016 at 02:05:21PM -0600, Grygorii Strashko wrote:
>> @@ -372,34 +354,27 @@ void cpts_tx_timestamp(struct cpts *cpts, struct sk_buff *skb)
>>  }
>>  EXPORT_SYMBOL_GPL(cpts_tx_timestamp);
>>  
>> -int cpts_register(struct device *dev, struct cpts *cpts,
>> -		  u32 mult, u32 shift)
>> +int cpts_register(struct cpts *cpts)
>>  {
>>  	int err, i;
>>  
>> -	cpts->info = cpts_info;
>> -	spin_lock_init(&cpts->lock);
>> -
>> -	cpts->cc.read = cpts_systim_read;
>> -	cpts->cc.mask = CLOCKSOURCE_MASK(32);
>> -	cpts->cc_mult = mult;
>> -	cpts->cc.mult = mult;
>> -	cpts->cc.shift = shift;
>> -
>>  	INIT_LIST_HEAD(&cpts->events);
>>  	INIT_LIST_HEAD(&cpts->pool);
>>  	for (i = 0; i < CPTS_MAX_EVENTS; i++)
>>  		list_add(&cpts->pool_data[i].list, &cpts->pool);
>>  
>> -	cpts_clk_init(dev, cpts);
>> +	clk_enable(cpts->refclk);
>> +
>>  	cpts_write32(cpts, CPTS_EN, control);
>>  	cpts_write32(cpts, TS_PEND_EN, int_enable);
>>  
>> +	/* reinitialize cc.mult to original value as it can be modified
>> +	 * by cpts_ptp_adjfreq().
>> +	 */
>> +	cpts->cc.mult = cpts->cc_mult;
> 
> This still isn't quite right.  First of all, you shouldn't clobber the
> learned cc.mult value in cpts_register().  Presumably, if PTP had been
> run on this port before, then the learned frequency is approximately
> correct, and it should be left alone.
> 
> [ BTW, resetting the timecounter here makes no sense either.  Why
>   reset the clock just because the interface goes down?  ]
> 

Huh. This is how it works now (even before my changes) - this is just refactoring!
(really new thing is the only cpts_calc_mult_shift()).

Also, this is how cpts is supported now as part of cpsw (and keystone):
configure cpsw (cpts)
- ifup
   cpsw (*soft_reset*, full reconfiguration of cpsw)
  (start cpts) - cpts/ptp active

- ifdown
   if last netdev - shutdown/poweroff cpsw (cpts)

in other words, cpts/ptp is expected to work once and until at least one cpsw netdev is active.

Also there are additional questions such as:
- is there guarantee that cpsw port will be connected to the same network after ifup?
- should there be possibility to reset cc.mult if it's value will be kept from the previous run?

> Secondly, you have made the initialization order of these fields hard
> to follow.  With the whole series applied:
> 
> probe()
> 	cpts_create()
> 		cpts_of_parse()
> 		{
> 			/* Set cc_mult but not cc.mult! */
> 			set cc_mult
> 			set cc.shift
> 		}
> 		cpts_calc_mult_shift()
> 		{
> 			/* Set them both. */
> 			cpts->cc_mult = mult;
> 			cpts->cc.mult = mult; 

^^ this assignment of cpts->cc.mult not required.

> 			cpts->cc.shift = shift;
			

only in case there were not set in DT before
(I have a requirement to support both - DT and cpts_calc_mult_shift and
 that introduces a bit of complexity)

Also, I've tried not to add more fields in struct cpts.

> 		}
> /* later on */
> cpts_register()
> 	cpts->cc.mult = cpts->cc_mult;
> 
> There is no need for such complexity.  Simply set cc.mult in
> cpts_create() _once_, immediately after the call to
> cpts_calc_mult_shift().
> 
> You can remove the assignment from cpts_calc_mult_shift() and
> cpts_register().

Just to clarify: do you propose to get rid of cpts->cc_mult at all?

static int cpts_ptp_adjfreq(struct ptp_clock_info *ptp, s32 ppb)
{
...
	if (ppb < 0) {
		neg_adj = 1;
		ppb = -ppb;
	}
	mult = cpts->cc_mult;
		^^^^^^^^^^^^^^
	adj = mult;
	adj *= ppb;
	diff = div_u64(adj, 1000000000ULL);
...
	cpts->cc.mult = neg_adj ? mult - diff : mult + diff;

Honestly, i'd not prefer to change functional behavior of ptp clock as part of
this series.

-- 
regards,
-grygorii
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net V2 0/6] Mellanox 100G mlx5 fixes 2016-12-04
From: David Miller @ 2016-12-06 16:45 UTC (permalink / raw)
  To: saeedm; +Cc: netdev
In-Reply-To: <1481038368-29677-1-git-send-email-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Tue,  6 Dec 2016 17:32:42 +0200

> Some bug fixes for mlx5 core and mlx5e driver.
> 
> v1->v2:
>  - replace "uint" with "unsigned int"

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH v3 net-next v3 0/4] net: dsa: mv88e6xxx: rework reset and PPU code
From: David Miller @ 2016-12-06 16:33 UTC (permalink / raw)
  To: vivien.didelot
  Cc: netdev, linux-kernel, kernel, f.fainelli, andrew, eichest,
	richardcochran
In-Reply-To: <20161205223028.20308-1-vivien.didelot@savoirfairelinux.com>

From: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date: Mon,  5 Dec 2016 17:30:24 -0500

> Old Marvell chips (like 88E6060) don't have a PHY Polling Unit (PPU).
> 
> Next chips (like 88E6185) have a PPU, which has exclusive access to the
> PHY registers, thus must be disabled before access.
> 
> Newer chips (like 88E6352) have an indirect mechanism to access the PHY
> registers whenever, thus loose control over the PPU (always enabled).
> 
> Here's a summary:
> 
> Model | PPU? | Has PPU ctrl?  | PPU state readable? | PHY access
> ----- | ---- | -------------- | ------------------- | ----------
>  6060 | no   | no             | no                  | direct
>  6185 | yes  | yes, PPUEn bit | yes, PPUState 2-bit | direct w/ PPU dis.
>  6352 | yes  | no             | yes, PPUState 1-bit | indirect
>  6390 | yes  | no             | yes, InitState bit  | indirect
> 
> Depending on the PPU control, a switch may have to restart the PPU when
> resetting the switch. Once the switch is reset, we must wait for the PPU
> state to be active polling again before accessing the registers.
> 
> For that purpose, add new operations to the chips to enable/disable the
> PPU, and execute software reset. With these new ops in place, rework the
> switch reset code and finally get rid of the MV88E6XXX_FLAG_PPU* flags.

Series applied, thanks Vivien.

And thanks for the detailed, informative, header postings like this one.

^ permalink raw reply

* Re: [PATCH V4 net-next] net: hns: Fix to conditionally convey RX checksum flag to stack
From: David Miller @ 2016-12-06 16:43 UTC (permalink / raw)
  To: salil.mehta; +Cc: yisen.zhuang, mehta.salil.lnk, netdev, linux-kernel, linuxarm
In-Reply-To: <20161206110946.734344-1-salil.mehta@huawei.com>

From: Salil Mehta <salil.mehta@huawei.com>
Date: Tue, 6 Dec 2016 11:09:46 +0000

> This patch introduces the RX checksum function to check the
> status of the hardware calculated checksum and its error and
> appropriately convey status to the upper stack in skb->ip_summed
> field.
> 
> In hardware, we only support checksum for the following
> protocols:
> 1) IPv4,
> 2) TCP(over IPv4 or IPv6),
> 3) UDP(over IPv4 or IPv6),
> 4) SCTP(over IPv4 or IPv6)
> but we support many L3(IPv4, IPv6, MPLS, PPPoE etc) and
> L4(TCP, UDP, GRE, SCTP, IGMP, ICMP etc.) protocols.
> 
> Hardware limitation:
> Our present hardware RX Descriptor lacks L3/L4 checksum
> "Status & Error" bit (which usually can be used to indicate whether
> checksum was calculated by the hardware and if there was any error
> encountered during checksum calculation).
> 
> Software workaround:
> We do get info within the RX descriptor about the kind of
> L3/L4 protocol coming in the packet and the error status. These
> errors might not just be checksum errors but could be related to
> version, length of IPv4, UDP, TCP etc.
> Because there is no-way of knowing if it is a L3/L4 error due
> to bad checksum or any other L3/L4 error, we will not (cannot)
> convey hardware checksum status(CHECKSUM_UNNECESSARY) for such
> cases to upper stack and will not maintain the RX L3/L4 checksum
> counters as well.
> 
> Signed-off-by: Salil Mehta <salil.mehta@huawei.com>

Applied.

^ permalink raw reply

* Re: [patch net v4] net: fec: fix compile with CONFIG_M5272
From: David Miller @ 2016-12-06 16:41 UTC (permalink / raw)
  To: nikita.yoush
  Cc: fugang.duan, troy.kisky, andrew, eric, tremyfr, johannes, netdev,
	cphealy, fabio.estevam, linux-kernel
In-Reply-To: <1481005613-5147-1-git-send-email-nikita.yoush@cogentembedded.com>

From: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Date: Tue,  6 Dec 2016 09:26:53 +0300

> Commit 80cca775cdc4 ("net: fec: cache statistics while device is down")
> introduced unconditional statistics-related actions.
> 
> However, when driver is compiled with CONFIG_M5272, staticsics-related
> definitions do not exist, which results into build errors.
> 
> Fix that by adding explicit handling of !defined(CONFIG_M5272) case.
> 
> Fixes: 80cca775cdc4 ("net: fec: cache statistics while device is down")
> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] be2net: Add DEVSEC privilege to SET_HSW_CONFIG command.
From: David Miller @ 2016-12-06 16:41 UTC (permalink / raw)
  To: suresh.reddy; +Cc: netdev, venkatkumar.duvvuru
In-Reply-To: <20161206053350.23203-1-suresh.reddy@broadcom.com>

From: Suresh Reddy <suresh.reddy@broadcom.com>
Date: Tue,  6 Dec 2016 00:33:50 -0500

> From: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
> 
> OPCODE_COMMON_GET_FN_PRIVILEGES is returning only DEVSEC
> privilege (Unrestricted Administrative Privilege) for Lancer NIC functions.
> So, driver is failing SET_HSW_CONFIG command, as DEVSEC privilege was not
> set in the privilege bitmap. This patch fixes the problem by setting DEVSEC
> privilege in SET_HSW_CONFIG’s privilege bitmap.
> 
> Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
> Signed-off-by: Suresh Reddy <suresh.reddy@broadcom.com>

Applied.

^ permalink raw reply

* Re: Gigabit ethernet driver for Alacritechs SLIC devices (v4)
From: Greg KH @ 2016-12-06 16:40 UTC (permalink / raw)
  To: David Miller
  Cc: LinoSanfilippo, charrer, liodot, andrew, roszenrami,
	markus.boehme, f.fainelli, devel, linux-kernel, netdev
In-Reply-To: <20161206.113004.1297109434722530511.davem@davemloft.net>

On Tue, Dec 06, 2016 at 11:30:04AM -0500, David Miller wrote:
> From: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> Date: Mon,  5 Dec 2016 23:07:15 +0100
> 
> > this is the forth version of the slicoss gigabit ethernet driver (which is a
> > rework of the driver from Alacritech which can currently be found under
> > drivers/staging/slicoss). The driver is supposed to support Mojave, Oasis and
> > Kalahari cards, for both copper and fiber.
> > 
> > If this code is accepted the staging version can be removed.
> > 
> > The driver has been tested on a SEN2104ET adapter (4 Port PCIe copper).
> 
> I've applied this series, nice work.
> 
> But realize that if you use SLICOSS as the Kconfig symbol to select
> this driver, then while the staging driver is still in the tree it
> will enable both drivers which both want to attach to the same exact
> device IDs.

If you have taken this in your tree now, I will go delete the staging
driver from the staging tree, so we will not have that issue.  Should I
do that now?

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] virtio-net: Fix DMA-from-the-stack in virtnet_set_mac_address()
From: David Miller @ 2016-12-06 16:39 UTC (permalink / raw)
  To: luto; +Cc: netdev, linux-kernel, virtualization, mst, jasowang, labbott
In-Reply-To: <fe889e578d5dffa9ae0834b449a35fcfd1e10694.1480990173.git.luto@kernel.org>

From: Andy Lutomirski <luto@kernel.org>
Date: Mon,  5 Dec 2016 18:10:58 -0800

> With CONFIG_VMAP_STACK=y, virtnet_set_mac_address() can be passed a
> pointer to the stack and it will OOPS.  Copy the address to the heap
> to prevent the crash.
> 
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Laura Abbott <labbott@redhat.com>
> Reported-by: zbyszek@in.waw.pl
> Signed-off-by: Andy Lutomirski <luto@kernel.org>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: ethernet: ti: cpsw: fix early budget split
From: David Miller @ 2016-12-06 16:37 UTC (permalink / raw)
  To: ivan.khoronzhuk
  Cc: mugunthanvnm, grygorii.strashko, linux-omap, netdev, linux-kernel
In-Reply-To: <1480988700-17046-1-git-send-email-ivan.khoronzhuk@linaro.org>

From: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Date: Tue,  6 Dec 2016 03:45:00 +0200

> The budget split function requires the phy speed to be known.
> While ndo open a phy speed identification is postponed till the
> moment link is up. Hence, move it to appropriate callback, when link
> is up.
> 
> Reported-by: Grygorii Strashko <grygorii.strashko@ti.com>
> Fixes: 8feb0a196507 ("net: ethernet: ti: cpsw: split tx budget according between channels")
> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>

Applied.

^ permalink raw reply

* Re: [PATCH] drivers: net: cpsw-phy-sel: Clear RGMII_IDMODE on "rgmii" links
From: David Miller @ 2016-12-06 16:36 UTC (permalink / raw)
  To: alex.g
  Cc: mugunthanvnm, grygorii.strashko, linux-omap, netdev, linux-kernel,
	gokhan
In-Reply-To: <1480988033-16535-1-git-send-email-alex.g@adaptrum.com>

From: Alexandru Gagniuc <alex.g@adaptrum.com>
Date: Mon,  5 Dec 2016 17:33:53 -0800

> Support for setting the RGMII_IDMODE bit was added in commit:
> "drivers: net: cpsw-phy-sel: add support to configure rgmii internal delay"
> However, that commit did not add the symmetrical clearing of the bit
> by way of setting it in "mask". Add it here.
> 
> Note that the documentation marks clearing this bit as "reserved",
> however, according to TI, support for delaying the clock does exist in
> the MAC, although it is not officially supported.
> We tested this on a board with an RGMII to RGMII link that will not
> work unless this bit is cleared.
> 
> Signed-off-by: Alexandru Gagniuc <alex.g@adaptrum.com>

Commits must be referenced by both short-form SHA1-ID as well as
the commit header text.

And since this change is fixing that commit, you should also provide
a proper "Fixes: " tag on the line right before your signoff.

Thanks.

^ permalink raw reply

* Re: [PATCH next] Revert "dctcp: update cwnd on congestion event"
From: David Miller @ 2016-12-06 16:34 UTC (permalink / raw)
  To: fw; +Cc: netdev, ncardwell
In-Reply-To: <1480980180-23349-1-git-send-email-fw@strlen.de>

From: Florian Westphal <fw@strlen.de>
Date: Tue,  6 Dec 2016 00:23:00 +0100

> Neal Cardwell says:
>  If I am reading the code correctly, then I would have two concerns:
>  1) Has that been tested? That seems like an extremely dramatic
>     decrease in cwnd. For example, if the cwnd is 80, and there are 40
>     ACKs, and half the ACKs are ECE marked, then my back-of-the-envelope
>     calculations seem to suggest that after just 11 ACKs the cwnd would be
>     down to a minimal value of 2 [..]
>  2) That seems to contradict another passage in the draft [..] where it
>     sazs:
>        Just as specified in [RFC3168], DCTCP does not react to congestion
>        indications more than once for every window of data.
> 
> Neal is right.  Fortunately we don't have to complicate this by testing
> vs. current rtt estimate, we can just revert the patch.
> 
> Normal stack already handles this for us: receiving ACKs with ECE
> set causes a call to tcp_enter_cwr(), from there on the ssthresh gets
> adjusted and prr will take care of cwnd adjustment.
> 
> Fixes: 4780566784b396 ("dctcp: update cwnd on congestion event")
> Cc: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Applied.

^ permalink raw reply

* Re: Gigabit ethernet driver for Alacritechs SLIC devices (v4)
From: David Miller @ 2016-12-06 16:30 UTC (permalink / raw)
  To: LinoSanfilippo
  Cc: devel, andrew, f.fainelli, roszenrami, gregkh, linux-kernel,
	liodot, netdev
In-Reply-To: <1480975637-18245-1-git-send-email-LinoSanfilippo@gmx.de>

From: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Date: Mon,  5 Dec 2016 23:07:15 +0100

> this is the forth version of the slicoss gigabit ethernet driver (which is a
> rework of the driver from Alacritech which can currently be found under
> drivers/staging/slicoss). The driver is supposed to support Mojave, Oasis and
> Kalahari cards, for both copper and fiber.
> 
> If this code is accepted the staging version can be removed.
> 
> The driver has been tested on a SEN2104ET adapter (4 Port PCIe copper).

I've applied this series, nice work.

But realize that if you use SLICOSS as the Kconfig symbol to select
this driver, then while the staging driver is still in the tree it
will enable both drivers which both want to attach to the same exact
device IDs.

^ permalink raw reply

* Re: [PATCH net v4] tcp: warn on bogus MSS and try to amend it
From: David Miller @ 2016-12-06 16:01 UTC (permalink / raw)
  To: marcelo.leitner
  Cc: netdev, jmaxwell37, alexandre.sidorenko, kuznet, jmorris,
	yoshfuji, kaber, tlfalcon, brking, eric.dumazet
In-Reply-To: <2056cf96b896aa473ff017b9f223904a14bfed86.1480969929.git.marcelo.leitner@gmail.com>

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Mon,  5 Dec 2016 18:37:13 -0200

> There have been some reports lately about TCP connection stalls caused
> by NIC drivers that aren't setting gso_size on aggregated packets on rx
> path. This causes TCP to assume that the MSS is actually the size of the
> aggregated packet, which is invalid.
> 
> Although the proper fix is to be done at each driver, it's often hard
> and cumbersome for one to debug, come to such root cause and report/fix
> it.
> 
> This patch amends this situation in two ways. First, it adds a warning
> on when this situation occurs, so it gives a hint to those trying to
> debug this. It also limit the maximum probed MSS to the adverised MSS,
> as it should never be any higher than that.
> 
> The result is that the connection may not have the best performance ever
> but it shouldn't stall, and the admin will have a hint on what to look
> for.
> 
> Tested with virtio by forcing gso_size to 0.
> 
> v2: updated msg per David's suggestion
> v3: use skb_iif to find the interface and also log its name, per Eric
>     Dumazet's suggestion. As the skb may be backlogged and the interface
>     gone by then, we need to check if the number still has a meaning.
> v4: use helper tcp_gro_dev_warn() and avoid pr_warn_once inside __once, per
>     David's suggestion
> 
> Cc: Jonathan Maxwell <jmaxwell37@gmail.com>
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

Applied, thanks Marcelo.

^ permalink raw reply

* Re: [PATCH resend 0/8] irda: w83977af_ir: Neatening
From: David Miller @ 2016-12-06 15:59 UTC (permalink / raw)
  To: joe; +Cc: netdev, arnd, sergei.shtylyov, samuel, linux-kernel
In-Reply-To: <cover.1480963809.git.joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Mon,  5 Dec 2016 11:00:40 -0800

> On top of Arnd's overly long udelay patch because I noticed a
> misindented block.

Joe, this doesn't apply cleanly to net-next, please respin.

Thank you.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox