Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH spi for-5.4 2/5] spi: Add a PTP system timestamp to the transfer structure
From: Vladimir Oltean @ 2019-08-18 18:25 UTC (permalink / raw)
  To: broonie, h.feurstein, mlichvar, richardcochran, andrew,
	f.fainelli
  Cc: linux-spi, netdev, Vladimir Oltean
In-Reply-To: <20190818182600.3047-1-olteanv@gmail.com>

SPI is one of the interfaces used to access devices which have a POSIX
clock driver (real time clocks, 1588 timers etc).

Since there are lots of sources of timing jitter when retrieving the
time from such a device (queuing delays, locking contention, running in
interruptible context etc), introduce a PTP system timestamp structure
in struct spi_transfer. This is to be used by SPI device drivers when
they need to know the exact time at which the underlying device's time
was snapshotted.

Because SPI controllers may have jitter even between frames, also
introduce a field which specifies to the controller driver specifically
which byte needs to be snapshotted.

Add a default implementation of the PTP system timestamping in the SPI
core. There are 3 entry points from the core towards the SPI controller
drivers:
- transfer_one: The driver is passed individual spi_transfers to
  execute. This is the easiest to timestamp.
- transfer_one_message: The core passes the driver an entire spi_message
  (a potential batch of spi_transfers). The core puts the same pre and
  post timestamp to all transfers within a message. This is not ideal,
  but nothing better can be done by default anyway, since the core has
  no insight into how the driver batches the transfers.
- transfer: Like transfer_one_message, but for unqueued drivers (i.e.
  the driver implements its own queue scheduling).

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 drivers/spi/spi.c       | 62 +++++++++++++++++++++++++++++++++++++++++
 include/linux/spi/spi.h | 38 +++++++++++++++++++++++++
 2 files changed, 100 insertions(+)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index d96e04627982..cf5c5435edcd 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1171,6 +1171,11 @@ static int spi_transfer_one_message(struct spi_controller *ctlr,
 		spi_statistics_add_transfer_stats(statm, xfer, ctlr);
 		spi_statistics_add_transfer_stats(stats, xfer, ctlr);
 
+		if (!ctlr->ptp_sts_supported) {
+			xfer->ptp_sts_word_pre = 0;
+			ptp_read_system_prets(xfer->ptp_sts);
+		}
+
 		if (xfer->tx_buf || xfer->rx_buf) {
 			reinit_completion(&ctlr->xfer_completion);
 
@@ -1197,6 +1202,11 @@ static int spi_transfer_one_message(struct spi_controller *ctlr,
 					xfer->len);
 		}
 
+		if (!ctlr->ptp_sts_supported) {
+			ptp_read_system_postts(xfer->ptp_sts);
+			xfer->ptp_sts_word_post = xfer->len;
+		}
+
 		trace_spi_transfer_stop(msg, xfer);
 
 		if (msg->status != -EINPROGRESS)
@@ -1265,6 +1275,7 @@ EXPORT_SYMBOL_GPL(spi_finalize_current_transfer);
  */
 static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 {
+	struct spi_transfer *xfer;
 	struct spi_message *mesg;
 	bool was_busy = false;
 	unsigned long flags;
@@ -1391,6 +1402,13 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 		goto out;
 	}
 
+	if (!ctlr->ptp_sts_supported) {
+		list_for_each_entry(xfer, &mesg->transfers, transfer_list) {
+			xfer->ptp_sts_word_pre = 0;
+			ptp_read_system_prets(xfer->ptp_sts);
+		}
+	}
+
 	ret = ctlr->transfer_one_message(ctlr, mesg);
 	if (ret) {
 		dev_err(&ctlr->dev,
@@ -1418,6 +1436,34 @@ static void spi_pump_messages(struct kthread_work *work)
 	__spi_pump_messages(ctlr, true);
 }
 
+/**
+ * spi_xfer_ptp_sts_word - helper for drivers to retrieve the pointer to the
+ *			   word in the TX buffer which must be timestamped.
+ *			   The SPI slave does not provide a pointer directly
+ *			   because the TX and RX buffers may be reallocated
+ *			   (see @spi_map_msg).
+ * @xfer: Pointer to the transfer being timestamped
+ * @pre: If true, returns the pointer to @ptp_sts_word_pre, otherwise returns
+ *	the pointer to @ptp_sts_word_post.
+ */
+const void *spi_xfer_ptp_sts_word(struct spi_transfer *xfer, bool pre)
+{
+	unsigned int bytes_per_word;
+
+	if (xfer->bits_per_word <= 8)
+		bytes_per_word = 1;
+	else if (xfer->bits_per_word <= 16)
+		bytes_per_word = 2;
+	else
+		bytes_per_word = 4;
+
+	if (pre)
+		return xfer->tx_buf + xfer->ptp_sts_word_pre * bytes_per_word;
+
+	return xfer->tx_buf + xfer->ptp_sts_word_post * bytes_per_word;
+}
+EXPORT_SYMBOL_GPL(spi_xfer_ptp_sts_word);
+
 /**
  * spi_set_thread_rt - set the controller to pump at realtime priority
  * @ctlr: controller to boost priority of
@@ -1503,6 +1549,7 @@ EXPORT_SYMBOL_GPL(spi_get_next_queued_message);
  */
 void spi_finalize_current_message(struct spi_controller *ctlr)
 {
+	struct spi_transfer *xfer;
 	struct spi_message *mesg;
 	unsigned long flags;
 	int ret;
@@ -1511,6 +1558,13 @@ void spi_finalize_current_message(struct spi_controller *ctlr)
 	mesg = ctlr->cur_msg;
 	spin_unlock_irqrestore(&ctlr->queue_lock, flags);
 
+	if (!ctlr->ptp_sts_supported) {
+		list_for_each_entry(xfer, &mesg->transfers, transfer_list) {
+			ptp_read_system_postts(xfer->ptp_sts);
+			xfer->ptp_sts_word_post = xfer->len;
+		}
+	}
+
 	spi_unmap_msg(ctlr, mesg);
 
 	if (ctlr->cur_msg_prepared && ctlr->unprepare_message) {
@@ -3270,6 +3324,7 @@ static int __spi_validate(struct spi_device *spi, struct spi_message *message)
 static int __spi_async(struct spi_device *spi, struct spi_message *message)
 {
 	struct spi_controller *ctlr = spi->controller;
+	struct spi_transfer *xfer;
 
 	/*
 	 * Some controllers do not support doing regular SPI transfers. Return
@@ -3285,6 +3340,13 @@ static int __spi_async(struct spi_device *spi, struct spi_message *message)
 
 	trace_spi_message_submit(message);
 
+	if (!ctlr->ptp_sts_supported) {
+		list_for_each_entry(xfer, &message->transfers, transfer_list) {
+			xfer->ptp_sts_word_pre = 0;
+			ptp_read_system_prets(xfer->ptp_sts);
+		}
+	}
+
 	return ctlr->transfer(spi, message);
 }
 
diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
index af4f265d0f67..bb7553c6e5d0 100644
--- a/include/linux/spi/spi.h
+++ b/include/linux/spi/spi.h
@@ -13,6 +13,7 @@
 #include <linux/completion.h>
 #include <linux/scatterlist.h>
 #include <linux/gpio/consumer.h>
+#include <linux/ptp_clock_kernel.h>
 
 struct dma_chan;
 struct property_entry;
@@ -409,6 +410,12 @@ static inline void spi_unregister_driver(struct spi_driver *sdrv)
  * @fw_translate_cs: If the boot firmware uses different numbering scheme
  *	what Linux expects, this optional hook can be used to translate
  *	between the two.
+ * @ptp_sts_supported: If the driver sets this to true, it must provide a
+ *	time snapshot in @spi_transfer->ptp_sts as close as possible to the
+ *	moment in time when @spi_transfer->ptp_sts_word_pre and
+ *	@spi_transfer->ptp_sts_word_post were transmitted.
+ *	If the driver does not set this, the SPI core takes the snapshot as
+ *	close to the driver hand-over as possible.
  *
  * Each SPI controller can communicate with one or more @spi_device
  * children.  These make a small bus, sharing MOSI, MISO and SCK signals
@@ -604,6 +611,12 @@ struct spi_controller {
 	void			*dummy_tx;
 
 	int (*fw_translate_cs)(struct spi_controller *ctlr, unsigned cs);
+
+	/*
+	 * Driver sets this field to indicate it is able to snapshot SPI
+	 * transfers (needed e.g. for reading the time of POSIX clocks)
+	 */
+	bool			ptp_sts_supported;
 };
 
 static inline void *spi_controller_get_devdata(struct spi_controller *ctlr)
@@ -644,6 +657,9 @@ extern struct spi_message *spi_get_next_queued_message(struct spi_controller *ct
 extern void spi_finalize_current_message(struct spi_controller *ctlr);
 extern void spi_finalize_current_transfer(struct spi_controller *ctlr);
 
+/* Helper calls for driver to get which buffer pointer must be timestamped */
+extern const void *spi_xfer_ptp_sts_word(struct spi_transfer *xfer, bool pre);
+
 /* the spi driver core manages memory for the spi_controller classdev */
 extern struct spi_controller *__spi_alloc_controller(struct device *host,
 						unsigned int size, bool slave);
@@ -753,6 +769,24 @@ extern void spi_res_release(struct spi_controller *ctlr,
  * @transfer_list: transfers are sequenced through @spi_message.transfers
  * @tx_sg: Scatterlist for transmit, currently not for client use
  * @rx_sg: Scatterlist for receive, currently not for client use
+ * @ptp_sts_word_pre: The word (subject to bits_per_word semantics) offset
+ *	within @tx_buf for which the SPI device is requesting that the time
+ *	snapshot for this transfer begins. Upon completing the SPI transfer,
+ *	this value may have changed compared to what was requested, depending
+ *	on the available snapshotting resolution (DMA transfer,
+ *	@ptp_sts_supported is false, etc).
+ * @ptp_sts_word_post: See @ptp_sts_word_post. The two can be equal (meaning
+ *	that a single byte should be snapshotted). The core will set
+ *	@ptp_sts_word_pre to 0, and @ptp_sts_word_post to the length of the
+ *	transfer, if @ptp_sts_supported is false for this controller. This is
+ *	done purposefully (instead of setting to spi_transfer->len - 1) to
+ *	denote that a transfer-level snapshot taken from within the driver may
+ *	still be of higher quality.
+ * @ptp_sts: Pointer to a memory location held by the SPI slave device where a
+ *	PTP system timestamp structure may lie. If drivers use PIO or their
+ *	hardware has some sort of assist for retrieving exact transfer timing,
+ *	they can (and should) assert @ptp_sts_supported and populate this
+ *	structure using the ptp_read_system_*ts helper functions.
  *
  * SPI transfers always write the same number of bytes as they read.
  * Protocol drivers should always provide @rx_buf and/or @tx_buf.
@@ -842,6 +876,10 @@ struct spi_transfer {
 
 	u32		effective_speed_hz;
 
+	unsigned int	ptp_sts_word_pre;
+	unsigned int	ptp_sts_word_post;
+	struct ptp_system_timestamp *ptp_sts;
+
 	struct list_head transfer_list;
 };
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH spi for-5.4 5/5] spi: spi-fsl-dspi: Disable interrupts and preemption during poll mode transfer
From: Vladimir Oltean @ 2019-08-18 18:26 UTC (permalink / raw)
  To: broonie, h.feurstein, mlichvar, richardcochran, andrew,
	f.fainelli
  Cc: linux-spi, netdev, Vladimir Oltean
In-Reply-To: <20190818182600.3047-1-olteanv@gmail.com>

While the poll mode helps reduce the overall latency in transmitting a
SPI message in the EOQ and TCFQ modes, the transmission can still have
jitter due to the CPU needing to service interrupts.

The transmission latency does not matter except in situations where the
SPI transfer represents the readout of a POSIX clock. In that case,
even with the byte-level PTP system timestamping in place, a pending IRQ
might find its way to be processed on the local CPU exactly during the
window when the transfer is snapshotted.

Disabling interrupts ensures the above situation never happens. When it
does, it manifests itself as random delay spikes, which throw off the
servo loop of phc2sys and make it lose lock.

Short phc2sys summary after 58 minutes of running without this patch:

  offset: min -26251 max 16416 mean -21.8672 std dev 863.416
  delay: min 4720 max 57280 mean 5182.49 std dev 1607.19
  lost servo lock 3 times

Summary of the same phc2sys service running for 120 minutes with the
patch:

  offset: min -378 max 381 mean -0.0083089 std dev 101.495
  delay: min 4720 max 5920 mean 5129.38 std dev 154.899
  lost servo lock 0 times

Disable interrupts unconditionally if running in poll mode.
Two aspects:
- If the DSPI driver is in IRQ mode, then disabling interrupts becomes a
  contradiction in terms. Poll mode is recommendable for predictable
  latency.
- In theory it should be possible to disable interrupts only for SPI
  transfers that represent an interaction with a POSIX clock. The driver
  can sense this by looking at transfer->ptp_sts. However enabling this
  unconditionally makes issues much more visible (and not just in fringe
  cases), were they to ever appear.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 drivers/spi/spi-fsl-dspi.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index ea7169d18e09..c94574a20c8a 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -182,6 +182,8 @@ struct fsl_dspi {
 	int					irq;
 	struct clk				*clk;
 
+	/* Used to disable IRQs and preemption */
+	spinlock_t				lock;
 	struct ptp_system_timestamp		*ptp_sts;
 	const void				*ptp_sts_word_pre;
 	const void				*ptp_sts_word_post;
@@ -739,6 +741,7 @@ static int dspi_transfer_one_message(struct spi_controller *ctlr,
 	struct spi_device *spi = message->spi;
 	enum dspi_trans_mode trans_mode;
 	struct spi_transfer *transfer;
+	unsigned long flags = 0;
 	int status = 0;
 
 	message->actual_length = 0;
@@ -797,6 +800,9 @@ static int dspi_transfer_one_message(struct spi_controller *ctlr,
 				     SPI_FRAME_EBITS(transfer->bits_per_word) |
 				     SPI_CTARE_DTCP(1));
 
+		if (!dspi->irq)
+			spin_lock_irqsave(&dspi->lock, flags);
+
 		dspi->take_snapshot_pre = (dspi->tx == dspi->ptp_sts_word_pre);
 
 		if (dspi->take_snapshot_pre)
@@ -829,6 +835,9 @@ static int dspi_transfer_one_message(struct spi_controller *ctlr,
 			do {
 				status = dspi_poll(dspi);
 			} while (status == -EINPROGRESS);
+
+			spin_unlock_irqrestore(&dspi->lock, flags);
+
 		} else if (trans_mode != DSPI_DMA_MODE) {
 			status = wait_event_interruptible(dspi->waitq,
 							  dspi->waitflags);
@@ -1153,6 +1162,7 @@ static int dspi_probe(struct platform_device *pdev)
 	}
 
 	init_waitqueue_head(&dspi->waitq);
+	spin_lock_init(&dspi->lock);
 
 poll_mode:
 	if (dspi->devtype_data->trans_mode == DSPI_DMA_MODE) {
-- 
2.17.1


^ permalink raw reply related

* [PATCH spi for-5.4 3/5] spi: spi-fsl-dspi: Use poll mode in case the platform IRQ is missing
From: Vladimir Oltean @ 2019-08-18 18:25 UTC (permalink / raw)
  To: broonie, h.feurstein, mlichvar, richardcochran, andrew,
	f.fainelli
  Cc: linux-spi, netdev, Vladimir Oltean
In-Reply-To: <20190818182600.3047-1-olteanv@gmail.com>

On platforms like LS1021A which use TCFQ mode, an interrupt needs to be
processed after each byte is TXed/RXed. I tried to make the DSPI
implementation on this SoC operate in other, more efficient modes (EOQ,
DMA) but it looks like it simply isn't possible.

Therefore allow the driver to operate in poll mode, to ease a bit of
this absurd amount of IRQ load generated in TCFQ mode. Doing so reduces
both the net time it takes to transmit a SPI message, as well as the
inter-frame jitter that occurs while doing so.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 drivers/spi/spi-fsl-dspi.c | 81 ++++++++++++++++++++++++++++----------
 1 file changed, 61 insertions(+), 20 deletions(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 238bbe172b79..4daf8c3d07b7 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -647,15 +647,11 @@ static void dspi_eoq_read(struct fsl_dspi *dspi)
 		dspi_push_rx(dspi, fifo_read(dspi));
 }
 
-static irqreturn_t dspi_interrupt(int irq, void *dev_id)
+static int dspi_rxtx(struct fsl_dspi *dspi)
 {
-	struct fsl_dspi *dspi = (struct fsl_dspi *)dev_id;
 	struct spi_message *msg = dspi->cur_msg;
-	u32 spi_sr, spi_tcr;
 	u16 spi_tcnt;
-
-	regmap_read(dspi->regmap, SPI_SR, &spi_sr);
-	regmap_write(dspi->regmap, SPI_SR, spi_sr);
+	u32 spi_tcr;
 
 	/* Get transfer counter (in number of SPI transfers). It was
 	 * reset to 0 when transfer(s) were started.
@@ -670,17 +666,52 @@ static irqreturn_t dspi_interrupt(int irq, void *dev_id)
 	else
 		dspi_tcfq_read(dspi);
 
-	if (!dspi->len) {
-		dspi->waitflags = 1;
-		wake_up_interruptible(&dspi->waitq);
-		return IRQ_HANDLED;
-	}
+	if (!dspi->len)
+		/* Success! */
+		return 0;
 
 	if (dspi->devtype_data->trans_mode == DSPI_EOQ_MODE)
 		dspi_eoq_write(dspi);
 	else
 		dspi_tcfq_write(dspi);
 
+	return -EINPROGRESS;
+}
+
+static int dspi_poll(struct fsl_dspi *dspi)
+{
+	int tries = 1000;
+	u32 spi_sr;
+
+	do {
+		regmap_read(dspi->regmap, SPI_SR, &spi_sr);
+		regmap_write(dspi->regmap, SPI_SR, spi_sr);
+
+		if (spi_sr & (SPI_SR_EOQF | SPI_SR_TCFQF))
+			break;
+	} while (--tries);
+
+	if (!tries)
+		return -ETIMEDOUT;
+
+	return dspi_rxtx(dspi);
+}
+
+static irqreturn_t dspi_interrupt(int irq, void *dev_id)
+{
+	struct fsl_dspi *dspi = (struct fsl_dspi *)dev_id;
+	u32 spi_sr;
+
+	regmap_read(dspi->regmap, SPI_SR, &spi_sr);
+	regmap_write(dspi->regmap, SPI_SR, spi_sr);
+
+	dspi_rxtx(dspi);
+
+	if (!dspi->len) {
+		dspi->waitflags = 1;
+		wake_up_interruptible(&dspi->waitq);
+	}
+
 	return IRQ_HANDLED;
 }
 
@@ -768,13 +799,18 @@ static int dspi_transfer_one_message(struct spi_controller *ctlr,
 			goto out;
 		}
 
-		if (trans_mode != DSPI_DMA_MODE) {
-			if (wait_event_interruptible(dspi->waitq,
-						     dspi->waitflags))
-				dev_err(&dspi->pdev->dev,
-					"wait transfer complete fail!\n");
+		if (!dspi->irq) {
+			do {
+				status = dspi_poll(dspi);
+			} while (status == -EINPROGRESS);
+		} else if (trans_mode != DSPI_DMA_MODE) {
+			status = wait_event_interruptible(dspi->waitq,
+							  dspi->waitflags);
 			dspi->waitflags = 0;
 		}
+		if (status)
+			dev_err(&dspi->pdev->dev,
+				"Waiting for transfer to complete failed!\n");
 
 		if (transfer->delay_usecs)
 			udelay(transfer->delay_usecs);
@@ -1074,10 +1110,13 @@ static int dspi_probe(struct platform_device *pdev)
 		goto out_ctlr_put;
 
 	dspi_init(dspi);
+
 	dspi->irq = platform_get_irq(pdev, 0);
-	if (dspi->irq < 0) {
-		ret = dspi->irq;
-		goto out_clk_put;
+	if (dspi->irq <= 0) {
+		dev_info(&pdev->dev,
+			 "can't get platform irq, using poll mode\n");
+		dspi->irq = 0;
+		goto poll_mode;
 	}
 
 	ret = devm_request_irq(&pdev->dev, dspi->irq, dspi_interrupt,
@@ -1087,6 +1126,9 @@ static int dspi_probe(struct platform_device *pdev)
 		goto out_clk_put;
 	}
 
+	init_waitqueue_head(&dspi->waitq);
+
+poll_mode:
 	if (dspi->devtype_data->trans_mode == DSPI_DMA_MODE) {
 		ret = dspi_request_dma(dspi, res->start);
 		if (ret < 0) {
@@ -1098,7 +1140,6 @@ static int dspi_probe(struct platform_device *pdev)
 	ctlr->max_speed_hz =
 		clk_get_rate(dspi->clk) / dspi->devtype_data->max_clock_factor;
 
-	init_waitqueue_head(&dspi->waitq);
 	platform_set_drvdata(pdev, ctlr);
 
 	ret = spi_register_controller(ctlr);
-- 
2.17.1


^ permalink raw reply related

* [PATCH spi for-5.4 4/5] spi: spi-fsl-dspi: Implement the PTP system timestamping for TCFQ mode
From: Vladimir Oltean @ 2019-08-18 18:25 UTC (permalink / raw)
  To: broonie, h.feurstein, mlichvar, richardcochran, andrew,
	f.fainelli
  Cc: linux-spi, netdev, Vladimir Oltean
In-Reply-To: <20190818182600.3047-1-olteanv@gmail.com>

In this mode, the DSPI controller uses PIO to transfer word by word. In
comparison, in EOQ mode the 4-word deep FIFO is being used, hence the
current logic will need some adaptation for which I do not have the
hardware (Coldfire) to test. It is not clear what is the timing of DMA
transfers and whether timestamping in the driver brings any overall
performance increase compared to regular timestamping done in the core.

Tested on LS1021A.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 drivers/spi/spi-fsl-dspi.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 4daf8c3d07b7..ea7169d18e09 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -129,6 +129,7 @@ enum dspi_trans_mode {
 struct fsl_dspi_devtype_data {
 	enum dspi_trans_mode	trans_mode;
 	u8			max_clock_factor;
+	bool			ptp_sts_supported;
 	bool			xspi_mode;
 };
 
@@ -140,12 +141,14 @@ static const struct fsl_dspi_devtype_data vf610_data = {
 static const struct fsl_dspi_devtype_data ls1021a_v1_data = {
 	.trans_mode		= DSPI_TCFQ_MODE,
 	.max_clock_factor	= 8,
+	.ptp_sts_supported	= true,
 	.xspi_mode		= true,
 };
 
 static const struct fsl_dspi_devtype_data ls2085a_data = {
 	.trans_mode		= DSPI_TCFQ_MODE,
 	.max_clock_factor	= 8,
+	.ptp_sts_supported	= true,
 };
 
 static const struct fsl_dspi_devtype_data coldfire_data = {
@@ -179,6 +182,11 @@ struct fsl_dspi {
 	int					irq;
 	struct clk				*clk;
 
+	struct ptp_system_timestamp		*ptp_sts;
+	const void				*ptp_sts_word_pre;
+	const void				*ptp_sts_word_post;
+	bool					take_snapshot_pre;
+	bool					take_snapshot_post;
 	struct spi_transfer			*cur_transfer;
 	struct spi_message			*cur_msg;
 	struct chip_data			*cur_chip;
@@ -653,6 +661,9 @@ static int dspi_rxtx(struct fsl_dspi *dspi)
 	u16 spi_tcnt;
 	u32 spi_tcr;
 
+	if (dspi->take_snapshot_post)
+		ptp_read_system_postts(dspi->ptp_sts);
+
 	/* Get transfer counter (in number of SPI transfers). It was
 	 * reset to 0 when transfer(s) were started.
 	 */
@@ -670,6 +681,12 @@ static int dspi_rxtx(struct fsl_dspi *dspi)
 		/* Success! */
 		return 0;
 
+	dspi->take_snapshot_pre = (dspi->tx == dspi->ptp_sts_word_pre);
+	dspi->take_snapshot_post = (dspi->tx == dspi->ptp_sts_word_post);
+
+	if (dspi->take_snapshot_pre)
+		ptp_read_system_prets(dspi->ptp_sts);
+
 	if (dspi->devtype_data->trans_mode == DSPI_EOQ_MODE)
 		dspi_eoq_write(dspi);
 	else
@@ -764,6 +781,10 @@ static int dspi_transfer_one_message(struct spi_controller *ctlr,
 			dspi->bytes_per_word = 2;
 		else
 			dspi->bytes_per_word = 4;
+		dspi->ptp_sts = transfer->ptp_sts;
+		dspi->ptp_sts_word_pre = spi_xfer_ptp_sts_word(transfer, true);
+		dspi->ptp_sts_word_post = spi_xfer_ptp_sts_word(transfer,
+								false);
 
 		regmap_update_bits(dspi->regmap, SPI_MCR,
 				   SPI_MCR_CLR_TXF | SPI_MCR_CLR_RXF,
@@ -776,6 +797,11 @@ static int dspi_transfer_one_message(struct spi_controller *ctlr,
 				     SPI_FRAME_EBITS(transfer->bits_per_word) |
 				     SPI_CTARE_DTCP(1));
 
+		dspi->take_snapshot_pre = (dspi->tx == dspi->ptp_sts_word_pre);
+
+		if (dspi->take_snapshot_pre)
+			ptp_read_system_prets(dspi->ptp_sts);
+
 		trans_mode = dspi->devtype_data->trans_mode;
 		switch (trans_mode) {
 		case DSPI_EOQ_MODE:
@@ -1140,6 +1166,8 @@ static int dspi_probe(struct platform_device *pdev)
 	ctlr->max_speed_hz =
 		clk_get_rate(dspi->clk) / dspi->devtype_data->max_clock_factor;
 
+	ctlr->ptp_sts_supported = dspi->devtype_data->ptp_sts_supported;
+
 	platform_set_drvdata(pdev, ctlr);
 
 	ret = spi_register_controller(ctlr);
-- 
2.17.1


^ permalink raw reply related

* [PATCH spi for-5.4 0/5] Deterministic SPI latency with NXP DSPI driver
From: Vladimir Oltean @ 2019-08-18 18:25 UTC (permalink / raw)
  To: broonie, h.feurstein, mlichvar, richardcochran, andrew,
	f.fainelli
  Cc: linux-spi, netdev, Vladimir Oltean

This patchset proposes an interface from the SPI subsystem for
software timestamping SPI transfers. There is a default implementation
provided in the core, as well as a mechanism for SPI slave drivers to
check which byte was in fact timestamped post-facto. The patchset also
adds the first user of this interface (the NXP DSPI driver in TCFQ mode).

The interface is somewhat similar to Hubert Feurstein's proposal for the
MDIO subsystem: https://lkml.org/lkml/2019/8/16/638

Original cover letter below. Also provided at the end some results with
an extra test (J - phc2sys using the timestamps taken by the SPI core).

===========================================================

Continuing the discussion created by Hubert Feurstein around the
mv88e6xxx driver for MDIO-controlled switches
(https://lkml.org/lkml/2019/8/2/1364), this patchset takes a similar
approach for the NXP LS1021A-TSN board, which has a SPI-controlled DSA
switch (SJA1105).

The patchset is motivated by some experiments done with a logic
analyzer, trying to understand the source of latency (and especially of
the jitter). SJA1105 SPI messages for reading the PTP clock are 12 bytes
in length: 4 for the SPI header and 8 for the timestamp. When looking at
the messages with a scope, there's jitter basically everywhere: between
bits of a frame and between frames in a transfer. The inter-bit jitter
is hardware and impacts us to a lesser extend (is smaller and caused by
the PVT stability of the oscillators, PLLs, etc). We will focus on the
latency between consecutive SPI frames within a 12-byte transfer.

As a preface, revisions of the DSPI controller IP are integrated in many
Freescale/NXP devices. As a result, the driver has 3 modes of operation:
- TCFQ (Transfer Complete Flag mode): The controller signals software
  that data has been sent/received after each individual word.
- EOQ (End of Queue mode): The driver can implement batching by making
  use of the controller's 4-word deep FIFO.
- DMA (Direct Memory Access mode): The SPI controller's FIFO is no
  longer in direct interaction with the driver, but is used to trigger
  the RX and TX channels of the eDMA module on the SoC.

In LS1021A, the driver works in the least efficient mode of the 3
(TCFQ). There is a well-known errata that the DSPI controller is broken
in conjunction with the eDMA module. As for the EOQ mode, I have tried
unsuccessfully for a few days to make use of the 4 entry FIFO, and the
hardware simply fails to reliably acknowledge the transmission when the
FIFO gets full. So it looks like we're stuck with the TCFQ mode.

The problem with phc2sys on the LS1021A-TSN board is that in order for
the gettime64() call to complete on the sja1105, the system has to
service 12 IRQs. Intuitively that is excessive and is the main source of
jitter, but let's not get ahead of ourselves.

An outline of the experiments that were done (unless otherwise
mentioned, all of these ran for 120 seconds):

A. First I have measured the (poor) performance of phc2sys under current
   conditions. (DSPI driver in IRQ mode, no PTP system timestamping)

   offset: min -53310 max 16107 mean -1737.18 std dev 11444.3
   delay: min 163680 max 237360 mean 201149 std dev 22446.6
   lost servo lock 1 times

B. I switched the .gettime64 callback to .gettimex64, snapshotting the
   PTP system timestamp within the sja1105 driver.

   offset: min -48923 max 64217 mean -904.137 std dev 17358.1
   delay: min 149600 max 203840 mean 169045 std dev 17993.3
   lost servo lock 8 times

C. I patched "struct spi_transfer" to contain the PTP system timestamp,
   and from the sja1105 driver, I passed this structure to be
   snapshotted by the SPI controller's driver (spi-fsl-dspi). This is
   the "transfer-level" snapshot.

   offset: min -64979 max 38979 mean -416.197 std dev 15367.9
   delay: min 125120 max 168320 mean 150286 std dev 17675.3
   lost servo lock 10 times

D. I changed the placement of the transfer snapshotting within the DSPI
   driver, from "transfer-level" to "byte-level".

   offset: min -9021 max 7149 mean -0.418803 std dev 3529.81
   delay: min 7840 max 23920 mean 14493.7 std dev 5982.17
   lost servo lock 0 times

E. I moved the DSPI driver to poll mode. I went back to collecting the
   PTP system timestamps from the sja1105 driver (same as B).

   offset: min -4199 max 46643 mean 418.214 std dev 4554.01
   delay: min 84000 max 194000 mean 99463.2 std dev 12936.5
   lost servo lock 1 times

F. Transfer-level snapshotting in the DSPI driver (same as C), but in
   poll mode.

   offset: min -24244 max 1115 mean -230.478 std dev 2297.28
   delay: min 69440 max 119040 mean 70312.9 std dev 8065.34
   lost servo lock 1 times

G. Byte-level snapshotting (same as D) but in poll mode.

   offset: min -314 max 288 mean -2.48718 std dev 118.045
   delay: min 4880 max 6000 mean 5118.63 std dev 507.258
   lost servo lock 0 times

   This seemed suspiciously good to me, so I let it run for longer
   (58 minutes):

   offset: min -26251 max 16416 mean -21.8672 std dev 863.416
   delay: min 4720 max 57280 mean 5182.49 std dev 1607.19
   lost servo lock 3 times

H. Transfer-level snapshotting (same as F), but with IRQs disabled.
   This ran for 86 minutes.

   offset: min -1927 max 1843 mean -0.209203 std dev 529.398
   delay: min 85440 max 93680 mean 88245 std dev 1454.71
   lost servo lock 0 times

I. Byte-level snapshotting (same as G), but with IRQs disabled.
   This ran for 102 minutes.

   offset: min -378 max 381 mean -0.0083089 std dev 101.495
   delay: min 4720 max 5920 mean 5129.38 std dev 154.899
   lost servo lock 0 times

J. Default snapshotting taken by the SPI core, with the DSPI driver
   running in poll mode, IRQs enabled. This ran for 274 minutes.

   offset: min -42568 max 44576 mean 2.91646 std dev 947.467
   delay: min 58480 max 171040 mean 80750.7 std dev 2001.61
   lost servo lock 3 times

As a result, this patchset proposes the implementation of scenario I.
The others were done through temporary patches which are not presented
here due to the difficulty of presenting a coherent git history without
resorting to reverts etc. The gist of each experiment should be clear
though.

The raw data is available for dissection at
https://drive.google.com/open?id=1r9raU9ZeqOqkqts6Lb-ISf5ubLDLP3wk.
The logic analyzer captures can be opened with a free-as-in-beer program
provided by Saleae: https://www.saleae.com/downloads/.

In the capture data one can find the MOSI, SCK SPI signals, as well as a
debug GPIO which was toggled at the same time as the PTP system
timestamp was taken, to give the viewer an impression of what the
software is capturing compared to the actual timing of the SPI transfer.

Attached are also some close-up screenshots of transfers where there is
a clear and huge delay in-between frames of the same 12-byte SPI
transfer. As it turns out, these were all caused by the CPU getting
interrupted by some other IRQ. Approaches H and I are the only ones that
get rid of these glitches. In theory, the byte-level snapshotting should
be less vulnerable to an IRQ interrupting the SPI transfer (because the
time window is much smaller) but as the 58 minutes experiment shows, it
is not immune.

Vladimir Oltean (5):
  spi: Use an abbreviated pointer to ctlr->cur_msg in
    __spi_pump_messages
  spi: Add a PTP system timestamp to the transfer structure
  spi: spi-fsl-dspi: Use poll mode in case the platform IRQ is missing
  spi: spi-fsl-dspi: Implement the PTP system timestamping for TCFQ mode
  spi: spi-fsl-dspi: Disable interrupts and preemption during poll mode
    transfer

 drivers/spi/spi-fsl-dspi.c | 117 +++++++++++++++++++++++++++++++------
 drivers/spi/spi.c          |  85 +++++++++++++++++++++++----
 include/linux/spi/spi.h    |  38 ++++++++++++
 3 files changed, 210 insertions(+), 30 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH spi for-5.4 1/5] spi: Use an abbreviated pointer to ctlr->cur_msg in __spi_pump_messages
From: Vladimir Oltean @ 2019-08-18 18:25 UTC (permalink / raw)
  To: broonie, h.feurstein, mlichvar, richardcochran, andrew,
	f.fainelli
  Cc: linux-spi, netdev, Vladimir Oltean
In-Reply-To: <20190818182600.3047-1-olteanv@gmail.com>

This helps a bit with line fitting now (the list_first_entry call) as
well as during the next patch which needs to iterate through all
transfers of ctlr->cur_msg so it timestamps them.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 drivers/spi/spi.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index aef55acb5ccd..d96e04627982 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1265,8 +1265,9 @@ EXPORT_SYMBOL_GPL(spi_finalize_current_transfer);
  */
 static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 {
-	unsigned long flags;
+	struct spi_message *mesg;
 	bool was_busy = false;
+	unsigned long flags;
 	int ret;
 
 	/* Lock queue */
@@ -1325,10 +1326,10 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 	}
 
 	/* Extract head of queue */
-	ctlr->cur_msg =
-		list_first_entry(&ctlr->queue, struct spi_message, queue);
+	mesg = list_first_entry(&ctlr->queue, struct spi_message, queue);
+	ctlr->cur_msg = mesg;
 
-	list_del_init(&ctlr->cur_msg->queue);
+	list_del_init(&mesg->queue);
 	if (ctlr->busy)
 		was_busy = true;
 	else
@@ -1361,7 +1362,7 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 			if (ctlr->auto_runtime_pm)
 				pm_runtime_put(ctlr->dev.parent);
 
-			ctlr->cur_msg->status = ret;
+			mesg->status = ret;
 			spi_finalize_current_message(ctlr);
 
 			mutex_unlock(&ctlr->io_mutex);
@@ -1369,28 +1370,28 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 		}
 	}
 
-	trace_spi_message_start(ctlr->cur_msg);
+	trace_spi_message_start(mesg);
 
 	if (ctlr->prepare_message) {
-		ret = ctlr->prepare_message(ctlr, ctlr->cur_msg);
+		ret = ctlr->prepare_message(ctlr, mesg);
 		if (ret) {
 			dev_err(&ctlr->dev, "failed to prepare message: %d\n",
 				ret);
-			ctlr->cur_msg->status = ret;
+			mesg->status = ret;
 			spi_finalize_current_message(ctlr);
 			goto out;
 		}
 		ctlr->cur_msg_prepared = true;
 	}
 
-	ret = spi_map_msg(ctlr, ctlr->cur_msg);
+	ret = spi_map_msg(ctlr, mesg);
 	if (ret) {
-		ctlr->cur_msg->status = ret;
+		mesg->status = ret;
 		spi_finalize_current_message(ctlr);
 		goto out;
 	}
 
-	ret = ctlr->transfer_one_message(ctlr, ctlr->cur_msg);
+	ret = ctlr->transfer_one_message(ctlr, mesg);
 	if (ret) {
 		dev_err(&ctlr->dev,
 			"failed to transfer one message from queue\n");
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH RFC v2 net-next 0/4] mv88e6xxx: setting 2500base-x mode for CPU/DSA port in dts
From: Vivien Didelot @ 2019-08-18 17:43 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Andrew Lunn, Florian Fainelli, Vladimir Oltean,
	Marek Behún
In-Reply-To: <20190817155025.GB9013@t480s.localdomain>

Hi Marek,

On Sat, 17 Aug 2019 15:50:25 -0400, Vivien Didelot <vivien.didelot@gmail.com> wrote:
> > here is another proposal for supporting setting 2500base-x mode for
> > CPU/DSA ports in device tree correctly.
> > 
> > The changes from v1 are that instead of adding .port_setup() and
> > .port_teardown() methods to the DSA operations struct we instead, for
> > CPU/DSA ports, call dsa_port_enable() from dsa_port_setup(), but only
> > after the port is registered (and required phylink/devlink structures
> > exist).
> > 
> > The .port_enable/.port_disable methods are now only meant to be used
> > for user ports, when the slave interface is brought up/down. This
> > proposal changes that in such a way that these methods are also called
> > for CPU/DSA ports, but only just after the switch is set up (and just
> > before the switch is tore down).
> > 
> > If we went this way, we would have to patch the other DSA drivers to
> > check if user port is being given in their respective .port_enable
> > and .port_disable implmentations.
> > 
> > What do you think about this?
> 
> This looks much better. Let me pass through all patches of this RFC so that
> I can include bits I would like to see in your next series.

I went ahead and sent a series which enables and disables all ports
in DSA, I hope you don't mind. You can now send a single patch on
top of it focusing on the 2500base-x issue with all the details.


Thank you,

	Vivien

^ permalink raw reply

* [PATCH net-next 6/6] net: dsa: mv88e6xxx: wrap SERDES IRQ in power function
From: Vivien Didelot @ 2019-08-18 17:35 UTC (permalink / raw)
  To: netdev; +Cc: marek.behun, davem, f.fainelli, andrew, Vivien Didelot
In-Reply-To: <20190818173548.19631-1-vivien.didelot@gmail.com>

Now that mv88e6xxx_serdes_power is only called after driver setup,
we can wrap the SERDES IRQ code directly within it for clarity.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index c72b3db75c54..d0bf98c10b2b 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2057,10 +2057,26 @@ static int mv88e6xxx_setup_egress_floods(struct mv88e6xxx_chip *chip, int port)
 static int mv88e6xxx_serdes_power(struct mv88e6xxx_chip *chip, int port,
 				  bool on)
 {
-	if (chip->info->ops->serdes_power)
-		return chip->info->ops->serdes_power(chip, port, on);
+	int err;
 
-	return 0;
+	if (!chip->info->ops->serdes_power)
+		return 0;
+
+	if (on) {
+		err = chip->info->ops->serdes_power(chip, port, true);
+		if (err)
+			return err;
+
+		if (chip->info->ops->serdes_irq_setup)
+			err = chip->info->ops->serdes_irq_setup(chip, port);
+	} else {
+		if (chip->info->ops->serdes_irq_free)
+			chip->info->ops->serdes_irq_free(chip, port);
+
+		err = chip->info->ops->serdes_power(chip, port, false);
+	}
+
+	return err;
 }
 
 static int mv88e6xxx_setup_upstream_port(struct mv88e6xxx_chip *chip, int port)
@@ -2258,12 +2274,7 @@ static int mv88e6xxx_port_enable(struct dsa_switch *ds, int port,
 	int err;
 
 	mv88e6xxx_reg_lock(chip);
-
 	err = mv88e6xxx_serdes_power(chip, port, true);
-
-	if (!err && chip->info->ops->serdes_irq_setup)
-		err = chip->info->ops->serdes_irq_setup(chip, port);
-
 	mv88e6xxx_reg_unlock(chip);
 
 	return err;
@@ -2274,13 +2285,8 @@ static void mv88e6xxx_port_disable(struct dsa_switch *ds, int port)
 	struct mv88e6xxx_chip *chip = ds->priv;
 
 	mv88e6xxx_reg_lock(chip);
-
-	if (chip->info->ops->serdes_irq_free)
-		chip->info->ops->serdes_irq_free(chip, port);
-
 	if (mv88e6xxx_serdes_power(chip, port, false))
 		dev_err(chip->dev, "failed to power off SERDES\n");
-
 	mv88e6xxx_reg_unlock(chip);
 }
 
-- 
2.22.0


^ permalink raw reply related

* [PATCH net-next 5/6] net: dsa: mv88e6xxx: enable SERDES after setup
From: Vivien Didelot @ 2019-08-18 17:35 UTC (permalink / raw)
  To: netdev; +Cc: marek.behun, davem, f.fainelli, andrew, Vivien Didelot
In-Reply-To: <20190818173548.19631-1-vivien.didelot@gmail.com>

SERDES is powered on for CPU and DSA ports and powered down for unused
ports at setup time. But now that DSA calls mv88e6xxx_port_enable
and mv88e6xxx_port_disable for all ports, the SERDES power can now
be handled after setup inconditionally for all ports.

Using the port enable and disable callbacks also have the benefit to
handle the SERDES IRQ for non user ports as well.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 35 ++++----------------------------
 1 file changed, 4 insertions(+), 31 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 27e1622bb03b..c72b3db75c54 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2151,16 +2151,6 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
 	if (err)
 		return err;
 
-	/* Enable the SERDES interface for DSA and CPU ports. Normal
-	 * ports SERDES are enabled when the port is enabled, thus
-	 * saving a bit of power.
-	 */
-	if ((dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))) {
-		err = mv88e6xxx_serdes_power(chip, port, true);
-		if (err)
-			return err;
-	}
-
 	/* Port Control 2: don't force a good FCS, set the maximum frame size to
 	 * 10240 bytes, disable 802.1q tags checking, don't discard tagged or
 	 * untagged frames on this port, do a destination address lookup on all
@@ -2267,9 +2257,6 @@ static int mv88e6xxx_port_enable(struct dsa_switch *ds, int port,
 	struct mv88e6xxx_chip *chip = ds->priv;
 	int err;
 
-	if (!dsa_is_user_port(ds, port))
-		return 0;
-
 	mv88e6xxx_reg_lock(chip);
 
 	err = mv88e6xxx_serdes_power(chip, port, true);
@@ -2286,9 +2273,6 @@ static void mv88e6xxx_port_disable(struct dsa_switch *ds, int port)
 {
 	struct mv88e6xxx_chip *chip = ds->priv;
 
-	if (!dsa_is_user_port(ds, port))
-		return;
-
 	mv88e6xxx_reg_lock(chip);
 
 	if (chip->info->ops->serdes_irq_free)
@@ -2461,27 +2445,16 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
 
 	/* Setup Switch Port Registers */
 	for (i = 0; i < mv88e6xxx_num_ports(chip); i++) {
+		if (dsa_is_unused_port(ds, i))
+			continue;
+
 		/* Prevent the use of an invalid port. */
-		if (mv88e6xxx_is_invalid_port(chip, i) &&
-		    !dsa_is_unused_port(ds, i)) {
+		if (mv88e6xxx_is_invalid_port(chip, i)) {
 			dev_err(chip->dev, "port %d is invalid\n", i);
 			err = -EINVAL;
 			goto unlock;
 		}
 
-		if (dsa_is_unused_port(ds, i)) {
-			err = mv88e6xxx_port_set_state(chip, i,
-						       BR_STATE_DISABLED);
-			if (err)
-				goto unlock;
-
-			err = mv88e6xxx_serdes_power(chip, i, false);
-			if (err)
-				goto unlock;
-
-			continue;
-		}
-
 		err = mv88e6xxx_setup_port(chip, i);
 		if (err)
 			goto unlock;
-- 
2.22.0


^ permalink raw reply related

* [PATCH net-next 4/6] net: dsa: mv88e6xxx: do not change STP state on port disabling
From: Vivien Didelot @ 2019-08-18 17:35 UTC (permalink / raw)
  To: netdev; +Cc: marek.behun, davem, f.fainelli, andrew, Vivien Didelot
In-Reply-To: <20190818173548.19631-1-vivien.didelot@gmail.com>

When disabling a port, that is not for the driver to decide what to
do with the STP state. This is already handled by the DSA layer.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 5e557545df6d..27e1622bb03b 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2291,9 +2291,6 @@ static void mv88e6xxx_port_disable(struct dsa_switch *ds, int port)
 
 	mv88e6xxx_reg_lock(chip);
 
-	if (mv88e6xxx_port_set_state(chip, port, BR_STATE_DISABLED))
-		dev_err(chip->dev, "failed to disable port\n");
-
 	if (chip->info->ops->serdes_irq_free)
 		chip->info->ops->serdes_irq_free(chip, port);
 
-- 
2.22.0


^ permalink raw reply related

* [PATCH net-next 3/6] net: dsa: enable and disable all ports
From: Vivien Didelot @ 2019-08-18 17:35 UTC (permalink / raw)
  To: netdev; +Cc: marek.behun, davem, f.fainelli, andrew, Vivien Didelot
In-Reply-To: <20190818173548.19631-1-vivien.didelot@gmail.com>

Call the .port_enable and .port_disable functions for all ports,
not only the user ports, so that drivers may optimize the power
consumption of all ports after a successful setup.

Unused ports are now disabled on setup. CPU and DSA ports are now
enabled on setup and disabled on teardown. User ports were already
enabled at slave creation and disabled at slave destruction.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 net/dsa/dsa2.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 405552ac4c08..8c4eccb0cfe6 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -264,6 +264,7 @@ static int dsa_port_setup(struct dsa_port *dp)
 
 	switch (dp->type) {
 	case DSA_PORT_TYPE_UNUSED:
+		dsa_port_disable(dp);
 		break;
 	case DSA_PORT_TYPE_CPU:
 		memset(dlp, 0, sizeof(*dlp));
@@ -274,6 +275,10 @@ static int dsa_port_setup(struct dsa_port *dp)
 			return err;
 
 		err = dsa_port_link_register_of(dp);
+		if (err)
+			return err;
+
+		err = dsa_port_enable(dp, NULL);
 		if (err)
 			return err;
 		break;
@@ -286,6 +291,10 @@ static int dsa_port_setup(struct dsa_port *dp)
 			return err;
 
 		err = dsa_port_link_register_of(dp);
+		if (err)
+			return err;
+
+		err = dsa_port_enable(dp, NULL);
 		if (err)
 			return err;
 		break;
@@ -317,11 +326,13 @@ static void dsa_port_teardown(struct dsa_port *dp)
 	case DSA_PORT_TYPE_UNUSED:
 		break;
 	case DSA_PORT_TYPE_CPU:
+		dsa_port_disable(dp);
 		dsa_tag_driver_put(dp->tag_ops);
 		devlink_port_unregister(dlp);
 		dsa_port_link_unregister_of(dp);
 		break;
 	case DSA_PORT_TYPE_DSA:
+		dsa_port_disable(dp);
 		devlink_port_unregister(dlp);
 		dsa_port_link_unregister_of(dp);
 		break;
-- 
2.22.0


^ permalink raw reply related

* [PATCH net-next 2/6] net: dsa: do not enable or disable non user ports
From: Vivien Didelot @ 2019-08-18 17:35 UTC (permalink / raw)
  To: netdev; +Cc: marek.behun, davem, f.fainelli, andrew, Vivien Didelot
In-Reply-To: <20190818173548.19631-1-vivien.didelot@gmail.com>

The .port_enable and .port_disable operations are currently only
called for user ports, hence assuming they have a slave device. In
preparation for using these operations for other port types as well,
simply guard all implementations against non user ports and return
directly in such case.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 drivers/net/dsa/b53/b53_common.c       | 10 +++++++++-
 drivers/net/dsa/bcm_sf2.c              |  6 ++++++
 drivers/net/dsa/lan9303-core.c         |  6 ++++++
 drivers/net/dsa/lantiq_gswip.c         |  6 ++++++
 drivers/net/dsa/microchip/ksz_common.c |  6 ++++++
 drivers/net/dsa/mt7530.c               |  6 ++++++
 drivers/net/dsa/mv88e6xxx/chip.c       |  6 ++++++
 7 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 907af62846ba..1639ea7b7dab 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -510,10 +510,15 @@ EXPORT_SYMBOL(b53_imp_vlan_setup);
 int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy)
 {
 	struct b53_device *dev = ds->priv;
-	unsigned int cpu_port = ds->ports[port].cpu_dp->index;
+	unsigned int cpu_port;
 	int ret = 0;
 	u16 pvlan;
 
+	if (!dsa_is_user_port(ds, port))
+		return 0;
+
+	cpu_port = ds->ports[port].cpu_dp->index;
+
 	if (dev->ops->irq_enable)
 		ret = dev->ops->irq_enable(dev, port);
 	if (ret)
@@ -547,6 +552,9 @@ void b53_disable_port(struct dsa_switch *ds, int port)
 	struct b53_device *dev = ds->priv;
 	u8 reg;
 
+	if (!dsa_is_user_port(ds, port))
+		return;
+
 	/* Disable Tx/Rx for the port */
 	b53_read8(dev, B53_CTRL_PAGE, B53_PORT_CTRL(port), &reg);
 	reg |= PORT_CTRL_RX_DISABLE | PORT_CTRL_TX_DISABLE;
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 49f99436018a..3d06262817bd 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -157,6 +157,9 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int port,
 	unsigned int i;
 	u32 reg;
 
+	if (!dsa_is_user_port(ds, port))
+		return 0;
+
 	/* Clear the memory power down */
 	reg = core_readl(priv, CORE_MEM_PSM_VDD_CTRL);
 	reg &= ~P_TXQ_PSM_VDD(port);
@@ -222,6 +225,9 @@ static void bcm_sf2_port_disable(struct dsa_switch *ds, int port)
 	struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
 	u32 reg;
 
+	if (!dsa_is_user_port(ds, port))
+		return;
+
 	/* Disable learning while in WoL mode */
 	if (priv->wol_ports_mask & (1 << port)) {
 		reg = core_readl(priv, CORE_DIS_LEARN);
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 7a2063e7737a..bbec86b9418e 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -1079,6 +1079,9 @@ static int lan9303_port_enable(struct dsa_switch *ds, int port,
 {
 	struct lan9303 *chip = ds->priv;
 
+	if (!dsa_is_user_port(ds, port))
+		return 0;
+
 	return lan9303_enable_processing_port(chip, port);
 }
 
@@ -1086,6 +1089,9 @@ static void lan9303_port_disable(struct dsa_switch *ds, int port)
 {
 	struct lan9303 *chip = ds->priv;
 
+	if (!dsa_is_user_port(ds, port))
+		return;
+
 	lan9303_disable_processing_port(chip, port);
 	lan9303_phy_write(ds, chip->phy_addr_base + port, MII_BMCR, BMCR_PDOWN);
 }
diff --git a/drivers/net/dsa/lantiq_gswip.c b/drivers/net/dsa/lantiq_gswip.c
index 2175ec13bb2c..a69c9b9878b7 100644
--- a/drivers/net/dsa/lantiq_gswip.c
+++ b/drivers/net/dsa/lantiq_gswip.c
@@ -642,6 +642,9 @@ static int gswip_port_enable(struct dsa_switch *ds, int port,
 	struct gswip_priv *priv = ds->priv;
 	int err;
 
+	if (!dsa_is_user_port(ds, port))
+		return 0;
+
 	if (!dsa_is_cpu_port(ds, port)) {
 		err = gswip_add_single_port_br(priv, port, true);
 		if (err)
@@ -678,6 +681,9 @@ static void gswip_port_disable(struct dsa_switch *ds, int port)
 {
 	struct gswip_priv *priv = ds->priv;
 
+	if (!dsa_is_user_port(ds, port))
+		return;
+
 	if (!dsa_is_cpu_port(ds, port)) {
 		gswip_mdio_mask(priv, GSWIP_MDIO_PHY_LINK_DOWN,
 				GSWIP_MDIO_PHY_LINK_MASK,
diff --git a/drivers/net/dsa/microchip/ksz_common.c b/drivers/net/dsa/microchip/ksz_common.c
index b45c7b972cec..b0b870f0c252 100644
--- a/drivers/net/dsa/microchip/ksz_common.c
+++ b/drivers/net/dsa/microchip/ksz_common.c
@@ -361,6 +361,9 @@ int ksz_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy)
 {
 	struct ksz_device *dev = ds->priv;
 
+	if (!dsa_is_user_port(ds, port))
+		return 0;
+
 	/* setup slave port */
 	dev->dev_ops->port_setup(dev, port, false);
 	if (dev->dev_ops->phy_setup)
@@ -378,6 +381,9 @@ void ksz_disable_port(struct dsa_switch *ds, int port)
 {
 	struct ksz_device *dev = ds->priv;
 
+	if (!dsa_is_user_port(ds, port))
+		return;
+
 	dev->on_ports &= ~(1 << port);
 	dev->live_ports &= ~(1 << port);
 
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index 3181e95586d6..c48e29486b10 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -726,6 +726,9 @@ mt7530_port_enable(struct dsa_switch *ds, int port,
 {
 	struct mt7530_priv *priv = ds->priv;
 
+	if (!dsa_is_user_port(ds, port))
+		return 0;
+
 	mutex_lock(&priv->reg_mutex);
 
 	/* Setup the MAC for the user port */
@@ -751,6 +754,9 @@ mt7530_port_disable(struct dsa_switch *ds, int port)
 {
 	struct mt7530_priv *priv = ds->priv;
 
+	if (!dsa_is_user_port(ds, port))
+		return;
+
 	mutex_lock(&priv->reg_mutex);
 
 	/* Clear up all port matrix which could be restored in the next
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 9b3ad22a5b98..5e557545df6d 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2267,6 +2267,9 @@ static int mv88e6xxx_port_enable(struct dsa_switch *ds, int port,
 	struct mv88e6xxx_chip *chip = ds->priv;
 	int err;
 
+	if (!dsa_is_user_port(ds, port))
+		return 0;
+
 	mv88e6xxx_reg_lock(chip);
 
 	err = mv88e6xxx_serdes_power(chip, port, true);
@@ -2283,6 +2286,9 @@ static void mv88e6xxx_port_disable(struct dsa_switch *ds, int port)
 {
 	struct mv88e6xxx_chip *chip = ds->priv;
 
+	if (!dsa_is_user_port(ds, port))
+		return;
+
 	mv88e6xxx_reg_lock(chip);
 
 	if (mv88e6xxx_port_set_state(chip, port, BR_STATE_DISABLED))
-- 
2.22.0


^ permalink raw reply related

* [PATCH net-next 1/6] net: dsa: use a single switch statement for port setup
From: Vivien Didelot @ 2019-08-18 17:35 UTC (permalink / raw)
  To: netdev; +Cc: marek.behun, davem, f.fainelli, andrew, Vivien Didelot
In-Reply-To: <20190818173548.19631-1-vivien.didelot@gmail.com>

It is currently difficult to read the different steps involved in the
setup and teardown of ports in the DSA code. Keep it simple with a
single switch statement for each port type: UNUSED, CPU, DSA, or USER.

Also no need to call devlink_port_unregister from within dsa_port_setup
as this step is inconditionally handled by dsa_port_teardown on error.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
---
 net/dsa/dsa2.c | 87 ++++++++++++++++++++++----------------------------
 1 file changed, 39 insertions(+), 48 deletions(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 3abd173ebacb..405552ac4c08 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -254,88 +254,79 @@ static void dsa_tree_teardown_default_cpu(struct dsa_switch_tree *dst)
 
 static int dsa_port_setup(struct dsa_port *dp)
 {
-	enum devlink_port_flavour flavour;
 	struct dsa_switch *ds = dp->ds;
 	struct dsa_switch_tree *dst = ds->dst;
-	int err = 0;
-
-	if (dp->type == DSA_PORT_TYPE_UNUSED)
-		return 0;
-
-	memset(&dp->devlink_port, 0, sizeof(dp->devlink_port));
-	dp->mac = of_get_mac_address(dp->dn);
-
-	switch (dp->type) {
-	case DSA_PORT_TYPE_CPU:
-		flavour = DEVLINK_PORT_FLAVOUR_CPU;
-		break;
-	case DSA_PORT_TYPE_DSA:
-		flavour = DEVLINK_PORT_FLAVOUR_DSA;
-		break;
-	case DSA_PORT_TYPE_USER: /* fall-through */
-	default:
-		flavour = DEVLINK_PORT_FLAVOUR_PHYSICAL;
-		break;
-	}
-
-	/* dp->index is used now as port_number. However
-	 * CPU and DSA ports should have separate numbering
-	 * independent from front panel port numbers.
-	 */
-	devlink_port_attrs_set(&dp->devlink_port, flavour,
-			       dp->index, false, 0,
-			       (const char *) &dst->index, sizeof(dst->index));
-	err = devlink_port_register(ds->devlink, &dp->devlink_port,
-				    dp->index);
-	if (err)
-		return err;
+	const unsigned char *id = (const unsigned char *)&dst->index;
+	const unsigned char len = sizeof(dst->index);
+	struct devlink_port *dlp = &dp->devlink_port;
+	struct devlink *dl = ds->devlink;
+	int err;
 
 	switch (dp->type) {
 	case DSA_PORT_TYPE_UNUSED:
 		break;
 	case DSA_PORT_TYPE_CPU:
+		memset(dlp, 0, sizeof(*dlp));
+		devlink_port_attrs_set(dlp, DEVLINK_PORT_FLAVOUR_CPU,
+				       dp->index, false, 0, id, len);
+		err = devlink_port_register(dl, dlp, dp->index);
+		if (err)
+			return err;
+
 		err = dsa_port_link_register_of(dp);
 		if (err)
-			dev_err(ds->dev, "failed to setup link for port %d.%d\n",
-				ds->index, dp->index);
+			return err;
 		break;
 	case DSA_PORT_TYPE_DSA:
+		memset(dlp, 0, sizeof(*dlp));
+		devlink_port_attrs_set(dlp, DEVLINK_PORT_FLAVOUR_DSA,
+				       dp->index, false, 0, id, len);
+		err = devlink_port_register(dl, dlp, dp->index);
+		if (err)
+			return err;
+
 		err = dsa_port_link_register_of(dp);
 		if (err)
-			dev_err(ds->dev, "failed to setup link for port %d.%d\n",
-				ds->index, dp->index);
+			return err;
 		break;
 	case DSA_PORT_TYPE_USER:
+		memset(dlp, 0, sizeof(*dlp));
+		devlink_port_attrs_set(dlp, DEVLINK_PORT_FLAVOUR_PHYSICAL,
+				       dp->index, false, 0, id, len);
+		err = devlink_port_register(dl, dlp, dp->index);
+		if (err)
+			return err;
+
+		dp->mac = of_get_mac_address(dp->dn);
 		err = dsa_slave_create(dp);
 		if (err)
-			dev_err(ds->dev, "failed to create slave for port %d.%d\n",
-				ds->index, dp->index);
-		else
-			devlink_port_type_eth_set(&dp->devlink_port, dp->slave);
+			return err;
+
+		devlink_port_type_eth_set(dlp, dp->slave);
 		break;
 	}
 
-	if (err)
-		devlink_port_unregister(&dp->devlink_port);
-
-	return err;
+	return 0;
 }
 
 static void dsa_port_teardown(struct dsa_port *dp)
 {
-	if (dp->type != DSA_PORT_TYPE_UNUSED)
-		devlink_port_unregister(&dp->devlink_port);
+	struct devlink_port *dlp = &dp->devlink_port;
 
 	switch (dp->type) {
 	case DSA_PORT_TYPE_UNUSED:
 		break;
 	case DSA_PORT_TYPE_CPU:
 		dsa_tag_driver_put(dp->tag_ops);
-		/* fall-through */
+		devlink_port_unregister(dlp);
+		dsa_port_link_unregister_of(dp);
+		break;
 	case DSA_PORT_TYPE_DSA:
+		devlink_port_unregister(dlp);
 		dsa_port_link_unregister_of(dp);
 		break;
 	case DSA_PORT_TYPE_USER:
+		devlink_port_unregister(dlp);
 		if (dp->slave) {
 			dsa_slave_destroy(dp->slave);
 			dp->slave = NULL;
-- 
2.22.0


^ permalink raw reply related

* [PATCH net-next 0/6] net: dsa: enable and disable all ports
From: Vivien Didelot @ 2019-08-18 17:35 UTC (permalink / raw)
  To: netdev; +Cc: marek.behun, davem, f.fainelli, andrew, Vivien Didelot

The DSA stack currently calls the .port_enable and .port_disable switch
callbacks for slave ports only. However, it is useful to call them for all
port types. For example this allows some drivers to delay the optimization
of power consumption after the switch is setup. This can also help reducing
the setup code of drivers a bit.

The first DSA core patches enable and disable all ports of a switch, regardless
their type. The last mv88e6xxx patches remove redundant code from the driver
setup and the said callbacks, now that they handle SERDES power for all ports.

Vivien Didelot (6):
  net: dsa: use a single switch statement for port setup
  net: dsa: do not enable or disable non user ports
  net: dsa: enable and disable all ports
  net: dsa: mv88e6xxx: do not change STP state on port disabling
  net: dsa: mv88e6xxx: enable SERDES after setup
  net: dsa: mv88e6xxx: wrap SERDES IRQ in power function

 drivers/net/dsa/b53/b53_common.c       | 10 ++-
 drivers/net/dsa/bcm_sf2.c              |  6 ++
 drivers/net/dsa/lan9303-core.c         |  6 ++
 drivers/net/dsa/lantiq_gswip.c         |  6 ++
 drivers/net/dsa/microchip/ksz_common.c |  6 ++
 drivers/net/dsa/mt7530.c               |  6 ++
 drivers/net/dsa/mv88e6xxx/chip.c       | 64 ++++++-----------
 net/dsa/dsa2.c                         | 98 +++++++++++++-------------
 8 files changed, 112 insertions(+), 90 deletions(-)

-- 
2.22.0

^ permalink raw reply

* Re: [PATCH net-next] net: openvswitch: Set OvS recirc_id from tc chain
From: Paul Blakey @ 2019-08-18 16:00 UTC (permalink / raw)
  To: Pravin B Shelar, netdev, David S. Miller, Justin Pettit,
	Simon Horman, Marcelo Ricardo Leitner, Vlad Buslov, Paul Blakey
  Cc: Jiri Pirko, Roi Dayan, Yossi Kuperman, Rony Efraim, Oz Shlomo

What do you guys say about the following diff on top of the last one?
Use static key, and also have OVS_DP_CMD_SET command probe/enable the feature.

This will allow userspace to probe the feature, and selectivly enable it via the
OVS_DP_CMD_SET command.

Thansk,
Paul.


---
 include/uapi/linux/openvswitch.h |  3 +++
 net/openvswitch/datapath.c       | 29 +++++++++++++++++++++++++----
 net/openvswitch/datapath.h       |  2 ++
 net/openvswitch/flow.c           |  6 ++++--
 4 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index f271f1e..1887a45 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -123,6 +123,9 @@ struct ovs_vport_stats {
 /* Allow datapath to associate multiple Netlink PIDs to each vport */
 #define OVS_DP_F_VPORT_PIDS	(1 << 1)
 
+/* Allow tc offload recirc sharing */
+#define OVS_DP_F_TC_RECIRC_SHARING	(1 << 2)
+
 /* Fixed logical ports. */
 #define OVSP_LOCAL      ((__u32)0)
 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 892287d..589b4f1 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1541,10 +1541,27 @@ static void ovs_dp_reset_user_features(struct sk_buff *skb, struct genl_info *in
 	dp->user_features = 0;
 }
 
-static void ovs_dp_change(struct datapath *dp, struct nlattr *a[])
+DEFINE_STATIC_KEY_FALSE(tc_recirc_sharing_support);
+
+static int ovs_dp_change(struct datapath *dp, struct nlattr *a[])
 {
+	u32 user_features;
+
 	if (a[OVS_DP_ATTR_USER_FEATURES])
-		dp->user_features = nla_get_u32(a[OVS_DP_ATTR_USER_FEATURES]);
+		user_features = nla_get_u32(a[OVS_DP_ATTR_USER_FEATURES]);
+
+#if !IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
+	if (user_features & OVS_DP_F_TC_RECIRC_SHARING)
+		return -EOPNOTSUPP;
+#endif
+	dp->user_features = user_features;
+
+	if (dp->user_features & OVS_DP_F_TC_RECIRC_SHARING)
+		static_branch_enable(&tc_recirc_sharing_support);
+	else
+		static_branch_disable(&tc_recirc_sharing_support);
+
+	return 0;
 }
 
 static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
@@ -1606,7 +1623,9 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	parms.port_no = OVSP_LOCAL;
 	parms.upcall_portids = a[OVS_DP_ATTR_UPCALL_PID];
 
-	ovs_dp_change(dp, a);
+	err = ovs_dp_change(dp, a);
+	if (err)
+		goto err_destroy_meters;
 
 	/* So far only local changes have been made, now need the lock. */
 	ovs_lock();
@@ -1732,7 +1751,9 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
 	if (IS_ERR(dp))
 		goto err_unlock_free;
 
-	ovs_dp_change(dp, info->attrs);
+	err = ovs_dp_change(dp, info->attrs);
+	if (err)
+		goto err_unlock_free;
 
 	err = ovs_dp_cmd_fill_info(dp, reply, info->snd_portid,
 				   info->snd_seq, 0, OVS_DP_CMD_SET);
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 751d34a..81e85dd 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -218,6 +218,8 @@ static inline struct datapath *get_dp(struct net *net, int dp_ifindex)
 extern struct notifier_block ovs_dp_device_notifier;
 extern struct genl_family dp_vport_genl_family;
 
+DECLARE_STATIC_KEY_FALSE(tc_recirc_sharing_support);
+
 void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key);
 void ovs_dp_detach_port(struct vport *);
 int ovs_dp_upcall(struct datapath *, struct sk_buff *,
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 0287ead..c0ac7c9 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -853,8 +853,10 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
 	key->mac_proto = res;
 
 #if IS_ENABLED(CONFIG_NET_TC_SKB_EXT)
-	tc_ext = skb_ext_find(skb, TC_SKB_EXT);
-	key->recirc_id = tc_ext ? tc_ext->chain : 0;
+	if (static_branch_unlikely(&tc_recirc_sharing_support)) {
+		tc_ext = skb_ext_find(skb, TC_SKB_EXT);
+		key->recirc_id = tc_ext ? tc_ext->chain : 0;
+	}
 #else
 	key->recirc_id = 0;
 #endif
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH net 1/1] bnx2x: Fix VF's VLAN reconfiguration in reload.
From: Manish Chopra @ 2019-08-18 14:25 UTC (permalink / raw)
  To: davem; +Cc: netdev, aelior, skalluru

Commit 04f05230c5c13 ("bnx2x: Remove configured vlans as
part of unload sequence."), introduced a regression in driver
that as a part of VF's reload flow, VLANs created on the VF
doesn't get re-configured in hardware as vlan metadata/info
was not getting cleared for the VFs which causes vlan PING to stop.

This patch clears the vlan metadata/info so that VLANs gets
re-configured back in the hardware in VF's reload flow and
PING/traffic continues for VLANs created over the VFs.

Fixes: 04f05230c5c13 ("bnx2x: Remove configured vlans as part of unload sequence.")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Sudarsana Kalluru <skalluru@marvell.com>
Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |  7 ++++---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |  2 ++
 .../net/ethernet/broadcom/bnx2x/bnx2x_main.c    | 17 ++++++++++++-----
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index e47ea92e2ae3..d10b421ed1f1 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -3057,12 +3057,13 @@ int bnx2x_nic_unload(struct bnx2x *bp, int unload_mode, bool keep_link)
 	/* if VF indicate to PF this function is going down (PF will delete sp
 	 * elements and clear initializations
 	 */
-	if (IS_VF(bp))
+	if (IS_VF(bp)) {
+		bnx2x_clear_vlan_info(bp);
 		bnx2x_vfpf_close_vf(bp);
-	else if (unload_mode != UNLOAD_RECOVERY)
+	} else if (unload_mode != UNLOAD_RECOVERY) {
 		/* if this is a normal/close unload need to clean up chip*/
 		bnx2x_chip_cleanup(bp, unload_mode, keep_link);
-	else {
+	} else {
 		/* Send the UNLOAD_REQUEST to the MCP */
 		bnx2x_send_unload_req(bp, unload_mode);
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
index c2f6e44e9a3f..8b08cb18e363 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h
@@ -425,6 +425,8 @@ void bnx2x_set_reset_global(struct bnx2x *bp);
 void bnx2x_disable_close_the_gate(struct bnx2x *bp);
 int bnx2x_init_hw_func_cnic(struct bnx2x *bp);
 
+void bnx2x_clear_vlan_info(struct bnx2x *bp);
+
 /**
  * bnx2x_sp_event - handle ramrods completion.
  *
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 2cc14db8f0ec..192ff8d5da32 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -8482,11 +8482,21 @@ int bnx2x_set_vlan_one(struct bnx2x *bp, u16 vlan,
 	return rc;
 }
 
+void bnx2x_clear_vlan_info(struct bnx2x *bp)
+{
+	struct bnx2x_vlan_entry *vlan;
+
+	/* Mark that hw forgot all entries */
+	list_for_each_entry(vlan, &bp->vlan_reg, link)
+		vlan->hw = false;
+
+	bp->vlan_cnt = 0;
+}
+
 static int bnx2x_del_all_vlans(struct bnx2x *bp)
 {
 	struct bnx2x_vlan_mac_obj *vlan_obj = &bp->sp_objs[0].vlan_obj;
 	unsigned long ramrod_flags = 0, vlan_flags = 0;
-	struct bnx2x_vlan_entry *vlan;
 	int rc;
 
 	__set_bit(RAMROD_COMP_WAIT, &ramrod_flags);
@@ -8495,10 +8505,7 @@ static int bnx2x_del_all_vlans(struct bnx2x *bp)
 	if (rc)
 		return rc;
 
-	/* Mark that hw forgot all entries */
-	list_for_each_entry(vlan, &bp->vlan_reg, link)
-		vlan->hw = false;
-	bp->vlan_cnt = 0;
+	bnx2x_clear_vlan_info(bp);
 
 	return 0;
 }
-- 
2.18.1


^ permalink raw reply related

* Re: kernel BUG at include/linux/skbuff.h:LINE! (2)
From: Dmitry Vyukov @ 2019-08-18 14:13 UTC (permalink / raw)
  To: Xin Long
  Cc: syzbot, davem, LKML, linux-sctp, Marcelo Ricardo Leitner,
	network dev, Neil Horman, syzkaller-bugs, Vlad Yasevich
In-Reply-To: <CADvbK_e+em+LiQOttfA9nckA4EPFuW_Q-cBmXx3S5pw5X+tQfw@mail.gmail.com>

On Sun, Aug 18, 2019 at 7:07 AM Xin Long <lucien.xin@gmail.com> wrote:
>
> On Sat, Aug 17, 2019 at 2:38 AM syzbot
> <syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    459c5fb4 Merge branch 'mscc-PTP-support'
> > git tree:       net-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13f2d33c600000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=d4cf1ffb87d590d7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=eb349eeee854e389c36d
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=111849e2600000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1442c25a600000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com
> >
> > ------------[ cut here ]------------
> > kernel BUG at include/linux/skbuff.h:2225!
> > invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> > CPU: 0 PID: 9030 Comm: syz-executor649 Not tainted 5.3.0-rc3+ #134
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
> > RIP: 0010:__skb_pull include/linux/skbuff.h:2222 [inline]
> > RIP: 0010:skb_pull_inline include/linux/skbuff.h:2231 [inline]
> > RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
> > Code: 9d c8 00 00 00 49 89 dc 49 89 9d c8 00 00 00 e8 9c e5 dd fb 4c 89 e0
> > 5b 41 5c 41 5d 41 5e 5d c3 45 31 e4 eb ea e8 86 e5 dd fb <0f> 0b e8 df 13
> > 18 fc e9 44 ff ff ff e8 d5 13 18 fc eb 8a e8 ee 13
> > RSP: 0018:ffff88808ac96e10 EFLAGS: 00010293
> > RAX: ffff88809c546000 RBX: 0000000000000004 RCX: ffffffff8594a3a6
> > RDX: 0000000000000000 RSI: ffffffff8594a3fa RDI: 0000000000000004
> > RBP: ffff88808ac96e30 R08: ffff88809c546000 R09: fffffbfff14a8f4f
> > R10: fffffbfff14a8f4e R11: ffffffff8a547a77 R12: 0000000095e28bcc
> > R13: ffff88808ac97478 R14: 00000000ffff8880 R15: ffff88808ac97478
> > FS:  0000555556549880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000020000100 CR3: 0000000089c3c000 CR4: 00000000001406f0
> > Call Trace:
> >   sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202
> >   sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385
> >   sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80
> >   sctp_rcv+0x2807/0x3590 net/sctp/input.c:256
> >   sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049
> >   ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
> >   ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
> >   NF_HOOK include/linux/netfilter.h:305 [inline]
> >   NF_HOOK include/linux/netfilter.h:299 [inline]
> >   ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
> >   dst_input include/net/dst.h:442 [inline]
> >   ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84
> Looks skb_list_del_init() should be called in ip6_sublist_rcv_finish,
> as does in ip_sublist_rcv_finish().

This was recently introduced, right? Only in net-next and linux-next.
Otherwise, is it a remote DoS? If so and if it's present in any
releases, may need a CVE.

^ permalink raw reply

* Re: kernel BUG at include/linux/skbuff.h:LINE! (2)
From: Xin Long @ 2019-08-18 14:06 UTC (permalink / raw)
  To: syzbot
  Cc: davem, LKML, linux-sctp, Marcelo Ricardo Leitner, network dev,
	Neil Horman, syzkaller-bugs, Vlad Yasevich
In-Reply-To: <0000000000008182a50590404a02@google.com>

On Sat, Aug 17, 2019 at 2:38 AM syzbot
<syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    459c5fb4 Merge branch 'mscc-PTP-support'
> git tree:       net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=13f2d33c600000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=d4cf1ffb87d590d7
> dashboard link: https://syzkaller.appspot.com/bug?extid=eb349eeee854e389c36d
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=111849e2600000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1442c25a600000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com
>
> ------------[ cut here ]------------
> kernel BUG at include/linux/skbuff.h:2225!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 9030 Comm: syz-executor649 Not tainted 5.3.0-rc3+ #134
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
> RIP: 0010:__skb_pull include/linux/skbuff.h:2222 [inline]
> RIP: 0010:skb_pull_inline include/linux/skbuff.h:2231 [inline]
> RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
> Code: 9d c8 00 00 00 49 89 dc 49 89 9d c8 00 00 00 e8 9c e5 dd fb 4c 89 e0
> 5b 41 5c 41 5d 41 5e 5d c3 45 31 e4 eb ea e8 86 e5 dd fb <0f> 0b e8 df 13
> 18 fc e9 44 ff ff ff e8 d5 13 18 fc eb 8a e8 ee 13
> RSP: 0018:ffff88808ac96e10 EFLAGS: 00010293
> RAX: ffff88809c546000 RBX: 0000000000000004 RCX: ffffffff8594a3a6
> RDX: 0000000000000000 RSI: ffffffff8594a3fa RDI: 0000000000000004
> RBP: ffff88808ac96e30 R08: ffff88809c546000 R09: fffffbfff14a8f4f
> R10: fffffbfff14a8f4e R11: ffffffff8a547a77 R12: 0000000095e28bcc
> R13: ffff88808ac97478 R14: 00000000ffff8880 R15: ffff88808ac97478
> FS:  0000555556549880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020000100 CR3: 0000000089c3c000 CR4: 00000000001406f0
> Call Trace:
>   sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202
>   sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385
>   sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80
>   sctp_rcv+0x2807/0x3590 net/sctp/input.c:256
>   sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049
>   ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
>   ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
>   NF_HOOK include/linux/netfilter.h:305 [inline]
>   NF_HOOK include/linux/netfilter.h:299 [inline]
>   ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
>   dst_input include/net/dst.h:442 [inline]
>   ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84
Looks skb_list_del_init() should be called in ip6_sublist_rcv_finish,
as does in ip_sublist_rcv_finish().

>   ip6_list_rcv_finish net/ipv6/ip6_input.c:118 [inline]
>   ip6_sublist_rcv+0x80c/0xcf0 net/ipv6/ip6_input.c:282
>   ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:316
>   __netif_receive_skb_list_ptype net/core/dev.c:5049 [inline]
>   __netif_receive_skb_list_core+0x5fc/0x9d0 net/core/dev.c:5097
>   __netif_receive_skb_list net/core/dev.c:5149 [inline]
>   netif_receive_skb_list_internal+0x7eb/0xe60 net/core/dev.c:5244
>   gro_normal_list.part.0+0x1e/0xb0 net/core/dev.c:5757
>   gro_normal_list net/core/dev.c:5755 [inline]
>   gro_normal_one net/core/dev.c:5769 [inline]
>   napi_frags_finish net/core/dev.c:5782 [inline]
>   napi_gro_frags+0xa6a/0xea0 net/core/dev.c:5855
>   tun_get_user+0x2e98/0x3fa0 drivers/net/tun.c:1974
>   tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2020
>   call_write_iter include/linux/fs.h:1870 [inline]
>   do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693
>   do_iter_write fs/read_write.c:970 [inline]
>   do_iter_write+0x184/0x610 fs/read_write.c:951
>   vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
>   do_writev+0x15b/0x330 fs/read_write.c:1058
>   __do_sys_writev fs/read_write.c:1131 [inline]
>   __se_sys_writev fs/read_write.c:1128 [inline]
>   __x64_sys_writev+0x75/0xb0 fs/read_write.c:1128
>   do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x441b10
> Code: 05 48 3d 01 f0 ff ff 0f 83 5d 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
> 00 66 90 83 3d 01 95 29 00 00 75 14 b8 14 00 00 00 0f 05 <48> 3d 01 f0 ff
> ff 0f 83 34 09 fc ff c3 48 83 ec 08 e8 ba 2b 00 00
> RSP: 002b:00007ffe63706b88 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
> RAX: ffffffffffffffda RBX: 00007ffe63706ba0 RCX: 0000000000441b10
> RDX: 0000000000000001 RSI: 00007ffe63706bd0 RDI: 00000000000000f0
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000004
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000122cb
> R13: 0000000000402960 R14: 0000000000000000 R15: 0000000000000000
> Modules linked in:
> ---[ end trace c37566c1c02066db ]---
> RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
> RIP: 0010:__skb_pull include/linux/skbuff.h:2222 [inline]
> RIP: 0010:skb_pull_inline include/linux/skbuff.h:2231 [inline]
> RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
> Code: 9d c8 00 00 00 49 89 dc 49 89 9d c8 00 00 00 e8 9c e5 dd fb 4c 89 e0
> 5b 41 5c 41 5d 41 5e 5d c3 45 31 e4 eb ea e8 86 e5 dd fb <0f> 0b e8 df 13
> 18 fc e9 44 ff ff ff e8 d5 13 18 fc eb 8a e8 ee 13
> RSP: 0018:ffff88808ac96e10 EFLAGS: 00010293
> RAX: ffff88809c546000 RBX: 0000000000000004 RCX: ffffffff8594a3a6
> RDX: 0000000000000000 RSI: ffffffff8594a3fa RDI: 0000000000000004
> RBP: ffff88808ac96e30 R08: ffff88809c546000 R09: fffffbfff14a8f4f
> R10: fffffbfff14a8f4e R11: ffffffff8a547a77 R12: 0000000095e28bcc
> R13: ffff88808ac97478 R14: 00000000ffff8880 R15: ffff88808ac97478
> FS:  0000555556549880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020000100 CR3: 0000000089c3c000 CR4: 00000000001406f0
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH 1/3] nvmem: mxs-ocotp: update MODULE_AUTHOR() email address
From: Srinivas Kandagatla @ 2019-08-18  9:08 UTC (permalink / raw)
  To: Stefan Wahren, Jean Delvare, Guenter Roeck, David S. Miller,
	Shawn Guo, Sascha Hauer, Pengutronix Kernel Team, Fabio Estevam,
	NXP Linux Team
  Cc: linux-hwmon, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <1565720249-6549-1-git-send-email-wahrenst@gmx.net>



On 13/08/2019 19:17, Stefan Wahren wrote:
> The email address listed in MODULE_AUTHOR() will be disabled in the
> near future. Replace it with my private one.
> 
> Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
> =2D--
>   drivers/nvmem/mxs-ocotp.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 

Applied thanks.

--srini

^ permalink raw reply

* Question on xen-netfront code to fix a potential ring buffer corruption
From: Dongli Zhang @ 2019-08-18  8:31 UTC (permalink / raw)
  To: xen-devel; +Cc: netdev, jgross, Joe Jin

Hi,

Would you please help confirm why the condition at line 908 is ">="?

In my opinion, we would only hit "skb_shinfo(skb)->nr_frag == MAX_SKB_FRAGS" at
line 908.

890 static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
891                                   struct sk_buff *skb,
892                                   struct sk_buff_head *list)
893 {
894         RING_IDX cons = queue->rx.rsp_cons;
895         struct sk_buff *nskb;
896 
897         while ((nskb = __skb_dequeue(list))) {
898                 struct xen_netif_rx_response *rx =
899                         RING_GET_RESPONSE(&queue->rx, ++cons);
900                 skb_frag_t *nfrag = &skb_shinfo(nskb)->frags[0];
901 
902                 if (skb_shinfo(skb)->nr_frags == MAX_SKB_FRAGS) {
903                         unsigned int pull_to = NETFRONT_SKB_CB(skb)->pull_to;
904 
905                         BUG_ON(pull_to < skb_headlen(skb));
906                         __pskb_pull_tail(skb, pull_to - skb_headlen(skb));
907                 }
908                 if (unlikely(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS)) {
909                         queue->rx.rsp_cons = ++cons;
910                         kfree_skb(nskb);
911                         return ~0U;
912                 }
913 
914                 skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
915                                 skb_frag_page(nfrag),
916                                 rx->offset, rx->status, PAGE_SIZE);
917 
918                 skb_shinfo(nskb)->nr_frags = 0;
919                 kfree_skb(nskb);
920         }
921 
922         return cons;
923 }


The reason that I ask about this is because I am considering below patch to
avoid a potential xen-netfront ring buffer corruption.

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 8d33970..48a2162 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -906,7 +906,7 @@ static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
                        __pskb_pull_tail(skb, pull_to - skb_headlen(skb));
                }
                if (unlikely(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS)) {
-                       queue->rx.rsp_cons = ++cons;
+                       queue->rx.rsp_cons = cons + skb_queue_len(list) + 1;
                        kfree_skb(nskb);
                        return ~0U;
                }


If there is skb left in list when we return ~0U, queue->rx.rsp_cons may be set
incorrectly.

While I am trying to make up a case that would hit the corruption, I could not
explain why (unlikely(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS)), but not
just "==". Perhaps __pskb_pull_tail() may fail although pull_to is less than
skb_headlen(skb).

Thank you very much!

Dongli Zhang

^ permalink raw reply related

* Re: [PATCH v2] virtio-net: lower min ring num_free for efficiency
From: Michael S. Tsirkin @ 2019-08-18  7:10 UTC (permalink / raw)
  To: ? jiang
  Cc: jasowang@redhat.com, davem@davemloft.net, ast@kernel.org,
	daniel@iogearbox.net, jakub.kicinski@netronome.com,
	hawk@kernel.org, john.fastabend@gmail.com, kafai@fb.com,
	songliubraving@fb.com, yhs@fb.com,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, xdp-newbies@vger.kernel.org,
	bpf@vger.kernel.org, jiangran.jr@alibaba-inc.com
In-Reply-To: <BYAPR14MB32058F4B2AD162F5421BB9B4A6AC0@BYAPR14MB3205.namprd14.prod.outlook.com>

On Thu, Aug 15, 2019 at 09:42:40AM +0000, ? jiang wrote:
> This change lowers ring buffer reclaim threshold from 1/2*queue to budget
> for better performance. According to our test with qemu + dpdk, packet
> dropping happens when the guest is not able to provide free buffer in
> avail ring timely with default 1/2*queue. The value in the patch has been
> tested and does show better performance.
> 
> Test setup: iperf3 to generate packets to guest (total 30mins, pps 400k, UDP)
> avg packets drop before: 2842
> avg packets drop after: 360(-87.3%)
> 
> Signed-off-by: jiangkidd <jiangkidd@hotmail.com>

To add to that:

Further, current code suffers from a starvation problem: the amount of
work done by try_fill_recv is not bounded by the budget parameter, thus
(with large queues) once in a while userspace gets blocked for a long
time while queue is being refilled. Trigger refills earlier to make sure
the amount of work to do is limited.

With this addition to the log:

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 0d4115c9e20b..bc08be7925eb 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1331,7 +1331,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>  		}
>  	}
>  
> -	if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) {
> +	if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) {
>  		if (!try_fill_recv(vi, rq, GFP_ATOMIC))
>  			schedule_delayed_work(&vi->refill, 0);
>  	}
> -- 
> 2.11.0

^ permalink raw reply

* Re: [PATCH bpf-next v4 0/4] bpf: support cloning sk storage on accept()
From: Daniel Borkmann @ 2019-08-17 21:32 UTC (permalink / raw)
  To: Stanislav Fomichev, netdev, bpf
  Cc: davem, ast, Martin KaFai Lau, Yonghong Song
In-Reply-To: <20190814173751.31806-1-sdf@google.com>

On 8/14/19 7:37 PM, Stanislav Fomichev wrote:
> Currently there is no way to propagate sk storage from the listener
> socket to a newly accepted one. Consider the following use case:
> 
>          fd = socket();
>          setsockopt(fd, SOL_IP, IP_TOS,...);
>          /* ^^^ setsockopt BPF program triggers here and saves something
>           * into sk storage of the listener.
>           */
>          listen(fd, ...);
>          while (client = accept(fd)) {
>                  /* At this point all association between listener
>                   * socket and newly accepted one is gone. New
>                   * socket will not have any sk storage attached.
>                   */
>          }
> 
> Let's add new BPF_F_CLONE flag that can be specified when creating
> a socket storage map. This new flag indicates that map contents
> should be cloned when the socket is cloned.
> 
> v4:
> * drop 'goto err' in bpf_sk_storage_clone (Yonghong Song)
> * add comment about race with bpf_sk_storage_map_free to the
>    bpf_sk_storage_clone side as well (Daniel Borkmann)
> 
> v3:
> * make sure BPF_F_NO_PREALLOC is always present when creating
>    a map (Martin KaFai Lau)
> * don't call bpf_sk_storage_free explicitly, rely on
>    sk_free_unlock_clone to do the cleanup (Martin KaFai Lau)
> 
> v2:
> * remove spinlocks around selem_link_map/sk (Martin KaFai Lau)
> * BPF_F_CLONE on a map, not selem (Martin KaFai Lau)
> * hold a map while cloning (Martin KaFai Lau)
> * use BTF maps in selftests (Yonghong Song)
> * do proper cleanup selftests; don't call close(-1) (Yonghong Song)
> * export bpf_map_inc_not_zero
> 
> Cc: Martin KaFai Lau <kafai@fb.com>
> Cc: Yonghong Song <yhs@fb.com>
> 
> Stanislav Fomichev (4):
>    bpf: export bpf_map_inc_not_zero
>    bpf: support cloning sk storage on accept()
>    bpf: sync bpf.h to tools/
>    selftests/bpf: add sockopt clone/inheritance test
> 
>   include/linux/bpf.h                           |   2 +
>   include/net/bpf_sk_storage.h                  |  10 +
>   include/uapi/linux/bpf.h                      |   3 +
>   kernel/bpf/syscall.c                          |  16 +-
>   net/core/bpf_sk_storage.c                     | 104 ++++++-
>   net/core/sock.c                               |   9 +-
>   tools/include/uapi/linux/bpf.h                |   3 +
>   tools/testing/selftests/bpf/.gitignore        |   1 +
>   tools/testing/selftests/bpf/Makefile          |   3 +-
>   .../selftests/bpf/progs/sockopt_inherit.c     |  97 +++++++
>   .../selftests/bpf/test_sockopt_inherit.c      | 253 ++++++++++++++++++
>   11 files changed, 491 insertions(+), 10 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/progs/sockopt_inherit.c
>   create mode 100644 tools/testing/selftests/bpf/test_sockopt_inherit.c
> 

Applied, thanks!

^ permalink raw reply

* Re: [bpf-next,v2] selftests/bpf: fix race in test_tcp_rtt test
From: Daniel Borkmann @ 2019-08-17 21:31 UTC (permalink / raw)
  To: Petar Penkov, netdev, bpf; +Cc: davem, ast, sdf, Petar Penkov
In-Reply-To: <20190816170825.22500-1-ppenkov.kernel@gmail.com>

On 8/16/19 7:08 PM, Petar Penkov wrote:
> From: Petar Penkov <ppenkov@google.com>
> 
> There is a race in this test between receiving the ACK for the
> single-byte packet sent in the test, and reading the values from the
> map.
> 
> This patch fixes this by having the client wait until there are no more
> unacknowledged packets.
> 
> Before:
> for i in {1..1000}; do ../net/in_netns.sh ./test_tcp_rtt; \
> done | grep -c PASSED
> < trimmed error messages >
> 993
> 
> After:
> for i in {1..10000}; do ../net/in_netns.sh ./test_tcp_rtt; \
> done | grep -c PASSED
> 10000
> 
> Fixes: b55873984dab ("selftests/bpf: test BPF_SOCK_OPS_RTT_CB")
> Signed-off-by: Petar Penkov <ppenkov@google.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: relicense bpf_helpers.h and bpf_endian.h
From: Daniel Borkmann @ 2019-08-17 21:30 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev
  Cc: andrii.nakryiko, kernel-team, Michael Holzheu, Naveen N . Rao,
	David S . Miller, Michal Rostecki, John Fastabend, Sargun Dhillon
In-Reply-To: <20190816054543.2215626-1-andriin@fb.com>

On 8/16/19 7:45 AM, Andrii Nakryiko wrote:
> bpf_helpers.h and bpf_endian.h contain useful macros and BPF helper
> definitions essential to almost every BPF program. Which makes them
> useful not just for selftests. To be able to expose them as part of
> libbpf, though, we need them to be dual-licensed as LGPL-2.1 OR
> BSD-2-Clause. This patch updates licensing of those two files.
> 
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Hechao Li <hechaol@fb.com>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
> Acked-by: Andrey Ignatov <rdna@fb.com>
> Acked-by: Yonghong Song <yhs@fb.com>
> Acked-by: Lawrence Brakmo <brakmo@fb.com>
> Acked-by: Adam Barth <arb@fb.com>
> Acked-by: Roman Gushchin <guro@fb.com>
> Acked-by: Josef Bacik <jbacik@fb.com>
> Acked-by: Joe Stringer <joe@wand.net.nz>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Acked-by: David Ahern <dsahern@gmail.com>
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
> Acked-by: Lorenz Bauer <lmb@cloudflare.com>
> Acked-by: Adrian Ratiu <adrian.ratiu@collabora.com>
> Acked-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
> Acked-by: Willem de Bruijn <willemb@google.com>
> Acked-by: Petar Penkov <ppenkov@google.com>
> Acked-by: Teng Qin <palmtenor@gmail.com>
> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Michal Rostecki <mrostecki@opensuse.org>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Sargun Dhillon <sargun@sargun.me>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH bpf-next v2] net: Don't call XDP_SETUP_PROG when nothing is changed
From: Daniel Borkmann @ 2019-08-17 21:29 UTC (permalink / raw)
  To: Maxim Mikityanskiy, Alexei Starovoitov, Jakub Kicinski
  Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, David S. Miller,
	Björn Töpel, Saeed Mahameed, Jesper Dangaard Brouer,
	John Fastabend, Martin KaFai Lau, Song Liu, Yonghong Song
In-Reply-To: <20190814143352.3759-1-maximmi@mellanox.com>

On 8/14/19 4:34 PM, Maxim Mikityanskiy wrote:
> Don't uninstall an XDP program when none is installed, and don't install
> an XDP program that has the same ID as the one already installed.
> 
> dev_change_xdp_fd doesn't perform any checks in case it uninstalls an
> XDP program. It means that the driver's ndo_bpf can be called with
> XDP_SETUP_PROG asking to set it to NULL even if it's already NULL. This
> case happens if the user runs `ip link set eth0 xdp off` when there is
> no XDP program attached.
> 
> The symmetrical case is possible when the user tries to set the program
> that is already set.
> 
> The drivers typically perform some heavy operations on XDP_SETUP_PROG,
> so they all have to handle these cases internally to return early if
> they happen. This patch puts this check into the kernel code, so that
> all drivers will benefit from it.
> 
> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>

Applied, thanks!

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox