* Re: [PATCH v2 2/5] spi: spi-fsl-dspi: Exit the ISR with IRQ_NONE when it's not ours
From: Mark Brown @ 2019-08-23 10:59 UTC (permalink / raw)
To: Vladimir Oltean; +Cc: linux-spi, lkml, devicetree, netdev
In-Reply-To: <20190823105044.GO23391@sirena.co.uk>
[-- Attachment #1: Type: text/plain, Size: 786 bytes --]
On Fri, Aug 23, 2019 at 11:50:44AM +0100, Mark Brown wrote:
> On Fri, Aug 23, 2019 at 01:30:27PM +0300, Vladimir Oltean wrote:
> > Did you see this?
> > https://lkml.org/lkml/2019/8/22/1542
> I'm not online enough to readily follow that link right now, I
> did apply another patch for a similar issue though. If that's
> a different version of the same change please don't do that,
> sending multiple conflicting versions of the same thing creates
> conflicts and makes everything harder to work with.
Oh, I guess this was due to there being an existing refactoring
in -next that meant the fix wouldn't apply directly. I sorted
that out now I think, but in general the same thing applies -
it's better to put fixes before anything else in the series,
it'll flag up when reviewing.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Applied "spi: spi-fsl-dspi: Exit the ISR with IRQ_NONE when it's not ours" to the spi tree
From: Mark Brown @ 2019-08-23 11:02 UTC (permalink / raw)
To: Vladimir Oltean
Cc: broonie, devicetree, linux-kernel, linux-spi, Mark Brown, netdev
In-Reply-To: <20190822211514.19288-3-olteanv@gmail.com>
The patch
spi: spi-fsl-dspi: Exit the ISR with IRQ_NONE when it's not ours
has been applied to the spi tree at
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-5.4
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
From 37b4100180641968056cb4e034cebc38338e8652 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <olteanv@gmail.com>
Date: Fri, 23 Aug 2019 00:15:11 +0300
Subject: [PATCH] spi: spi-fsl-dspi: Exit the ISR with IRQ_NONE when it's not
ours
The DSPI interrupt can be shared between two controllers at least on the
LX2160A. In that case, the driver for one controller might misbehave and
consume the other's interrupt. Fix this by actually checking if any of
the bits in the status register have been asserted.
Fixes: 13aed2392741 ("spi: spi-fsl-dspi: use IRQF_SHARED mode to request IRQ")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20190822211514.19288-3-olteanv@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
---
drivers/spi/spi-fsl-dspi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index c90db7db4121..6ef2279a3699 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -659,7 +659,7 @@ static irqreturn_t dspi_interrupt(int irq, void *dev_id)
regmap_write(dspi->regmap, SPI_SR, spi_sr);
if (!(spi_sr & (SPI_SR_EOQF | SPI_SR_TCFQF)))
- return IRQ_HANDLED;
+ return IRQ_NONE;
/* Get transfer counter (in number of SPI transfers). It was
* reset to 0 when transfer(s) were started.
--
2.20.1
^ permalink raw reply related
* Applied "spi: spi-fsl-dspi: Use poll mode in case the platform IRQ is missing" to the spi tree
From: Mark Brown @ 2019-08-23 11:02 UTC (permalink / raw)
To: Vladimir Oltean
Cc: broonie, devicetree, linux-kernel, linux-spi, Mark Brown, netdev
In-Reply-To: <20190822211514.19288-5-olteanv@gmail.com>
The patch
spi: spi-fsl-dspi: Use poll mode in case the platform IRQ is missing
has been applied to the spi tree at
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-5.4
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
From c55be305915974db160ce6472722ff74f45b8d4e Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <olteanv@gmail.com>
Date: Fri, 23 Aug 2019 00:15:13 +0300
Subject: [PATCH] spi: spi-fsl-dspi: Use poll mode in case the platform IRQ is
missing
On platforms like LS1021A which use TCFQ mode, an interrupt needs to be
processed after each byte is TXed/RXed. I tried to make the DSPI
implementation on this SoC operate in other, more efficient modes (EOQ,
DMA) but it looks like it simply isn't possible.
Therefore allow the driver to operate in poll mode, to ease a bit of
this absurd amount of IRQ load generated in TCFQ mode. Doing so reduces
both the net time it takes to transmit a SPI message, as well as the
inter-frame jitter that occurs while doing so.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20190822211514.19288-5-olteanv@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
---
drivers/spi/spi-fsl-dspi.c | 87 ++++++++++++++++++++++++++++----------
1 file changed, 64 insertions(+), 23 deletions(-)
diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 6d2c7984ab0e..77db43f1290f 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -647,19 +647,12 @@ static void dspi_eoq_read(struct fsl_dspi *dspi)
dspi_push_rx(dspi, fifo_read(dspi));
}
-static irqreturn_t dspi_interrupt(int irq, void *dev_id)
+static int dspi_rxtx(struct fsl_dspi *dspi)
{
- struct fsl_dspi *dspi = (struct fsl_dspi *)dev_id;
struct spi_message *msg = dspi->cur_msg;
enum dspi_trans_mode trans_mode;
- u32 spi_sr, spi_tcr;
u16 spi_tcnt;
-
- regmap_read(dspi->regmap, SPI_SR, &spi_sr);
- regmap_write(dspi->regmap, SPI_SR, spi_sr);
-
- if (!(spi_sr & (SPI_SR_EOQF | SPI_SR_TCFQF)))
- return IRQ_NONE;
+ u32 spi_tcr;
/* Get transfer counter (in number of SPI transfers). It was
* reset to 0 when transfer(s) were started.
@@ -675,17 +668,55 @@ static irqreturn_t dspi_interrupt(int irq, void *dev_id)
else if (trans_mode == DSPI_TCFQ_MODE)
dspi_tcfq_read(dspi);
- if (!dspi->len) {
- dspi->waitflags = 1;
- wake_up_interruptible(&dspi->waitq);
- return IRQ_HANDLED;
- }
+ if (!dspi->len)
+ /* Success! */
+ return 0;
if (trans_mode == DSPI_EOQ_MODE)
dspi_eoq_write(dspi);
else if (trans_mode == DSPI_TCFQ_MODE)
dspi_tcfq_write(dspi);
+ return -EINPROGRESS;
+}
+
+static int dspi_poll(struct fsl_dspi *dspi)
+{
+ int tries = 1000;
+ u32 spi_sr;
+
+ do {
+ regmap_read(dspi->regmap, SPI_SR, &spi_sr);
+ regmap_write(dspi->regmap, SPI_SR, spi_sr);
+
+ if (spi_sr & (SPI_SR_EOQF | SPI_SR_TCFQF))
+ break;
+ } while (--tries);
+
+ if (!tries)
+ return -ETIMEDOUT;
+
+ return dspi_rxtx(dspi);
+}
+
+static irqreturn_t dspi_interrupt(int irq, void *dev_id)
+{
+ struct fsl_dspi *dspi = (struct fsl_dspi *)dev_id;
+ u32 spi_sr;
+
+ regmap_read(dspi->regmap, SPI_SR, &spi_sr);
+ regmap_write(dspi->regmap, SPI_SR, spi_sr);
+
+ if (!(spi_sr & (SPI_SR_EOQF | SPI_SR_TCFQF)))
+ return IRQ_NONE;
+
+ dspi_rxtx(dspi);
+
+ if (!dspi->len) {
+ dspi->waitflags = 1;
+ wake_up_interruptible(&dspi->waitq);
+ }
+
return IRQ_HANDLED;
}
@@ -773,13 +804,18 @@ static int dspi_transfer_one_message(struct spi_controller *ctlr,
goto out;
}
- if (trans_mode != DSPI_DMA_MODE) {
- if (wait_event_interruptible(dspi->waitq,
- dspi->waitflags))
- dev_err(&dspi->pdev->dev,
- "wait transfer complete fail!\n");
+ if (!dspi->irq) {
+ do {
+ status = dspi_poll(dspi);
+ } while (status == -EINPROGRESS);
+ } else if (trans_mode != DSPI_DMA_MODE) {
+ status = wait_event_interruptible(dspi->waitq,
+ dspi->waitflags);
dspi->waitflags = 0;
}
+ if (status)
+ dev_err(&dspi->pdev->dev,
+ "Waiting for transfer to complete failed!\n");
if (transfer->delay_usecs)
udelay(transfer->delay_usecs);
@@ -1079,10 +1115,13 @@ static int dspi_probe(struct platform_device *pdev)
goto out_ctlr_put;
dspi_init(dspi);
+
dspi->irq = platform_get_irq(pdev, 0);
- if (dspi->irq < 0) {
- ret = dspi->irq;
- goto out_clk_put;
+ if (dspi->irq <= 0) {
+ dev_info(&pdev->dev,
+ "can't get platform irq, using poll mode\n");
+ dspi->irq = 0;
+ goto poll_mode;
}
ret = devm_request_irq(&pdev->dev, dspi->irq, dspi_interrupt,
@@ -1092,6 +1131,9 @@ static int dspi_probe(struct platform_device *pdev)
goto out_clk_put;
}
+ init_waitqueue_head(&dspi->waitq);
+
+poll_mode:
if (dspi->devtype_data->trans_mode == DSPI_DMA_MODE) {
ret = dspi_request_dma(dspi, res->start);
if (ret < 0) {
@@ -1103,7 +1145,6 @@ static int dspi_probe(struct platform_device *pdev)
ctlr->max_speed_hz =
clk_get_rate(dspi->clk) / dspi->devtype_data->max_clock_factor;
- init_waitqueue_head(&dspi->waitq);
platform_set_drvdata(pdev, ctlr);
ret = spi_register_controller(ctlr);
--
2.20.1
^ permalink raw reply related
* Applied "spi: spi-fsl-dspi: Remove impossible to reach error check" to the spi tree
From: Mark Brown @ 2019-08-23 11:02 UTC (permalink / raw)
To: Vladimir Oltean
Cc: broonie, devicetree, linux-kernel, linux-spi, Mark Brown, netdev
In-Reply-To: <20190822211514.19288-4-olteanv@gmail.com>
The patch
spi: spi-fsl-dspi: Remove impossible to reach error check
has been applied to the spi tree at
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-5.4
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
From 1eaeba70738e723be1e5787bdfd9a30f7471d730 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <olteanv@gmail.com>
Date: Fri, 23 Aug 2019 00:15:12 +0300
Subject: [PATCH] spi: spi-fsl-dspi: Remove impossible to reach error check
dspi->devtype_data is under the total control of the driver. Therefore,
a bad value is a driver bug and checking it at runtime (and during an
ISR, at that!) is pointless.
The second "else if" check is only for clarity (instead of a broader
"else") in case other transfer modes are added in the future. But the
printing is dead code and can be removed.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20190822211514.19288-4-olteanv@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
---
drivers/spi/spi-fsl-dspi.c | 24 ++++--------------------
1 file changed, 4 insertions(+), 20 deletions(-)
diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 6ef2279a3699..6d2c7984ab0e 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -670,18 +670,10 @@ static irqreturn_t dspi_interrupt(int irq, void *dev_id)
msg->actual_length += spi_tcnt * dspi->bytes_per_word;
trans_mode = dspi->devtype_data->trans_mode;
- switch (trans_mode) {
- case DSPI_EOQ_MODE:
+ if (trans_mode == DSPI_EOQ_MODE)
dspi_eoq_read(dspi);
- break;
- case DSPI_TCFQ_MODE:
+ else if (trans_mode == DSPI_TCFQ_MODE)
dspi_tcfq_read(dspi);
- break;
- default:
- dev_err(&dspi->pdev->dev, "unsupported trans_mode %u\n",
- trans_mode);
- return IRQ_HANDLED;
- }
if (!dspi->len) {
dspi->waitflags = 1;
@@ -689,18 +681,10 @@ static irqreturn_t dspi_interrupt(int irq, void *dev_id)
return IRQ_HANDLED;
}
- switch (trans_mode) {
- case DSPI_EOQ_MODE:
+ if (trans_mode == DSPI_EOQ_MODE)
dspi_eoq_write(dspi);
- break;
- case DSPI_TCFQ_MODE:
+ else if (trans_mode == DSPI_TCFQ_MODE)
dspi_tcfq_write(dspi);
- break;
- default:
- dev_err(&dspi->pdev->dev,
- "unsupported trans_mode %u\n",
- trans_mode);
- }
return IRQ_HANDLED;
}
--
2.20.1
^ permalink raw reply related
* Applied "spi: spi-fsl-dspi: Reduce indentation level in dspi_interrupt" to the spi tree
From: Mark Brown @ 2019-08-23 11:02 UTC (permalink / raw)
To: Vladimir Oltean
Cc: broonie, devicetree, linux-kernel, linux-spi, Mark Brown, netdev
In-Reply-To: <20190822211514.19288-2-olteanv@gmail.com>
The patch
spi: spi-fsl-dspi: Reduce indentation level in dspi_interrupt
has been applied to the spi tree at
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-5.4
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
From 12fb61a973935c63f2580b3b053017cc14b51f42 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <olteanv@gmail.com>
Date: Fri, 23 Aug 2019 00:15:10 +0300
Subject: [PATCH] spi: spi-fsl-dspi: Reduce indentation level in dspi_interrupt
If the entire function depends on the SPI status register having the
interrupt bits asserted, then just check it and exit early if those bits
aren't set (such as in the case of the shared IRQ being triggered for
the other peripheral). Cosmetic patch.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20190822211514.19288-2-olteanv@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
---
drivers/spi/spi-fsl-dspi.c | 79 +++++++++++++++++++-------------------
1 file changed, 40 insertions(+), 39 deletions(-)
diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 790cb02fc181..c90db7db4121 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -658,47 +658,48 @@ static irqreturn_t dspi_interrupt(int irq, void *dev_id)
regmap_read(dspi->regmap, SPI_SR, &spi_sr);
regmap_write(dspi->regmap, SPI_SR, spi_sr);
+ if (!(spi_sr & (SPI_SR_EOQF | SPI_SR_TCFQF)))
+ return IRQ_HANDLED;
+
+ /* Get transfer counter (in number of SPI transfers). It was
+ * reset to 0 when transfer(s) were started.
+ */
+ regmap_read(dspi->regmap, SPI_TCR, &spi_tcr);
+ spi_tcnt = SPI_TCR_GET_TCNT(spi_tcr);
+ /* Update total number of bytes that were transferred */
+ msg->actual_length += spi_tcnt * dspi->bytes_per_word;
+
+ trans_mode = dspi->devtype_data->trans_mode;
+ switch (trans_mode) {
+ case DSPI_EOQ_MODE:
+ dspi_eoq_read(dspi);
+ break;
+ case DSPI_TCFQ_MODE:
+ dspi_tcfq_read(dspi);
+ break;
+ default:
+ dev_err(&dspi->pdev->dev, "unsupported trans_mode %u\n",
+ trans_mode);
+ return IRQ_HANDLED;
+ }
- if (spi_sr & (SPI_SR_EOQF | SPI_SR_TCFQF)) {
- /* Get transfer counter (in number of SPI transfers). It was
- * reset to 0 when transfer(s) were started.
- */
- regmap_read(dspi->regmap, SPI_TCR, &spi_tcr);
- spi_tcnt = SPI_TCR_GET_TCNT(spi_tcr);
- /* Update total number of bytes that were transferred */
- msg->actual_length += spi_tcnt * dspi->bytes_per_word;
-
- trans_mode = dspi->devtype_data->trans_mode;
- switch (trans_mode) {
- case DSPI_EOQ_MODE:
- dspi_eoq_read(dspi);
- break;
- case DSPI_TCFQ_MODE:
- dspi_tcfq_read(dspi);
- break;
- default:
- dev_err(&dspi->pdev->dev, "unsupported trans_mode %u\n",
- trans_mode);
- return IRQ_HANDLED;
- }
+ if (!dspi->len) {
+ dspi->waitflags = 1;
+ wake_up_interruptible(&dspi->waitq);
+ return IRQ_HANDLED;
+ }
- if (!dspi->len) {
- dspi->waitflags = 1;
- wake_up_interruptible(&dspi->waitq);
- } else {
- switch (trans_mode) {
- case DSPI_EOQ_MODE:
- dspi_eoq_write(dspi);
- break;
- case DSPI_TCFQ_MODE:
- dspi_tcfq_write(dspi);
- break;
- default:
- dev_err(&dspi->pdev->dev,
- "unsupported trans_mode %u\n",
- trans_mode);
- }
- }
+ switch (trans_mode) {
+ case DSPI_EOQ_MODE:
+ dspi_eoq_write(dspi);
+ break;
+ case DSPI_TCFQ_MODE:
+ dspi_tcfq_write(dspi);
+ break;
+ default:
+ dev_err(&dspi->pdev->dev,
+ "unsupported trans_mode %u\n",
+ trans_mode);
}
return IRQ_HANDLED;
--
2.20.1
^ permalink raw reply related
* Re: [PATCH][next] mac80211: minstrel_ht: fix infinite loop because supported is not being shifted
From: Felix Fietkau @ 2019-08-23 11:07 UTC (permalink / raw)
To: Colin King, Johannes Berg, David S . Miller, linux-wireless,
netdev
Cc: kernel-janitors, linux-kernel
In-Reply-To: <20190822122034.28664-1-colin.king@canonical.com>
On 2019-08-22 14:20, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
>
> Currently the for-loop will spin forever if variable supported is
> non-zero because supported is never changed. Fix this by adding in
> the missing right shift of supported.
>
> Addresses-Coverity: ("Infinite loop")
> Fixes: 48cb39522a9d ("mac80211: minstrel_ht: improve rate probing for devices with static fallback")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Felix Fietkau <nbd@nbd.name>
Thanks,
- Felix
^ permalink raw reply
* Re: [PATCH] wimax/i2400m: fix calculation of index, remove sizeof
From: Dan Carpenter @ 2019-08-23 11:23 UTC (permalink / raw)
To: Colin King
Cc: Inaky Perez-Gonzalez, linux-wimax, David S . Miller, netdev,
kernel-janitors, linux-kernel
In-Reply-To: <20190823085230.6225-1-colin.king@canonical.com>
On Fri, Aug 23, 2019 at 09:52:30AM +0100, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
>
> The subtraction of the two pointers is automatically scaled by the
> size of the size of the object the pointers point to, so the division
> by sizeof(*i2400m->barker) is incorrect. Fix this by removing the
> division. Also make index an unsigned int to clean up a checkpatch
> warning.
>
> Addresses-Coverity: ("Extra sizeof expression")
> Fixes: aba3792ac2d7 ("wimax/i2400m: rework bootrom initialization to be more flexible")
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
> drivers/net/wimax/i2400m/fw.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/net/wimax/i2400m/fw.c b/drivers/net/wimax/i2400m/fw.c
> index 489cba9b284d..599a703af6eb 100644
> --- a/drivers/net/wimax/i2400m/fw.c
> +++ b/drivers/net/wimax/i2400m/fw.c
> @@ -399,8 +399,7 @@ int i2400m_is_boot_barker(struct i2400m *i2400m,
> * associated with the device. */
> if (i2400m->barker
> && !memcmp(buf, i2400m->barker, sizeof(i2400m->barker->data))) {
> - unsigned index = (i2400m->barker - i2400m_barker_db)
> - / sizeof(*i2400m->barker);
> + unsigned int index = i2400m->barker - i2400m_barker_db;
> d_printf(2, dev, "boot barker cache-confirmed #%u/%08x\n",
> index, le32_to_cpu(i2400m->barker->data[0]));
It's only used for this debug output. You may as well just delete it.
> return 0;
regards,
dan carpenter
^ permalink raw reply
* Re: [PATCH] wimax/i2400m: fix calculation of index, remove sizeof
From: Colin Ian King @ 2019-08-23 11:27 UTC (permalink / raw)
To: Dan Carpenter
Cc: Inaky Perez-Gonzalez, linux-wimax, David S . Miller, netdev,
kernel-janitors, linux-kernel
In-Reply-To: <20190823112337.GB23408@kadam>
On 23/08/2019 12:23, Dan Carpenter wrote:
> On Fri, Aug 23, 2019 at 09:52:30AM +0100, Colin King wrote:
>> From: Colin Ian King <colin.king@canonical.com>
>>
>> The subtraction of the two pointers is automatically scaled by the
>> size of the size of the object the pointers point to, so the division
>> by sizeof(*i2400m->barker) is incorrect. Fix this by removing the
>> division. Also make index an unsigned int to clean up a checkpatch
>> warning.
>>
>> Addresses-Coverity: ("Extra sizeof expression")
>> Fixes: aba3792ac2d7 ("wimax/i2400m: rework bootrom initialization to be more flexible")
>> Signed-off-by: Colin Ian King <colin.king@canonical.com>
>> ---
>> drivers/net/wimax/i2400m/fw.c | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/wimax/i2400m/fw.c b/drivers/net/wimax/i2400m/fw.c
>> index 489cba9b284d..599a703af6eb 100644
>> --- a/drivers/net/wimax/i2400m/fw.c
>> +++ b/drivers/net/wimax/i2400m/fw.c
>> @@ -399,8 +399,7 @@ int i2400m_is_boot_barker(struct i2400m *i2400m,
>> * associated with the device. */
>> if (i2400m->barker
>> && !memcmp(buf, i2400m->barker, sizeof(i2400m->barker->data))) {
>> - unsigned index = (i2400m->barker - i2400m_barker_db)
>> - / sizeof(*i2400m->barker);
>> + unsigned int index = i2400m->barker - i2400m_barker_db;
>> d_printf(2, dev, "boot barker cache-confirmed #%u/%08x\n",
>> index, le32_to_cpu(i2400m->barker->data[0]));
>
> It's only used for this debug output. You may as well just delete it.
>
>> return 0;
Deleting wrong debug code vs fixing debug code? I'd rather go for the
latter.
>
> regards,
> dan carpenter
>
^ permalink raw reply
* Re: [RFC bpf-next 0/5] Convert iproute2 to use libbpf (WIP)
From: Toke Høiland-Jørgensen @ 2019-08-23 11:29 UTC (permalink / raw)
To: Andrii Nakryiko, Daniel Borkmann
Cc: Stephen Hemminger, Alexei Starovoitov, Martin KaFai Lau, Song Liu,
Yonghong Song, David Miller, Jesper Dangaard Brouer, Networking,
bpf
In-Reply-To: <CAEf4Bzab_w0AXy5P9mG14mcyJVgUCzuuNda5FpU5wSEwUciGfg@mail.gmail.com>
[ ... snip ...]
> E.g., today's API is essentially three steps:
>
> 1. open and parse ELF: collect relos, programs, map definitions
> 2. load: create maps from collected defs, do program/global data/CO-RE
> relocs, load and verify BPF programs
> 3. attach programs one by one.
>
> Between step 1 and 2 user has flexibility to create more maps, set up
> map-in-map, etc. Between 2 and 3 you can fill in global data, fill in
> tail call maps, etc. That's already pretty flexible. But we can tune
> and break apart those steps even further, if necessary.
Today, steps 1 and 2 can be collapsed into a single call to
bpf_prog_load_xattr(). As Jesper's mail explains, for XDP we don't
generally want to do all the fancy rewriting stuff, we just want a
simple way to load a program and get reusable pinning of maps.
Preferably in a way that is compatible with the iproute2 loader.
So I really think we need two things:
(1) a flexible API that splits up all the various steps in a way that
allows programs to inject their own map definitions before
relocations and loading
(2) a simple convenience wrapper that loads an object file, does
something sensible with pinning and map-in-map definitions, and loads
everything into the kernel.
I'd go so far as to say that (2) should even support system-wide
configuration, similar to the /etc/iproute2/bpf_pinning file. E.g., an
/etc/libbpf/pinning.conf file that sets the default pinning directory,
and makes it possible to set up pin-value-to-subdir mappings like what
iproute2 does today.
Having (2) makes it more likely that all the different custom loaders
will be compatible with each other, while still allowing people to do
their own custom thing with (1). And of course, (2) could be implemented
in terms of (1) internally in libbpf.
In my ideal world, (2) would just use the definition format already in
iproute2 (this is basically what I implemented already), but if you guys
don't want to put this into libbpf, I can probably live with the default
format being BTF-based instead. Which would mean that iproute2 I would
end up with a flow like this:
- When given an elf file, try to run it through the "standard loader"
(2). If this works, great, proceed to program attach.
- If using (2) fails because it doesn't understand the map definition,
fall back to a compatibility loader that parses the legacy iproute2
map definition format and uses (1) to load that.
Does the above make sense? :)
-Toke
^ permalink raw reply
* [PATCH net-next] net: ipv6: fix listify ip6_rcv_finish in case of forwarding
From: Xin Long @ 2019-08-23 11:33 UTC (permalink / raw)
To: network dev, linux-sctp
Cc: Marcelo Ricardo Leitner, Neil Horman, davem,
Jesper Dangaard Brouer, Edward Cree, Dmitry Vyukov,
syzkaller-bugs
We need a similar fix for ipv6 as Commit 0761680d5215 ("net: ipv4: fix
listify ip_rcv_finish in case of forwarding") does for ipv4.
This issue can be reprocuded by syzbot since Commit 323ebb61e32b ("net:
use listified RX for handling GRO_NORMAL skbs") on net-next. The call
trace was:
kernel BUG at include/linux/skbuff.h:2225!
RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
Call Trace:
sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202
sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385
sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80
sctp_rcv+0x2807/0x3590 net/sctp/input.c:256
sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049
ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
NF_HOOK include/linux/netfilter.h:305 [inline]
NF_HOOK include/linux/netfilter.h:299 [inline]
ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
dst_input include/net/dst.h:442 [inline]
ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84
ip6_list_rcv_finish net/ipv6/ip6_input.c:118 [inline]
ip6_sublist_rcv+0x80c/0xcf0 net/ipv6/ip6_input.c:282
ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:316
__netif_receive_skb_list_ptype net/core/dev.c:5049 [inline]
__netif_receive_skb_list_core+0x5fc/0x9d0 net/core/dev.c:5097
__netif_receive_skb_list net/core/dev.c:5149 [inline]
netif_receive_skb_list_internal+0x7eb/0xe60 net/core/dev.c:5244
gro_normal_list.part.0+0x1e/0xb0 net/core/dev.c:5757
gro_normal_list net/core/dev.c:5755 [inline]
gro_normal_one net/core/dev.c:5769 [inline]
napi_frags_finish net/core/dev.c:5782 [inline]
napi_gro_frags+0xa6a/0xea0 net/core/dev.c:5855
tun_get_user+0x2e98/0x3fa0 drivers/net/tun.c:1974
tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2020
Fixes: d8269e2cbf90 ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()")
Fixes: 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL skbs")
Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com
Reported-by: syzbot+4a0643a653ac375612d1@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/ipv6/ip6_input.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index fa014d5..d432d00 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -80,8 +80,10 @@ static void ip6_sublist_rcv_finish(struct list_head *head)
{
struct sk_buff *skb, *next;
- list_for_each_entry_safe(skb, next, head, list)
+ list_for_each_entry_safe(skb, next, head, list) {
+ skb_list_del_init(skb);
dst_input(skb);
+ }
}
static void ip6_list_rcv_finish(struct net *net, struct sock *sk,
--
2.1.0
^ permalink raw reply related
* Re: [PATCH 1/2] rtnetlink: gate MAC address with an LSM hook
From: Jeffrey Vander Stoep @ 2019-08-23 11:41 UTC (permalink / raw)
To: David Miller; +Cc: netdev, LSM List, selinux
In-Reply-To: <20190822.161913.326746900077543343.davem@davemloft.net>
On Fri, Aug 23, 2019 at 1:19 AM David Miller <davem@davemloft.net> wrote:
>
> From: Jeff Vander Stoep <jeffv@google.com>
> Date: Wed, 21 Aug 2019 15:45:47 +0200
>
> > MAC addresses are often considered sensitive because they are
> > usually unique and can be used to identify/track a device or
> > user [1].
> >
> > The MAC address is accessible via the RTM_NEWLINK message type of a
> > netlink route socket[2]. Ideally we could grant/deny access to the
> > MAC address on a case-by-case basis without blocking the entire
> > RTM_NEWLINK message type which contains a lot of other useful
> > information. This can be achieved using a new LSM hook on the netlink
> > message receive path. Using this new hook, individual LSMs can select
> > which processes are allowed access to the real MAC, otherwise a
> > default value of zeros is returned. Offloading access control
> > decisions like this to an LSM is convenient because it preserves the
> > status quo for most Linux users while giving the various LSMs
> > flexibility to make finer grained decisions on access to sensitive
> > data based on policy.
> >
> > [1] https://adamdrake.com/mac-addresses-udids-and-privacy.html
> > [2] Other access vectors like ioctl(SIOCGIFHWADDR) are already covered
> > by existing LSM hooks.
> >
> > Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
>
> I'm sure the MAC address will escape into userspace via other means,
> dumping pieces of networking config in other contexts, etc. I mean,
> if I can get a link dump, I can dump the neighbor table as well.
These are already gated by existing LSM hooks and capability checks.
They are not allowed on mandatory access control systems unless explicitly
granted.
>
> I kinda think this is all very silly whack-a-mole kind of stuff, to
> be quite honest.
We evaluated mechanisms for the MAC to reach unprivileged apps.
A number of researchers have published on this as well such as:
https://www.usenix.org/conference/usenixsecurity19/presentation/reardon
Three "leaks" were identified, two have already been fixed.
-ioctl(SIOCGIFHWADDR). Fixed using finer grained LSM checks
on socket ioctls (similar to this change).
-IPv6 IP addresses. Fixed by no longer including the MAC as part
of the IP address.
-RTM_NEWLINK netlink route messages. The last mole to be whacked.
>
> And like others have said, tomorrow you'll be like "oh crap, we should
> block X too" and we'll get another hook, another config knob, another
> rulset update, etc.
This seems like an issue inherent with permissions/capabilities. I don’t
think we should abandon the concept of permissions because someone
can forget to add a check. Likewise, if someone adds new code to the
kernel and omits a capable(CAP_NET_*) check, I would expect it to be
fixed like any other bug without the idea of capability checks being tossed
out.
We need to do something because this information is being abused. Any
recommendations? This seemed like the simplest approach, but I can
definitely appreciate that it has downsides.
I could make this really generic by adding a single hook to the end of
sock_msgrecv() which would allow an LSM to modify the message to omit
the MAC address and any other information that we deem as sensitive in the
future. Basically what Casey was suggesting. Thoughts on that approach?
Thanks for your help on this.
^ permalink raw reply
* Re: [PATCH v2 2/5] spi: spi-fsl-dspi: Exit the ISR with IRQ_NONE when it's not ours
From: Vladimir Oltean @ 2019-08-23 12:06 UTC (permalink / raw)
To: Mark Brown; +Cc: linux-spi, lkml, devicetree, netdev
In-Reply-To: <20190823105949.GQ23391@sirena.co.uk>
Hi Mark,
On Fri, 23 Aug 2019 at 13:59, Mark Brown <broonie@kernel.org> wrote:
>
> On Fri, Aug 23, 2019 at 11:50:44AM +0100, Mark Brown wrote:
> > On Fri, Aug 23, 2019 at 01:30:27PM +0300, Vladimir Oltean wrote:
>
> > > Did you see this?
> > > https://lkml.org/lkml/2019/8/22/1542
>
> > I'm not online enough to readily follow that link right now, I
> > did apply another patch for a similar issue though. If that's
> > a different version of the same change please don't do that,
> > sending multiple conflicting versions of the same thing creates
> > conflicts and makes everything harder to work with.
>
> Oh, I guess this was due to there being an existing refactoring
> in -next that meant the fix wouldn't apply directly. I sorted
> that out now I think, but in general the same thing applies -
> it's better to put fixes before anything else in the series,
> it'll flag up when reviewing.
I try to require as little attention span as possible from you and I
apologize that I'm sending patches like a noob, but I'm not used to
this sort of interaction with a maintainer. It's taking me some time
to adjust expectations.
- You left change requests in the initial patchset I submitted, but
you partially applied the series anyway. You didn't give me a chance
to respin the whole series and put the shared IRQ fix on top, so it
applies on old trees as well. No problem, I sent two versions of the
patch.
- On my previous series you left this comment:
> Please don't include all the extra tags you've got in your subject
> lines. In my inbox this series looks like:
>
> 790 N T 08/18 Vladimir Oltean ( 16K) ├─>[PATCH spi for-5.4 01/14] spi: spi-f
>
> so I can't tell what it's actually about just from looking at it. Just
> [PATCH 01/14] would be enough, putting a target version in or versioning
> the patch series can be OK but you usually don't use a target version
> for -next and adding spi in there is redundant given that it's also in
> the changelog.
So I didn't put any target version in the patch titles this time,
although arguably it would have been clearer to you that there's a
patch for-5.4 and another version of it for-4.20 (which i *think* is
how I should submit a fix, I don't see any branch for inclusion in
stable trees per se).
No problem, I explained in the cover letters that one patchset is for
-next and the other is for inclusion in stable trees. Maintainers do
read cover letters, right?
Message from the -next cover letter:
> The series also contains a bug fix for the shared IRQ of the DSPI
> driver. I am going to respin a version of it as a separate patch for
> inclusion in stable trees, independent of this patchset.
Message from the fix's cover letter:
> This patch is taken out of the "Poll mode for NXP DSPI driver" series
> and respun against the "for-4.20" branch.
> $(git describe --tags 13aed2392741) shows:
> v4.20-rc1-18-g13aed2392741
Yes, I did send a cover letter for a single patch. I thought it's
harder to miss than a note hidden under patch 2/5 of one series, and
in the note section of the other's. I think you could have also made
an argument about me not doing it the other way around. In the end,
you can not read a note just as well as not read a cover letter, and
there's little I can do.
No problem, you missed the link between the two. I sent you a link to
the lkml archive. You said "I'm not online enough to readily follow
that link right now". Please teach me - I really don't know - how can
I make links between patchsets easier for you to follow, if you don't
read cover letters and you can't access lkml? I promise I'll use that
method next time.
> I do frequently catch up on my mail on flights or while otherwise
> travelling so this is even more pressing for me than just being about
> making things a bit easier to read.
Maybe you simply should do something else while traveling, just saying.
Regards,
-Vladimir
^ permalink raw reply
* Re: [PATCH net-next] net: ipv6: fix listify ip6_rcv_finish in case of forwarding
From: Edward Cree @ 2019-08-23 12:07 UTC (permalink / raw)
To: Xin Long, network dev, linux-sctp
Cc: Marcelo Ricardo Leitner, Neil Horman, davem,
Jesper Dangaard Brouer, Dmitry Vyukov, syzkaller-bugs
In-Reply-To: <e355527b374f6ce70fcc286457f87592cd8f3dcc.1566559983.git.lucien.xin@gmail.com>
On 23/08/2019 12:33, Xin Long wrote:
> We need a similar fix for ipv6 as Commit 0761680d5215 ("net: ipv4: fix
> listify ip_rcv_finish in case of forwarding") does for ipv4.
>
> This issue can be reprocuded by syzbot since Commit 323ebb61e32b ("net:
> use listified RX for handling GRO_NORMAL skbs") on net-next. The call
> trace was:
>
> kernel BUG at include/linux/skbuff.h:2225!
> RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
> RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
> Call Trace:
> sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202
> sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385
> sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80
> sctp_rcv+0x2807/0x3590 net/sctp/input.c:256
> sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049
> ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
> ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
> NF_HOOK include/linux/netfilter.h:305 [inline]
> NF_HOOK include/linux/netfilter.h:299 [inline]
> ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
> dst_input include/net/dst.h:442 [inline]
> ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84
> ip6_list_rcv_finish net/ipv6/ip6_input.c:118 [inline]
> ip6_sublist_rcv+0x80c/0xcf0 net/ipv6/ip6_input.c:282
> ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:316
> __netif_receive_skb_list_ptype net/core/dev.c:5049 [inline]
> __netif_receive_skb_list_core+0x5fc/0x9d0 net/core/dev.c:5097
> __netif_receive_skb_list net/core/dev.c:5149 [inline]
> netif_receive_skb_list_internal+0x7eb/0xe60 net/core/dev.c:5244
> gro_normal_list.part.0+0x1e/0xb0 net/core/dev.c:5757
> gro_normal_list net/core/dev.c:5755 [inline]
> gro_normal_one net/core/dev.c:5769 [inline]
> napi_frags_finish net/core/dev.c:5782 [inline]
> napi_gro_frags+0xa6a/0xea0 net/core/dev.c:5855
> tun_get_user+0x2e98/0x3fa0 drivers/net/tun.c:1974
> tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2020
>
> Fixes: d8269e2cbf90 ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()")
> Fixes: 323ebb61e32b ("net: use listified RX for handling GRO_NORMAL skbs")
> Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com
> Reported-by: syzbot+4a0643a653ac375612d1@syzkaller.appspotmail.com
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Thanks for catching this.
> ---
> net/ipv6/ip6_input.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index fa014d5..d432d00 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -80,8 +80,10 @@ static void ip6_sublist_rcv_finish(struct list_head *head)
> {
> struct sk_buff *skb, *next;
>
> - list_for_each_entry_safe(skb, next, head, list)
> + list_for_each_entry_safe(skb, next, head, list) {
> + skb_list_del_init(skb);
> dst_input(skb);
> + }
> }
>
> static void ip6_list_rcv_finish(struct net *net, struct sock *sk,
^ permalink raw reply
* Re: [PATCH net v3] ixgbe: fix double clean of tx descriptors with xdp
From: Eelco Chaudron @ 2019-08-23 12:10 UTC (permalink / raw)
To: Ilya Maximets
Cc: netdev, linux-kernel, bpf, David S. Miller, Björn Töpel,
Magnus Karlsson, Jakub Kicinski, Alexei Starovoitov,
Daniel Borkmann, Jeff Kirsher, intel-wired-lan, William Tu,
Alexander Duyck
In-Reply-To: <20190822171237.20798-1-i.maximets@samsung.com>
On 22 Aug 2019, at 19:12, Ilya Maximets wrote:
> Tx code doesn't clear the descriptors' status after cleaning.
> So, if the budget is larger than number of used elems in a ring, some
> descriptors will be accounted twice and xsk_umem_complete_tx will move
> prod_tail far beyond the prod_head breaking the completion queue ring.
>
> Fix that by limiting the number of descriptors to clean by the number
> of used descriptors in the tx ring.
>
> 'ixgbe_clean_xdp_tx_irq()' function refactored to look more like
> 'ixgbe_xsk_clean_tx_ring()' since we're allowed to directly use
> 'next_to_clean' and 'next_to_use' indexes.
>
> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---
>
> Version 3:
> * Reverted some refactoring made for v2.
> * Eliminated 'budget' for tx clean.
> * prefetch returned.
>
> Version 2:
> * 'ixgbe_clean_xdp_tx_irq()' refactored to look more like
> 'ixgbe_xsk_clean_tx_ring()'.
>
> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 29
> ++++++++------------
> 1 file changed, 11 insertions(+), 18 deletions(-)
>
Did some test with and without the fix applied. For PVP the results are
a little different depending on the packet size (note this is a single
run, no deviation).
For the same physical port in and out it’s faster! Note this was OVS
AF_XDP using a XENA tester at 10G wire speed.
+--------------------------------------------------------------------------------+
| Physical to Virtual to Physical test, L3 flows[port redirect]
|
+-----------------+--------------------------------------------------------------+
| | Packet size
|
+-----------------+--------+--------+--------+--------+--------+--------+--------+
| Number of flows | 64 | 128 | 256 | 512 | 768 | 1024
| 1514 |
+-----------------+--------+--------+--------+--------+--------+--------+--------+
| [NO FIX] 1000 | 739161 | 700091 | 690034 | 659894 | 618128 | 594223
| 537504 |
+-----------------+--------+--------+--------+--------+--------+--------+--------+
| [FIX] 1000 | 742317 | 708391 | 689952 | 658034 | 626056 | 587653
| 530885 |
+-----------------+--------+--------+--------+--------+--------+--------+--------+
+--------------------------------------------------------------------------------------+
| Physical loopback test, L3 flows[port redirect]
|
+-----------------+--------------------------------------------------------------------+
| | Packet size
|
+-----------------+---------+---------+---------+---------+---------+---------+--------+
| Number of flows | 64 | 128 | 256 | 512 | 768 |
1024 | 1514 |
+-----------------+---------+---------+---------+---------+---------+---------+--------+
| [NO FIX] 1000 | 2573298 | 2227578 | 2514318 | 2298204 | 1081861 |
1015173 | 788081 |
+-----------------+---------+---------+---------+---------+---------+---------+--------+
| [FIX] 1000 | 3343188 | 3234993 | 3151833 | 2349597 | 1586276 |
1197304 | 814854 |
+-----------------+---------+---------+---------+---------+---------+---------+--------+
Tested-by: Eelco Chaudron <echaudro@redhat.com>
^ permalink raw reply
* Re: [PATCH 0/3] Add NETIF_F_HW_BRIDGE feature
From: Horatiu Vultur @ 2019-08-23 12:26 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: roopa, davem, UNGLinuxDriver, alexandre.belloni, allan.nielsen,
netdev, linux-kernel, bridge
In-Reply-To: <b2c52206-82d1-ef28-aeec-a5dcdbe9df6c@cumulusnetworks.com>
The 08/23/2019 01:26, Nikolay Aleksandrov wrote:
> External E-Mail
>
>
> On 8/23/19 1:09 AM, Nikolay Aleksandrov wrote:
> > On 22/08/2019 22:07, Horatiu Vultur wrote:
> >> Current implementation of the SW bridge is setting the interfaces in
> >> promisc mode when they are added to bridge if learning of the frames is
> >> enabled.
> >> In case of Ocelot which has HW capabilities to switch frames, it is not
> >> needed to set the ports in promisc mode because the HW already capable of
> >> doing that. Therefore add NETIF_F_HW_BRIDGE feature to indicate that the
> >> HW has bridge capabilities. Therefore the SW bridge doesn't need to set
> >> the ports in promisc mode to do the switching.
> >> This optimization takes places only if all the interfaces that are part
> >> of the bridge have this flag and have the same network driver.
> >>
> >> If the bridge interfaces is added in promisc mode then also the ports part
> >> of the bridge are set in promisc mode.
> >>
> >> Horatiu Vultur (3):
> >> net: Add HW_BRIDGE offload feature
> >> net: mscc: Use NETIF_F_HW_BRIDGE
> >> net: mscc: Implement promisc mode.
> >>
> >> drivers/net/ethernet/mscc/ocelot.c | 26 ++++++++++++++++++++++++--
> >> include/linux/netdev_features.h | 3 +++
> >> net/bridge/br_if.c | 29 ++++++++++++++++++++++++++++-
> >> net/core/ethtool.c | 1 +
> >> 4 files changed, 56 insertions(+), 3 deletions(-)
> >>
> >
>
Hi Nikolay,
> Just to clarify:
> > IMO the name is misleading.
> - that's not mandatory or anything, just saying people might get confused when they see it
Naming is hard, I properly need to go back and find a better name once
the patch is more mature and the problem/purpose is better understood.
But you are right, this is a bad name, I will find a better one.
>
> > Why do the devices have to be from the same driver ?
After seeing yours and Andrews comments I realize that I try to do two
things, but I have only explained one of them.
Here is what I was trying to do:
A. Prevent ports from going into promisc mode, if it is not needed.
B. Prevent adding the CPU to the flood-mask (in Ocelot we have a
flood-mask controlling who should be included when flooding due to
destination unknown).
We have been thinking of these two as the same thing, but they are in
fact quite different. It is because of "B" we had to require all the
devices to be controlled by the same Switch.
For item "A" I do not think we need this restriction. In this patch we
will continue only focusing on item "A".
Sorry for the confusion. I will do a new patch that does not have this
restriction (which will also make it simpler).
It just needs to check for the flag NETIF_F_HW_BRIDGE and if it is not
set then set the port in promisc mode.
To solve item "B", the network driver needs to detect if there is a
foreign interfaces added to the bridge. If that is the case then to add
the CPU port to the flooding mask otherwise no. But this part will be in
a different patch series.
> > This is too specific targeting some devices.
Maybe I was wrong to mention specific HW in the commit message. The
purpose of the patch was to add an optimization (not to copy all the
frames to the CPU) for HW that is capable of learning and flooding the
frames.
I would expect that most switching HW would benefit from this.
> > The bridge should not care what's the port device, it should be the other way
Not sure I understand how that could be done. Are you suggesting that
the flag belongs to another structure? If you have something specific in
mind, then please let us know.
> That was mostly a rhetorical question, it's obvious why but please add an explanation
> at least in the commit message and please fix the typos in the comment. Also HW
> is capable of doing switching, this needs some clarification that the whole process
> stays in HW IIUC. More details here would be great.
Yes, I will add more details in the commit message and in code.
> > around, so adding device-specific code to the bridge is not ok. Isn't there a solution
> > where you can use NETDEV_JOIN and handle it all from your driver ?
> > Would all HW-learned entries be hidden from user-space in this case ?
Yes, they would. But if the network driver will call
'call_switchdev_notifiers' with SWITCHDEV_FDB_ADD_TO_BRIDGE and
SWITCHDEV_FDB_DEL_TO_BRIDGE then the SW will see these entries.
> >
> I.e. isn't there a way to do this without introducing a new feature flag ?
I do not have any better ideas.
>
>
--
/Horatiu
^ permalink raw reply
* [PATCH net-next] net/mlx5: Fix return code in case of hyperv wrong size read
From: Eran Ben Elisha @ 2019-08-23 12:34 UTC (permalink / raw)
To: netdev, David S. Miller; +Cc: Saeed Mahameed, Haiyang Zhang, Eran Ben Elisha
Return code value could be non deterministic in case of wrong size read.
With this patch, if such error occurs, set rc to be -EIO.
In addition, mlx5_hv_config_common() supports reading of
HV_CONFIG_BLOCK_SIZE_MAX bytes only, fix to early return error with
bad input.
Fixes: 913d14e86657 ("net/mlx5: Add wrappers for HyperV PCIe operations")
Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
index cf08d02703fb..583dc7e2aca8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c
@@ -12,7 +12,7 @@ static int mlx5_hv_config_common(struct mlx5_core_dev *dev, void *buf, int len,
int bytes_returned;
int block_id;
- if (offset % HV_CONFIG_BLOCK_SIZE_MAX || len % HV_CONFIG_BLOCK_SIZE_MAX)
+ if (offset % HV_CONFIG_BLOCK_SIZE_MAX || len != HV_CONFIG_BLOCK_SIZE_MAX)
return -EINVAL;
block_id = offset / HV_CONFIG_BLOCK_SIZE_MAX;
@@ -25,8 +25,8 @@ static int mlx5_hv_config_common(struct mlx5_core_dev *dev, void *buf, int len,
HV_CONFIG_BLOCK_SIZE_MAX, block_id);
/* Make sure len bytes were read successfully */
- if (read)
- rc |= !(len == bytes_returned);
+ if (read && !rc && len != bytes_returned)
+ rc = -EIO;
if (rc) {
mlx5_core_err(dev, "Failed to %s hv config, err = %d, len = %d, offset = %d\n",
--
2.17.1
^ permalink raw reply related
* Re: [PATCH 1/3] net: Add HW_BRIDGE offload feature
From: Horatiu Vultur @ 2019-08-23 12:39 UTC (permalink / raw)
To: Andrew Lunn
Cc: roopa, nikolay, davem, UNGLinuxDriver, alexandre.belloni,
allan.nielsen, netdev, linux-kernel, bridge
In-Reply-To: <20190822200817.GD21295@lunn.ch>
The 08/22/2019 22:08, Andrew Lunn wrote:
> External E-Mail
>
>
> > +/* Determin if the SW bridge can be offloaded to HW. Return true if all
> > + * the interfaces of the bridge have the feature NETIF_F_HW_SWITCHDEV set
> > + * and have the same netdev_ops.
> > + */
>
> Hi Horatiu
>
> Why do you need these restrictions. The HW bridge should be able to
> learn that a destination MAC address can be reached via the SW
> bridge. The software bridge can then forward it out the correct
> interface.
>
> Or are you saying your hardware cannot learn from frames which come
> from the CPU?
>
> Andrew
>
Hi Andrew,
I do not believe that our HW can learn from frames which comes from the
CPU, at least not in the way they are injected today. But in case of Ocelot
(and the next chip we are working on), we have other issues in mixing with
foreign interfaces which is why we have the check in
ocelot_netdevice_dev_check.
More important, as we responded to Nikolay, we properly introduced this
restriction for the wrong reasons.
In SW bridge I will remove all these restrictions and only set ports in
promisc mode only if NETIF_F_HW_BRIDGE is not set.
Then in the network driver I can see if a foreign interface is added to
the bridge, and when that happens I can set the port in promisc mode.
Then the frames will be flooded to the SW bridge which eventually will
send to the foreign interface.
--
/Horatiu
^ permalink raw reply
* Re: [PATCHv2 net] ipv6/addrconf: allow adding multicast addr if IFA_F_MCAUTOJOIN is set
From: David Ahern @ 2019-08-23 12:41 UTC (permalink / raw)
To: Hangbin Liu; +Cc: netdev, Madhu Challa, David S . Miller, Jianlin Shi
In-Reply-To: <20190822082012.GE18865@dhcp-12-139.nay.redhat.com>
On 8/22/19 4:20 AM, Hangbin Liu wrote:
> On Mon, Aug 19, 2019 at 10:33:58PM -0400, David Ahern wrote:
>> On 8/19/19 10:19 PM, Hangbin Liu wrote:
>>> But in ipv6_add_addr() it will check the address type and reject multicast
>>> address directly. So this feature is never worked for IPv6.
>>
>> If true, that is really disappointing.
>>
>> We need to get a functional test script started for various address cases.
>
> Do you mean an `ip addr add` testing for all kinds of address types?
>
yes.
^ permalink raw reply
* Re: IPv6 routes inserted into the kernel with 'route' end up with invalid type
From: David Ahern @ 2019-08-23 12:54 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, David S. Miller
Cc: netdev, bird-users, Tom Bird, Maria Jan Matejka, Ondrej Zajicek
In-Reply-To: <87mug0m4rp.fsf@toke.dk>
On 8/23/19 8:43 AM, Toke Høiland-Jørgensen wrote:
> Hi David
>
> Tom noticed[0] that on newer kernels, the Bird routing daemon rejects IPv6
> routes received from the kernel if those routes were inserted with the
> old 'route' utility (i.e., when they're inserted through the ioctl
> interface).
>
> We tracked it down to the routes having an rtm_type of RTN_UNKNOWN, and
> a bit of git archaeology points suggestively at this commit:
>
> e8478e80e5a ("net/ipv6: Save route type in rt6_info")
>
> The same setup works with older kernels, so this seems like it's a
> regression, the age of 'route' notwithstanding. Any good ideas for the
> proper way to fix this?
>
Should be fixed by:
commit c7036d97acd2527cef145b5ef9ad1a37ed21bbe6
Author: David Ahern <dsahern@gmail.com>
Date: Wed Jun 19 10:50:24 2019 -0700
ipv6: Default fib6_type to RTN_UNICAST when not set
A user reported that routes are getting installed with type 0
(RTN_UNSPEC)
where before the routes were RTN_UNICAST. One example is from accel-ppp
which apparently still uses the ioctl interface and does not set
rtmsg_type. Another is the netlink interface where ipv6 does not require
rtm_type to be set (v4 does). Prior to the commit in the Fixes tag the
ipv6 stack converted type 0 to RTN_UNICAST, so restore that behavior.
Fixes: e8478e80e5a7 ("net/ipv6: Save route type in rt6_info")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
^ permalink raw reply
* Re: IPv6 routes inserted into the kernel with 'route' end up with invalid type
From: Toke Høiland-Jørgensen @ 2019-08-23 12:59 UTC (permalink / raw)
To: David Ahern, David S. Miller
Cc: netdev, bird-users, Tom Bird, Maria Jan Matejka, Ondrej Zajicek
In-Reply-To: <423380ec-b999-d620-9bd6-78c2dabfde99@gmail.com>
David Ahern <dsahern@gmail.com> writes:
> On 8/23/19 8:43 AM, Toke Høiland-Jørgensen wrote:
>> Hi David
>>
>> Tom noticed[0] that on newer kernels, the Bird routing daemon rejects IPv6
>> routes received from the kernel if those routes were inserted with the
>> old 'route' utility (i.e., when they're inserted through the ioctl
>> interface).
>>
>> We tracked it down to the routes having an rtm_type of RTN_UNKNOWN, and
>> a bit of git archaeology points suggestively at this commit:
>>
>> e8478e80e5a ("net/ipv6: Save route type in rt6_info")
>>
>> The same setup works with older kernels, so this seems like it's a
>> regression, the age of 'route' notwithstanding. Any good ideas for the
>> proper way to fix this?
>>
>
> Should be fixed by:
>
> commit c7036d97acd2527cef145b5ef9ad1a37ed21bbe6
> Author: David Ahern <dsahern@gmail.com>
> Date: Wed Jun 19 10:50:24 2019 -0700
>
> ipv6: Default fib6_type to RTN_UNICAST when not set
>
> A user reported that routes are getting installed with type 0
> (RTN_UNSPEC)
> where before the routes were RTN_UNICAST. One example is from accel-ppp
> which apparently still uses the ioctl interface and does not set
> rtmsg_type. Another is the netlink interface where ipv6 does not require
> rtm_type to be set (v4 does). Prior to the commit in the Fixes tag the
> ipv6 stack converted type 0 to RTN_UNICAST, so restore that behavior.
>
> Fixes: e8478e80e5a7 ("net/ipv6: Save route type in rt6_info")
> Signed-off-by: David Ahern <dsahern@gmail.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
Ah, great! Guess that hasn't made its way to the stable and distribution
kernels yet. Thanks for the pointer! :)
-Toke
^ permalink raw reply
* Re: [PATCH net-next v5] sched: Add dualpi2 qdisc
From: Tilmans, Olivier (Nokia - BE/Antwerp) @ 2019-08-23 12:59 UTC (permalink / raw)
To: Dave Taht
Cc: Eric Dumazet, Stephen Hemminger, Olga Albisser,
De Schepper, Koen (Nokia - BE/Antwerp), Bob Briscoe, Henrik Steen,
Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <CAA93jw5_LN_-zhHh=zZA8r6Zvv1CvA_AikT_rCgWyT8ytQM_rg@mail.gmail.com>
> 1) Since we're still duking it out over the meaning of the bits - not
> just the SCE thing, but as best as I can
> tell (but could be wrong) the NQB idea wants to put something into the
> l4s fast queue? Or is NQB supposed to
> be a third queue?
We can add support for NQB in the future, by expanding the
dualpi2_skb_classify() function. This is however out of scope at the
moment as NQB is not yet adopted by the TSV WG. I'd guess we may want more
than just the NQB DSCP codepoint in the L queue, which then warrant
another way to classify traffic, e.g., using tc filter hints.
> In those cases, the ecn_mask should just be mask.
That is actually what it is at the moment: a mask on the two ecn bits.
> 2) Is the intent to make the drop probability 0 by default? (10 in the
> pie rfc, not mentioned in the l4s rfc as yet)
I assume you are referring to §5.1 of the PIE RFC, i.e., the switch to
pure drop once the computed marking probability is >10%?
The default for dualpi2 is also to enter a pure-drop mode on overload.
More precisely, we define overload as reaching a marking probability of
100% in the L queue, meaning an internal PI probability of 50% (as it
gets mutiplied by the coupling factor which defaults to 2).
This is equivalent to a PIE probability of 25% (as the classic queue gets a
squared probability).
This drop mode means that packets in both queues will be subject to
random drops with a PI^2 probability. Additionally, all packets not
dropped in the L queue are CE marked.
We used to have a parameter to configure this overload threshold (IIRC
it was still in the pre-v4 patches), but found no real use for lowering
it, hence its removal.
Note that the drop on overload can be disabled, resulting in increased
latencies in both queues, 100% CE marking in the L queue, and eventually
a taildrop behaviour once the packet limit is reached.
> 3) has this been tested on a hw mq system as yet? (10gigE is typically
> 64 queues)
Yes, in a setup where 1/32/64/128VMs were behind an Intel X540-*, which indeed
has 64 internal queues. The VMs use a mix of long/short cubic/DCTCP connections
towards another server. I could not think about another use-case where a 10G
software switch would prove to be a bottleneck, i.e., where a queue would
happen.
The qdisc is however not optimized for mq systems, could it cause performance
degradation if the server was severely resource constrained?
Also, ensuring it was able to saturate 10G meant gro was required on the
hypervisor, thus that the step threshold of dualpi2 has to be increased to
compensate for those large bursts. Maybe that is where being mq-aware would
help, i.e., by instantiating one dualpi2 instance per HW queue?
The AQM scheme itself is CPU friendly (lighter than PIE)--i.e., computing the
probability takes <10 arithmetic ops and 5 comparisons once every 16ms, while
enqueue/dequeue can involve ~10 comparisons and at most 2 rng calls)--so
should not increase the load too much issues if it was duplicated.
Best,
Olivier
^ permalink raw reply
* Re: [PATCH net v3] ixgbe: fix double clean of tx descriptors with xdp
From: William Tu @ 2019-08-23 13:19 UTC (permalink / raw)
To: Björn Töpel
Cc: Alexander Duyck, Ilya Maximets, Netdev, LKML, bpf,
David S. Miller, Magnus Karlsson, Jakub Kicinski,
Alexei Starovoitov, Daniel Borkmann, Jeff Kirsher,
intel-wired-lan, Eelco Chaudron
In-Reply-To: <217e90f5-206a-bb80-6699-f6dd750b57d9@intel.com>
On Thu, Aug 22, 2019 at 11:10 PM Björn Töpel <bjorn.topel@intel.com> wrote:
>
> On 2019-08-22 19:32, William Tu wrote:
> > On Thu, Aug 22, 2019 at 10:21 AM Alexander Duyck
> > <alexander.duyck@gmail.com> wrote:
> >>
> >> On Thu, Aug 22, 2019 at 10:12 AM Ilya Maximets <i.maximets@samsung.com> wrote:
> >>>
> >>> Tx code doesn't clear the descriptors' status after cleaning.
> >>> So, if the budget is larger than number of used elems in a ring, some
> >>> descriptors will be accounted twice and xsk_umem_complete_tx will move
> >>> prod_tail far beyond the prod_head breaking the completion queue ring.
> >>>
> >>> Fix that by limiting the number of descriptors to clean by the number
> >>> of used descriptors in the tx ring.
> >>>
> >>> 'ixgbe_clean_xdp_tx_irq()' function refactored to look more like
> >>> 'ixgbe_xsk_clean_tx_ring()' since we're allowed to directly use
> >>> 'next_to_clean' and 'next_to_use' indexes.
> >>>
> >>> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
> >>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> >>> ---
> >>>
> >>> Version 3:
> >>> * Reverted some refactoring made for v2.
> >>> * Eliminated 'budget' for tx clean.
> >>> * prefetch returned.
> >>>
> >>> Version 2:
> >>> * 'ixgbe_clean_xdp_tx_irq()' refactored to look more like
> >>> 'ixgbe_xsk_clean_tx_ring()'.
> >>>
> >>> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 29 ++++++++------------
> >>> 1 file changed, 11 insertions(+), 18 deletions(-)
> >>
> >> Thanks for addressing my concerns.
> >>
> >> Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> >
> > Thanks.
> >
> > Tested-by: William Tu <u9012063@gmail.com>
> >
>
> Will, thanks for testing! For this patch, did you notice any performance
> degradation?
>
No noticeable performance degradation seen this time.
Thanks,
William
^ permalink raw reply
* Re: [PATCH 0/3] Add NETIF_F_HW_BRIDGE feature
From: Andrew Lunn @ 2019-08-23 13:25 UTC (permalink / raw)
To: Horatiu Vultur
Cc: Nikolay Aleksandrov, roopa, davem, UNGLinuxDriver,
alexandre.belloni, allan.nielsen, netdev, linux-kernel, bridge
In-Reply-To: <20190823122657.njk2tcgur2zu74i7@soft-dev3.microsemi.net>
> > > Why do the devices have to be from the same driver ?
> After seeing yours and Andrews comments I realize that I try to do two
> things, but I have only explained one of them.
>
> Here is what I was trying to do:
> A. Prevent ports from going into promisc mode, if it is not needed.
The switch definition is promisc is a bit odd. You really need to
split it into two use cases.
The Linux interface is not a member of a bridge. In this case, promisc
mode would mean all frames ingressing the port should be forwarded to
the CPU. Without promisc, you can program the hardware to just accept
frames with the interfaces MAC address. So this is just the usual
behaviour of an interface.
When the interface is part of the bridge, then you can turn on all the
learning and not forward frames to the CPU, unless the CPU asks for
them. But you need to watch out for various flags. By default, you
should flood to the CPU, unknown destinations to the CPU etc. But some
of these can be turned off by flags.
> B. Prevent adding the CPU to the flood-mask (in Ocelot we have a
> flood-mask controlling who should be included when flooding due to
> destination unknown).
So destination unknown should be flooded to the CPU. The CPU might
know where to send the frame.
> To solve item "B", the network driver needs to detect if there is a
> foreign interfaces added to the bridge. If that is the case then to add
> the CPU port to the flooding mask otherwise no.
It is not just a foreign interface. What about the MAC address on the
bridge interface?
> > > This is too specific targeting some devices.
> Maybe I was wrong to mention specific HW in the commit message. The
> purpose of the patch was to add an optimization (not to copy all the
> frames to the CPU) for HW that is capable of learning and flooding the
> frames.
To some extent, this is also tied to your hardware not learning MAC
addresses from frames passed from the CPU. You should also consider
fixing this. The SW bridge does send out notifications when it
adds/removes MAC addresses to its tables. You probably want to receive
this modifications, and use them to program your hardware to forward
frames to the CPU when needed.
Andrew
^ permalink raw reply
* Re: [PATCH bpf-next v5 03/11] xsk: add support to allow unaligned chunk placement
From: Laatz, Kevin @ 2019-08-23 13:35 UTC (permalink / raw)
To: Jonathan Lemon
Cc: netdev, ast, daniel, bjorn.topel, magnus.karlsson, jakub.kicinski,
saeedm, maximmi, stephen, bruce.richardson, ciara.loftus, bpf,
intel-wired-lan
In-Reply-To: <3AEEC88E-8D45-41C5-AFBF-51512826B1A7@gmail.com>
On 22/08/2019 19:43, Jonathan Lemon wrote:
> On 21 Aug 2019, at 18:44, Kevin Laatz wrote:
>> Currently, addresses are chunk size aligned. This means, we are very
>> restricted in terms of where we can place chunk within the umem. For
>> example, if we have a chunk size of 2k, then our chunks can only be
>> placed
>> at 0,2k,4k,6k,8k... and so on (ie. every 2k starting from 0).
>>
>> This patch introduces the ability to use unaligned chunks. With these
>> changes, we are no longer bound to having to place chunks at a 2k (or
>> whatever your chunk size is) interval. Since we are no longer dealing
>> with
>> aligned chunks, they can now cross page boundaries. Checks for page
>> contiguity have been added in order to keep track of which pages are
>> followed by a physically contiguous page.
>>
>> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
>> Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>
>> ---
>> v2:
>> - Add checks for the flags coming from userspace
>> - Fix how we get chunk_size in xsk_diag.c
>> - Add defines for masking the new descriptor format
>> - Modified the rx functions to use new descriptor format
>> - Modified the tx functions to use new descriptor format
>>
>> v3:
>> - Add helper function to do address/offset masking/addition
>>
>> v4:
>> - fixed page_start calculation in __xsk_rcv_memcpy().
>> - move offset handling to the xdp_umem_get_* functions
>> - modified the len field in xdp_umem_reg struct. We now use 16 bits
>> from
>> this for the flags field.
>> - removed next_pg_contig field from xdp_umem_page struct. Using low 12
>> bits of addr to store flags instead.
>> - other minor changes based on review comments
>>
>> v5:
>> - Added accessors for getting addr and offset
>> - Added helper function to add offset to addr
>> - Fixed offset handling in xsk_rcv
>> - Removed bitfields from xdp_umem_reg
>> - Added struct size checking for xdp_umem_reg in xsk_setsockopt to
>> handle
>> different versions of the struct.
>> - fix conflicts after 'bpf-af-xdp-wakeup' was merged.
>> ---
>> include/net/xdp_sock.h | 75 +++++++++++++++++++++++++++--
>> include/uapi/linux/if_xdp.h | 9 ++++
>> net/xdp/xdp_umem.c | 19 ++++++--
>> net/xdp/xsk.c | 96 +++++++++++++++++++++++++++++--------
>> net/xdp/xsk_diag.c | 2 +-
>> net/xdp/xsk_queue.h | 68 ++++++++++++++++++++++----
>> 6 files changed, 232 insertions(+), 37 deletions(-)
>>
>>
[...]
>> @@ -196,17 +221,17 @@ int xsk_generic_rcv(struct xdp_sock *xs, struct
>> xdp_buff *xdp)
>> goto out_unlock;
>> }
>>
>> - if (!xskq_peek_addr(xs->umem->fq, &addr) ||
>> + if (!xskq_peek_addr(xs->umem->fq, &addr, xs->umem) ||
>> len > xs->umem->chunk_size_nohr - XDP_PACKET_HEADROOM) {
>> err = -ENOSPC;
>> goto out_drop;
>> }
>>
>> - addr += xs->umem->headroom;
>> -
>> - buffer = xdp_umem_get_data(xs->umem, addr);
>> + buffer = xdp_umem_get_data(xs->umem, addr + offset);
>> memcpy(buffer, xdp->data_meta, len + metalen);
>> - addr += metalen;
>> + offset += metalen;
>> +
>> + addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
>> err = xskq_produce_batch_desc(xs->rx, addr, len);
>> if (err)
>> goto out_drop;
>
> Can't just add address and offset any longer. This should read:
>
> addr = xsk_umem_adjust_offset(xs->umem, addr, offset);
> buffer = xdp_umem_get_data(xs->umem, addr);
>
> addr = xsk_umem_adjust_offset(xs->umem, addr, metalen);
>
>
> so that offset and then metalen are added. (or preserve the
> address across the calls like memcpy_addr earlier).
Will fix this, thanks!
-Kevin
^ permalink raw reply
* [PATCH net] ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev
From: Sabrina Dubroca @ 2019-08-23 13:44 UTC (permalink / raw)
To: netdev; +Cc: Sabrina Dubroca
Currently, ipv6_find_idev returns NULL when ipv6_add_dev fails,
ignoring the specific error value. This results in addrconf_add_dev
returning ENOBUFS in all cases, which is unfortunate in cases such as:
# ip link add dummyX type dummy
# ip link set dummyX mtu 1200 up
# ip addr add 2000::/64 dev dummyX
RTNETLINK answers: No buffer space available
Commit a317a2f19da7 ("ipv6: fail early when creating netdev named all
or default") introduced error returns in ipv6_add_dev. Before that,
that function would simply return NULL for all failures.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
---
net/ipv6/addrconf.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index ced995f3fec4..6a576ff92c39 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -478,7 +478,7 @@ static struct inet6_dev *ipv6_find_idev(struct net_device *dev)
if (!idev) {
idev = ipv6_add_dev(dev);
if (IS_ERR(idev))
- return NULL;
+ return idev;
}
if (dev->flags&IFF_UP)
@@ -2466,8 +2466,8 @@ static struct inet6_dev *addrconf_add_dev(struct net_device *dev)
ASSERT_RTNL();
idev = ipv6_find_idev(dev);
- if (!idev)
- return ERR_PTR(-ENOBUFS);
+ if (IS_ERR(idev))
+ return idev;
if (idev->cnf.disable_ipv6)
return ERR_PTR(-EACCES);
@@ -3159,7 +3159,7 @@ static void init_loopback(struct net_device *dev)
ASSERT_RTNL();
idev = ipv6_find_idev(dev);
- if (!idev) {
+ if (IS_ERR(idev)) {
pr_debug("%s: add_dev failed\n", __func__);
return;
}
@@ -3374,7 +3374,7 @@ static void addrconf_sit_config(struct net_device *dev)
*/
idev = ipv6_find_idev(dev);
- if (!idev) {
+ if (IS_ERR(idev)) {
pr_debug("%s: add_dev failed\n", __func__);
return;
}
@@ -3399,7 +3399,7 @@ static void addrconf_gre_config(struct net_device *dev)
ASSERT_RTNL();
idev = ipv6_find_idev(dev);
- if (!idev) {
+ if (IS_ERR(idev)) {
pr_debug("%s: add_dev failed\n", __func__);
return;
}
@@ -4773,8 +4773,8 @@ inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh,
IFA_F_MCAUTOJOIN | IFA_F_OPTIMISTIC;
idev = ipv6_find_idev(dev);
- if (!idev)
- return -ENOBUFS;
+ if (IS_ERR(idev))
+ return PTR_ERR(idev);
if (!ipv6_allow_optimistic_dad(net, idev))
cfg.ifa_flags &= ~IFA_F_OPTIMISTIC;
--
2.22.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox