* [PATCH 10/12] ide: remove the PCI_DMA_BUS_IS_PHYS check
From: Christoph Hellwig @ 2018-04-16 8:50 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-arch-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-ide-u79uwXL29TY76Z2rM5mHXA,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180416085032.7367-1-hch-jcswGhMUV9g@public.gmane.org>
We now have ways to deal with drainage in the block layer, and libata has
been using it for ages. We also want to get rid of PCI_DMA_BUS_IS_PHYS
now, so just reduce the PCI transfer size for ide - anyone who cares for
performance on PCI controllers should have switched to libata long ago.
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
drivers/ide/ide-probe.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index 8d8ed036ca0a..56d7bc228cb3 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -796,8 +796,7 @@ static int ide_init_queue(ide_drive_t *drive)
* This will be fixed once we teach pci_map_sg() about our boundary
* requirements, hopefully soon. *FIXME*
*/
- if (!PCI_DMA_BUS_IS_PHYS)
- max_sg_entries >>= 1;
+ max_sg_entries >>= 1;
#endif /* CONFIG_PCI */
blk_queue_max_segments(q, max_sg_entries);
--
2.17.0
^ permalink raw reply related
* [PATCH 09/12] ide: kill ide_toggle_bounce
From: Christoph Hellwig @ 2018-04-16 8:50 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-arch-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-ide-u79uwXL29TY76Z2rM5mHXA,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180416085032.7367-1-hch-jcswGhMUV9g@public.gmane.org>
ide_toggle_bounce did select various strange block bounce limits, including
not bouncing at all as soon as an iommu is present in the system. Given
that the dma_map routines now handle any required bounce buffering except
for ISA DMA, and the ide code already must handle either ISA DMA or highmem
at least for iommu equipped systems we can get rid of the block layer
bounce limit setting entirely.
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
drivers/ide/ide-dma.c | 2 --
drivers/ide/ide-lib.c | 26 --------------------------
drivers/ide/ide-probe.c | 3 ---
include/linux/ide.h | 2 --
4 files changed, 33 deletions(-)
diff --git a/drivers/ide/ide-dma.c b/drivers/ide/ide-dma.c
index 54d4d78ca46a..6f344654ef22 100644
--- a/drivers/ide/ide-dma.c
+++ b/drivers/ide/ide-dma.c
@@ -180,7 +180,6 @@ EXPORT_SYMBOL_GPL(ide_dma_unmap_sg);
void ide_dma_off_quietly(ide_drive_t *drive)
{
drive->dev_flags &= ~IDE_DFLAG_USING_DMA;
- ide_toggle_bounce(drive, 0);
drive->hwif->dma_ops->dma_host_set(drive, 0);
}
@@ -211,7 +210,6 @@ EXPORT_SYMBOL(ide_dma_off);
void ide_dma_on(ide_drive_t *drive)
{
drive->dev_flags |= IDE_DFLAG_USING_DMA;
- ide_toggle_bounce(drive, 1);
drive->hwif->dma_ops->dma_host_set(drive, 1);
}
diff --git a/drivers/ide/ide-lib.c b/drivers/ide/ide-lib.c
index e1180fa46196..78cb79eddc8b 100644
--- a/drivers/ide/ide-lib.c
+++ b/drivers/ide/ide-lib.c
@@ -6,32 +6,6 @@
#include <linux/ide.h>
#include <linux/bitops.h>
-/**
- * ide_toggle_bounce - handle bounce buffering
- * @drive: drive to update
- * @on: on/off boolean
- *
- * Enable or disable bounce buffering for the device. Drives move
- * between PIO and DMA and that changes the rules we need.
- */
-
-void ide_toggle_bounce(ide_drive_t *drive, int on)
-{
- u64 addr = BLK_BOUNCE_HIGH; /* dma64_addr_t */
-
- if (!PCI_DMA_BUS_IS_PHYS) {
- addr = BLK_BOUNCE_ANY;
- } else if (on && drive->media == ide_disk) {
- struct device *dev = drive->hwif->dev;
-
- if (dev && dev->dma_mask)
- addr = *dev->dma_mask;
- }
-
- if (drive->queue)
- blk_queue_bounce_limit(drive->queue, addr);
-}
-
u64 ide_get_lba_addr(struct ide_cmd *cmd, int lba48)
{
struct ide_taskfile *tf = &cmd->tf;
diff --git a/drivers/ide/ide-probe.c b/drivers/ide/ide-probe.c
index 2019e66eada7..8d8ed036ca0a 100644
--- a/drivers/ide/ide-probe.c
+++ b/drivers/ide/ide-probe.c
@@ -805,9 +805,6 @@ static int ide_init_queue(ide_drive_t *drive)
/* assign drive queue */
drive->queue = q;
- /* needs drive->queue to be set */
- ide_toggle_bounce(drive, 1);
-
return 0;
}
diff --git a/include/linux/ide.h b/include/linux/ide.h
index ca9d34feb572..11f0dd03a4b4 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -1508,8 +1508,6 @@ static inline void ide_set_hwifdata (ide_hwif_t * hwif, void *data)
hwif->hwif_data = data;
}
-extern void ide_toggle_bounce(ide_drive_t *drive, int on);
-
u64 ide_get_lba_addr(struct ide_cmd *, int);
u8 ide_dump_status(ide_drive_t *, const char *, u8);
--
2.17.0
^ permalink raw reply related
* [PATCH 08/12] mmc: reduce use of block bounce buffers
From: Christoph Hellwig @ 2018-04-16 8:50 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-arch-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-ide-u79uwXL29TY76Z2rM5mHXA,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180416085032.7367-1-hch-jcswGhMUV9g@public.gmane.org>
We can rely on the dma-mapping code to handle any DMA limits that is
bigger than the ISA DMA mask for us (either using an iommu or swiotlb),
so remove setting the block layer bounce limit for anything but bouncing
for highmem pages.
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
drivers/mmc/core/queue.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 56e9a803db21..60a02a763d01 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -351,17 +351,14 @@ static const struct blk_mq_ops mmc_mq_ops = {
static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
{
struct mmc_host *host = card->host;
- u64 limit = BLK_BOUNCE_HIGH;
-
- if (mmc_dev(host)->dma_mask && *mmc_dev(host)->dma_mask)
- limit = (u64)dma_max_pfn(mmc_dev(host)) << PAGE_SHIFT;
blk_queue_flag_set(QUEUE_FLAG_NONROT, mq->queue);
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, mq->queue);
if (mmc_can_erase(card))
mmc_queue_setup_discard(mq->queue, card);
- blk_queue_bounce_limit(mq->queue, limit);
+ if (!mmc_dev(host)->dma_mask || !mmc_dev(host)->dma_mask)
+ blk_queue_bounce_limit(mq->queue, BLK_BOUNCE_HIGH);
blk_queue_max_hw_sectors(mq->queue,
min(host->max_blk_count, host->max_req_size / 512));
blk_queue_max_segments(mq->queue, host->max_segs);
--
2.17.0
^ permalink raw reply related
* [PATCH 07/12] scsi: reduce use of block bounce buffers
From: Christoph Hellwig @ 2018-04-16 8:50 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-arch-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-ide-u79uwXL29TY76Z2rM5mHXA,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180416085032.7367-1-hch-jcswGhMUV9g@public.gmane.org>
We can rely on the dma-mapping code to handle any DMA limits that is
bigger than the ISA DMA mask for us (either using an iommu or swiotlb),
so remove setting the block layer bounce limit for anything but the
unchecked_isa_dma case, or the bouncing for highmem pages.
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
drivers/scsi/scsi_lib.c | 24 ++----------------------
1 file changed, 2 insertions(+), 22 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0dfec0dedd5e..ebe2cbb48b80 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2142,27 +2142,6 @@ static int scsi_map_queues(struct blk_mq_tag_set *set)
return blk_mq_map_queues(set);
}
-static u64 scsi_calculate_bounce_limit(struct Scsi_Host *shost)
-{
- struct device *host_dev;
- u64 bounce_limit = 0xffffffff;
-
- if (shost->unchecked_isa_dma)
- return BLK_BOUNCE_ISA;
- /*
- * Platforms with virtual-DMA translation
- * hardware have no practical limit.
- */
- if (!PCI_DMA_BUS_IS_PHYS)
- return BLK_BOUNCE_ANY;
-
- host_dev = scsi_get_device(shost);
- if (host_dev && host_dev->dma_mask)
- bounce_limit = (u64)dma_max_pfn(host_dev) << PAGE_SHIFT;
-
- return bounce_limit;
-}
-
void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
{
struct device *dev = shost->dma_dev;
@@ -2182,7 +2161,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
}
blk_queue_max_hw_sectors(q, shost->max_sectors);
- blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
+ if (shost->unchecked_isa_dma)
+ blk_queue_bounce_limit(q, BLK_BOUNCE_ISA);
blk_queue_segment_boundary(q, shost->dma_boundary);
dma_set_seg_boundary(dev, shost->dma_boundary);
--
2.17.0
^ permalink raw reply related
* [PATCH 06/12] memstick: don't call blk_queue_bounce_limit
From: Christoph Hellwig @ 2018-04-16 8:50 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-arch-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-ide-u79uwXL29TY76Z2rM5mHXA,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180416085032.7367-1-hch-jcswGhMUV9g@public.gmane.org>
All in-tree host drivers set up a proper dma mask and use the dma-mapping
helpers. This means they will be able to deal with any address that we
are throwing at them.
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
drivers/memstick/core/ms_block.c | 5 -----
drivers/memstick/core/mspro_block.c | 5 -----
2 files changed, 10 deletions(-)
diff --git a/drivers/memstick/core/ms_block.c b/drivers/memstick/core/ms_block.c
index 57b13dfbd21e..b2d025f42d14 100644
--- a/drivers/memstick/core/ms_block.c
+++ b/drivers/memstick/core/ms_block.c
@@ -2096,12 +2096,8 @@ static int msb_init_disk(struct memstick_dev *card)
struct msb_data *msb = memstick_get_drvdata(card);
struct memstick_host *host = card->host;
int rc;
- u64 limit = BLK_BOUNCE_HIGH;
unsigned long capacity;
- if (host->dev.dma_mask && *(host->dev.dma_mask))
- limit = *(host->dev.dma_mask);
-
mutex_lock(&msb_disk_lock);
msb->disk_id = idr_alloc(&msb_disk_idr, card, 0, 256, GFP_KERNEL);
mutex_unlock(&msb_disk_lock);
@@ -2123,7 +2119,6 @@ static int msb_init_disk(struct memstick_dev *card)
msb->queue->queuedata = card;
- blk_queue_bounce_limit(msb->queue, limit);
blk_queue_max_hw_sectors(msb->queue, MS_BLOCK_MAX_PAGES);
blk_queue_max_segments(msb->queue, MS_BLOCK_MAX_SEGS);
blk_queue_max_segment_size(msb->queue,
diff --git a/drivers/memstick/core/mspro_block.c b/drivers/memstick/core/mspro_block.c
index 8897962781bb..a2fadc605750 100644
--- a/drivers/memstick/core/mspro_block.c
+++ b/drivers/memstick/core/mspro_block.c
@@ -1175,12 +1175,8 @@ static int mspro_block_init_disk(struct memstick_dev *card)
struct mspro_sys_info *sys_info = NULL;
struct mspro_sys_attr *s_attr = NULL;
int rc, disk_id;
- u64 limit = BLK_BOUNCE_HIGH;
unsigned long capacity;
- if (host->dev.dma_mask && *(host->dev.dma_mask))
- limit = *(host->dev.dma_mask);
-
for (rc = 0; msb->attr_group.attrs[rc]; ++rc) {
s_attr = mspro_from_sysfs_attr(msb->attr_group.attrs[rc]);
@@ -1219,7 +1215,6 @@ static int mspro_block_init_disk(struct memstick_dev *card)
msb->queue->queuedata = card;
- blk_queue_bounce_limit(msb->queue, limit);
blk_queue_max_hw_sectors(msb->queue, MSPRO_BLOCK_MAX_PAGES);
blk_queue_max_segments(msb->queue, MSPRO_BLOCK_MAX_SEGS);
blk_queue_max_segment_size(msb->queue,
--
2.17.0
^ permalink raw reply related
* [PATCH 05/12] sata_nv: don't use block layer bounce buffers
From: Christoph Hellwig @ 2018-04-16 8:50 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-arch-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-ide-u79uwXL29TY76Z2rM5mHXA,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180416085032.7367-1-hch-jcswGhMUV9g@public.gmane.org>
sata_nv sets the block bounce limit to the reduce dma mask for ATAPI
devices, which means that the iommu or swiotlb already take care of
the bounce buffering, and the block bouncing can be removed.
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
drivers/ata/sata_nv.c | 62 +++++++++++++++++--------------------------
1 file changed, 24 insertions(+), 38 deletions(-)
diff --git a/drivers/ata/sata_nv.c b/drivers/ata/sata_nv.c
index 8c683ddd0f58..b6e9ad6d33c9 100644
--- a/drivers/ata/sata_nv.c
+++ b/drivers/ata/sata_nv.c
@@ -740,32 +740,16 @@ static int nv_adma_slave_config(struct scsi_device *sdev)
sdev1 = ap->host->ports[1]->link.device[0].sdev;
if ((port0->flags & NV_ADMA_ATAPI_SETUP_COMPLETE) ||
(port1->flags & NV_ADMA_ATAPI_SETUP_COMPLETE)) {
- /** We have to set the DMA mask to 32-bit if either port is in
- ATAPI mode, since they are on the same PCI device which is
- used for DMA mapping. If we set the mask we also need to set
- the bounce limit on both ports to ensure that the block
- layer doesn't feed addresses that cause DMA mapping to
- choke. If either SCSI device is not allocated yet, it's OK
- since that port will discover its correct setting when it
- does get allocated.
- Note: Setting 32-bit mask should not fail. */
- if (sdev0)
- blk_queue_bounce_limit(sdev0->request_queue,
- ATA_DMA_MASK);
- if (sdev1)
- blk_queue_bounce_limit(sdev1->request_queue,
- ATA_DMA_MASK);
-
- dma_set_mask(&pdev->dev, ATA_DMA_MASK);
+ /*
+ * We have to set the DMA mask to 32-bit if either port is in
+ * ATAPI mode, since they are on the same PCI device which is
+ * used for DMA mapping. If either SCSI device is not allocated
+ * yet, it's OK since that port will discover its correct
+ * setting when it does get allocated.
+ */
+ rc = dma_set_mask(&pdev->dev, ATA_DMA_MASK);
} else {
- /** This shouldn't fail as it was set to this value before */
- dma_set_mask(&pdev->dev, pp->adma_dma_mask);
- if (sdev0)
- blk_queue_bounce_limit(sdev0->request_queue,
- pp->adma_dma_mask);
- if (sdev1)
- blk_queue_bounce_limit(sdev1->request_queue,
- pp->adma_dma_mask);
+ rc = dma_set_mask(&pdev->dev, pp->adma_dma_mask);
}
blk_queue_segment_boundary(sdev->request_queue, segment_boundary);
@@ -1131,12 +1115,11 @@ static int nv_adma_port_start(struct ata_port *ap)
VPRINTK("ENTER\n");
- /* Ensure DMA mask is set to 32-bit before allocating legacy PRD and
- pad buffers */
- rc = dma_set_mask(&pdev->dev, DMA_BIT_MASK(32));
- if (rc)
- return rc;
- rc = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32));
+ /*
+ * Ensure DMA mask is set to 32-bit before allocating legacy PRD and
+ * pad buffers.
+ */
+ rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
if (rc)
return rc;
@@ -1156,13 +1139,16 @@ static int nv_adma_port_start(struct ata_port *ap)
pp->notifier_clear_block = pp->gen_block +
NV_ADMA_NOTIFIER_CLEAR + (4 * ap->port_no);
- /* Now that the legacy PRD and padding buffer are allocated we can
- safely raise the DMA mask to allocate the CPB/APRD table.
- These are allowed to fail since we store the value that ends up
- being used to set as the bounce limit in slave_config later if
- needed. */
- dma_set_mask(&pdev->dev, DMA_BIT_MASK(64));
- dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64));
+ /*
+ * Now that the legacy PRD and padding buffer are allocated we can
+ * try to raise the DMA mask to allocate the CPB/APRD table.
+ */
+ rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+ if (rc) {
+ rc = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
+ if (rc)
+ return rc;
+ }
pp->adma_dma_mask = *dev->dma_mask;
mem = dmam_alloc_coherent(dev, NV_ADMA_PORT_PRIV_DMA_SZ,
--
2.17.0
^ permalink raw reply related
* [PATCH 04/12] DAC960: don't use block layer bounce buffers
From: Christoph Hellwig @ 2018-04-16 8:50 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-arch-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-ide-u79uwXL29TY76Z2rM5mHXA,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180416085032.7367-1-hch-jcswGhMUV9g@public.gmane.org>
DAC960 just sets the block bounce limit to the dma mask, which means
that the iommu or swiotlb already take care of the bounce buffering,
and the block bouncing can be removed.
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
drivers/block/DAC960.c | 9 ++-------
drivers/block/DAC960.h | 1 -
2 files changed, 2 insertions(+), 8 deletions(-)
diff --git a/drivers/block/DAC960.c b/drivers/block/DAC960.c
index f781eff7d23e..c9ba48519d0f 100644
--- a/drivers/block/DAC960.c
+++ b/drivers/block/DAC960.c
@@ -1179,7 +1179,6 @@ static bool DAC960_V1_EnableMemoryMailboxInterface(DAC960_Controller_T
if (pci_set_dma_mask(Controller->PCIDevice, DMA_BIT_MASK(32)))
return DAC960_Failure(Controller, "DMA mask out of range");
- Controller->BounceBufferLimit = DMA_BIT_MASK(32);
if ((hw_type == DAC960_PD_Controller) || (hw_type == DAC960_P_Controller)) {
CommandMailboxesSize = 0;
@@ -1380,11 +1379,8 @@ static bool DAC960_V2_EnableMemoryMailboxInterface(DAC960_Controller_T
dma_addr_t CommandMailboxDMA;
DAC960_V2_CommandStatus_T CommandStatus;
- if (!pci_set_dma_mask(Controller->PCIDevice, DMA_BIT_MASK(64)))
- Controller->BounceBufferLimit = DMA_BIT_MASK(64);
- else if (!pci_set_dma_mask(Controller->PCIDevice, DMA_BIT_MASK(32)))
- Controller->BounceBufferLimit = DMA_BIT_MASK(32);
- else
+ if (pci_set_dma_mask(Controller->PCIDevice, DMA_BIT_MASK(64)) &&
+ pci_set_dma_mask(Controller->PCIDevice, DMA_BIT_MASK(32)))
return DAC960_Failure(Controller, "DMA mask out of range");
/* This is a temporary dma mapping, used only in the scope of this function */
@@ -2540,7 +2536,6 @@ static bool DAC960_RegisterBlockDevice(DAC960_Controller_T *Controller)
continue;
}
Controller->RequestQueue[n] = RequestQueue;
- blk_queue_bounce_limit(RequestQueue, Controller->BounceBufferLimit);
RequestQueue->queuedata = Controller;
blk_queue_max_segments(RequestQueue, Controller->DriverScatterGatherLimit);
blk_queue_max_hw_sectors(RequestQueue, Controller->MaxBlocksPerCommand);
diff --git a/drivers/block/DAC960.h b/drivers/block/DAC960.h
index 21aff470d268..1439e651928b 100644
--- a/drivers/block/DAC960.h
+++ b/drivers/block/DAC960.h
@@ -2295,7 +2295,6 @@ typedef struct DAC960_Controller
unsigned short MaxBlocksPerCommand;
unsigned short ControllerScatterGatherLimit;
unsigned short DriverScatterGatherLimit;
- u64 BounceBufferLimit;
unsigned int CombinedStatusBufferLength;
unsigned int InitialStatusLength;
unsigned int CurrentStatusLength;
--
2.17.0
^ permalink raw reply related
* [PATCH 03/12] mtip32xx: don't use block layer bounce buffers
From: Christoph Hellwig @ 2018-04-16 8:50 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-arch-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-ide-u79uwXL29TY76Z2rM5mHXA,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180416085032.7367-1-hch-jcswGhMUV9g@public.gmane.org>
mtip32xx just sets the block bounce limit to the dma mask, which means
that the iommu or swiotlb already take care of the bounce buffering,
and the block bouncing can be removed.
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
drivers/block/mtip32xx/mtip32xx.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 769c551e3d71..b03bb27dcc58 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3862,7 +3862,6 @@ static int mtip_block_initialize(struct driver_data *dd)
blk_queue_max_hw_sectors(dd->queue, 0xffff);
blk_queue_max_segment_size(dd->queue, 0x400000);
blk_queue_io_min(dd->queue, 4096);
- blk_queue_bounce_limit(dd->queue, dd->pdev->dma_mask);
/* Signal trim support */
if (dd->trim_supp == true) {
--
2.17.0
^ permalink raw reply related
* [PATCH 02/12] storsvc: don't set a bounce limit
From: Christoph Hellwig @ 2018-04-16 8:50 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-arch-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-ide-u79uwXL29TY76Z2rM5mHXA,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180416085032.7367-1-hch-jcswGhMUV9g@public.gmane.org>
The default already is to never bounce, so the call is a no-op.
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
drivers/scsi/storvsc_drv.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 8c51d628b52e..5f2d177c3bd9 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1382,9 +1382,6 @@ static int storvsc_device_alloc(struct scsi_device *sdevice)
static int storvsc_device_configure(struct scsi_device *sdevice)
{
-
- blk_queue_bounce_limit(sdevice->request_queue, BLK_BOUNCE_ANY);
-
blk_queue_rq_timeout(sdevice->request_queue, (storvsc_timeout * HZ));
/* Ensure there are no gaps in presented sgls */
--
2.17.0
^ permalink raw reply related
* [PATCH 01/12] iscsi_tcp: don't set a bounce limit
From: Christoph Hellwig @ 2018-04-16 8:50 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-arch-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-ide-u79uwXL29TY76Z2rM5mHXA,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180416085032.7367-1-hch-jcswGhMUV9g@public.gmane.org>
The default already is to never bounce, so the call is a no-op.
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
drivers/scsi/iscsi_tcp.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 2ba4b68fdb73..b025a0b74341 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -962,7 +962,6 @@ static int iscsi_sw_tcp_slave_configure(struct scsi_device *sdev)
if (conn->datadgst_en)
sdev->request_queue->backing_dev_info->capabilities
|= BDI_CAP_STABLE_WRITES;
- blk_queue_bounce_limit(sdev->request_queue, BLK_BOUNCE_ANY);
blk_queue_dma_alignment(sdev->request_queue, 0);
return 0;
}
--
2.17.0
^ permalink raw reply related
* remove PCI_DMA_BUS_IS_PHYS
From: Christoph Hellwig @ 2018-04-16 8:50 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
linux-arch-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-ide-u79uwXL29TY76Z2rM5mHXA,
linux-scsi-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Hi all,
this series tries to get rid of the global and PCI_DMA_BUS_IS_PHYS flag,
which causes the block layer and networking code to bounce buffer memory
above the dma mask in some cases. It is a leftover from i386 + highmem
days and is obsolete now that we have swiotlb or iommus so that the
dma ops implementations can always (well minus the ISA DMA case which
will require further attention) handle memory passed to them.
^ permalink raw reply
* KASAN: use-after-free Read in llc_conn_ac_send_sabme_cmd_p_set_x
From: syzbot @ 2018-04-16 8:50 UTC (permalink / raw)
To: davem, keescook, linux-kernel, netdev, syzkaller-bugs,
xiyou.wangcong
Hello,
syzbot hit the following crash on upstream commit
18b7fd1c93e5204355ddbf2608a097d64df81b88 (Sat Apr 14 15:50:50 2018 +0000)
Merge branch 'akpm' (patches from Andrew)
syzbot dashboard link:
https://syzkaller.appspot.com/bug?extid=6e181fc95081c2cf9051
Unfortunately, I don't have any reproducer for this crash yet.
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=5257422885093376
Kernel config:
https://syzkaller.appspot.com/x/.config?id=-8852471259444315113
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+6e181fc95081c2cf9051@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.
XFS (loop1): Invalid superblock magic number
==================================================================
BUG: KASAN: use-after-free in
llc_conn_ac_send_sabme_cmd_p_set_x+0x3a8/0x460 net/llc/llc_c_ac.c:785
Read of size 1 at addr ffff88018be1a290 by task syz-executor7/13726
CPU: 0 PID: 13726 Comm: syz-executor7 Not tainted 4.16.0+ #3
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
__asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
llc_conn_ac_send_sabme_cmd_p_set_x+0x3a8/0x460 net/llc/llc_c_ac.c:785
llc_exec_conn_trans_actions net/llc/llc_conn.c:475 [inline]
llc_conn_service net/llc/llc_conn.c:400 [inline]
llc_conn_state_process+0x4e1/0x13a0 net/llc/llc_conn.c:75
llc_backlog_rcv+0x195/0x1e0 net/llc/llc_conn.c:891
sk_backlog_rcv include/net/sock.h:909 [inline]
__release_sock+0x12f/0x3a0 net/core/sock.c:2335
release_sock+0xa4/0x2b0 net/core/sock.c:2850
llc_ui_release+0xc8/0x220 net/llc/af_llc.c:204
sock_release+0x96/0x1b0 net/socket.c:594
sock_close+0x16/0x20 net/socket.c:1149
__fput+0x34d/0x890 fs/file_table.c:209
____fput+0x15/0x20 fs/file_table.c:243
task_work_run+0x1e4/0x290 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0x1aee/0x2730 kernel/exit.c:865
do_group_exit+0x16f/0x430 kernel/exit.c:968
SYSC_exit_group kernel/exit.c:979 [inline]
SyS_exit_group+0x1d/0x20 kernel/exit.c:977
do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x455319
RSP: 002b:00007ffc740e5db8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00000000000000c4 RCX: 0000000000455319
RDX: 00000000000274e8 RSI: 0000000000730500 RDI: 0000000000000000
RBP: 0000000000000013 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000013
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000001380
Allocated by task 13728:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
kmalloc include/linux/slab.h:512 [inline]
kzalloc include/linux/slab.h:701 [inline]
llc_sap_alloc net/llc/llc_core.c:35 [inline]
llc_sap_open+0x193/0x4d0 net/llc/llc_core.c:102
llc_ui_bind+0xb8c/0xef0 net/llc/af_llc.c:354
__sys_bind+0x331/0x440 net/socket.c:1484
SYSC_bind net/socket.c:1495 [inline]
SyS_bind+0x24/0x30 net/socket.c:1493
do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
Freed by task 13726:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kfree+0xd9/0x260 mm/slab.c:3813
llc_sap_close+0x1d8/0x2d0 net/llc/llc_core.c:132
llc_sap_put include/net/llc.h:124 [inline]
llc_sap_remove_socket+0x460/0x5b0 net/llc/llc_conn.c:760
llc_ui_release+0x1de/0x220 net/llc/af_llc.c:203
sock_release+0x96/0x1b0 net/socket.c:594
sock_close+0x16/0x20 net/socket.c:1149
__fput+0x34d/0x890 fs/file_table.c:209
____fput+0x15/0x20 fs/file_table.c:243
task_work_run+0x1e4/0x290 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0x1aee/0x2730 kernel/exit.c:865
do_group_exit+0x16f/0x430 kernel/exit.c:968
SYSC_exit_group kernel/exit.c:979 [inline]
SyS_exit_group+0x1d/0x20 kernel/exit.c:977
do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
The buggy address belongs to the object at ffff88018be1a280
which belongs to the cache kmalloc-2048 of size 2048
The buggy address is located 16 bytes inside of
2048-byte region [ffff88018be1a280, ffff88018be1aa80)
The buggy address belongs to the page:
page:ffffea00062f8680 count:1 mapcount:0 mapping:ffff88018be1a280 index:0x0
compound_mapcount: 0
flags: 0x2fffc0000008100(slab|head)
raw: 02fffc0000008100 ffff88018be1a280 0000000000000000 0000000100000003
raw: ffffea00062f7ea0 ffffea00062f88a0 ffff8801dac00c40 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88018be1a180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88018be1a200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff88018be1a280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88018be1a300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88018be1a380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.
syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.
^ permalink raw reply
* [PATCH net] net: mvpp2: Fix TCAM filter reserved range
From: Maxime Chevallier @ 2018-04-16 8:07 UTC (permalink / raw)
To: davem
Cc: Maxime Chevallier, netdev, linux-kernel, Antoine Tenart,
thomas.petazzoni, gregory.clement, miquel.raynal, nadavh, stefanc,
ymarkman, mw
Marvell's PPv2 controller has a Packet Header parser, which uses a
fixed-size TCAM array of filter entries.
The mvpp2 driver reserves some ranges among the 256 TCAM entries to
perform MAC and VID filtering. The rest of the TCAM ids are freely usable
for other features, such as IPv4 proto matching.
This commit fixes the MVPP2_PE_LAST_FREE_TID define that sets the end of
the "free range", which included the MAC range. This could therefore allow
some other features to use entries dedicated to MAC filtering,
lowering the number of unicast/multicast addresses that could be allowed
before switching to promiscuous mode.
Fixes: 10fea26ce2aa ("net: mvpp2: Add support for unicast filtering")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---
drivers/net/ethernet/marvell/mvpp2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c
index 54a038943c06..9deb79b6dcc8 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -663,7 +663,7 @@ enum mvpp2_tag_type {
#define MVPP2_PE_VID_FILT_RANGE_END (MVPP2_PRS_TCAM_SRAM_SIZE - 31)
#define MVPP2_PE_VID_FILT_RANGE_START (MVPP2_PE_VID_FILT_RANGE_END - \
MVPP2_PRS_VLAN_FILT_RANGE_SIZE + 1)
-#define MVPP2_PE_LAST_FREE_TID (MVPP2_PE_VID_FILT_RANGE_START - 1)
+#define MVPP2_PE_LAST_FREE_TID (MVPP2_PE_MAC_RANGE_START - 1)
#define MVPP2_PE_IP6_EXT_PROTO_UN (MVPP2_PRS_TCAM_SRAM_SIZE - 30)
#define MVPP2_PE_IP6_ADDR_UN (MVPP2_PRS_TCAM_SRAM_SIZE - 29)
#define MVPP2_PE_IP4_ADDR_UN (MVPP2_PRS_TCAM_SRAM_SIZE - 28)
--
2.11.0
^ permalink raw reply related
* [PATCH 2/2] net: socionext: reset hardware in ndo_stop
From: jassisinghbrar @ 2018-04-16 7:39 UTC (permalink / raw)
To: netdev; +Cc: davem, masahisa.kojima, ard.biesheuvel, Jassi Brar
From: Masahisa KOJIMA <masahisa.kojima@linaro.org>
When the interface is down, head/tail of the descriptor
ring address is set to 0 in netsec_netdev_stop().
But netsec hardware still keeps the previous descriptor
ring address, so there is inconsistency between driver
and hardware after interface is up at a later time.
To address this inconsistency, add netsec_reset_hardware()
when the interface is down.
In addition, to minimize the reset process,
add flag to decide whether driver loads the netsec microcode.
Even if driver resets the netsec hardware, netsec microcode
keeps resident on RAM, so it is ok we only load the microcode
at initialization.
This patch is critical for installation over network.
Signed-off-by: Masahisa KOJIMA <masahisa.kojima@linaro.org>
Fixes: 533dd11a12f6 ("net: socionext: Add Synquacer NetSec driver")
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
---
drivers/net/ethernet/socionext/netsec.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index f6fe70e..aa50331 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -1057,7 +1057,8 @@ static int netsec_netdev_load_microcode(struct netsec_priv *priv)
return 0;
}
-static int netsec_reset_hardware(struct netsec_priv *priv)
+static int netsec_reset_hardware(struct netsec_priv *priv,
+ bool load_ucode)
{
u32 value;
int err;
@@ -1102,11 +1103,14 @@ static int netsec_reset_hardware(struct netsec_priv *priv)
netsec_write(priv, NETSEC_REG_NRM_RX_CONFIG,
1 << NETSEC_REG_DESC_ENDIAN);
- err = netsec_netdev_load_microcode(priv);
- if (err) {
- netif_err(priv, probe, priv->ndev,
- "%s: failed to load microcode (%d)\n", __func__, err);
- return err;
+ if (load_ucode) {
+ err = netsec_netdev_load_microcode(priv);
+ if (err) {
+ netif_err(priv, probe, priv->ndev,
+ "%s: failed to load microcode (%d)\n",
+ __func__, err);
+ return err;
+ }
}
/* start DMA engines */
@@ -1328,6 +1332,7 @@ static int netsec_netdev_open(struct net_device *ndev)
static int netsec_netdev_stop(struct net_device *ndev)
{
+ int ret;
struct netsec_priv *priv = netdev_priv(ndev);
netif_stop_queue(priv->ndev);
@@ -1343,12 +1348,14 @@ static int netsec_netdev_stop(struct net_device *ndev)
netsec_uninit_pkt_dring(priv, NETSEC_RING_TX);
netsec_uninit_pkt_dring(priv, NETSEC_RING_RX);
+ ret = netsec_reset_hardware(priv, false);
+
phy_stop(ndev->phydev);
phy_disconnect(ndev->phydev);
pm_runtime_put_sync(priv->dev);
- return 0;
+ return ret;
}
static int netsec_netdev_init(struct net_device *ndev)
@@ -1364,7 +1371,7 @@ static int netsec_netdev_init(struct net_device *ndev)
if (ret)
goto err1;
- ret = netsec_reset_hardware(priv);
+ ret = netsec_reset_hardware(priv, true);
if (ret)
goto err2;
--
2.7.4
^ permalink raw reply related
* [PATCH 1/2] net: netsec: enable tx-irq during open callback
From: jassisinghbrar @ 2018-04-16 7:22 UTC (permalink / raw)
To: netdev; +Cc: davem, masahisa.kojima, ard.biesheuvel, Jassi Brar
From: Jassi Brar <jaswinder.singh@linaro.org>
Enable TX-irq as well during ndo_open() as we can not count upon
RX to arrive early enough to trigger the napi. This patch is critical
for installation over network.
Fixes: 533dd11a12f6 ("net: socionext: Add Synquacer NetSec driver")
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
---
drivers/net/ethernet/socionext/netsec.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index f4c0b02..f6fe70e 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -1313,8 +1313,8 @@ static int netsec_netdev_open(struct net_device *ndev)
napi_enable(&priv->napi);
netif_start_queue(ndev);
- /* Enable RX intr. */
- netsec_write(priv, NETSEC_REG_INTEN_SET, NETSEC_IRQ_RX);
+ /* Enable TX+RX intr. */
+ netsec_write(priv, NETSEC_REG_INTEN_SET, NETSEC_IRQ_RX | NETSEC_IRQ_TX);
return 0;
err3:
--
2.7.4
^ permalink raw reply related
* Re: net: hang in unregister_netdevice: waiting for lo to become free
From: Dmitry Vyukov @ 2018-04-16 7:35 UTC (permalink / raw)
To: Dan Streetman
Cc: Tommi Rantala, Neil Horman, Xin Long, David Ahern,
Daniel Borkmann, Cong Wang, David Miller, Eric Dumazet,
Willem de Bruijn, Jakub Kicinski, Rasmus Villemoes, netdev, LKML,
Alexey Kuznetsov, Hideaki YOSHIFUJI, syzkaller, Dan Streetman,
Eric W. Biederman, Alexey Kodanev
In-Reply-To: <CACT4Y+ao0Z_RT+D+sqb6ysnPVnUZ5DfrHstuFjEAt0mr+xt_4Q@mail.gmail.com>
On Fri, Apr 13, 2018 at 5:54 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Fri, Apr 13, 2018 at 2:43 PM, Dan Streetman <ddstreet@ieee.org> wrote:
>> On Thu, Apr 12, 2018 at 8:15 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> On Wed, Feb 21, 2018 at 3:53 PM, Tommi Rantala
>>> <tommi.t.rantala@nokia.com> wrote:
>>>> On 20.02.2018 18:26, Neil Horman wrote:
>>>>>
>>>>> On Tue, Feb 20, 2018 at 09:14:41AM +0100, Dmitry Vyukov wrote:
>>>>>>
>>>>>> On Tue, Feb 20, 2018 at 8:56 AM, Tommi Rantala
>>>>>> <tommi.t.rantala@nokia.com> wrote:
>>>>>>>
>>>>>>> On 19.02.2018 20:59, Dmitry Vyukov wrote:
>>>>>>>>
>>>>>>>> Is this meant to be fixed already? I am still seeing this on the
>>>>>>>> latest upstream tree.
>>>>>>>>
>>>>>>>
>>>>>>> These two commits are in v4.16-rc1:
>>>>>>>
>>>>>>> commit 4a31a6b19f9ddf498c81f5c9b089742b7472a6f8
>>>>>>> Author: Tommi Rantala <tommi.t.rantala@nokia.com>
>>>>>>> Date: Mon Feb 5 21:48:14 2018 +0200
>>>>>>>
>>>>>>> sctp: fix dst refcnt leak in sctp_v4_get_dst
>>>>>>> ...
>>>>>>> Fixes: 410f03831 ("sctp: add routing output fallback")
>>>>>>> Fixes: 0ca50d12f ("sctp: fix src address selection if using
>>>>>>> secondary
>>>>>>> addresses")
>>>>>>>
>>>>>>>
>>>>>>> commit 957d761cf91cdbb175ad7d8f5472336a4d54dbf2
>>>>>>> Author: Alexey Kodanev <alexey.kodanev@oracle.com>
>>>>>>> Date: Mon Feb 5 15:10:35 2018 +0300
>>>>>>>
>>>>>>> sctp: fix dst refcnt leak in sctp_v6_get_dst()
>>>>>>> ...
>>>>>>> Fixes: dbc2b5e9a09e ("sctp: fix src address selection if using
>>>>>>> secondary
>>>>>>> addresses for ipv6")
>>>>>>>
>>>>>>>
>>>>>>> I guess we missed something if it's still reproducible.
>>>>>>>
>>>>>>> I can check it later this week, unless someone else beat me to it.
>>>>>>
>>>>>>
>>>>>> Hi Tommi,
>>>>>>
>>>>>> Hmmm, I can't claim that it's exactly the same bug. Perhaps it's
>>>>>> another one then. But I am still seeing these:
>>>>>>
>>>>>> [ 58.799130] unregister_netdevice: waiting for lo to become free.
>>>>>> Usage count = 4
>>>>>> [ 60.847138] unregister_netdevice: waiting for lo to become free.
>>>>>> Usage count = 4
>>>>>> [ 62.895093] unregister_netdevice: waiting for lo to become free.
>>>>>> Usage count = 4
>>>>>> [ 64.943103] unregister_netdevice: waiting for lo to become free.
>>>>>> Usage count = 4
>>>>>>
>>>>>> on upstream tree pulled ~12 hours ago.
>>>>>>
>>>>> Can you write a systemtap script to probe dev_hold, and dev_put, printing
>>>>> out a
>>>>> backtrace if the device name matches "lo". That should tell us
>>>>> definitively if
>>>>> the problem is in the same location or not
>>>>
>>>>
>>>> Hi Dmitry, I tested with the reproducer and the kernel .config file that you
>>>> sent in the first email in this thread:
>>>>
>>>> With 4.16-rc2 unable to reproduce.
>>>>
>>>> With 4.15-rc9 bug reproducible, and I get "unregister_netdevice: waiting for
>>>> lo to become free. Usage count = 3"
>>>>
>>>> With 4.15-rc9 and Alexey's "sctp: fix dst refcnt leak in sctp_v6_get_dst()"
>>>> cherry-picked on top, unable to reproduce.
>>>>
>>>>
>>>> Is syzkaller doing something else now to trigger the bug...?
>>>> Can you still trigger the bug with the same reproducer?
>>>
>>> Hi Neil, Tommi,
>>>
>>> Reviving this old thread about "unregister_netdevice: waiting for lo
>>> to become free. Usage count = 3" hangs.
>>> I still did not have time to deep dive into what happens there (too
>>> many bugs coming from syzbot). But this still actively happens and I
>>> suspect accounts to a significant portion of various hang reports,
>>> which are quite unpleasant.
>>>
>>> One idea that could make it all simpler:
>>>
>>> Is this wait loop in netdev_wait_allrefs() supposed to wait for any
>>> prolonged periods of time under any non-buggy conditions? E.g. more
>>> than 1-2 minutes?
>>> If it only supposed to wait briefly for things that already supposed
>>> to be shutting down, and we add a WARNING there after some timeout,
>>> then syzbot will report all info how/when it happens, hopefully
>>> extracting reproducers, and all the nice things.
>>> But this WARNING should not have any false positives under any
>>> realistic conditions (e.g. waiting for arrival of remote packets with
>>> large timeouts).
>>>
>>> Looking at some task hung reports, it seems that this code holds some
>>> mutexes, takes workqueue thread and prevents any progress with
>>> destruction of other devices (and net namespace creation/destruction),
>>> so I guess it should not wait for any indefinite periods of time?
>>
>> I'm working on this currently:
>> https://bugs.launchpad.net/ubuntu/zesty/+source/linux/+bug/1711407
>>
>> I added a summary of what I've found to be the cause (or at least, one
>> possible cause) of this:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407/comments/72
>>
>> I'm working on a patch to work around the main side-effect of this,
>> which is hanging while holding the global net mutex. Hangs will still
>> happen (e.g. if a dst leaks) but should not affect anything else,
>> other than a leak of the dst and its net namespace.
>>
>> Fixing the dst leaks is important too, of course, but a dst leak (or
>> other cause) shouldn't break the entire system.
>
> Leaking some memory is definitely better than hanging the system.
>
> So I've made syzkaller to recognize "unregister_netdevice: waiting for
> (.*) to become free" as a kernel bug:
> https://github.com/google/syzkaller/commit/7a67784ca8bdc3b26cce2f0ec9a40d2dd9ec9396
> Unfortunately it does not make it catch these bugs because creating a
> net namespace per test is too damn slow, so namespaces are reused for
> lots of tests and when/if it's eventually destroyed it's already too
> late to find root cause.
>
> But I've run a one-off experiment with prompt net namespace
> destruction and syzkaller was able to easily extract a C reproducer:
> https://gist.githubusercontent.com/dvyukov/d571e8fff24e127ca48a8c4790d42bfa/raw/52050e93ba9afbb5126b9d7bb39b7e71a82af016/gistfile1.txt
>
> On upstream 16e205cf42da1f497b10a4a24f563e6c0d574eec with this config:
> https://gist.githubusercontent.com/dvyukov/9663c57443adb21f2795b92ef0829d62/raw/bbea0652e23746096dd56855a28f6c681aebcdee/gistfile1.txt
>
> this gives me:
>
> [ 83.183198] unregister_netdevice: waiting for lo to become free.
> Usage count = 9
> [ 85.231202] unregister_netdevice: waiting for lo to become free.
> Usage count = 9
> ...
> [ 523.511205] unregister_netdevice: waiting for lo to become free.
> Usage count = 9
> ...
>
> This is generated from this syzkaller program:
>
> r0 = socket$inet6(0xa, 0x1, 0x84)
> setsockopt$inet6_IPV6_XFRM_POLICY(r0, 0x29, 0x23,
> &(0x7f0000000380)={{{@in6=@remote={0xfe, 0x80, [], 0xbb},
> @in=@dev={0xac, 0x14, 0x14}, 0x0, 0x0, 0x0, 0x0, 0xa}, {}, {}, 0x0,
> 0x0, 0x1}, {{@in=@local={0xac, 0x14, 0x14, 0xaa}, 0x0, 0x32}, 0x0,
> @in=@local={0xac, 0x14, 0x14, 0xaa}, 0x3504}}, 0xe8)
> bind$inet6(r0, &(0x7f0000000000)={0xa, 0x4e20}, 0x1c)
> connect$inet(r0, &(0x7f0000000040)={0x2, 0x4e20, @dev={0xac, 0x14,
> 0x14, 0xd}}, 0x10)
> syz_emit_ethernet(0x3e, &(0x7f00000001c0)={@local={[0xaa, 0xaa, 0xaa,
> 0xaa, 0xaa], 0xaa}, @dev={[0xaa, 0xaa, 0xaa, 0xaa, 0xaa]}, [],
> {@ipv6={0x86dd, {0x0, 0x6, "50a09c", 0x8, 0xffffff11, 0x0,
> @remote={0xfe, 0x80, [], 0xbb}, @local={0xfe, 0x80, [], 0xaa}, {[],
> @udp={0x0, 0x4e20, 0x8}}}}}}, &(0x7f0000000040))
>
> So this seems to be related to IPv6 and/or xfrm and is potentially
> caused by external packets (that syz_emit_ethernet call).
Here is another repro which seems to be a different bug (note that it
requires fault injection):
https://gist.githubusercontent.com/dvyukov/1c56623016cc4c24a69d433c5114ad5b/raw/530478f571b195193101b912aa646948528baa8e/gistfile1.txt
Dan, do you mind taking a look at them? Fixing these should eliminate
root causes of these hangs/leaks.
^ permalink raw reply
* Re: XDP performance regression due to CONFIG_RETPOLINE Spectre V2
From: Jesper Dangaard Brouer @ 2018-04-16 6:02 UTC (permalink / raw)
To: David Woodhouse
Cc: Christoph Hellwig, Tushar Dave, xdp-newbies@vger.kernel.org,
netdev@vger.kernel.org, William Tu, Björn Töpel,
Karlsson, Magnus, Alexander Duyck, Arnaldo Carvalho de Melo,
brouer
In-Reply-To: <1523734166.15648.5.camel@infradead.org>
On Sat, 14 Apr 2018 21:29:26 +0200
David Woodhouse <dwmw2@infradead.org> wrote:
> On Fri, 2018-04-13 at 19:26 +0200, Christoph Hellwig wrote:
> > On Fri, Apr 13, 2018 at 10:12:41AM -0700, Tushar Dave wrote:
> > > I guess there is nothing we need to do!
> > >
> > > On x86, in case of no intel iommu or iommu is disabled, you end up in
> > > swiotlb for DMA API calls when system has 4G memory.
> > > However, AFAICT, for 64bit DMA capable devices swiotlb DMA APIs do not
> > > use bounce buffer until and unless you have swiotlb=force specified in
> > > kernel commandline.
> >
> > Sure. But that means very sync_*_to_device and sync_*_to_cpu now
> > involves an indirect call to do exactly nothing, which in the workload
> > Jesper is looking at is causing a huge performance degradation due to
> > retpolines.
Yes, exactly.
>
> We should look at using the
>
> if (dma_ops == swiotlb_dma_ops)
> swiotlb_map_page()
> else
> dma_ops->map_page()
>
> trick for this. Perhaps with alternatives so that when an Intel or AMD
> IOMMU is detected, it's *that* which is checked for as the special
> case.
Yes, this trick is basically what I'm asking for :-)
It did sound like Hellwig wanted to first avoid/fix that x86 end-up
defaulting to swiotlb. Thus, we just have to do the same trick with
the new default fall-through dma_ops.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* KASAN: use-after-free Read in tipc_nametbl_stop
From: syzbot @ 2018-04-16 5:57 UTC (permalink / raw)
To: davem, jon.maloy, linux-kernel, netdev, syzkaller-bugs,
tipc-discussion, ying.xue
Hello,
syzbot hit the following crash on net-next commit
5d1365940a68dd57b031b6e3c07d7d451cd69daf (Thu Apr 12 18:09:05 2018 +0000)
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
syzbot dashboard link:
https://syzkaller.appspot.com/bug?extid=d64b64afc55660106556
So far this crash happened 5 times on net-next, upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6319968803094528
syzkaller reproducer:
https://syzkaller.appspot.com/x/repro.syz?id=6099825221173248
Raw console output:
https://syzkaller.appspot.com/x/log.txt?id=4953018151731200
Kernel config:
https://syzkaller.appspot.com/x/.config?id=-5947642240294114534
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d64b64afc55660106556@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for
details.
If you forward the report, please keep this part and the footer.
Failed to remove local publication {0,0,0}/206417777
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
IPVS: ftp: loaded support on port[0] = 21
==================================================================
BUG: KASAN: use-after-free in tipc_service_delete net/tipc/name_table.c:751
[inline]
BUG: KASAN: use-after-free in tipc_nametbl_stop+0x94e/0xd70
net/tipc/name_table.c:780
Read of size 8 at addr ffff8801c4c25130 by task kworker/u4:2/30
CPU: 0 PID: 30 Comm: kworker/u4:2 Not tainted 4.16.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
tipc_service_delete net/tipc/name_table.c:751 [inline]
tipc_nametbl_stop+0x94e/0xd70 net/tipc/name_table.c:780
tipc_exit_net+0x2d/0x40 net/tipc/core.c:103
ops_exit_list.isra.7+0xb0/0x160 net/core/net_namespace.c:152
cleanup_net+0x51d/0xb20 net/core/net_namespace.c:523
process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
kthread+0x345/0x410 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
Allocated by task 4535:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
kmalloc include/linux/slab.h:512 [inline]
kzalloc include/linux/slab.h:701 [inline]
tipc_service_create_range net/tipc/name_table.c:183 [inline]
tipc_service_insert_publ net/tipc/name_table.c:207 [inline]
tipc_nametbl_insert_publ+0x569/0x1910 net/tipc/name_table.c:371
tipc_nametbl_publish+0x6c3/0xba0 net/tipc/name_table.c:618
tipc_sk_publish+0x22a/0x510 net/tipc/socket.c:2604
tipc_bind+0x206/0x330 net/tipc/socket.c:647
__sys_bind+0x331/0x440 net/socket.c:1484
SYSC_bind net/socket.c:1495 [inline]
SyS_bind+0x24/0x30 net/socket.c:1493
do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
Freed by task 30:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kfree+0xd9/0x260 mm/slab.c:3813
tipc_service_remove_publ.isra.8+0x909/0xc30 net/tipc/name_table.c:283
tipc_service_delete net/tipc/name_table.c:753 [inline]
tipc_nametbl_stop+0x746/0xd70 net/tipc/name_table.c:780
tipc_exit_net+0x2d/0x40 net/tipc/core.c:103
ops_exit_list.isra.7+0xb0/0x160 net/core/net_namespace.c:152
cleanup_net+0x51d/0xb20 net/core/net_namespace.c:523
process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
kthread+0x345/0x410 kernel/kthread.c:238
ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
The buggy address belongs to the object at ffff8801c4c25100
which belongs to the cache kmalloc-64 of size 64
The buggy address is located 48 bytes inside of
64-byte region [ffff8801c4c25100, ffff8801c4c25140)
The buggy address belongs to the page:
page:ffffea0007130940 count:1 mapcount:0 mapping:ffff8801c4c25000 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffff8801c4c25000 0000000000000000 0000000100000020
raw: ffffea0006ccf860 ffffea00070840a0 ffff8801dac00340 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8801c4c25000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff8801c4c25080: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> ffff8801c4c25100: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
^
ffff8801c4c25180: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff8801c4c25200: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================
---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkaller@googlegroups.com.
syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is
merged
into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug
report.
Note: all commands must start from beginning of the line in the email body.
^ permalink raw reply
* Re: Donation
From: M. M. Fridman @ 2018-04-14 1:52 UTC (permalink / raw)
--
I Mikhail Fridman. has selected you specially as one of my beneficiaries
for my Charitable Donation, Just as I have declared on May 23, 2016 to
give
my fortune as charity.
Check the link below for confirmation:
http://www.ibtimes.co.uk/russias-second-wealthiest-man-mikhail-fridman-plans-leaving-14-2bn-fortune-charity-1561604
Reply as soon as possible with further directives.
Best Regards,
Mikhail Fridman.
^ permalink raw reply
* Re: tcp hang when socket fills up ?
From: Dominique Martinet @ 2018-04-16 4:03 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Michal Kubecek, netdev
In-Reply-To: <20180416035546.GA5388@nautica>
Dominique Martinet wrote on Mon, Apr 16, 2018:
> . . . Oh, there is something interesting there, the connection doesn't
> come up with -G?
Hm, sorry, I take this last part back. I cannot reproduce -G not working
reliably.
I'll dig around the conntrack table a bit more.
--
Dominique Martinet | Asmadeus
^ permalink raw reply
* Re: tcp hang when socket fills up ?
From: Dominique Martinet @ 2018-04-16 3:55 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Michal Kubecek, netdev
In-Reply-To: <38b6a690-e01d-471f-ce85-e6e6a8acd26d@gmail.com>
Eric Dumazet wrote on Sun, Apr 15, 2018:
> Are you sure you do not have some iptables/netfilter stuff ?
I have a basic firewall setup with default rules e.g. starts with
-m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
in the INPUT chain...
That said, I just dropped it on the server to check and that seems to
workaround the issue?!
When logging everything dropped it appears to decide that the connection
is no longer established at some point, but only if there is
tcp_timestamp, just, err, how?
And certainly enough, if I restore the firewall while a connection is up
that just hangs; conntrack doesn't consider it connected anymore at some
point (but it worked for a while!)
Here's the kind of logs I get from iptables:
IN=wlp1s0 OUT= MAC=00:c2:c6:b4:7e:c7:a4:12:42:b5:5d:fc:08:00 SRC=client DST=server LEN=52 TOS=0x00 PREC=0x00 TTL=52 ID=17038 DF PROTO=TCP SPT=41558 DPT=15609 WINDOW=1212 RES=0x00 ACK URGP=0
> ss -temoi might give us more info
hang
ESTAB 0 81406 server:15609 client:41558 users:(("socat",pid=17818,fd=5)) timer:(on,48sec,11) uid:1000 ino:137253 sk:6a <->
skmem:(r0,rb369280,t0,tb147456,f2050,w104446,o0,bl0,d1) ts sack
reno wscale:7,7 rto:15168 backoff:6 rtt:36.829/6.492 ato:40
mss:1374 pmtu:1500 rcvmss:1248 advmss:1448 cwnd:1 ssthresh:16
bytes_acked:32004 bytes_received:4189 segs_out:84 segs_in:55
data_segs_out:77 data_segs_in:18 send 298.5Kbps lastsnd:12483
lastrcv:27801 lastack:27726 pacing_rate 19.1Mbps delivery_rate
4.1Mbps busy:28492ms unacked:31 retrans:1/6 lost:31 rcv_rtt:29
rcv_space:29200 rcv_ssthresh:39184 notsent:38812 minrtt:25.152
working (tcp_timestamp=0)
ESTAB 0 36 server:15080 client:32979 users:(("socat",pid=17047,fd=5)) timer:(on,226ms,0) uid:1000 ino:90917 sk:23 <->
skmem:(r0,rb369280,t0,tb1170432,f1792,w2304,o0,bl0,d3) sack reno
wscale:7,7 rto:230 rtt:29.413/5.345 ato:64 mss:1386 pmtu:1500
rcvmss:1248 advmss:1460 cwnd:4 ssthresh:3 bytes_acked:17391762
bytes_received:62397 segs_out:13964 segs_in:8642
data_segs_out:13895 data_segs_in:1494 send 1.5Mbps lastsnd:4
lastrcv:5 lastack:5 pacing_rate 1.8Mbps delivery_rate 1.2Mbps
busy:56718ms unacked:1 retrans:0/11 rcv_rtt:9112.95 rcv_space:29233
rcv_ssthresh:41680 minrtt:25.95
working (no iptables)
ESTAB 0 0 server:61460 client:20468 users:(("socat",pid=17880,fd=5)) uid:1000 ino:129982 sk:6f <->
skmem:(r0,rb369280,t0,tb1852416,f0,w0,o0,bl0,d1) ts sack reno
wscale:7,7 rto:244 rtt:43.752/7.726 ato:40 mss:1374 pmtu:1500
rcvmss:1248 advmss:1448 cwnd:10 bytes_acked:2617302
bytes_received:5441 segs_out:1929 segs_in:976 data_segs_out:1919
data_segs_in:41 send 2.5Mbps lastsnd:2734 lastrcv:2734 lastack:2705
pacing_rate 5.0Mbps delivery_rate 12.7Mbps busy:1884ms rcv_rtt:30
rcv_space:29200 rcv_ssthresh:39184 minrtt:26.156
> Really it looks like at some point, all incoming packets are shown by
> tcpdump but do not reach the TCP socket anymore.
>
> (segs_in: might be steady, look at the d0 counter shown by ss -temoi
> (dX : drop counters, sk->sk_drops)
segs_in does not increase with replays; the d1 seems stable.
> While running your experiment, try on the server.
>
> perf record -a -g -e skb:kfree_skb sleep 30
> perf report
While I understand what that should do, I am not sure why I do not get
any graph so that doesn't help tell what called kfree_skb and thus what
decided to drop the packet (although we no longer really need that
now..)
perf script just shows kfree_skb e.g.
swapper 0 [001] 237244.869321: skb:kfree_skb: skbaddr=0xffff8800360fda00 protocol=2048 location=0xffffffff817a1a77
9458e3 kfree_skb (/usr/lib/debug/lib/modules/4.16.0-300.fc28.x86_64/vmlinux)
---
So I guess that ultimately the problem is why conntrack suddenly decides
that an established connection suddenly isn't anymore, despite being
listed as established by ss..
I'm discovering `conntrack(8)`, but what strikes me as interesting is
that even that points at the connection being established (looking at a
new connection after iptables started dropping packets)
# conntrack -L | grep 21308
tcp 6 267 ESTABLISHED src=server dst=client sport=21308 dport=37552 src=client dst=server sport=37552 dport=21308 [ASSURED] mark=0 use=1
compared to another that isn't dropped (the old connection without
tcp_timestamp)
tcp 6 299 ESTABLISHED src=server dst=client sport=15080 dport=32979 src=client dst=server sport=32979 dport=15080 [ASSURED] mark=0 use=1
The expect/dying/unconfirmed tables all are empty.
. . . Oh, there is something interesting there, the connection doesn't
come up with -G?
working:
conntrack -G --protonum tcp --src server --dst client --sport 15080 --dport 32979
tcp 6 299 ESTABLISHED src=server dst=client sport=15080 dport=32979 src=client dst=server sport=32979 dport=15080 [ASSURED] mark=0 use=3
conntrack v1.4.4 (conntrack-tools): 1 flow entries have been shown.
hang:
# conntrack -G --protonum tcp --src server --dst client --sport 21308 --dport 37552
conntrack v1.4.4 (conntrack-tools): 0 flow entries have been shown.
So something happened that makes it show up in -L (table dump) but not
when querying...?
And only when there is enough traffic: I have previously kept such a
connection without workaround for hours just fine as long as I made sure
not to display more than a screen at a time.
Thanks again,
--
Dominique Martinet | Asmadeus
^ permalink raw reply
* Re:Re: [PATCH net] net: Fix one possible memleak in ip_setup_cork
From: Gao Feng @ 2018-04-16 3:05 UTC (permalink / raw)
To: davem@davemloft.net; +Cc: kuznet, netdev@vger.kernel.org
In-Reply-To: <20180415.225556.1579770058342079388.davem@davemloft.net>
At 2018-04-16 10:55:56, "David Miller" <davem@davemloft.net> wrote:
>From: gfree.wind@vip.163.com
>Date: Mon, 16 Apr 2018 10:16:45 +0800
>
>> From: Gao Feng <gfree.wind@vip.163.com>
>>
>> It would allocate memory in this function when the cork->opt is NULL. But
>> the memory isn't freed if failed in the latter rt check, and return error
>> directly. It causes the memleak if its caller is ip_make_skb which also
>> doesn't free the cork->opt when meet a error.
>>
>> Now move the rt check ahead to avoid the memleak.
>>
>> Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
>
>Why did you post this patch twice?
Sorry, it is my input error. I typed "yes" not "all" at the first time when execute git-send-email.
Then I corrected it as the second time.
Best Regards
Feng
^ permalink raw reply
* Re: [PATCH net] net: Fix one possible memleak in ip_setup_cork
From: David Miller @ 2018-04-16 2:55 UTC (permalink / raw)
To: gfree.wind; +Cc: kuznet, netdev
In-Reply-To: <1523845005-6353-2-git-send-email-gfree.wind@vip.163.com>
From: gfree.wind@vip.163.com
Date: Mon, 16 Apr 2018 10:16:45 +0800
> From: Gao Feng <gfree.wind@vip.163.com>
>
> It would allocate memory in this function when the cork->opt is NULL. But
> the memory isn't freed if failed in the latter rt check, and return error
> directly. It causes the memleak if its caller is ip_make_skb which also
> doesn't free the cork->opt when meet a error.
>
> Now move the rt check ahead to avoid the memleak.
>
> Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Why did you post this patch twice?
^ permalink raw reply
* [PATCH] net: mediatek: use of_device_get_match_data()
From: Ryder Lee @ 2018-04-16 2:33 UTC (permalink / raw)
To: David S. Miller
Cc: Sean Wang, netdev, linux-kernel, linux-arm-kernel, linux-mediatek,
Ryder Lee
In-Reply-To: <31f944ab8dfcc1d7b6f03b35657a2a34825b5246.1523347340.git.ryder.lee@mediatek.com>
The usage of of_device_get_match_data() reduce the code size a bit.
Also, the only way to call mtk_probe() is to match an entry in
of_mtk_match[], so match cannot be NULL.
Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index e0b72bf..d8ebf0a 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2503,7 +2503,6 @@ static int mtk_probe(struct platform_device *pdev)
{
struct resource *res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
struct device_node *mac_np;
- const struct of_device_id *match;
struct mtk_eth *eth;
int err;
int i;
@@ -2512,8 +2511,7 @@ static int mtk_probe(struct platform_device *pdev)
if (!eth)
return -ENOMEM;
- match = of_match_device(of_mtk_match, &pdev->dev);
- eth->soc = (struct mtk_soc_data *)match->data;
+ eth->soc = of_device_get_match_data(&pdev->dev);
eth->dev = &pdev->dev;
eth->base = devm_ioremap_resource(&pdev->dev, res);
--
1.9.1
^ permalink raw reply related
* linux-next: build failure after merge of the bpf tree
From: Stephen Rothwell @ 2018-04-16 2:30 UTC (permalink / raw)
To: Daniel Borkmann, Alexei Starovoitov, Networking
Cc: Linux-Next Mailing List, Linux Kernel Mailing List,
John Fastabend
[-- Attachment #1: Type: text/plain, Size: 2927 bytes --]
Hi all,
After merging the bpf tree, today's linux-next build (arm
multi_v7_defconfig) failed like this:
kernel/bpf/core.o: In function `sock_map_release':
core.c:(.text+0xd04): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
kernel/events/core.o: In function `sock_map_release':
core.c:(.text+0x85cc): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
block/blk-core.o: In function `sock_map_release':
blk-core.c:(.text+0x58e8): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
drivers/net/virtio_net.o: In function `sock_map_release':
virtio_net.c:(.text+0x53ec): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
net/core/dev.o: In function `sock_map_release':
dev.c:(.text+0x6c68): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
net/core/rtnetlink.o: In function `sock_map_release':
rtnetlink.c:(.text+0x63e0): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
net/core/filter.o: In function `sock_map_release':
filter.c:(.text+0x8c8c): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
net/core/sock_reuseport.o: In function `sock_map_release':
sock_reuseport.c:(.text+0x398): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
net/bpf/test_run.o: In function `sock_map_release':
test_run.c:(.text+0x3dc): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
net/packet/af_packet.o: In function `sock_map_release':
af_packet.c:(.text+0x6958): multiple definition of `sock_map_release'
kernel/sysctl.o:sysctl.c:(.text+0x1a50): first defined here
Caused by commit
9b2e8bbc4e7a ("bpf: sockmap, map_release does not hold refcnt for pinned maps")
I applied the following patch for today:
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Mon, 16 Apr 2018 12:27:24 +1000
Subject: [PATCH] fix for "bpf: sockmap, map_release does not hold refcnt for
pinned maps"
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
include/linux/bpf.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f46561de5154..3b6c2b66f414 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -660,7 +660,7 @@ static inline int sock_map_prog(struct bpf_map *map,
return -EOPNOTSUPP;
}
-void sock_map_release(struct bpf_map *map) {}
+static inline void sock_map_release(struct bpf_map *map) {}
#endif
/* verifier prototypes for helper functions called from eBPF programs */
--
2.16.3
--
Cheers,
Stephen Rothwell
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox