Linux-HyperV List
 help / color / mirror / Atom feed
* Re: [RFC PATCH v2 04/12] x86/mm/tlb: Flush remote and local TLBs concurrently
From: Juergen Gross @ 2019-05-31 11:48 UTC (permalink / raw)
  To: Nadav Amit, Peter Zijlstra, Andy Lutomirski
  Cc: Borislav Petkov, Dave Hansen, Ingo Molnar, Thomas Gleixner, x86,
	linux-kernel, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sasha Levin, Paolo Bonzini, Dave Hansen, Boris Ostrovsky,
	linux-hyperv, virtualization, kvm, xen-devel
In-Reply-To: <20190531063645.4697-5-namit@vmware.com>

On 31/05/2019 08:36, Nadav Amit wrote:
> To improve TLB shootdown performance, flush the remote and local TLBs
> concurrently. Introduce flush_tlb_multi() that does so. The current
> flush_tlb_others() interface is kept, since paravirtual interfaces need
> to be adapted first before it can be removed. This is left for future
> work. In such PV environments, TLB flushes are not performed, at this
> time, concurrently.
> 
> Add a static key to tell whether this new interface is supported.
> 
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Sasha Levin <sashal@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: x86@kernel.org
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: linux-hyperv@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: virtualization@lists.linux-foundation.org
> Cc: kvm@vger.kernel.org
> Cc: xen-devel@lists.xenproject.org
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
>  arch/x86/hyperv/mmu.c                 |  2 +
>  arch/x86/include/asm/paravirt.h       |  8 +++
>  arch/x86/include/asm/paravirt_types.h |  6 ++
>  arch/x86/include/asm/tlbflush.h       |  6 ++
>  arch/x86/kernel/kvm.c                 |  1 +
>  arch/x86/kernel/paravirt.c            |  3 +
>  arch/x86/mm/tlb.c                     | 80 +++++++++++++++++++++++----
>  arch/x86/xen/mmu_pv.c                 |  2 +
>  8 files changed, 96 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> index e65d7fe6489f..ca28b400c87c 100644
> --- a/arch/x86/hyperv/mmu.c
> +++ b/arch/x86/hyperv/mmu.c
> @@ -233,4 +233,6 @@ void hyperv_setup_mmu_ops(void)
>  	pr_info("Using hypercall for remote TLB flush\n");
>  	pv_ops.mmu.flush_tlb_others = hyperv_flush_tlb_others;
>  	pv_ops.mmu.tlb_remove_table = tlb_remove_table;
> +
> +	static_key_disable(&flush_tlb_multi_enabled.key);
>  }
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index c25c38a05c1c..192be7254457 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -47,6 +47,8 @@ static inline void slow_down_io(void)
>  #endif
>  }
>  
> +DECLARE_STATIC_KEY_TRUE(flush_tlb_multi_enabled);
> +
>  static inline void __flush_tlb(void)
>  {
>  	PVOP_VCALL0(mmu.flush_tlb_user);
> @@ -62,6 +64,12 @@ static inline void __flush_tlb_one_user(unsigned long addr)
>  	PVOP_VCALL1(mmu.flush_tlb_one_user, addr);
>  }
>  
> +static inline void flush_tlb_multi(const struct cpumask *cpumask,
> +				   const struct flush_tlb_info *info)
> +{
> +	PVOP_VCALL2(mmu.flush_tlb_multi, cpumask, info);
> +}
> +
>  static inline void flush_tlb_others(const struct cpumask *cpumask,
>  				    const struct flush_tlb_info *info)
>  {
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index 946f8f1f1efc..3a156e63c57d 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -211,6 +211,12 @@ struct pv_mmu_ops {
>  	void (*flush_tlb_user)(void);
>  	void (*flush_tlb_kernel)(void);
>  	void (*flush_tlb_one_user)(unsigned long addr);
> +	/*
> +	 * flush_tlb_multi() is the preferred interface. When it is used,
> +	 * flush_tlb_others() should return false.

Didn't you want to remove/change this comment?


Juergen

^ permalink raw reply

* Re: [RFC PATCH v2 04/12] x86/mm/tlb: Flush remote and local TLBs concurrently
From: Nadav Amit @ 2019-05-31 19:44 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Peter Zijlstra, Andy Lutomirski, Borislav Petkov, Dave Hansen,
	Ingo Molnar, Thomas Gleixner, the arch/x86 maintainers, LKML,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Paolo Bonzini, Dave Hansen, Boris Ostrovsky,
	linux-hyperv@vger.kernel.org,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	xen-devel@lists.xenproject.org
In-Reply-To: <a847ee9c-4faf-c8b4-43bb-cc30e0980796@suse.com>

> On May 31, 2019, at 4:48 AM, Juergen Gross <jgross@suse.com> wrote:
> 
> On 31/05/2019 08:36, Nadav Amit wrote:
>> 
>> --- a/arch/x86/include/asm/paravirt_types.h
>> +++ b/arch/x86/include/asm/paravirt_types.h
>> @@ -211,6 +211,12 @@ struct pv_mmu_ops {
>> 	void (*flush_tlb_user)(void);
>> 	void (*flush_tlb_kernel)(void);
>> 	void (*flush_tlb_one_user)(unsigned long addr);
>> +	/*
>> +	 * flush_tlb_multi() is the preferred interface. When it is used,
>> +	 * flush_tlb_others() should return false.
> 
> Didn't you want to remove/change this comment?

Yes! Sorry for that. Fixed now.

^ permalink raw reply

* RE: [PATCH] PCI: hv: Fix build error without CONFIG_SYSFS
From: Michael Kelley @ 2019-06-01 22:59 UTC (permalink / raw)
  To: YueHaibing, bhelgaas@google.com, Stephen Hemminger,
	sashal@kernel.org, Dexuan Cui, linux-hyperv@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
In-Reply-To: <20190531150923.12376-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>  Sent: Friday, May 31, 2019 8:09 AM
> 
> while building without CONFIG_SYSFS, fails as below:
> 
> drivers/pci/controller/pci-hyperv.o: In function 'hv_pci_assign_slots':
> pci-hyperv.c:(.text+0x40a): undefined reference to 'pci_create_slot'
> drivers/pci/controller/pci-hyperv.o: In function 'pci_devices_present_work':
> pci-hyperv.c:(.text+0xc02): undefined reference to 'pci_destroy_slot'
> drivers/pci/controller/pci-hyperv.o: In function 'hv_pci_remove':
> pci-hyperv.c:(.text+0xe50): undefined reference to 'pci_destroy_slot'
> drivers/pci/controller/pci-hyperv.o: In function 'hv_eject_device_work':
> pci-hyperv.c:(.text+0x11f9): undefined reference to 'pci_destroy_slot'
> 
> Select SYSFS while PCI_HYPERV is set to fix this.
> 

I'm wondering if is the right way to fix the problem.  Conceptually
is it possible to setup & operate virtual PCI devices like 
pci-hyperv.c does, even if sysfs is not present?  Or is it right to
always required sysfs?

The function pci_dev_assign_slot() in slot.c has a null implementation
in include/linux/pci.h when CONFIG_SYSFS is not defined, which
seems to be trying to solve the same problem for that function.  And
if CONFIG_HOTPLUG_PCI is defined but CONFIG_SYSFS is not,
pci_hp_create_module_link() and pci_hp_remove_module_link()
look like they would have the same problem.  Maybe there should
be degenerate implementations of pci_create_slot() and
pci_destroy_slot() for cases when CONFIG_SYSFS is not defined?

But I'll admit I don't know the full story behind how PCI slots
are represented and used, so maybe I'm off base.  I just noticed
the inconsistency in how other functions in slot.c are handled.

Thoughts?

Michael

^ permalink raw reply

* [PATCH] revert async probing of VMBus network devices.
From: Stephen Hemminger @ 2019-06-05 18:51 UTC (permalink / raw)
  To: netdev; +Cc: linux-hyperv, Stephen Hemminger

Doing asynchronous probing can lead to reordered network device names.
And because udev doesn't have any useful information to construct a
persistent name, this causes VM's to sporadically boot with reordered
device names and no connectivity.

This shows up on the Ubuntu image on larger VM's where 30% of the
time eth0 and eth1 get swapped.

Note: udev MAC address policy is disabled on Azure images
because the netvsc and PCI VF will have the same mac address.

Fixes: af0a5646cb8d ("use the new async probing feature for the hyperv drivers")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 06393b215102..1a2c32111106 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2411,9 +2411,6 @@ static struct  hv_driver netvsc_drv = {
 	.id_table = id_table,
 	.probe = netvsc_probe,
 	.remove = netvsc_remove,
-	.driver = {
-		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
-	},
 };
 
 /*
-- 
2.20.1


^ permalink raw reply related

* [PATCH] revert async probing of VMBus scsi device
From: Stephen Hemminger @ 2019-06-05 18:52 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-hyperv, Stephen Hemminger

Doing asynchronous probing can lead to reordered device names
which is leads to failed mounts.

Fixes: af0a5646cb8d ("use the new async probing feature for the hyperv drivers")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/scsi/storvsc_drv.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 8472de1007ff..56dcaa43b652 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1942,9 +1942,6 @@ static struct hv_driver storvsc_drv = {
 	.id_table = id_table,
 	.probe = storvsc_probe,
 	.remove = storvsc_remove,
-	.driver = {
-		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
-	},
 };
 
 #if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
-- 
2.20.1


^ permalink raw reply related

* RE: [PATCH] revert async probing of VMBus network devices.
From: Haiyang Zhang @ 2019-06-05 18:54 UTC (permalink / raw)
  To: Stephen Hemminger, netdev@vger.kernel.org
  Cc: linux-hyperv@vger.kernel.org, Stephen Hemminger
In-Reply-To: <20190605185114.12456-1-sthemmin@microsoft.com>



> -----Original Message-----
> From: linux-hyperv-owner@vger.kernel.org <linux-hyperv-
> owner@vger.kernel.org> On Behalf Of Stephen Hemminger
> Sent: Wednesday, June 5, 2019 2:51 PM
> To: netdev@vger.kernel.org
> Cc: linux-hyperv@vger.kernel.org; Stephen Hemminger
> <sthemmin@microsoft.com>
> Subject: [PATCH] revert async probing of VMBus network devices.
> 
> Doing asynchronous probing can lead to reordered network device names.
> And because udev doesn't have any useful information to construct a
> persistent name, this causes VM's to sporadically boot with reordered device
> names and no connectivity.
> 
> This shows up on the Ubuntu image on larger VM's where 30% of the time
> eth0 and eth1 get swapped.
> 
> Note: udev MAC address policy is disabled on Azure images because the
> netvsc and PCI VF will have the same mac address.
> 
> Fixes: af0a5646cb8d ("use the new async probing feature for the hyperv
> drivers")
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> ---

Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>

^ permalink raw reply

* RE: [PATCH] revert async probing of VMBus scsi device
From: Haiyang Zhang @ 2019-06-05 18:54 UTC (permalink / raw)
  To: Stephen Hemminger, linux-scsi@vger.kernel.org
  Cc: linux-hyperv@vger.kernel.org, Stephen Hemminger
In-Reply-To: <20190605185205.12583-1-sthemmin@microsoft.com>



> -----Original Message-----
> From: linux-hyperv-owner@vger.kernel.org <linux-hyperv-
> owner@vger.kernel.org> On Behalf Of Stephen Hemminger
> Sent: Wednesday, June 5, 2019 2:52 PM
> To: linux-scsi@vger.kernel.org
> Cc: linux-hyperv@vger.kernel.org; Stephen Hemminger
> <sthemmin@microsoft.com>
> Subject: [PATCH] revert async probing of VMBus scsi device
> 
> Doing asynchronous probing can lead to reordered device names which is
> leads to failed mounts.
> 
> Fixes: af0a5646cb8d ("use the new async probing feature for the hyperv
> drivers")
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> ---

Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>

^ permalink raw reply

* Re: [PATCH] revert async probing of VMBus scsi device
From: Christoph Hellwig @ 2019-06-05 18:56 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: linux-scsi, linux-hyperv, Stephen Hemminger
In-Reply-To: <20190605185205.12583-1-sthemmin@microsoft.com>

On Wed, Jun 05, 2019 at 11:52:05AM -0700, Stephen Hemminger wrote:
> Doing asynchronous probing can lead to reordered device names
> which is leads to failed mounts.

Which is true for every device, and why we use UUIDs or label for
mounts that are supposed to be stable.

^ permalink raw reply

* Re: [PATCH] revert async probing of VMBus scsi device
From: Stephen Hemminger @ 2019-06-05 19:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi, linux-hyperv, Stephen Hemminger
In-Reply-To: <20190605185637.GA31439@infradead.org>

On Wed, 5 Jun 2019 11:56:37 -0700
Christoph Hellwig <hch@infradead.org> wrote:

> On Wed, Jun 05, 2019 at 11:52:05AM -0700, Stephen Hemminger wrote:
> > Doing asynchronous probing can lead to reordered device names
> > which is leads to failed mounts.  
> 
> Which is true for every device, and why we use UUIDs or label for
> mounts that are supposed to be stable.

Not everyone is smart enough to do that.

^ permalink raw reply

* Re: [PATCH] revert async probing of VMBus scsi device
From: Christoph Hellwig @ 2019-06-05 19:07 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Christoph Hellwig, linux-scsi, linux-hyperv, Stephen Hemminger
In-Reply-To: <20190605120640.00358689@hermes.lan>

On Wed, Jun 05, 2019 at 12:06:40PM -0700, Stephen Hemminger wrote:
> > Which is true for every device, and why we use UUIDs or label for
> > mounts that are supposed to be stable.
> 
> Not everyone is smart enough to do that.

Sure.  But they should not get a way out for just one specific driver.

^ permalink raw reply

* [PATCH 04/13] mmc: also set max_segment_size in the device
From: Christoph Hellwig @ 2019-06-05 19:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605190836.32354-1-hch@lst.de>

If we only set the max_segment_size on the queue an IOMMU merge might
create bigger segments again, so limit the IOMMU merges as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/mmc/core/queue.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index b5b9c6142f08..92900a095796 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -377,6 +377,8 @@ static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
 	blk_queue_max_segment_size(mq->queue,
 			round_down(host->max_seg_size, block_size));
 
+	dma_set_max_seg_size(mmc_dev(host), queue_max_segment_size(mq->queue));
+
 	INIT_WORK(&mq->recovery_work, mmc_mq_recovery_handler);
 	INIT_WORK(&mq->complete_work, mmc_blk_mq_complete_work);
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH 03/13] mtip32xx: also set max_segment_size in the device
From: Christoph Hellwig @ 2019-06-05 19:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605190836.32354-1-hch@lst.de>

If we only set the max_segment_size on the queue an IOMMU merge might
create bigger segments again, so limit the IOMMU merges as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/mtip32xx/mtip32xx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index bacfdac7161c..a14b09ab3a41 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3676,6 +3676,7 @@ static int mtip_block_initialize(struct driver_data *dd)
 	blk_queue_physical_block_size(dd->queue, 4096);
 	blk_queue_max_hw_sectors(dd->queue, 0xffff);
 	blk_queue_max_segment_size(dd->queue, 0x400000);
+	dma_set_max_seg_size(&dd->pdev->dev, 0x400000);
 	blk_queue_io_min(dd->queue, 4096);
 
 	/* Set the capacity of the device in 512 byte sectors. */
-- 
2.20.1


^ permalink raw reply related

* [PATCH 07/13] storvsc: set virt_boundary_mask in the scsi host template
From: Christoph Hellwig @ 2019-06-05 19:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605190836.32354-1-hch@lst.de>

This ensures all proper DMA layer handling is taken care of by the
SCSI midlayer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/storvsc_drv.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 8472de1007ff..e61051c026f6 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1434,9 +1434,6 @@ static int storvsc_device_configure(struct scsi_device *sdevice)
 {
 	blk_queue_rq_timeout(sdevice->request_queue, (storvsc_timeout * HZ));
 
-	/* Ensure there are no gaps in presented sgls */
-	blk_queue_virt_boundary(sdevice->request_queue, PAGE_SIZE - 1);
-
 	sdevice->no_write_same = 1;
 
 	/*
@@ -1709,6 +1706,8 @@ static struct scsi_host_template scsi_driver = {
 	.this_id =		-1,
 	/* Make sure we dont get a sg segment crosses a page boundary */
 	.dma_boundary =		PAGE_SIZE-1,
+	/* Ensure there are no gaps in presented sgls */
+	.virt_boundary_mask =	PAGE_SIZE-1,
 	.no_write_same =	1,
 	.track_queue_depth =	1,
 };
-- 
2.20.1


^ permalink raw reply related

* [PATCH 02/13] rsxx: don't call dma_set_max_seg_size
From: Christoph Hellwig @ 2019-06-05 19:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605190836.32354-1-hch@lst.de>

This driver does never uses dma_map_sg, so the setting is rather
pointless.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/rsxx/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/block/rsxx/core.c b/drivers/block/rsxx/core.c
index de9b2d2f8654..76b73ddf8fd7 100644
--- a/drivers/block/rsxx/core.c
+++ b/drivers/block/rsxx/core.c
@@ -767,7 +767,6 @@ static int rsxx_pci_probe(struct pci_dev *dev,
 		goto failed_enable;
 
 	pci_set_master(dev);
-	dma_set_max_seg_size(&dev->dev, RSXX_HW_BLK_SIZE);
 
 	st = dma_set_mask(&dev->dev, DMA_BIT_MASK(64));
 	if (st) {
-- 
2.20.1


^ permalink raw reply related

* [PATCH 06/13] ufshcd: set max_segment_size in the scsi host template
From: Christoph Hellwig @ 2019-06-05 19:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605190836.32354-1-hch@lst.de>

We need to also mirror the value to the device to ensure IOMMU merging
doesn't undo it, and the SCSI host level parameter will ensure that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/ufs/ufshcd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 8c1c551f2b42..4e524ade489e 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -4586,8 +4586,6 @@ static int ufshcd_slave_configure(struct scsi_device *sdev)
 	struct request_queue *q = sdev->request_queue;
 
 	blk_queue_update_dma_pad(q, PRDT_DATA_BYTE_COUNT_PAD - 1);
-	blk_queue_max_segment_size(q, PRDT_DATA_BYTE_COUNT_MAX);
-
 	return 0;
 }
 
@@ -6990,6 +6988,7 @@ static struct scsi_host_template ufshcd_driver_template = {
 	.sg_tablesize		= SG_ALL,
 	.cmd_per_lun		= UFSHCD_CMD_PER_LUN,
 	.can_queue		= UFSHCD_CAN_QUEUE,
+	.max_segment_size	= PRDT_DATA_BYTE_COUNT_MAX,
 	.max_host_blocked	= 1,
 	.track_queue_depth	= 1,
 	.sdev_groups		= ufshcd_driver_groups,
-- 
2.20.1


^ permalink raw reply related

* [PATCH 11/13] mpt3sas: set virt_boundary_mask in the scsi host
From: Christoph Hellwig @ 2019-06-05 19:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605190836.32354-1-hch@lst.de>

This ensures all proper DMA layer handling is taken care of by the
SCSI midlayer.  Note that the effect is global, as the IOMMU merging
is based off a paramters in struct device.  We could still turn if off
if no PCIe devices are present, but I don't know how to find that out.

Also remove the bogus nomerges flag, merges do take the virt_boundary
into account.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 1ccfbc7eebe0..03a0df2a3379 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -2361,14 +2361,6 @@ scsih_slave_configure(struct scsi_device *sdev)
 		pcie_device_put(pcie_device);
 		spin_unlock_irqrestore(&ioc->pcie_device_lock, flags);
 		scsih_change_queue_depth(sdev, qdepth);
-		/* Enable QUEUE_FLAG_NOMERGES flag, so that IOs won't be
-		 ** merged and can eliminate holes created during merging
-		 ** operation.
-		 **/
-		blk_queue_flag_set(QUEUE_FLAG_NOMERGES,
-				sdev->request_queue);
-		blk_queue_virt_boundary(sdev->request_queue,
-				ioc->page_size - 1);
 		return 0;
 	}
 
@@ -10472,6 +10464,9 @@ _scsih_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	shost->transportt = mpt3sas_transport_template;
 	shost->unique_id = ioc->id;
 
+	/* XXX: only strictly needed if NVMe devices are attached */
+	shost->virt_boundary_mask = ioc->page_size - 1;
+
 	if (ioc->is_mcpu_endpoint) {
 		/* mCPU MPI support 64K max IO */
 		shost->max_sectors = 128;
-- 
2.20.1


^ permalink raw reply related

* [PATCH 05/13] scsi: add a host / host template field for the virt boundary
From: Christoph Hellwig @ 2019-06-05 19:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605190836.32354-1-hch@lst.de>

This allows drivers setting it up easily instead of branching out to
block layer calls in slave_alloc, and ensures the upgraded
max_segment_size setting gets picked up by the DMA layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/hosts.c     | 3 +++
 drivers/scsi/scsi_lib.c  | 3 ++-
 include/scsi/scsi_host.h | 3 +++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index ff0d8c6a8d0c..55522b7162d3 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -462,6 +462,9 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
 	else
 		shost->dma_boundary = 0xffffffff;
 
+	if (sht->virt_boundary_mask)
+		shost->virt_boundary_mask = sht->virt_boundary_mask;
+
 	device_initialize(&shost->shost_gendev);
 	dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
 	shost->shost_gendev.bus = &scsi_bus_type;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 65d0a10c76ad..d333bb6b1c59 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1775,7 +1775,8 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
 	dma_set_seg_boundary(dev, shost->dma_boundary);
 
 	blk_queue_max_segment_size(q, shost->max_segment_size);
-	dma_set_max_seg_size(dev, shost->max_segment_size);
+	blk_queue_virt_boundary(q, shost->virt_boundary_mask);
+	dma_set_max_seg_size(dev, queue_max_segment_size(q));
 
 	/*
 	 * Set a reasonable default alignment:  The larger of 32-byte (dword),
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index a5fcdad4a03e..cc139dbd71e5 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -369,6 +369,8 @@ struct scsi_host_template {
 	 */
 	unsigned long dma_boundary;
 
+	unsigned long virt_boundary_mask;
+
 	/*
 	 * This specifies "machine infinity" for host templates which don't
 	 * limit the transfer size.  Note this limit represents an absolute
@@ -587,6 +589,7 @@ struct Scsi_Host {
 	unsigned int max_sectors;
 	unsigned int max_segment_size;
 	unsigned long dma_boundary;
+	unsigned long virt_boundary_mask;
 	/*
 	 * In scsi-mq mode, the number of hardware queues supported by the LLD.
 	 *
-- 
2.20.1


^ permalink raw reply related

* [PATCH 13/13] uas: set virt_boundary_mask in the scsi host
From: Christoph Hellwig @ 2019-06-05 19:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605190836.32354-1-hch@lst.de>

This ensures all proper DMA layer handling is taken care of by the
SCSI midlayer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/usb/storage/uas.c | 36 ++++++++++++++++--------------------
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index 047c5922618f..d20919e7bbf4 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -789,29 +789,9 @@ static int uas_slave_alloc(struct scsi_device *sdev)
 {
 	struct uas_dev_info *devinfo =
 		(struct uas_dev_info *)sdev->host->hostdata;
-	int maxp;
 
 	sdev->hostdata = devinfo;
 
-	/*
-	 * We have two requirements here. We must satisfy the requirements
-	 * of the physical HC and the demands of the protocol, as we
-	 * definitely want no additional memory allocation in this path
-	 * ruling out using bounce buffers.
-	 *
-	 * For a transmission on USB to continue we must never send
-	 * a package that is smaller than maxpacket. Hence the length of each
-         * scatterlist element except the last must be divisible by the
-         * Bulk maxpacket value.
-	 * If the HC does not ensure that through SG,
-	 * the upper layer must do that. We must assume nothing
-	 * about the capabilities off the HC, so we use the most
-	 * pessimistic requirement.
-	 */
-
-	maxp = usb_maxpacket(devinfo->udev, devinfo->data_in_pipe, 0);
-	blk_queue_virt_boundary(sdev->request_queue, maxp - 1);
-
 	/*
 	 * The protocol has no requirements on alignment in the strict sense.
 	 * Controllers may or may not have alignment restrictions.
@@ -1004,6 +984,22 @@ static int uas_probe(struct usb_interface *intf, const struct usb_device_id *id)
 	 */
 	shost->can_queue = devinfo->qdepth - 2;
 
+	/*
+	 * We have two requirements here. We must satisfy the requirements of
+	 * the physical HC and the demands of the protocol, as we definitely
+	 * want no additional memory allocation in this path ruling out using
+	 * bounce buffers.
+	 *
+	 * For a transmission on USB to continue we must never send a package
+	 * that is smaller than maxpacket.  Hence the length of each scatterlist
+	 * element except the last must be divisible by the Bulk maxpacket
+	 * value.  If the HC does not ensure that through SG, the upper layer
+	 * must do that.  We must assume nothing about the capabilities off the
+	 * HC, so we use the most pessimistic requirement.
+	 */
+	shost->virt_boundary_mask =
+		usb_maxpacket(udev, devinfo->data_in_pipe, 0) - 1;
+
 	usb_set_intfdata(intf, shost);
 	result = scsi_add_host(shost, &intf->dev);
 	if (result)
-- 
2.20.1


^ permalink raw reply related

* [PATCH 12/13] usb-storage: set virt_boundary_mask in the scsi host
From: Christoph Hellwig @ 2019-06-05 19:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605190836.32354-1-hch@lst.de>

This ensures all proper DMA layer handling is taken care of by the
SCSI midlayer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/usb/storage/scsiglue.c | 10 ----------
 drivers/usb/storage/usb.c      | 10 ++++++++++
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index 59190d88fa9f..02c3b66b3f78 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -65,7 +65,6 @@ static const char* host_info(struct Scsi_Host *host)
 static int slave_alloc (struct scsi_device *sdev)
 {
 	struct us_data *us = host_to_us(sdev->host);
-	int maxp;
 
 	/*
 	 * Set the INQUIRY transfer length to 36.  We don't use any of
@@ -74,15 +73,6 @@ static int slave_alloc (struct scsi_device *sdev)
 	 */
 	sdev->inquiry_len = 36;
 
-	/*
-	 * USB has unusual scatter-gather requirements: the length of each
-	 * scatterlist element except the last must be divisible by the
-	 * Bulk maxpacket value.  Fortunately this value is always a
-	 * power of 2.  Inform the block layer about this requirement.
-	 */
-	maxp = usb_maxpacket(us->pusb_dev, us->recv_bulk_pipe, 0);
-	blk_queue_virt_boundary(sdev->request_queue, maxp - 1);
-
 	/*
 	 * Some host controllers may have alignment requirements.
 	 * We'll play it safe by requiring 512-byte alignment always.
diff --git a/drivers/usb/storage/usb.c b/drivers/usb/storage/usb.c
index 9a79cd9762f3..b0f23f4f58e3 100644
--- a/drivers/usb/storage/usb.c
+++ b/drivers/usb/storage/usb.c
@@ -1050,6 +1050,16 @@ int usb_stor_probe2(struct us_data *us)
 	usb_autopm_get_interface_no_resume(us->pusb_intf);
 	snprintf(us->scsi_name, sizeof(us->scsi_name), "usb-storage %s",
 					dev_name(&us->pusb_intf->dev));
+
+	/*
+	 * USB has unusual scatter-gather requirements: the length of each
+	 * scatterlist element except the last must be divisible by the
+	 * Bulk maxpacket value.  Fortunately this value is always a
+	 * power of 2.  Inform the block layer about this requirement.
+	 */
+	us_to_host(us)->virt_boundary_mask =
+		usb_maxpacket(us->pusb_dev, us->recv_bulk_pipe, 0) - 1;
+
 	result = scsi_add_host(us_to_host(us), dev);
 	if (result) {
 		dev_warn(dev,
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH] revert async probing of VMBus scsi device
From: Stephen Hemminger @ 2019-06-05 19:10 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi, linux-hyperv, Stephen Hemminger
In-Reply-To: <20190605190722.GA19684@infradead.org>

On Wed, 5 Jun 2019 12:07:23 -0700
Christoph Hellwig <hch@infradead.org> wrote:

> On Wed, Jun 05, 2019 at 12:06:40PM -0700, Stephen Hemminger wrote:
> > > Which is true for every device, and why we use UUIDs or label for
> > > mounts that are supposed to be stable.  
> > 
> > Not everyone is smart enough to do that.  
> 
> Sure.  But they should not get a way out for just one specific driver.

There are people running new kernels on 6 year old distributions.
Was every distribution smart enough then? If you think so, then
this not necessary.

^ permalink raw reply

* [PATCH 08/13] IB/iser: set virt_boundary_mask in the scsi host
From: Christoph Hellwig @ 2019-06-05 19:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605190836.32354-1-hch@lst.de>

This ensures all proper DMA layer handling is taken care of by the
SCSI midlayer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/ulp/iser/iscsi_iser.c | 35 +++++-------------------
 1 file changed, 7 insertions(+), 28 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 9c185a8dabd3..841b66397a57 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -613,6 +613,7 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
 	struct Scsi_Host *shost;
 	struct iser_conn *iser_conn = NULL;
 	struct ib_conn *ib_conn;
+	struct ib_device *ib_dev;
 	u32 max_fr_sectors;
 
 	shost = iscsi_host_alloc(&iscsi_iser_sht, 0, 0);
@@ -643,16 +644,19 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
 		}
 
 		ib_conn = &iser_conn->ib_conn;
+		ib_dev = ib_conn->device->ib_device;
 		if (ib_conn->pi_support) {
-			u32 sig_caps = ib_conn->device->ib_device->attrs.sig_prot_cap;
+			u32 sig_caps = ib_dev->attrs.sig_prot_cap;
 
 			scsi_host_set_prot(shost, iser_dif_prot_caps(sig_caps));
 			scsi_host_set_guard(shost, SHOST_DIX_GUARD_IP |
 						   SHOST_DIX_GUARD_CRC);
 		}
 
-		if (iscsi_host_add(shost,
-				   ib_conn->device->ib_device->dev.parent)) {
+		if (!(ib_dev->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG))
+			shost->virt_boundary_mask = ~MASK_4K;
+
+		if (iscsi_host_add(shost, ib_dev->dev.parent)) {
 			mutex_unlock(&iser_conn->state_mutex);
 			goto free_host;
 		}
@@ -958,30 +962,6 @@ static umode_t iser_attr_is_visible(int param_type, int param)
 	return 0;
 }
 
-static int iscsi_iser_slave_alloc(struct scsi_device *sdev)
-{
-	struct iscsi_session *session;
-	struct iser_conn *iser_conn;
-	struct ib_device *ib_dev;
-
-	mutex_lock(&unbind_iser_conn_mutex);
-
-	session = starget_to_session(scsi_target(sdev))->dd_data;
-	iser_conn = session->leadconn->dd_data;
-	if (!iser_conn) {
-		mutex_unlock(&unbind_iser_conn_mutex);
-		return -ENOTCONN;
-	}
-	ib_dev = iser_conn->ib_conn.device->ib_device;
-
-	if (!(ib_dev->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG))
-		blk_queue_virt_boundary(sdev->request_queue, ~MASK_4K);
-
-	mutex_unlock(&unbind_iser_conn_mutex);
-
-	return 0;
-}
-
 static struct scsi_host_template iscsi_iser_sht = {
 	.module                 = THIS_MODULE,
 	.name                   = "iSCSI Initiator over iSER",
@@ -994,7 +974,6 @@ static struct scsi_host_template iscsi_iser_sht = {
 	.eh_device_reset_handler= iscsi_eh_device_reset,
 	.eh_target_reset_handler = iscsi_eh_recover_target,
 	.target_alloc		= iscsi_target_alloc,
-	.slave_alloc            = iscsi_iser_slave_alloc,
 	.proc_name              = "iscsi_iser",
 	.this_id                = -1,
 	.track_queue_depth	= 1,
-- 
2.20.1


^ permalink raw reply related

* [PATCH 10/13] megaraid_sas: set virt_boundary_mask in the scsi host
From: Christoph Hellwig @ 2019-06-05 19:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605190836.32354-1-hch@lst.de>

This ensures all proper DMA layer handling is taken care of by the
SCSI midlayer.  Note that the effect is global, as the IOMMU merging
is based off a paramters in struct device.  We could still turn if off
if no PCIe devices are present, but I don't know how to find that out.

Also remove the bogus nomerges flag, merges do take the virt_boundary
into account.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/megaraid/megaraid_sas_base.c   | 46 +++++----------------
 drivers/scsi/megaraid/megaraid_sas_fusion.c |  7 ++++
 2 files changed, 18 insertions(+), 35 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 3dd1df472dc6..20b3b3f8bc16 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -1870,39 +1870,6 @@ void megasas_set_dynamic_target_properties(struct scsi_device *sdev,
 	}
 }
 
-/*
- * megasas_set_nvme_device_properties -
- * set nomerges=2
- * set virtual page boundary = 4K (current mr_nvme_pg_size is 4K).
- * set maximum io transfer = MDTS of NVME device provided by MR firmware.
- *
- * MR firmware provides value in KB. Caller of this function converts
- * kb into bytes.
- *
- * e.a MDTS=5 means 2^5 * nvme page size. (In case of 4K page size,
- * MR firmware provides value 128 as (32 * 4K) = 128K.
- *
- * @sdev:				scsi device
- * @max_io_size:				maximum io transfer size
- *
- */
-static inline void
-megasas_set_nvme_device_properties(struct scsi_device *sdev, u32 max_io_size)
-{
-	struct megasas_instance *instance;
-	u32 mr_nvme_pg_size;
-
-	instance = (struct megasas_instance *)sdev->host->hostdata;
-	mr_nvme_pg_size = max_t(u32, instance->nvme_page_size,
-				MR_DEFAULT_NVME_PAGE_SIZE);
-
-	blk_queue_max_hw_sectors(sdev->request_queue, (max_io_size / 512));
-
-	blk_queue_flag_set(QUEUE_FLAG_NOMERGES, sdev->request_queue);
-	blk_queue_virt_boundary(sdev->request_queue, mr_nvme_pg_size - 1);
-}
-
-
 /*
  * megasas_set_static_target_properties -
  * Device property set by driver are static and it is not required to be
@@ -1961,8 +1928,10 @@ static void megasas_set_static_target_properties(struct scsi_device *sdev,
 		max_io_size_kb = le32_to_cpu(instance->tgt_prop->max_io_size_kb);
 	}
 
-	if (instance->nvme_page_size && max_io_size_kb)
-		megasas_set_nvme_device_properties(sdev, (max_io_size_kb << 10));
+	if (instance->nvme_page_size && max_io_size_kb) {
+		blk_queue_max_hw_sectors(sdev->request_queue,
+				(max_io_size_kb << 10) / 512);
+	}
 
 	scsi_change_queue_depth(sdev, device_qd);
 
@@ -6258,6 +6227,7 @@ static int megasas_start_aen(struct megasas_instance *instance)
 static int megasas_io_attach(struct megasas_instance *instance)
 {
 	struct Scsi_Host *host = instance->host;
+	u32 nvme_page_size = instance->nvme_page_size;
 
 	/*
 	 * Export parameters required by SCSI mid-layer
@@ -6298,6 +6268,12 @@ static int megasas_io_attach(struct megasas_instance *instance)
 	host->max_lun = MEGASAS_MAX_LUN;
 	host->max_cmd_len = 16;
 
+	if (nvme_page_size) {
+		if (nvme_page_size > MR_DEFAULT_NVME_PAGE_SIZE)
+			nvme_page_size = MR_DEFAULT_NVME_PAGE_SIZE;
+		host->virt_boundary_mask = nvme_page_size - 1;
+	}
+
 	/*
 	 * Notify the mid-layer about the new controller
 	 */
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index 4dfa0685a86c..a9ff3a648e7b 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -1935,6 +1935,13 @@ megasas_is_prp_possible(struct megasas_instance *instance,
 			build_prp = true;
 	}
 
+/*
+ * XXX: All the code following should go away.  The block layer guarantees
+ * merging according to the virt boundary.  And while we might have had some
+ * issues with that in the past we fixed them, and any new bug should be fixed
+ * in the core code as well.
+ */
+
 /*
  * Below code detects gaps/holes in IO data buffers.
  * What does holes/gaps mean?
-- 
2.20.1


^ permalink raw reply related

* [PATCH 09/13] IB/srp: set virt_boundary_mask in the scsi host
From: Christoph Hellwig @ 2019-06-05 19:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605190836.32354-1-hch@lst.de>

This ensures all proper DMA layer handling is taken care of by the
SCSI midlayer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 18 +++---------------
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index be9ddcad8f28..944fe8eee1ea 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -3061,20 +3061,6 @@ static int srp_target_alloc(struct scsi_target *starget)
 	return 0;
 }
 
-static int srp_slave_alloc(struct scsi_device *sdev)
-{
-	struct Scsi_Host *shost = sdev->host;
-	struct srp_target_port *target = host_to_target(shost);
-	struct srp_device *srp_dev = target->srp_host->srp_dev;
-	struct ib_device *ibdev = srp_dev->dev;
-
-	if (!(ibdev->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG))
-		blk_queue_virt_boundary(sdev->request_queue,
-					~srp_dev->mr_page_mask);
-
-	return 0;
-}
-
 static int srp_slave_configure(struct scsi_device *sdev)
 {
 	struct Scsi_Host *shost = sdev->host;
@@ -3277,7 +3263,6 @@ static struct scsi_host_template srp_template = {
 	.name				= "InfiniBand SRP initiator",
 	.proc_name			= DRV_NAME,
 	.target_alloc			= srp_target_alloc,
-	.slave_alloc			= srp_slave_alloc,
 	.slave_configure		= srp_slave_configure,
 	.info				= srp_target_info,
 	.queuecommand			= srp_queuecommand,
@@ -3812,6 +3797,9 @@ static ssize_t srp_create_target(struct device *dev,
 	target_host->max_cmd_len = sizeof ((struct srp_cmd *) (void *) 0L)->cdb;
 	target_host->max_segment_size = ib_dma_max_seg_size(ibdev);
 
+	if (!(ibdev->attrs.device_cap_flags & IB_DEVICE_SG_GAPS_REG))
+		target_host->virt_boundary_mask = ~srp_dev->mr_page_mask;
+
 	target = host_to_target(target_host);
 
 	target->net		= kobj_ns_grab_current(KOBJ_NS_TYPE_NET);
-- 
2.20.1


^ permalink raw reply related

* properly communicate queue limits to the DMA layer
From: Christoph Hellwig @ 2019-06-05 19:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel

Hi Jens,

we've always had a bit of a problem communicating the block layer
queue limits to the DMA layer, which needs to respect them when
an IOMMU that could merge segments is used.  Unfortunately most
drivers don't get this right.  Oddly enough we've been mostly
getting away with it, although lately dma-debug has been catching
a few of those issues.

The segment merging fix for devices with PRP-like structures seems
to have escalated this a bit.  The first patch fixes the actual
report from Sebastian, while the rest fix every drivers that appears
to have the problem based on a code audit looking for drivers using
blk_queue_max_segment_size, blk_queue_segment_boundary or
blk_queue_virt_boundary and calling dma_map_sg eventually.  Note
that for SCSI drivers I've taken the blk_queue_virt_boundary setting
to the SCSI core, similar to how we did it for the other two settings
a while ago.  This also deals with the fact that the DMA layer
settings are on a per-device granularity, so the per-device settings
in a few SCSI drivers can't actually work in an IOMMU environment.

It would be nice to eventually pass these limits as arguments to
dma_map_sg, but that is a far too big series for the 5.2 merge
window.

^ permalink raw reply

* [PATCH 01/13] nvme-pci: don't limit DMA segement size
From: Christoph Hellwig @ 2019-06-05 19:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605190836.32354-1-hch@lst.de>

NVMe uses PRPs (or optionally unlimited SGLs) for data transfers and
has no specific limit for a single DMA segement.  Limiting the size
will cause problems because the block layer assumes PRP-ish devices
using a virt boundary mask don't have a segment limit.  And while this
is true, we also really need to tell the DMA mapping layer about it,
otherwise dma-debug will trip over it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Sebastian Ott <sebott@linux.ibm.com>
---
 drivers/nvme/host/pci.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index f562154551ce..524d6bd6d095 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2513,6 +2513,12 @@ static void nvme_reset_work(struct work_struct *work)
 	 */
 	dev->ctrl.max_hw_sectors = NVME_MAX_KB_SZ << 1;
 	dev->ctrl.max_segments = NVME_MAX_SEGS;
+
+	/*
+	 * Don't limit the IOMMU merged segment size.
+	 */
+	dma_set_max_seg_size(dev->dev, 0xffffffff);
+
 	mutex_unlock(&dev->shutdown_lock);
 
 	/*
-- 
2.20.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox