From mboxrd@z Thu Jan 1 00:00:00 1970 From: Laurence Oberman Subject: Re: multipath IB/srp fail-over testing lands up in dump stack in swiotlb_alloc_coherent() Date: Mon, 13 Jun 2016 21:56:27 -0400 (EDT) Message-ID: <1964187258.42093298.1465869387551.JavaMail.zimbra@redhat.com> References: <19156300.41876496.1465771227395.JavaMail.zimbra@redhat.com> <2d316ddf-9a2a-3aba-cf2d-fcdaafbaa848@sandisk.com> <20160613140747.GL5408@leon.nu> <946373818.41993264.1465827597452.JavaMail.zimbra@redhat.com> <887623939.42004497.1465831339845.JavaMail.zimbra@redhat.com> <450384210.42057823.1465857004662.JavaMail.zimbra@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <450384210.42057823.1465857004662.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org Cc: Bart Van Assche , Yishai Hadas , linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-rdma@vger.kernel.org ----- Original Message ----- > From: "Laurence Oberman" > To: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org > Cc: "Bart Van Assche" , "Yishai Hadas" , linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Sent: Monday, June 13, 2016 6:30:04 PM > Subject: Re: multipath IB/srp fail-over testing lands up in dump stack in swiotlb_alloc_coherent() > > > > ----- Original Message ----- > > From: "Laurence Oberman" > > To: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org > > Cc: "Bart Van Assche" , "Yishai Hadas" > > , linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > > Sent: Monday, June 13, 2016 11:22:19 AM > > Subject: Re: multipath IB/srp fail-over testing lands up in dump stack in > > swiotlb_alloc_coherent() > > > > > > > > ----- Original Message ----- > > > From: "Laurence Oberman" > > > To: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org > > > Cc: "Bart Van Assche" , "Yishai Hadas" > > > , linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > > > Sent: Monday, June 13, 2016 10:19:57 AM > > > Subject: Re: multipath IB/srp fail-over testing lands up in dump stack in > > > swiotlb_alloc_coherent() > > > > > > > > > > > > ----- Original Message ----- > > > > From: "Leon Romanovsky" > > > > To: "Bart Van Assche" > > > > Cc: "Yishai Hadas" , "Laurence Oberman" > > > > , linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > > > > Sent: Monday, June 13, 2016 10:07:47 AM > > > > Subject: Re: multipath IB/srp fail-over testing lands up in dump stack > > > > in > > > > swiotlb_alloc_coherent() > > > > > > > > On Sun, Jun 12, 2016 at 11:32:53PM -0700, Bart Van Assche wrote: > > > > > On 06/12/2016 03:40 PM, Laurence Oberman wrote: > > > > > >Jun 8 10:12:52 jumpclient kernel: mlx5_core 0000:08:00.1: swiotlb > > > > > >buffer > > > > > >is full (sz: 266240 bytes) > > > > > >Jun 8 10:12:52 jumpclient kernel: swiotlb: coherent allocation > > > > > >failed > > > > > >for > > > > > >device 0000:08:00.1 size=266240 > > > > > > > > > > Hello, > > > > > > > > > > I think the above means that the coherent memory allocation succeeded > > > > > but > > > > > that the test dev_addr + size - 1 <= DMA_BIT_MASK(32) failed. Can > > > > > someone > > > > > from Mellanox tell us whether or not it would be safe to set > > > > > coherent_dma_mask to DMA_BIT_MASK(64) for the mlx4 and mlx5 drivers? > > > > > > > > Bart and Laurence, > > > > We are actually doing it For mlx5 driver. > > > > > > > > 926 static int mlx5_pci_init(struct mlx5_core_dev *dev, struct > > > > mlx5_priv > > > > *priv) > > > > <...> > > > > 961 err = set_dma_caps(pdev); > > > > > > > > 187 static int set_dma_caps(struct pci_dev *pdev) > > > > <...> > > > > 201 err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); > > > > 202 if (err) { > > > > 203 dev_warn(&pdev->dev, > > > > 204 "Warning: couldn't set 64-bit consistent > > > > PCI > > > > DMA > > > > mask\n"); > > > > 205 err = pci_set_consistent_dma_mask(pdev, > > > > DMA_BIT_MASK(32)); > > > > 206 if (err) { > > > > 207 dev_err(&pdev->dev, > > > > 208 "Can't set consistent PCI DMA mask, > > > > aborting\n"); > > > > 209 return err; > > > > 210 } > > > > 211 } > > > > > > > > 118 static inline int pci_set_consistent_dma_mask(struct pci_dev > > > > *dev,u64 > > > > mask) > > > > 119 { > > > > 120 return dma_set_coherent_mask(&dev->dev, mask); > > > > 121 } > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Bart. > > > > > -- > > > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" > > > > > in > > > > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > Hi Leon, > > > > > > OK I see it now > > > > > > static int set_dma_caps(struct pci_dev *pdev) > > > { > > > int err; > > > > > > err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64)); > > > if (err) { > > > > > > Thanks > > > Laurence > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > Replying to my own email. > > Leon, what is the implication of the mapping failure. > > Its only in the reconnect stack when I am restarting controllers with the > > messaging and stack dump masked I still see the failure but it seems > > transparent in that all the paths come back. > > > > [ 1595.167812] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240 > > bytes) > > [ 1595.379133] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240 > > bytes) > > [ 1595.460627] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240 > > bytes) > > [ 1598.121096] scsi host1: reconnect attempt 3 failed (-48) > > [ 1608.187869] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240 > > bytes) > > [ 1615.911705] scsi host1: reconnect attempt 4 failed (-12) > > [ 1641.446017] scsi host1: ib_srp: Got failed path rec status -110 > > [ 1641.482947] scsi host1: ib_srp: Path record query failed > > [ 1641.513454] scsi host1: reconnect attempt 5 failed (-110) > > [ 1662.330883] scsi host1: ib_srp: Got failed path rec status -110 > > [ 1662.361224] scsi host1: ib_srp: Path record query failed > > [ 1662.390768] scsi host1: reconnect attempt 6 failed (-110) > > [ 1683.892311] scsi host1: ib_srp: Got failed path rec status -110 > > [ 1683.922653] scsi host1: ib_srp: Path record query failed > > [ 1683.952717] scsi host1: reconnect attempt 7 failed (-110) > > SM port is up > > > > Entering MASTER state > > > > [ 1705.254048] scsi host1: REJ reason 0x8 > > [ 1705.274869] scsi host1: reconnect attempt 8 failed (-104) > > [ 1723.264914] scsi host1: REJ reason 0x8 > > [ 1723.285193] scsi host1: reconnect attempt 9 failed (-104) > > [ 1743.658091] scsi host1: REJ reason 0x8 > > [ 1743.678562] scsi host1: reconnect attempt 10 failed (-104) > > [ 1761.911512] scsi host1: REJ reason 0x8 > > [ 1761.932006] scsi host1: reconnect attempt 11 failed (-104) > > [ 1782.209020] scsi host1: ib_srp: reconnect succeeded > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > Hi Leon > > Calling relationship looks like this. > > swiotlb_alloc_coherent > u64 dma_mask = DMA_BIT_MASK(32); > > May get overwritten here, I assume as a 64 bit mask right ? > > if (hwdev && hwdev->coherent_dma_mask) > dma_mask = hwdev->coherent_dma_mask; ***** Is this now > DMA_BIT_MASK(64) > > > We fail here and then try single but we fail so we see the warning > ret = (void *)__get_free_pages(flags, order); > > It seems to be a non-critical event in that when we are able to reconnect we > do. > I am missing how we recover here. I assume on next try we pass. > I will add some instrumentation to figure this out. > > > if (!ret) { > /* > * We are either out of memory or the device can't DMA to > * GFP_DMA memory; fall back on map_single(), which > * will grab memory from the lowest available address range. > */ > phys_addr_t paddr = map_single(hwdev, 0, size, DMA_FROM_DEVICE); > if (paddr == SWIOTLB_MAP_ERROR) > goto err_warn; > > ret = phys_to_virt(paddr); > dev_addr = phys_to_dma(hwdev, paddr); > > /* Confirm address can be DMA'd by device */ > if (dev_addr + size - 1 > dma_mask) { > printk("hwdev DMA mask = 0x%016Lx, dev_addr = 0x%016Lx\n", > (unsigned long long)dma_mask, > (unsigned long long)dev_addr); > > /* DMA_TO_DEVICE to avoid memcpy in unmap_single */ > swiotlb_tbl_unmap_single(hwdev, paddr, > size, DMA_TO_DEVICE); > goto err_warn; > } > } > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Tracing do { while (iommu_is_span_boundary(index, nslots, offset_slots, max_slots)) { index += stride; if (index >= io_tlb_nslabs) index = 0; if (index == wrap) goto not_found; ------------------------> take this jump } [ 986.838508] RHDEBUG: wrap=56 index=56 not_found: spin_unlock_irqrestore(&io_tlb_lock, flags); if (printk_ratelimit()) { dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", size); printk("RHDEBUG: wrap=%u index=%u\n",wrap,index); } return SWIOTLB_MAP_ERROR; [ 990.484449] RHDEBUG: SWIOTLB_MAP_ERROR ffffffffffffffff Then We take this branch to get to err_warn label. phys_addr_t paddr = map_single(hwdev, 0, size, DMA_FROM_DEVICE); if (paddr == SWIOTLB_MAP_ERROR) { printk("RHDEBUG: SWIOTLB_MAP_ERROR %llx\n",paddr); goto err_warn; } -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html