public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Laurence Oberman <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
Cc: Bart Van Assche
	<bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>,
	Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: multipath IB/srp fail-over testing lands up in dump stack in swiotlb_alloc_coherent()
Date: Mon, 13 Jun 2016 21:56:27 -0400 (EDT)	[thread overview]
Message-ID: <1964187258.42093298.1465869387551.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <450384210.42057823.1465857004662.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>



----- Original Message -----
> From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> To: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Yishai Hadas" <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Sent: Monday, June 13, 2016 6:30:04 PM
> Subject: Re: multipath IB/srp fail-over testing lands up in dump stack in swiotlb_alloc_coherent()
> 
> 
> 
> ----- Original Message -----
> > From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > To: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> > Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Yishai Hadas"
> > <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Sent: Monday, June 13, 2016 11:22:19 AM
> > Subject: Re: multipath IB/srp fail-over testing lands up in dump stack in
> > swiotlb_alloc_coherent()
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "Laurence Oberman" <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > To: leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> > > Cc: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>, "Yishai Hadas"
> > > <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > Sent: Monday, June 13, 2016 10:19:57 AM
> > > Subject: Re: multipath IB/srp fail-over testing lands up in dump stack in
> > > swiotlb_alloc_coherent()
> > > 
> > > 
> > > 
> > > ----- Original Message -----
> > > > From: "Leon Romanovsky" <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > > > To: "Bart Van Assche" <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > > > Cc: "Yishai Hadas" <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>, "Laurence Oberman"
> > > > <loberman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > Sent: Monday, June 13, 2016 10:07:47 AM
> > > > Subject: Re: multipath IB/srp fail-over testing lands up in dump stack
> > > > in
> > > > swiotlb_alloc_coherent()
> > > > 
> > > > On Sun, Jun 12, 2016 at 11:32:53PM -0700, Bart Van Assche wrote:
> > > > > On 06/12/2016 03:40 PM, Laurence Oberman wrote:
> > > > > >Jun  8 10:12:52 jumpclient kernel: mlx5_core 0000:08:00.1: swiotlb
> > > > > >buffer
> > > > > >is full (sz: 266240 bytes)
> > > > > >Jun  8 10:12:52 jumpclient kernel: swiotlb: coherent allocation
> > > > > >failed
> > > > > >for
> > > > > >device 0000:08:00.1 size=266240
> > > > > 
> > > > > Hello,
> > > > > 
> > > > > I think the above means that the coherent memory allocation succeeded
> > > > > but
> > > > > that the test dev_addr + size - 1 <= DMA_BIT_MASK(32) failed. Can
> > > > > someone
> > > > > from Mellanox tell us whether or not it would be safe to set
> > > > > coherent_dma_mask to DMA_BIT_MASK(64) for the mlx4 and mlx5 drivers?
> > > > 
> > > > Bart and Laurence,
> > > > We are actually doing it For mlx5 driver.
> > > > 
> > > > 926 static int mlx5_pci_init(struct mlx5_core_dev *dev, struct
> > > > mlx5_priv
> > > > *priv)
> > > > <...>
> > > > 961         err = set_dma_caps(pdev);
> > > > 
> > > > 187 static int set_dma_caps(struct pci_dev *pdev)
> > > > <...>
> > > > 201         err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
> > > > 202         if (err) {
> > > > 203                 dev_warn(&pdev->dev,
> > > > 204                          "Warning: couldn't set 64-bit consistent
> > > > PCI
> > > > DMA
> > > > mask\n");
> > > > 205                 err = pci_set_consistent_dma_mask(pdev,
> > > > DMA_BIT_MASK(32));
> > > > 206                 if (err) {
> > > > 207                         dev_err(&pdev->dev,
> > > > 208                                 "Can't set consistent PCI DMA mask,
> > > > aborting\n");
> > > > 209                         return err;
> > > > 210                 }
> > > > 211         }
> > > > 
> > > > 118 static inline int pci_set_consistent_dma_mask(struct pci_dev
> > > > *dev,u64
> > > > mask)
> > > > 119 {
> > > > 120         return dma_set_coherent_mask(&dev->dev, mask);
> > > > 121 }
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > Bart.
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > > > > in
> > > > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > 
> > > Hi Leon,
> > > 
> > > OK I see it now
> > > 
> > > static int set_dma_caps(struct pci_dev *pdev)
> > > {
> > >         int err;
> > > 
> > >         err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
> > >         if (err) {
> > > 
> > > Thanks
> > > Laurence
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> > Replying to my own email.
> > Leon, what is the implication of the mapping failure.
> > Its only in the reconnect stack when I am restarting controllers with the
> > messaging and stack dump masked I still see the failure but it seems
> > transparent in that all the paths come back.
> > 
> > [ 1595.167812] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240
> > bytes)
> > [ 1595.379133] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240
> > bytes)
> > [ 1595.460627] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240
> > bytes)
> > [ 1598.121096] scsi host1: reconnect attempt 3 failed (-48)
> > [ 1608.187869] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240
> > bytes)
> > [ 1615.911705] scsi host1: reconnect attempt 4 failed (-12)
> > [ 1641.446017] scsi host1: ib_srp: Got failed path rec status -110
> > [ 1641.482947] scsi host1: ib_srp: Path record query failed
> > [ 1641.513454] scsi host1: reconnect attempt 5 failed (-110)
> > [ 1662.330883] scsi host1: ib_srp: Got failed path rec status -110
> > [ 1662.361224] scsi host1: ib_srp: Path record query failed
> > [ 1662.390768] scsi host1: reconnect attempt 6 failed (-110)
> > [ 1683.892311] scsi host1: ib_srp: Got failed path rec status -110
> > [ 1683.922653] scsi host1: ib_srp: Path record query failed
> > [ 1683.952717] scsi host1: reconnect attempt 7 failed (-110)
> > SM port is up
> > 
> > Entering MASTER state
> > 
> > [ 1705.254048] scsi host1:   REJ reason 0x8
> > [ 1705.274869] scsi host1: reconnect attempt 8 failed (-104)
> > [ 1723.264914] scsi host1:   REJ reason 0x8
> > [ 1723.285193] scsi host1: reconnect attempt 9 failed (-104)
> > [ 1743.658091] scsi host1:   REJ reason 0x8
> > [ 1743.678562] scsi host1: reconnect attempt 10 failed (-104)
> > [ 1761.911512] scsi host1:   REJ reason 0x8
> > [ 1761.932006] scsi host1: reconnect attempt 11 failed (-104)
> > [ 1782.209020] scsi host1: ib_srp: reconnect succeeded
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> Hi Leon
> 
> Calling relationship looks like this.
> 
> swiotlb_alloc_coherent
>      u64 dma_mask = DMA_BIT_MASK(32);
>           
>       May get overwritten here, I assume as a 64 bit mask right ?
> 
>        if (hwdev && hwdev->coherent_dma_mask)
> 		dma_mask = hwdev->coherent_dma_mask;          ***** Is this now
> 		DMA_BIT_MASK(64)
> 
> 
> We fail here and then try single but we fail so we see the warning
>  ret = (void *)__get_free_pages(flags, order);
> 
> It seems to be a non-critical event in that when we are able to reconnect we
> do.
> I am missing how we recover here. I assume on next try we pass.
> I will add some instrumentation to figure this out.
> 
> 
> if (!ret) {
> 		/*
> 		 * We are either out of memory or the device can't DMA to
> 		 * GFP_DMA memory; fall back on map_single(), which
> 		 * will grab memory from the lowest available address range.
> 		 */
> 		phys_addr_t paddr = map_single(hwdev, 0, size, DMA_FROM_DEVICE);
> 		if (paddr == SWIOTLB_MAP_ERROR)
> 			goto err_warn;
> 
> 		ret = phys_to_virt(paddr);
> 		dev_addr = phys_to_dma(hwdev, paddr);
> 
> 		/* Confirm address can be DMA'd by device */
> 		if (dev_addr + size - 1 > dma_mask) {
> 			printk("hwdev DMA mask = 0x%016Lx, dev_addr = 0x%016Lx\n",
> 			       (unsigned long long)dma_mask,
> 			       (unsigned long long)dev_addr);
> 
> 			/* DMA_TO_DEVICE to avoid memcpy in unmap_single */
> 			swiotlb_tbl_unmap_single(hwdev, paddr,
> 						 size, DMA_TO_DEVICE);
> 			goto err_warn;
> 		}
> 	}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Tracing

        do {
                while (iommu_is_span_boundary(index, nslots, offset_slots,
                                              max_slots)) {
                        index += stride;
                        if (index >= io_tlb_nslabs)
                                index = 0;
                        if (index == wrap)
                                goto not_found; ------------------------> take this jump
                }


[  986.838508] RHDEBUG: wrap=56 index=56

not_found:
        spin_unlock_irqrestore(&io_tlb_lock, flags);
        if (printk_ratelimit()) {
                dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", size);
                printk("RHDEBUG: wrap=%u index=%u\n",wrap,index);
        }
        return SWIOTLB_MAP_ERROR;

[  990.484449] RHDEBUG: SWIOTLB_MAP_ERROR ffffffffffffffff

Then

We take this branch to get to err_warn label.

                phys_addr_t paddr = map_single(hwdev, 0, size, DMA_FROM_DEVICE);
                if (paddr == SWIOTLB_MAP_ERROR) {
                        printk("RHDEBUG: SWIOTLB_MAP_ERROR %llx\n",paddr);
                        goto err_warn;
                }
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2016-06-14  1:56 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1217453008.41876448.1465770498545.JavaMail.zimbra@redhat.com>
     [not found] ` <1217453008.41876448.1465770498545.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-12 22:40   ` multipath IB/srp fail-over testing lands up in dump stack in swiotlb_alloc_coherent() Laurence Oberman
     [not found]     ` <19156300.41876496.1465771227395.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-13  6:32       ` Bart Van Assche
     [not found]         ` <2d316ddf-9a2a-3aba-cf2d-fcdaafbaa848-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-06-13 13:23           ` Laurence Oberman
2016-06-13 14:07           ` Leon Romanovsky
     [not found]             ` <20160613140747.GL5408-2ukJVAZIZ/Y@public.gmane.org>
2016-06-13 14:19               ` Laurence Oberman
     [not found]                 ` <946373818.41993264.1465827597452.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-13 15:22                   ` Laurence Oberman
     [not found]                     ` <887623939.42004497.1465831339845.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-13 22:30                       ` Laurence Oberman
     [not found]                         ` <450384210.42057823.1465857004662.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-14  1:56                           ` Laurence Oberman [this message]
     [not found]                             ` <1964187258.42093298.1465869387551.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-14  9:24                               ` Bart Van Assche
     [not found]                                 ` <11e680c4-84b3-1cd6-133c-36f71bd853d0-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-06-14 12:08                                   ` Leon Romanovsky
     [not found]                                     ` <20160614120833.GO5408-2ukJVAZIZ/Y@public.gmane.org>
2016-06-14 12:25                                       ` Bart Van Assche
     [not found]                                         ` <fe7c9713-2864-7b6c-53ec-f5d1364d65d8-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-06-14 13:10                                           ` Laurence Oberman
2016-06-14 13:15                                           ` Leon Romanovsky
     [not found]                                             ` <20160614131552.GP5408-2ukJVAZIZ/Y@public.gmane.org>
2016-06-14 13:57                                               ` Laurence Oberman
     [not found]                                                 ` <1531921470.42169965.1465912634165.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-14 17:40                                                   ` Laurence Oberman
     [not found]                                                     ` <1296246237.42197305.1465926035162.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-14 18:41                                                       ` Laurence Oberman
     [not found]                                                         ` <1167916510.42202925.1465929678588.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-15  7:40                                                           ` Bart Van Assche
     [not found]                                                             ` <a524c577-cfb1-4072-da12-01d0d9ab9c38-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-06-15 10:57                                                               ` Laurence Oberman
     [not found]                                                                 ` <109658870.42286330.1465988279277.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-15 12:02                                                                   ` Laurence Oberman
     [not found]                                                                     ` <794983323.42297890.1465992133003.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-15 12:51                                                                       ` Bart Van Assche
     [not found]                                                                         ` <cb6f8f42-1f4f-cf9d-42d0-12ba5e90ab86-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-06-15 13:19                                                                           ` Laurence Oberman
     [not found]                                                                             ` <1925675172.42312868.1465996772507.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-15 13:23                                                                               ` Laurence Oberman
     [not found]                                                                                 ` <868111008.42313561.1465997038399.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-06-15 23:05                                                                                   ` Laurence Oberman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1964187258.42093298.1465869387551.JavaMail.zimbra@redhat.com \
    --to=loberman-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org \
    --cc=leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox