linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: yizhan@redhat.com (Yi Zhang)
Subject: mlx4_core 0000:07:00.0: swiotlb buffer is full and OOM observed during stress test on reset_controller
Date: Tue, 14 Mar 2017 21:35:32 +0800	[thread overview]
Message-ID: <860db62d-ae93-d94c-e5fb-88e7b643f737@redhat.com> (raw)
In-Reply-To: <56e8ccd3-8116-89a1-2f65-eb61a91c5f84@mellanox.com>



On 03/13/2017 02:16 AM, Max Gurtovoy wrote:
>
>
> On 3/10/2017 6:52 PM, Leon Romanovsky wrote:
>> On Thu, Mar 09, 2017@12:20:14PM +0800, Yi Zhang wrote:
>>>
>>>> I'm using CX5-LX device and have not seen any issues with it.
>>>>
>>>> Would it be possible to retest with kmemleak?
>>>>
>>> Here is the device I used.
>>>
>>> Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
>>>
>>> The issue always can be reproduced with about 1000 time.
>>>
>>> Another thing is I found one strange phenomenon from the log:
>>>
>>> before the OOM occurred, most of the log are  about "adding queue", and
>>> after the OOM occurred, most of the log are about "nvmet_rdma: freeing
>>> queue".
>>>
>>> seems the release work: "schedule_work(&queue->release_work);" not 
>>> executed
>>> timely, not sure whether the OOM is caused by this reason.
>>
>> Sagi,
>> The release function is placed in global workqueue. I'm not familiar
>> with NVMe design and I don't know all the details, but maybe the 
>> proper way will
>> be to create special workqueue with MEM_RECLAIM flag to ensure the 
>> progress?
>>
>
> Hi,
>
> I was able to repro it in my lab with ConnectX3. added a dedicated 
> workqueue with high priority but the bug still happens.
> if I add a "sleep 1" after echo 1 
> >/sys/block/nvme0n1/device/reset_controller the test pass. So there is 
> no leak IMO, but the allocation process is much faster than the 
> destruction of the resources.
> In the initiator we don't wait for RDMA_CM_EVENT_DISCONNECTED event 
> after we call rdma_disconnect, and we try to connect immediatly again.
> maybe we need to slow down the storm of connect requests from the 
> initiator somehow to let the target time to settle up.
>
> Max.
>
>
Hi Sagi
Let's use this mail loop to track the OOM issue. :)

Thanks
Yi
>>>
>>> Here is the log before/after OOM
>>> http://pastebin.com/Zb6w4nEv
>>>
>>>> _______________________________________________
>>>> Linux-nvme mailing list
>>>> Linux-nvme at lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-nvme
>>>
>>>
>>> _______________________________________________
>>> Linux-nvme mailing list
>>> Linux-nvme at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-nvme
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2017-03-14 13:35 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1908657724.31179983.1488539944957.JavaMail.zimbra@redhat.com>
2017-03-03 11:55 ` mlx4_core 0000:07:00.0: swiotlb buffer is full and OOM observed during stress test on reset_controller Yi Zhang
2017-03-05  8:12   ` Leon Romanovsky
2017-03-08 15:48     ` Christoph Hellwig
2017-03-09  8:42       ` Leon Romanovsky
2017-03-09  8:46     ` Leon Romanovsky
2017-03-09 10:33       ` Yi Zhang
2017-03-06 11:23   ` Sagi Grimberg
2017-03-09  4:20     ` Yi Zhang
2017-03-09 11:42       ` Max Gurtovoy
2017-03-10  8:12         ` Yi Zhang
2017-03-10 16:52       ` Leon Romanovsky
2017-03-12 18:16         ` Max Gurtovoy
2017-03-14 13:35           ` Yi Zhang [this message]
2017-03-14 16:52             ` Max Gurtovoy
2017-03-15  7:48               ` Yi Zhang
2017-03-16 16:51                 ` Sagi Grimberg
2017-03-18 11:51                   ` Yi Zhang
2017-03-18 17:50                     ` Sagi Grimberg
2017-03-19  7:01                   ` Leon Romanovsky
2017-05-18 17:01                     ` Yi Zhang
2017-05-19 16:17                       ` Yi Zhang
2017-06-04 15:49                         ` Sagi Grimberg
2017-06-15  8:45                           ` Yi Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=860db62d-ae93-d94c-e5fb-88e7b643f737@redhat.com \
    --to=yizhan@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).