From: yizhan@redhat.com (Yi Zhang)
Subject: mlx4_core 0000:07:00.0: swiotlb buffer is full and OOM observed during stress test on reset_controller
Date: Tue, 14 Mar 2017 21:35:32 +0800 [thread overview]
Message-ID: <860db62d-ae93-d94c-e5fb-88e7b643f737@redhat.com> (raw)
In-Reply-To: <56e8ccd3-8116-89a1-2f65-eb61a91c5f84@mellanox.com>
On 03/13/2017 02:16 AM, Max Gurtovoy wrote:
>
>
> On 3/10/2017 6:52 PM, Leon Romanovsky wrote:
>> On Thu, Mar 09, 2017@12:20:14PM +0800, Yi Zhang wrote:
>>>
>>>> I'm using CX5-LX device and have not seen any issues with it.
>>>>
>>>> Would it be possible to retest with kmemleak?
>>>>
>>> Here is the device I used.
>>>
>>> Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
>>>
>>> The issue always can be reproduced with about 1000 time.
>>>
>>> Another thing is I found one strange phenomenon from the log:
>>>
>>> before the OOM occurred, most of the log are about "adding queue", and
>>> after the OOM occurred, most of the log are about "nvmet_rdma: freeing
>>> queue".
>>>
>>> seems the release work: "schedule_work(&queue->release_work);" not
>>> executed
>>> timely, not sure whether the OOM is caused by this reason.
>>
>> Sagi,
>> The release function is placed in global workqueue. I'm not familiar
>> with NVMe design and I don't know all the details, but maybe the
>> proper way will
>> be to create special workqueue with MEM_RECLAIM flag to ensure the
>> progress?
>>
>
> Hi,
>
> I was able to repro it in my lab with ConnectX3. added a dedicated
> workqueue with high priority but the bug still happens.
> if I add a "sleep 1" after echo 1
> >/sys/block/nvme0n1/device/reset_controller the test pass. So there is
> no leak IMO, but the allocation process is much faster than the
> destruction of the resources.
> In the initiator we don't wait for RDMA_CM_EVENT_DISCONNECTED event
> after we call rdma_disconnect, and we try to connect immediatly again.
> maybe we need to slow down the storm of connect requests from the
> initiator somehow to let the target time to settle up.
>
> Max.
>
>
Hi Sagi
Let's use this mail loop to track the OOM issue. :)
Thanks
Yi
>>>
>>> Here is the log before/after OOM
>>> http://pastebin.com/Zb6w4nEv
>>>
>>>> _______________________________________________
>>>> Linux-nvme mailing list
>>>> Linux-nvme at lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-nvme
>>>
>>>
>>> _______________________________________________
>>> Linux-nvme mailing list
>>> Linux-nvme at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-nvme
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2017-03-14 13:35 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1908657724.31179983.1488539944957.JavaMail.zimbra@redhat.com>
2017-03-03 11:55 ` mlx4_core 0000:07:00.0: swiotlb buffer is full and OOM observed during stress test on reset_controller Yi Zhang
2017-03-05 8:12 ` Leon Romanovsky
2017-03-08 15:48 ` Christoph Hellwig
2017-03-09 8:42 ` Leon Romanovsky
2017-03-09 8:46 ` Leon Romanovsky
2017-03-09 10:33 ` Yi Zhang
2017-03-06 11:23 ` Sagi Grimberg
2017-03-09 4:20 ` Yi Zhang
2017-03-09 11:42 ` Max Gurtovoy
2017-03-10 8:12 ` Yi Zhang
2017-03-10 16:52 ` Leon Romanovsky
2017-03-12 18:16 ` Max Gurtovoy
2017-03-14 13:35 ` Yi Zhang [this message]
2017-03-14 16:52 ` Max Gurtovoy
2017-03-15 7:48 ` Yi Zhang
2017-03-16 16:51 ` Sagi Grimberg
2017-03-18 11:51 ` Yi Zhang
2017-03-18 17:50 ` Sagi Grimberg
2017-03-19 7:01 ` Leon Romanovsky
2017-05-18 17:01 ` Yi Zhang
2017-05-19 16:17 ` Yi Zhang
2017-06-04 15:49 ` Sagi Grimberg
2017-06-15 8:45 ` Yi Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=860db62d-ae93-d94c-e5fb-88e7b643f737@redhat.com \
--to=yizhan@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).