From: Qing Huang <qing.huang@oracle.com>
To: Tariq Toukan <tariqt@mellanox.com>,
Eric Dumazet <eric.dumazet@gmail.com>,
davem@davemloft.net, haakon.bugge@oracle.com,
yanjun.zhu@oracle.com
Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org, gi-oh.kim@profitbricks.com
Subject: Re: [PATCH v3] mlx4_core: allocate ICM memory in page size chunks
Date: Tue, 22 May 2018 18:41:39 -0700 [thread overview]
Message-ID: <4b7a4f67-2c08-a60d-81cd-f12db42622ec@oracle.com> (raw)
In-Reply-To: <35ba0f14-7b24-96ff-6b2d-610a4b2980c2@mellanox.com>
On 5/22/2018 8:33 AM, Tariq Toukan wrote:
>
>
> On 18/05/2018 12:45 AM, Qing Huang wrote:
>>
>>
>> On 5/17/2018 2:14 PM, Eric Dumazet wrote:
>>> On 05/17/2018 01:53 PM, Qing Huang wrote:
>>>> When a system is under memory presure (high usage with fragments),
>>>> the original 256KB ICM chunk allocations will likely trigger kernel
>>>> memory management to enter slow path doing memory compact/migration
>>>> ops in order to complete high order memory allocations.
>>>>
>>>> When that happens, user processes calling uverb APIs may get stuck
>>>> for more than 120s easily even though there are a lot of free pages
>>>> in smaller chunks available in the system.
>>>>
>>>> Syslog:
>>>> ...
>>>> Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
>>>> oracle_205573_e:205573 blocked for more than 120 seconds.
>>>> ...
>>>>
>>> NACK on this patch.
>>>
>>> You have been asked repeatedly to use kvmalloc()
>>>
>>> This is not a minor suggestion.
>>>
>>> Take a look
>>> athttps://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d8c13f2271ec5178c52fbde072ec7b562651ed9d
>>>
>>
>> Would you please take a look at how table->icm is being used in the
>> mlx4 driver? It's a meta data used for individual pointer variable
>> referencing,
>> not as data frag or in/out buffer. It has no need for contiguous phy.
>> memory.
>>
>> Thanks.
>>
>
> NACK.
>
> This would cause a degradation when iterating the entries of table->icm.
> For example, in mlx4_table_get_range.
E.g.
int mlx4_table_get_range(struct mlx4_dev *dev, struct mlx4_icm_table *table,
u32 start, u32 end)
{
int inc = MLX4_TABLE_CHUNK_SIZE / table->obj_size;
int err;
u32 i;
for (i = start; i <= end; i += inc) {
err = mlx4_table_get(dev, table, i);
if (err)
goto fail;
}
return 0;
...
}
E.g. mtt obj is 8 bytes, so a 4KB ICM block would have 512 mtt objects.
So you will have to allocate
more 512 mtt objects in order to have table->icm pointer to increment by
1 to fetch next pointer
value. So 256K mtt objects are needed in order to traverse table->icm
pointer across a page boundary
in the call stacks.
Considering mlx4_table_get_range() is only used in control path, there
is no significant gain by using kvzalloc
vs. vzalloc for table->icm.
Anyway, if a user makes sure mlx4 driver to be loaded very early and
doesn't remove and reload it afterwards,
we should have enough (and not wasting) contiguous phy mem for
table->icm allocation. I will use kvzalloc to
replace vzalloc and send a V4 patch.
Thanks,
Qing
>
> Thanks,
> Tariq
>
>>> And you'll understand some people care about this.
>>>
>>> Strongly.
>>>
>>> Thanks.
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2018-05-23 1:41 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-17 20:53 [PATCH v3] mlx4_core: allocate ICM memory in page size chunks Qing Huang
2018-05-17 21:14 ` Eric Dumazet
2018-05-17 21:45 ` Qing Huang
2018-05-22 15:33 ` Tariq Toukan
2018-05-23 1:41 ` Qing Huang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4b7a4f67-2c08-a60d-81cd-f12db42622ec@oracle.com \
--to=qing.huang@oracle.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=gi-oh.kim@profitbricks.com \
--cc=haakon.bugge@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=tariqt@mellanox.com \
--cc=yanjun.zhu@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).