From mboxrd@z Thu Jan 1 00:00:00 1970 From: Qing Huang Subject: Re: [PATCH net] mlx4_core: restore optimal ICM memory allocation Date: Wed, 30 May 2018 13:30:55 -0700 Message-ID: References: <20180530041152.113393-1-edumazet@google.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: netdev , Eric Dumazet , John Sperbeck , Tarick Bedeir , Daniel Jurgens , Zhu Yanjun , Tariq Toukan To: Eric Dumazet , "David S . Miller" Return-path: Received: from aserp2130.oracle.com ([141.146.126.79]:54480 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753776AbeE3UbD (ORCPT ); Wed, 30 May 2018 16:31:03 -0400 In-Reply-To: <20180530041152.113393-1-edumazet@google.com> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 5/29/2018 9:11 PM, Eric Dumazet wrote: > Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks") > brought a regression caught in our regression suite, thanks to KASAN. If KASAN reported issue was really caused by smaller chunk sizes, changing allocation order dynamically will eventually hit the same issue. > Note that mlx4_alloc_icm() is already able to try high order allocations > and fallback to low-order allocations under high memory pressure. > > We only have to tweak gfp_mask a bit, to help falling back faster, > without risking OOM killings. > > BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib] > Read of size 4 at addr ffff8817df584f68 by task qp_listing_test/92585 > > CPU: 38 PID: 92585 Comm: qp_listing_test Tainted: G O > Call Trace: > [] dump_stack+0x4d/0x72 > [] print_address_description+0x6f/0x260 > [] kasan_report+0x257/0x370 > [] __asan_report_load4_noabort+0x19/0x20 > [] to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib] > [] mlx4_ib_query_qp+0x1213/0x1660 [mlx4_ib] > [] qpstat_print_qp+0x13b/0x500 [ib_uverbs] > [] qpstat_seq_show+0x4a/0xb0 [ib_uverbs] > [] seq_read+0xa9c/0x1230 > [] proc_reg_read+0xc1/0x180 > [] __vfs_read+0xe8/0x730 > [] vfs_read+0xf7/0x300 > [] SyS_read+0xd2/0x1b0 > [] do_syscall_64+0x186/0x420 > [] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > RIP: 0033:0x7f851a7bb30d > RSP: 002b:00007ffd09a758c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000 > RAX: ffffffffffffffda RBX: 00007f84ff959440 RCX: 00007f851a7bb30d > RDX: 000000000003fc00 RSI: 00007f84ff60a000 RDI: 000000000000000b > RBP: 00007ffd09a75900 R08: 00000000ffffffff R09: 0000000000000000 > R10: 0000000000000022 R11: 0000000000000293 R12: 0000000000000000 > R13: 000000000003ffff R14: 000000000003ffff R15: 00007f84ff60a000 > > Allocated by task 4488: > save_stack+0x46/0xd0 > kasan_kmalloc+0xad/0xe0 > __kmalloc+0x101/0x5e0 > ib_register_device+0xc03/0x1250 [ib_core] > mlx4_ib_add+0x27d6/0x4dd0 [mlx4_ib] > mlx4_add_device+0xa9/0x340 [mlx4_core] > mlx4_register_interface+0x16e/0x390 [mlx4_core] > xhci_pci_remove+0x7a/0x180 [xhci_pci] > do_one_initcall+0xa0/0x230 > do_init_module+0x1b9/0x5a4 > load_module+0x63e6/0x94c0 > SYSC_init_module+0x1a4/0x1c0 > SyS_init_module+0xe/0x10 > do_syscall_64+0x186/0x420 > entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > Freed by task 0: > (stack is not available) > > The buggy address belongs to the object at ffff8817df584f40 > which belongs to the cache kmalloc-32 of size 32 > The buggy address is located 8 bytes to the right of > 32-byte region [ffff8817df584f40, ffff8817df584f60) > The buggy address belongs to the page: > page:ffffea005f7d6100 count:1 mapcount:0 mapping:ffff8817df584000 index:0xffff8817df584fc1 > flags: 0x880000000000100(slab) > raw: 0880000000000100 ffff8817df584000 ffff8817df584fc1 000000010000003f > raw: ffffea005f3ac0a0 ffffea005c476760 ffff8817fec00900 ffff883ff78d26c0 > page dumped because: kasan: bad access detected > page->mem_cgroup:ffff883ff78d26c0 > > Memory state around the buggy address: > ffff8817df584e00: 00 03 fc fc fc fc fc fc 00 03 fc fc fc fc fc fc > ffff8817df584e80: 00 00 00 04 fc fc fc fc 00 00 00 fc fc fc fc fc >> ffff8817df584f00: fb fb fb fb fc fc fc fc 00 00 00 00 fc fc fc fc > ^ > ffff8817df584f80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc > ffff8817df585000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > > Fixes: 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks") > Signed-off-by: Eric Dumazet > Cc: John Sperbeck > Cc: Tarick Bedeir > Cc: Qing Huang > Cc: Daniel Jurgens > Cc: Zhu Yanjun > Cc: Tariq Toukan > --- > drivers/net/ethernet/mellanox/mlx4/icm.c | 17 +++++++++++------ > 1 file changed, 11 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c > index 685337d58276fc91baeeb64387c52985e1bc6dda..cae33d5c7dbd9ba7929adcf2127b104f6796fa5a 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/icm.c > +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c > @@ -43,12 +43,13 @@ > #include "fw.h" > > /* > - * We allocate in page size (default 4KB on many archs) chunks to avoid high > - * order memory allocations in fragmented/high usage memory situation. > + * We allocate in as big chunks as we can, up to a maximum of 256 KB > + * per chunk. Note that the chunks are not necessarily in contiguous > + * physical memory. > */ > enum { > - MLX4_ICM_ALLOC_SIZE = PAGE_SIZE, > - MLX4_TABLE_CHUNK_SIZE = PAGE_SIZE, > + MLX4_ICM_ALLOC_SIZE = 1 << 18, > + MLX4_TABLE_CHUNK_SIZE = 1 << 18, > }; > > static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk) > @@ -135,6 +136,7 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages, > struct mlx4_icm *icm; > struct mlx4_icm_chunk *chunk = NULL; > int cur_order; > + gfp_t mask; > int ret; > > /* We use sg_set_buf for coherent allocs, which assumes low memory */ > @@ -178,13 +180,16 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages, > while (1 << cur_order > npages) > --cur_order; > > + mask = gfp_mask; > + if (cur_order) > + mask = (mask & ~__GFP_DIRECT_RECLAIM) | __GFP_NORETRY; > if (coherent) > ret = mlx4_alloc_icm_coherent(&dev->persist->pdev->dev, > &chunk->mem[chunk->npages], > - cur_order, gfp_mask); > + cur_order, mask); > else > ret = mlx4_alloc_icm_pages(&chunk->mem[chunk->npages], > - cur_order, gfp_mask, > + cur_order, mask, > dev->numa_node); > > if (ret) {