* [PATCH/RFC] mxl4_core: Scale size of MTT table with system RAM
@ 2012-03-05 18:09 Roland Dreier
[not found] ` <1330970972-10225-1-git-send-email-roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Roland Dreier @ 2012-03-05 18:09 UTC (permalink / raw)
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
From: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
The current driver defaults to 1M MTT segments, where each segment holds
8 MTT entries. This limits the total memory registered to 8M * PAGE_SIZE
which is 32GB with 4K pages. Since systems that have much more memory
are pretty common now (at least among systems with InfiniBand hardware),
this limit ends up getting hit in practice quite a bit.
Handle this by having the driver allocate at least enough MTT entries to
cover 2 * totalram pages.
Signed-off-by: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
---
Albert, if you could try this on one of your 192GB systems and see if
you still are able to register enough memory, that would be great.
(Of course please remove any local hacks you have to work around the
problem in any other way. And actually I'd be curious to know how
much you're bumping up num_mtt and/or log_mtts_per_seg to use 192GB
right now ... I'd like to validate the (2*totalram) heuristic)
Thanks!
drivers/net/ethernet/mellanox/mlx4/mlx4.h | 2 +-
drivers/net/ethernet/mellanox/mlx4/profile.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+), 1 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index c92269f..c846152 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -399,7 +399,7 @@ struct mlx4_profile {
int num_cq;
int num_mcg;
int num_mpt;
- int num_mtt;
+ unsigned num_mtt;
};
struct mlx4_fw {
diff --git a/drivers/net/ethernet/mellanox/mlx4/profile.c b/drivers/net/ethernet/mellanox/mlx4/profile.c
index 1129677..06e5ade 100644
--- a/drivers/net/ethernet/mellanox/mlx4/profile.c
+++ b/drivers/net/ethernet/mellanox/mlx4/profile.c
@@ -83,12 +83,31 @@ u64 mlx4_make_profile(struct mlx4_dev *dev,
u64 total_size = 0;
struct mlx4_resource *profile;
struct mlx4_resource tmp;
+ struct sysinfo si;
int i, j;
profile = kcalloc(MLX4_RES_NUM, sizeof(*profile), GFP_KERNEL);
if (!profile)
return -ENOMEM;
+ /*
+ * We want to scale the number of MTTs with the size of the
+ * system memory, since it makes sense to register a lot of
+ * memory on a system with a lot of memory. As a heuristic,
+ * make sure we have enough MTTs to cover twice the system
+ * memory (with PAGE_SIZE entries).
+ *
+ * This number has to be a power of two and fit into 32 bits
+ * due to device limitations, so cap this at 2^31 as well.
+ * That limits us to 8TB of memory registration per HCA with
+ * 4KB pages, which is probably OK for the next few months.
+ */
+ si_meminfo(&si);
+ request->num_mtt =
+ roundup_pow_of_two(max_t(unsigned, request->num_mtt,
+ min(1UL << 31,
+ si.totalram >> (log_mtts_per_seg - 1))));
+
profile[MLX4_RES_QP].size = dev_cap->qpc_entry_sz;
profile[MLX4_RES_RDMARC].size = dev_cap->rdmarc_entry_sz;
profile[MLX4_RES_ALTC].size = dev_cap->altc_entry_sz;
--
1.7.9
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH/RFC] mxl4_core: Scale size of MTT table with system RAM
@ 2012-03-05 18:09 Roland Dreier
0 siblings, 0 replies; 6+ messages in thread
From: Roland Dreier @ 2012-03-05 18:09 UTC (permalink / raw)
To: Albert Strasheim; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
From: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
The current driver defaults to 1M MTT segments, where each segment holds
8 MTT entries. This limits the total memory registered to 8M * PAGE_SIZE
which is 32GB with 4K pages. Since systems that have much more memory
are pretty common now (at least among systems with InfiniBand hardware),
this limit ends up getting hit in practice quite a bit.
Handle this by having the driver allocate at least enough MTT entries to
cover 2 * totalram pages.
Signed-off-by: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
---
Albert, if you could try this on one of your 192GB systems and see if
you still are able to register enough memory, that would be great.
(Of course please remove any local hacks you have to work around the
problem in any other way. And actually I'd be curious to know how
much you're bumping up num_mtt and/or log_mtts_per_seg to use 192GB
right now ... I'd like to validate the (2*totalram) heuristic)
Thanks!
drivers/net/ethernet/mellanox/mlx4/mlx4.h | 2 +-
drivers/net/ethernet/mellanox/mlx4/profile.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+), 1 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index c92269f..c846152 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -399,7 +399,7 @@ struct mlx4_profile {
int num_cq;
int num_mcg;
int num_mpt;
- int num_mtt;
+ unsigned num_mtt;
};
struct mlx4_fw {
diff --git a/drivers/net/ethernet/mellanox/mlx4/profile.c b/drivers/net/ethernet/mellanox/mlx4/profile.c
index 1129677..06e5ade 100644
--- a/drivers/net/ethernet/mellanox/mlx4/profile.c
+++ b/drivers/net/ethernet/mellanox/mlx4/profile.c
@@ -83,12 +83,31 @@ u64 mlx4_make_profile(struct mlx4_dev *dev,
u64 total_size = 0;
struct mlx4_resource *profile;
struct mlx4_resource tmp;
+ struct sysinfo si;
int i, j;
profile = kcalloc(MLX4_RES_NUM, sizeof(*profile), GFP_KERNEL);
if (!profile)
return -ENOMEM;
+ /*
+ * We want to scale the number of MTTs with the size of the
+ * system memory, since it makes sense to register a lot of
+ * memory on a system with a lot of memory. As a heuristic,
+ * make sure we have enough MTTs to cover twice the system
+ * memory (with PAGE_SIZE entries).
+ *
+ * This number has to be a power of two and fit into 32 bits
+ * due to device limitations, so cap this at 2^31 as well.
+ * That limits us to 8TB of memory registration per HCA with
+ * 4KB pages, which is probably OK for the next few months.
+ */
+ si_meminfo(&si);
+ request->num_mtt =
+ roundup_pow_of_two(max_t(unsigned, request->num_mtt,
+ min(1UL << 31,
+ si.totalram >> (log_mtts_per_seg - 1))));
+
profile[MLX4_RES_QP].size = dev_cap->qpc_entry_sz;
profile[MLX4_RES_RDMARC].size = dev_cap->rdmarc_entry_sz;
profile[MLX4_RES_ALTC].size = dev_cap->altc_entry_sz;
--
1.7.9
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH/RFC] mxl4_core: Scale size of MTT table with system RAM
[not found] ` <1330970972-10225-1-git-send-email-roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
@ 2012-03-12 21:35 ` Or Gerlitz
[not found] ` <CAJZOPZ+qLzZaH=fmKVRE0K559DoeQ1SsrsjSDY0TS-z8+WjnYQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Or Gerlitz @ 2012-03-12 21:35 UTC (permalink / raw)
To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
On Mon, Mar 5, 2012 at 8:09 PM, Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> Handle this by having the driver allocate at least enough MTT entries to
> cover 2 * totalram pages.
just curious, why we want to cover > totalram? also the commit title
has "mxl4" instead of "mlx4"
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH/RFC] mxl4_core: Scale size of MTT table with system RAM
[not found] ` <CAJZOPZ+qLzZaH=fmKVRE0K559DoeQ1SsrsjSDY0TS-z8+WjnYQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-03-12 23:29 ` Roland Dreier
[not found] ` <CAL1RGDVaGpVXukyKpJ2vPZE3br4UWn+td8Ba03fKHuY-53H2vg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Roland Dreier @ 2012-03-12 23:29 UTC (permalink / raw)
To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
On Mon, Mar 12, 2012 at 2:35 PM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> just curious, why we want to cover > totalram? also the commit title
> has "mxl4" instead of "mlx4"
It's just a heuristic, but I figured some app might want to register essentially
all of memory, and then we want some more to cover other users. The amount
of memory used for unused MTT space is pretty small, I think.
I fixed the mxl4->mlx4, thanks.
- R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH/RFC] mxl4_core: Scale size of MTT table with system RAM
[not found] ` <CAL1RGDVaGpVXukyKpJ2vPZE3br4UWn+td8Ba03fKHuY-53H2vg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-03-13 6:27 ` Or Gerlitz
[not found] ` <4F5EE8CD.7020909-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 6+ messages in thread
From: Or Gerlitz @ 2012-03-13 6:27 UTC (permalink / raw)
To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
On 3/13/2012 1:29 AM, Roland Dreier wrote:
> It's just a heuristic, but I figured some app might want to register
> essentially all of memory, and then we want some more to cover other
> users. The amount of memory used for unused MTT space is pretty small,
> I think.
If an app registered essentially all memory, then practically nothing
else left for other users, or you refer to the case where the driver
consumed an MTT segment made of 8 elements on behalf of a certain app
but didn't use all the MTT entries of that segment?
> I fixed the mxl4->mlx4, thanks.
sure,
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH/RFC] mxl4_core: Scale size of MTT table with system RAM
[not found] ` <4F5EE8CD.7020909-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2012-03-13 17:59 ` Roland Dreier
0 siblings, 0 replies; 6+ messages in thread
From: Roland Dreier @ 2012-03-13 17:59 UTC (permalink / raw)
To: Or Gerlitz; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
On Mon, Mar 12, 2012 at 11:27 PM, Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> If an app registered essentially all memory, then practically nothing else
> left for other users, or you refer to the case where the driver consumed an
> MTT segment made of 8 elements on behalf of a certain app but didn't use all
> the MTT entries of that segment?
Yeah, if we register a lot of small buffers we might waste a lot of MTTs. Also
I was thinking that a multi-threaded app might might register
overlapping regions.
Thinking about it more, I wonder if we'll be able to allocate enough contiguous
memory for the buddy bitmaps on really big memory systems. This might
force us to improve the buddy allocator so it doesn't rely on contiguous
bitmaps...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-03-13 17:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-05 18:09 [PATCH/RFC] mxl4_core: Scale size of MTT table with system RAM Roland Dreier
-- strict thread matches above, loose matches on Subject: below --
2012-03-05 18:09 Roland Dreier
[not found] ` <1330970972-10225-1-git-send-email-roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-03-12 21:35 ` Or Gerlitz
[not found] ` <CAJZOPZ+qLzZaH=fmKVRE0K559DoeQ1SsrsjSDY0TS-z8+WjnYQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-03-12 23:29 ` Roland Dreier
[not found] ` <CAL1RGDVaGpVXukyKpJ2vPZE3br4UWn+td8Ba03fKHuY-53H2vg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-03-13 6:27 ` Or Gerlitz
[not found] ` <4F5EE8CD.7020909-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2012-03-13 17:59 ` Roland Dreier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox