* [Patch V3 4/9] openvswitch: Replace cpu_to_node() with cpu_to_mem() to support memoryless node
[not found] <1439781546-7217-1-git-send-email-jiang.liu@linux.intel.com>
@ 2015-08-17 3:19 ` Jiang Liu
2015-08-18 0:14 ` Pravin Shelar
2015-08-17 3:19 ` [Patch V3 5/9] i40e: Use numa_mem_id() to better " Jiang Liu
2015-08-17 3:19 ` [Patch V3 6/9] i40evf: " Jiang Liu
2 siblings, 1 reply; 12+ messages in thread
From: Jiang Liu @ 2015-08-17 3:19 UTC (permalink / raw)
To: Andrew Morton, Mel Gorman, David Rientjes, Mike Galbraith,
Peter Zijlstra, Rafael J . Wysocki, Tang Chen, Tejun Heo,
Pravin Shelar, David S. Miller
Cc: Jiang Liu, Tony Luck, linux-mm, linux-hotplug, linux-kernel, x86,
netdev, dev
Function ovs_flow_stats_update() allocates memory with __GFP_THISNODE
flag set, which may cause permanent memory allocation failure on
memoryless node. So replace cpu_to_node() with cpu_to_mem() to better
support memoryless node. For node with memory, cpu_to_mem() is the same
as cpu_to_node().
This change only affects performance and shouldn't affect functionality.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
net/openvswitch/flow.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index bc7b0aba994a..e50a5681d0c2 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -69,7 +69,7 @@ void ovs_flow_stats_update(struct sw_flow *flow, __be16 tcp_flags,
const struct sk_buff *skb)
{
struct flow_stats *stats;
- int node = numa_node_id();
+ int node = numa_mem_id();
int len = skb->len + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0);
stats = rcu_dereference(flow->stats[node]);
--
1.7.10.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Patch V3 5/9] i40e: Use numa_mem_id() to better support memoryless node
[not found] <1439781546-7217-1-git-send-email-jiang.liu@linux.intel.com>
2015-08-17 3:19 ` [Patch V3 4/9] openvswitch: Replace cpu_to_node() with cpu_to_mem() to support memoryless node Jiang Liu
@ 2015-08-17 3:19 ` Jiang Liu
2015-08-18 0:35 ` David Rientjes
2015-08-19 22:38 ` [Intel-wired-lan] " Patil, Kiran
2015-08-17 3:19 ` [Patch V3 6/9] i40evf: " Jiang Liu
2 siblings, 2 replies; 12+ messages in thread
From: Jiang Liu @ 2015-08-17 3:19 UTC (permalink / raw)
To: Andrew Morton, Mel Gorman, David Rientjes, Mike Galbraith,
Peter Zijlstra, Rafael J . Wysocki, Tang Chen, Tejun Heo,
Jeff Kirsher, Jesse Brandeburg, Shannon Nelson, Carolyn Wyborny,
Don Skidmore, Matthew Vick, John Ronciak, Mitch Williams
Cc: Jiang Liu, Tony Luck, linux-mm, linux-hotplug, linux-kernel, x86,
intel-wired-lan, netdev
Function i40e_clean_rx_irq() tries to reuse memory pages allocated
from the nearest node. To better support memoryless node, use
numa_mem_id() instead of numa_node_id() to get the nearest node with
memory.
This change should only affect performance.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 9a4f2bc70cd2..a8f618cb8eb0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1516,7 +1516,7 @@ static int i40e_clean_rx_irq_ps(struct i40e_ring *rx_ring, int budget)
unsigned int total_rx_bytes = 0, total_rx_packets = 0;
u16 rx_packet_len, rx_header_len, rx_sph, rx_hbo;
u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
- const int current_node = numa_node_id();
+ const int current_node = numa_mem_id();
struct i40e_vsi *vsi = rx_ring->vsi;
u16 i = rx_ring->next_to_clean;
union i40e_rx_desc *rx_desc;
--
1.7.10.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Patch V3 6/9] i40evf: Use numa_mem_id() to better support memoryless node
[not found] <1439781546-7217-1-git-send-email-jiang.liu@linux.intel.com>
2015-08-17 3:19 ` [Patch V3 4/9] openvswitch: Replace cpu_to_node() with cpu_to_mem() to support memoryless node Jiang Liu
2015-08-17 3:19 ` [Patch V3 5/9] i40e: Use numa_mem_id() to better " Jiang Liu
@ 2015-08-17 3:19 ` Jiang Liu
2015-08-17 19:03 ` [Intel-wired-lan] " Patil, Kiran
2 siblings, 1 reply; 12+ messages in thread
From: Jiang Liu @ 2015-08-17 3:19 UTC (permalink / raw)
To: Andrew Morton, Mel Gorman, David Rientjes, Mike Galbraith,
Peter Zijlstra, Rafael J . Wysocki, Tang Chen, Tejun Heo,
Jeff Kirsher, Jesse Brandeburg, Shannon Nelson, Carolyn Wyborny,
Don Skidmore, Matthew Vick, John Ronciak, Mitch Williams
Cc: Jiang Liu, Tony Luck, linux-mm, linux-hotplug, linux-kernel, x86,
intel-wired-lan, netdev
Function i40e_clean_rx_irq() tries to reuse memory pages allocated
from the nearest node. To better support memoryless node, use
numa_mem_id() instead of numa_node_id() to get the nearest node with
memory.
This change should only affect performance.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 395f32f226c0..19ca96d8bd97 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1003,7 +1003,7 @@ static int i40e_clean_rx_irq_ps(struct i40e_ring *rx_ring, int budget)
unsigned int total_rx_bytes = 0, total_rx_packets = 0;
u16 rx_packet_len, rx_header_len, rx_sph, rx_hbo;
u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
- const int current_node = numa_node_id();
+ const int current_node = numa_mem_id();
struct i40e_vsi *vsi = rx_ring->vsi;
u16 i = rx_ring->next_to_clean;
union i40e_rx_desc *rx_desc;
--
1.7.10.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* RE: [Intel-wired-lan] [Patch V3 6/9] i40evf: Use numa_mem_id() to better support memoryless node
2015-08-17 3:19 ` [Patch V3 6/9] i40evf: " Jiang Liu
@ 2015-08-17 19:03 ` Patil, Kiran
0 siblings, 0 replies; 12+ messages in thread
From: Patil, Kiran @ 2015-08-17 19:03 UTC (permalink / raw)
To: Jiang Liu, Andrew Morton, Mel Gorman, David Rientjes,
Mike Galbraith, Peter Zijlstra, Wysocki, Rafael J, Tang Chen,
Tejun Heo, Kirsher, Jeffrey T, Brandeburg, Jesse, Nelson, Shannon,
Wyborny, Carolyn, Skidmore, Donald C, Vick, Matthew,
Ronciak, John, Williams, Mitch A
Cc: Luck, Tony, netdev@vger.kernel.org, x86@kernel.org,
linux-hotplug@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, intel-wired-lan@lists.osuosl.org
ACK.
Thanks,
-- Kiran P.
-----Original Message-----
From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.org] On Behalf Of Jiang Liu
Sent: Sunday, August 16, 2015 8:19 PM
To: Andrew Morton; Mel Gorman; David Rientjes; Mike Galbraith; Peter Zijlstra; Wysocki, Rafael J; Tang Chen; Tejun Heo; Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson, Shannon; Wyborny, Carolyn; Skidmore, Donald C; Vick, Matthew; Ronciak, John; Williams, Mitch A
Cc: Luck, Tony; netdev@vger.kernel.org; x86@kernel.org; linux-hotplug@vger.kernel.org; linux-kernel@vger.kernel.org; linux-mm@kvack.org; intel-wired-lan@lists.osuosl.org; Jiang Liu
Subject: [Intel-wired-lan] [Patch V3 6/9] i40evf: Use numa_mem_id() to better support memoryless node
Function i40e_clean_rx_irq() tries to reuse memory pages allocated from the nearest node. To better support memoryless node, use
numa_mem_id() instead of numa_node_id() to get the nearest node with memory.
This change should only affect performance.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 395f32f226c0..19ca96d8bd97 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1003,7 +1003,7 @@ static int i40e_clean_rx_irq_ps(struct i40e_ring *rx_ring, int budget)
unsigned int total_rx_bytes = 0, total_rx_packets = 0;
u16 rx_packet_len, rx_header_len, rx_sph, rx_hbo;
u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
- const int current_node = numa_node_id();
+ const int current_node = numa_mem_id();
struct i40e_vsi *vsi = rx_ring->vsi;
u16 i = rx_ring->next_to_clean;
union i40e_rx_desc *rx_desc;
--
1.7.10.4
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@lists.osuosl.org
http://lists.osuosl.org/mailman/listinfo/intel-wired-lan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Patch V3 4/9] openvswitch: Replace cpu_to_node() with cpu_to_mem() to support memoryless node
2015-08-17 3:19 ` [Patch V3 4/9] openvswitch: Replace cpu_to_node() with cpu_to_mem() to support memoryless node Jiang Liu
@ 2015-08-18 0:14 ` Pravin Shelar
0 siblings, 0 replies; 12+ messages in thread
From: Pravin Shelar @ 2015-08-18 0:14 UTC (permalink / raw)
To: Jiang Liu
Cc: Andrew Morton, Mel Gorman, David Rientjes, Mike Galbraith,
Peter Zijlstra, Rafael J . Wysocki, Tang Chen, Tejun Heo,
David S. Miller, Tony Luck, linux-mm, linux-hotplug, LKML, x86,
netdev, dev@openvswitch.org
On Sun, Aug 16, 2015 at 8:19 PM, Jiang Liu <jiang.liu@linux.intel.com> wrote:
> Function ovs_flow_stats_update() allocates memory with __GFP_THISNODE
> flag set, which may cause permanent memory allocation failure on
> memoryless node. So replace cpu_to_node() with cpu_to_mem() to better
> support memoryless node. For node with memory, cpu_to_mem() is the same
> as cpu_to_node().
>
> This change only affects performance and shouldn't affect functionality.
>
> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Patch V3 5/9] i40e: Use numa_mem_id() to better support memoryless node
2015-08-17 3:19 ` [Patch V3 5/9] i40e: Use numa_mem_id() to better " Jiang Liu
@ 2015-08-18 0:35 ` David Rientjes
2015-08-19 22:38 ` [Intel-wired-lan] " Patil, Kiran
1 sibling, 0 replies; 12+ messages in thread
From: David Rientjes @ 2015-08-18 0:35 UTC (permalink / raw)
To: Jiang Liu
Cc: Andrew Morton, Mel Gorman, Mike Galbraith, Peter Zijlstra,
Rafael J . Wysocki, Tang Chen, Tejun Heo, Jeff Kirsher,
Jesse Brandeburg, Shannon Nelson, Carolyn Wyborny, Don Skidmore,
Matthew Vick, John Ronciak, Mitch Williams, Tony Luck, linux-mm,
linux-hotplug, linux-kernel, x86, intel-wired-lan, netdev
On Mon, 17 Aug 2015, Jiang Liu wrote:
> Function i40e_clean_rx_irq() tries to reuse memory pages allocated
s/i40e_clean_rx_irq/i40e_clean_rx_irq_ps/
> from the nearest node. To better support memoryless node, use
> numa_mem_id() instead of numa_node_id() to get the nearest node with
> memory.
>
Out of curiosity, what prevents the cpu to be preempted and current_node
to no longer match numa_mem_id()?
> This change should only affect performance.
>
> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
> ---
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index 9a4f2bc70cd2..a8f618cb8eb0 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1516,7 +1516,7 @@ static int i40e_clean_rx_irq_ps(struct i40e_ring *rx_ring, int budget)
> unsigned int total_rx_bytes = 0, total_rx_packets = 0;
> u16 rx_packet_len, rx_header_len, rx_sph, rx_hbo;
> u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
> - const int current_node = numa_node_id();
> + const int current_node = numa_mem_id();
> struct i40e_vsi *vsi = rx_ring->vsi;
> u16 i = rx_ring->next_to_clean;
> union i40e_rx_desc *rx_desc;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [Intel-wired-lan] [Patch V3 5/9] i40e: Use numa_mem_id() to better support memoryless node
2015-08-17 3:19 ` [Patch V3 5/9] i40e: Use numa_mem_id() to better " Jiang Liu
2015-08-18 0:35 ` David Rientjes
@ 2015-08-19 22:38 ` Patil, Kiran
2015-08-20 0:18 ` David Rientjes
1 sibling, 1 reply; 12+ messages in thread
From: Patil, Kiran @ 2015-08-19 22:38 UTC (permalink / raw)
To: Jiang Liu, Andrew Morton, Mel Gorman, David Rientjes,
Mike Galbraith, Peter Zijlstra, Wysocki, Rafael J, Tang Chen,
Tejun Heo, Kirsher, Jeffrey T, Brandeburg, Jesse, Nelson, Shannon,
Wyborny, Carolyn, Skidmore, Donald C, Vick, Matthew,
Ronciak, John, Williams, Mitch A
Cc: Luck, Tony, netdev@vger.kernel.org, x86@kernel.org,
linux-hotplug@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, intel-wired-lan@lists.osuosl.org
Acked-by: Kiran Patil <kiran.patil@intel.com>
-----Original Message-----
From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.org] On Behalf Of Jiang Liu
Sent: Sunday, August 16, 2015 8:19 PM
To: Andrew Morton; Mel Gorman; David Rientjes; Mike Galbraith; Peter Zijlstra; Wysocki, Rafael J; Tang Chen; Tejun Heo; Kirsher, Jeffrey T; Brandeburg, Jesse; Nelson, Shannon; Wyborny, Carolyn; Skidmore, Donald C; Vick, Matthew; Ronciak, John; Williams, Mitch A
Cc: Luck, Tony; netdev@vger.kernel.org; x86@kernel.org; linux-hotplug@vger.kernel.org; linux-kernel@vger.kernel.org; linux-mm@kvack.org; intel-wired-lan@lists.osuosl.org; Jiang Liu
Subject: [Intel-wired-lan] [Patch V3 5/9] i40e: Use numa_mem_id() to better support memoryless node
Function i40e_clean_rx_irq() tries to reuse memory pages allocated from the nearest node. To better support memoryless node, use
numa_mem_id() instead of numa_node_id() to get the nearest node with memory.
This change should only affect performance.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 9a4f2bc70cd2..a8f618cb8eb0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1516,7 +1516,7 @@ static int i40e_clean_rx_irq_ps(struct i40e_ring *rx_ring, int budget)
unsigned int total_rx_bytes = 0, total_rx_packets = 0;
u16 rx_packet_len, rx_header_len, rx_sph, rx_hbo;
u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
- const int current_node = numa_node_id();
+ const int current_node = numa_mem_id();
struct i40e_vsi *vsi = rx_ring->vsi;
u16 i = rx_ring->next_to_clean;
union i40e_rx_desc *rx_desc;
--
1.7.10.4
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@lists.osuosl.org
http://lists.osuosl.org/mailman/listinfo/intel-wired-lan
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* RE: [Intel-wired-lan] [Patch V3 5/9] i40e: Use numa_mem_id() to better support memoryless node
2015-08-19 22:38 ` [Intel-wired-lan] " Patil, Kiran
@ 2015-08-20 0:18 ` David Rientjes
2015-10-08 20:20 ` Andrew Morton
0 siblings, 1 reply; 12+ messages in thread
From: David Rientjes @ 2015-08-20 0:18 UTC (permalink / raw)
To: Patil, Kiran
Cc: Jiang Liu, Andrew Morton, Mel Gorman, Mike Galbraith,
Peter Zijlstra, Wysocki, Rafael J, Tang Chen, Tejun Heo,
Kirsher, Jeffrey T, Brandeburg, Jesse, Nelson, Shannon,
Wyborny, Carolyn, Skidmore, Donald C, Vick, Matthew,
Ronciak, John, Williams, Mitch A, Luck, Tony,
netdev@vger.kernel.org, x86@kernel.org
On Wed, 19 Aug 2015, Patil, Kiran wrote:
> Acked-by: Kiran Patil <kiran.patil@intel.com>
Where's the call to preempt_disable() to prevent kernels with preemption
from making numa_node_id() invalid during this iteration?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [Patch V3 5/9] i40e: Use numa_mem_id() to better support memoryless node
2015-08-20 0:18 ` David Rientjes
@ 2015-10-08 20:20 ` Andrew Morton
2015-10-09 5:52 ` Jiang Liu
0 siblings, 1 reply; 12+ messages in thread
From: Andrew Morton @ 2015-10-08 20:20 UTC (permalink / raw)
To: David Rientjes
Cc: Patil, Kiran, Jiang Liu, Mel Gorman, Mike Galbraith,
Peter Zijlstra, Wysocki, Rafael J, Tang Chen, Tejun Heo,
Kirsher, Jeffrey T, Brandeburg, Jesse, Nelson, Shannon,
Wyborny, Carolyn, Skidmore, Donald C, Vick, Matthew,
Ronciak, John, Williams, Mitch A, Luck, Tony,
netdev@vger.kernel.org, x86@kernel.org,
linux-hotplug@vger.kernel.org, "li
On Wed, 19 Aug 2015 17:18:15 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
> On Wed, 19 Aug 2015, Patil, Kiran wrote:
>
> > Acked-by: Kiran Patil <kiran.patil@intel.com>
>
> Where's the call to preempt_disable() to prevent kernels with preemption
> from making numa_node_id() invalid during this iteration?
David asked this question twice, received no answer and now the patch
is in the maintainer tree, destined for mainline.
If I was asked this question I would respond
The use of numa_mem_id() is racy and best-effort. If the unlikely
race occurs, the memory allocation will occur on the wrong node, the
overall result being very slightly suboptimal performance. The
existing use of numa_node_id() suffers from the same issue.
But I'm not the person proposing the patch. Please don't just ignore
reviewer comments!
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [Patch V3 5/9] i40e: Use numa_mem_id() to better support memoryless node
2015-10-08 20:20 ` Andrew Morton
@ 2015-10-09 5:52 ` Jiang Liu
2015-10-09 9:08 ` Kamezawa Hiroyuki
0 siblings, 1 reply; 12+ messages in thread
From: Jiang Liu @ 2015-10-09 5:52 UTC (permalink / raw)
To: Andrew Morton, David Rientjes
Cc: Patil, Kiran, Mel Gorman, Mike Galbraith, Peter Zijlstra,
Wysocki, Rafael J, Tang Chen, Tejun Heo, Kirsher, Jeffrey T,
Brandeburg, Jesse, Nelson, Shannon, Wyborny, Carolyn,
Skidmore, Donald C, Vick, Matthew, Ronciak, John,
Williams, Mitch A, Luck, Tony, netdev@vger.kernel.org,
x86@kernel.org, linux-hotplug@vger.kernel.org,
"linux-kernel@vger.kernel.org" <linux-kernel
On 2015/10/9 4:20, Andrew Morton wrote:
> On Wed, 19 Aug 2015 17:18:15 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
>
>> On Wed, 19 Aug 2015, Patil, Kiran wrote:
>>
>>> Acked-by: Kiran Patil <kiran.patil@intel.com>
>>
>> Where's the call to preempt_disable() to prevent kernels with preemption
>> from making numa_node_id() invalid during this iteration?
>
> David asked this question twice, received no answer and now the patch
> is in the maintainer tree, destined for mainline.
>
> If I was asked this question I would respond
>
> The use of numa_mem_id() is racy and best-effort. If the unlikely
> race occurs, the memory allocation will occur on the wrong node, the
> overall result being very slightly suboptimal performance. The
> existing use of numa_node_id() suffers from the same issue.
>
> But I'm not the person proposing the patch. Please don't just ignore
> reviewer comments!
Hi Andrew,
Apologize for the slow response due to personal reasons!
And thanks for answering the question from David. To be honest,
I didn't know how to answer this question before. Actually this
question has puzzled me for a long time when dealing with memory
hot-removal. For normal cases, it only causes sub-optimal memory
allocation if schedule event happens between querying NUMA node id
and calling alloc_pages_node(). But what happens if system run into
following execution sequence?
1) node = numa_mem_id();
2) memory hot-removal event triggers
2.1) remove affected memory
2.2) reset pgdat to zero if node becomes empty after memory removal
3) alloc_pages_node(), which may access zero-ed pgdat structure.
I haven't found a mechanism to protect system from above sequence yet,
so puzzled for a long time already:(. Does stop_machine() protect
system from such a execution sequence?
Thanks!
Gerry
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [Patch V3 5/9] i40e: Use numa_mem_id() to better support memoryless node
2015-10-09 5:52 ` Jiang Liu
@ 2015-10-09 9:08 ` Kamezawa Hiroyuki
2015-10-09 9:25 ` Jiang Liu
0 siblings, 1 reply; 12+ messages in thread
From: Kamezawa Hiroyuki @ 2015-10-09 9:08 UTC (permalink / raw)
To: Jiang Liu, Andrew Morton, David Rientjes
Cc: Patil, Kiran, Mel Gorman, Mike Galbraith, Peter Zijlstra,
Wysocki, Rafael J, Tang Chen, Tejun Heo, Kirsher, Jeffrey T,
Brandeburg, Jesse, Nelson, Shannon, Wyborny, Carolyn,
Skidmore, Donald C, Vick, Matthew, Ronciak, John,
Williams, Mitch A, Luck, Tony, netdev@vger.kernel.org,
x86@kernel.org, linux-hotplug@vger.kernel.org,
"linux-kernel@vger.kernel.org" <linux-kernel
On 2015/10/09 14:52, Jiang Liu wrote:
> On 2015/10/9 4:20, Andrew Morton wrote:
>> On Wed, 19 Aug 2015 17:18:15 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
>>
>>> On Wed, 19 Aug 2015, Patil, Kiran wrote:
>>>
>>>> Acked-by: Kiran Patil <kiran.patil@intel.com>
>>>
>>> Where's the call to preempt_disable() to prevent kernels with preemption
>>> from making numa_node_id() invalid during this iteration?
>>
>> David asked this question twice, received no answer and now the patch
>> is in the maintainer tree, destined for mainline.
>>
>> If I was asked this question I would respond
>>
>> The use of numa_mem_id() is racy and best-effort. If the unlikely
>> race occurs, the memory allocation will occur on the wrong node, the
>> overall result being very slightly suboptimal performance. The
>> existing use of numa_node_id() suffers from the same issue.
>>
>> But I'm not the person proposing the patch. Please don't just ignore
>> reviewer comments!
> Hi Andrew,
> Apologize for the slow response due to personal reasons!
> And thanks for answering the question from David. To be honest,
> I didn't know how to answer this question before. Actually this
> question has puzzled me for a long time when dealing with memory
> hot-removal. For normal cases, it only causes sub-optimal memory
> allocation if schedule event happens between querying NUMA node id
> and calling alloc_pages_node(). But what happens if system run into
> following execution sequence?
> 1) node = numa_mem_id();
> 2) memory hot-removal event triggers
> 2.1) remove affected memory
> 2.2) reset pgdat to zero if node becomes empty after memory removal
I'm sorry if I misunderstand something.
After commit b0dc3a342af36f95a68fe229b8f0f73552c5ca08, there is no memset().
> 3) alloc_pages_node(), which may access zero-ed pgdat structure.
?
>
> I haven't found a mechanism to protect system from above sequence yet,
> so puzzled for a long time already:(. Does stop_machine() protect
> system from such a execution sequence?
To access pgdat, a pgdat's zone should be on per-pgdat-zonelist.
Now, __build_all_zonelists() is called under stop_machine(). That's the reason
why you're asking what stop_machine() does. And, as you know, stop_machine() is not
protecting anything. The caller may fallback into removed zone.
Then, let's think.
At first, please note "pgdat" is not removed (and cannot be removed),
accessing pgdat's memory will not cause segmentation fault.
Just contents are problem. At removal, zone's page related information
and pgdat's page related information is cleared.
alloc_pages uses zonelist/zoneref/cache to walk each zones without accessing
pgdat itself. I think accessing zonelist is safe because it's an array updated
by stop_machine().
So, the problem is alloc_pages() can work correctly even if zone contains no page.
I think it should work.
(Note: zones are included in pgdat. So, zeroing pgdat means zeroing zone and other
structures. it will not work.)
So, what problem you see now ?
I'm sorry I can't chase old discusions.
Thanks,
-Kame
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Intel-wired-lan] [Patch V3 5/9] i40e: Use numa_mem_id() to better support memoryless node
2015-10-09 9:08 ` Kamezawa Hiroyuki
@ 2015-10-09 9:25 ` Jiang Liu
0 siblings, 0 replies; 12+ messages in thread
From: Jiang Liu @ 2015-10-09 9:25 UTC (permalink / raw)
To: Kamezawa Hiroyuki, Andrew Morton, David Rientjes
Cc: Patil, Kiran, Mel Gorman, Mike Galbraith, Peter Zijlstra,
Wysocki, Rafael J, Tang Chen, Tejun Heo, Kirsher, Jeffrey T,
Brandeburg, Jesse, Nelson, Shannon, Wyborny, Carolyn,
Skidmore, Donald C, Vick, Matthew, Ronciak, John,
Williams, Mitch A, Luck, Tony, netdev@vger.kernel.org,
x86@kernel.org, linux-hotplug@vger.kernel.org,
"linux-kernel@vger.kernel.org" <linux-kernel
On 2015/10/9 17:08, Kamezawa Hiroyuki wrote:
> On 2015/10/09 14:52, Jiang Liu wrote:
>> On 2015/10/9 4:20, Andrew Morton wrote:
>>> On Wed, 19 Aug 2015 17:18:15 -0700 (PDT) David Rientjes
>>> <rientjes@google.com> wrote:
>>>
>>>> On Wed, 19 Aug 2015, Patil, Kiran wrote:
>>>>
>>>>> Acked-by: Kiran Patil <kiran.patil@intel.com>
>>>>
>>>> Where's the call to preempt_disable() to prevent kernels with
>>>> preemption
>>>> from making numa_node_id() invalid during this iteration?
>>>
>>> David asked this question twice, received no answer and now the patch
>>> is in the maintainer tree, destined for mainline.
>>>
>>> If I was asked this question I would respond
>>>
>>> The use of numa_mem_id() is racy and best-effort. If the unlikely
>>> race occurs, the memory allocation will occur on the wrong node, the
>>> overall result being very slightly suboptimal performance. The
>>> existing use of numa_node_id() suffers from the same issue.
>>>
>>> But I'm not the person proposing the patch. Please don't just ignore
>>> reviewer comments!
>> Hi Andrew,
>> Apologize for the slow response due to personal reasons!
>> And thanks for answering the question from David. To be honest,
>> I didn't know how to answer this question before. Actually this
>> question has puzzled me for a long time when dealing with memory
>> hot-removal. For normal cases, it only causes sub-optimal memory
>> allocation if schedule event happens between querying NUMA node id
>> and calling alloc_pages_node(). But what happens if system run into
>> following execution sequence?
>> 1) node = numa_mem_id();
>> 2) memory hot-removal event triggers
>> 2.1) remove affected memory
>> 2.2) reset pgdat to zero if node becomes empty after memory removal
>
> I'm sorry if I misunderstand something.
> After commit b0dc3a342af36f95a68fe229b8f0f73552c5ca08, there is no
> memset().
Hi Kamezawa,
Thanks for the information. The commit solved the issue what
I was puzzling about. With this change applied, thing should work
as expected. Seems it would be better to enhance __build_all_zonelists()
to handle those offlined empty nodes too, but that really doesn't
make to much difference:)
Thanks for the info again!
Thanks!
Gerry
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2015-10-09 9:25 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1439781546-7217-1-git-send-email-jiang.liu@linux.intel.com>
2015-08-17 3:19 ` [Patch V3 4/9] openvswitch: Replace cpu_to_node() with cpu_to_mem() to support memoryless node Jiang Liu
2015-08-18 0:14 ` Pravin Shelar
2015-08-17 3:19 ` [Patch V3 5/9] i40e: Use numa_mem_id() to better " Jiang Liu
2015-08-18 0:35 ` David Rientjes
2015-08-19 22:38 ` [Intel-wired-lan] " Patil, Kiran
2015-08-20 0:18 ` David Rientjes
2015-10-08 20:20 ` Andrew Morton
2015-10-09 5:52 ` Jiang Liu
2015-10-09 9:08 ` Kamezawa Hiroyuki
2015-10-09 9:25 ` Jiang Liu
2015-08-17 3:19 ` [Patch V3 6/9] i40evf: " Jiang Liu
2015-08-17 19:03 ` [Intel-wired-lan] " Patil, Kiran
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).