[PATCH net-next] sfc: allocate rx pages on the same node as the interrupt

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next] sfc: allocate rx pages on the same node as the interrupt
@ 2016-05-10 14:21 Edward Cree
  2016-05-10 14:55 ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Edward Cree @ 2016-05-10 14:21 UTC (permalink / raw)
  To: linux-net-drivers, David Miller, netdev; +Cc: Daniel Pieczko

From: Daniel Pieczko <dpieczko@solarflare.com>

When the interrupt servicing a channel is on a NUMA node that is
not local to the device, performance is improved by allocating
rx pages on the node local to the interrupt (remote to the device)

The performance-optimal case, where interrupts and applications
are pinned to CPUs on the same node as the device, is not altered
by this change.

This change gave a 1% improvement in transaction rate using Nginx
with all interrupts and Nginx threads on the node remote to the
device. It also gave a small reduction in round-trip latency,
again with the interrupt and application on a different node to
the device.

Allocating rx pages based on the channel->irq_node value is only
valid for the initial driver-load interrupt affinities; if an
interrupt is moved later, the wrong node may be used for the
allocation.

Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/ethernet/sfc/efx.c        | 38 +++++++++++++++++++++++++++++++++++
 drivers/net/ethernet/sfc/net_driver.h |  3 +++
 drivers/net/ethernet/sfc/rx.c         | 18 ++++++++++++++---
 3 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 0705ec86..cb1552c 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -445,6 +445,7 @@ efx_alloc_channel(struct efx_nic *efx, int i, struct efx_channel *old_channel)
 	channel->efx = efx;
 	channel->channel = i;
 	channel->type = &efx_default_channel_type;
+	channel->irq_mem_node = NUMA_NO_NODE;
 
 	for (j = 0; j < EFX_TXQ_TYPES; j++) {
 		tx_queue = &channel->tx_queue[j];
@@ -1486,6 +1487,38 @@ static int efx_probe_interrupts(struct efx_nic *efx)
 	return 0;
 }
 
+#ifndef CONFIG_SMP
+static void efx_set_interrupt_affinity(struct efx_nic *efx __always_unused)
+{
+}
+
+static void efx_clear_interrupt_affinity(struct efx_nic *efx __always_unused)
+{
+}
+#else
+static void efx_set_interrupt_affinity(struct efx_nic *efx)
+{
+	struct efx_channel *channel;
+	unsigned int cpu;
+
+	efx_for_each_channel(channel, efx) {
+		cpu = cpumask_local_spread(channel->channel,
+					   pcibus_to_node(efx->pci_dev->bus));
+
+		irq_set_affinity_hint(channel->irq, cpumask_of(cpu));
+		channel->irq_mem_node = cpu_to_mem(cpu);
+	}
+}
+
+static void efx_clear_interrupt_affinity(struct efx_nic *efx)
+{
+	struct efx_channel *channel;
+
+	efx_for_each_channel(channel, efx)
+		irq_set_affinity_hint(channel->irq, NULL);
+}
+#endif /* CONFIG_SMP */
+
 static int efx_soft_enable_interrupts(struct efx_nic *efx)
 {
 	struct efx_channel *channel, *end_channel;
@@ -2934,6 +2967,7 @@ static void efx_pci_remove_main(struct efx_nic *efx)
 	cancel_work_sync(&efx->reset_work);
 
 	efx_disable_interrupts(efx);
+	efx_clear_interrupt_affinity(efx);
 	efx_nic_fini_interrupt(efx);
 	efx_fini_port(efx);
 	efx->type->fini(efx);
@@ -3083,6 +3117,9 @@ static int efx_pci_probe_main(struct efx_nic *efx)
 	rc = efx_nic_init_interrupt(efx);
 	if (rc)
 		goto fail5;
+
+	efx_set_interrupt_affinity(efx);
+
 	rc = efx_enable_interrupts(efx);
 	if (rc)
 		goto fail6;
@@ -3090,6 +3127,7 @@ static int efx_pci_probe_main(struct efx_nic *efx)
 	return 0;
 
  fail6:
+	efx_clear_interrupt_affinity(efx);
 	efx_nic_fini_interrupt(efx);
  fail5:
 	efx_fini_port(efx);
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 38c4223..28c3673 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -423,6 +423,7 @@ enum efx_sync_events_state {
  * @sync_events_state: Current state of sync events on this channel
  * @sync_timestamp_major: Major part of the last ptp sync event
  * @sync_timestamp_minor: Minor part of the last ptp sync event
+ * @irq_mem_node: Memory NUMA node of interrupt
  */
 struct efx_channel {
 	struct efx_nic *efx;
@@ -468,6 +469,8 @@ struct efx_channel {
 	enum efx_sync_events_state sync_events_state;
 	u32 sync_timestamp_major;
 	u32 sync_timestamp_minor;
+
+	int irq_mem_node;
 };
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 8956995..eed0c3b 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -163,9 +163,21 @@ static int efx_init_rx_buffers(struct efx_rx_queue *rx_queue, bool atomic)
 	do {
 		page = efx_reuse_page(rx_queue);
 		if (page == NULL) {
-			page = alloc_pages(__GFP_COLD | __GFP_COMP |
-					   (atomic ? GFP_ATOMIC : GFP_KERNEL),
-					   efx->rx_buffer_order);
+			/* GFP_ATOMIC may fail because of various reasons,
+			 * and we re-schedule rx_fill from non-atomic
+			 * context in such a case.  So, use __GFP_NO_WARN
+			 * in case of atomic.
+			 */
+			struct efx_channel *channel;
+
+			channel = efx_rx_queue_channel(rx_queue);
+			page = alloc_pages_node(channel->irq_mem_node,
+						__GFP_COMP |
+						(atomic ?
+						 (GFP_ATOMIC | __GFP_NOWARN)
+						 : GFP_KERNEL),
+						efx->rx_buffer_order);
+
 			if (unlikely(page == NULL))
 				return -ENOMEM;
 			dma_addr =

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] sfc: allocate rx pages on the same node as the interrupt
  2016-05-10 14:21 [PATCH net-next] sfc: allocate rx pages on the same node as the interrupt Edward Cree
@ 2016-05-10 14:55 ` Eric Dumazet
  2016-05-10 14:57   ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2016-05-10 14:55 UTC (permalink / raw)
  To: Edward Cree; +Cc: linux-net-drivers, David Miller, netdev, Daniel Pieczko

On Tue, 2016-05-10 at 15:21 +0100, Edward Cree wrote:
> From: Daniel Pieczko <dpieczko@solarflare.com>
> 
> When the interrupt servicing a channel is on a NUMA node that is
> not local to the device, performance is improved by allocating
> rx pages on the node local to the interrupt (remote to the device)
> 
> The performance-optimal case, where interrupts and applications
> are pinned to CPUs on the same node as the device, is not altered
> by this change.
> 
> This change gave a 1% improvement in transaction rate using Nginx
> with all interrupts and Nginx threads on the node remote to the
> device. It also gave a small reduction in round-trip latency,
> again with the interrupt and application on a different node to
> the device.

Yes, I advocated for such changes in the past for mlx4 NIC.

But your patch makes no sense to me.

alloc_pages() by default would run on the cpu servicing the IRQ, so
would automatically provide pages on the local node.

If you care only of the initial pages allocated with GFP_KERNEL at
device start, really that is a small detail as they should be consumed
and replaced quite fast.

If you worry that "wrong" pages would be reused over and over,
you could make sure that efx_reuse_page() wont reuse a page on the wrong
node.

page_to_nid(page) != numa_mem_id() 

Note that this could happen even if IRQ are not changed since
alloc_pages() could in stress situations give you a page from a remote
node.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] sfc: allocate rx pages on the same node as the interrupt
  2016-05-10 14:55 ` Eric Dumazet
@ 2016-05-10 14:57   ` Eric Dumazet
  2016-05-10 14:59     ` Eric Dumazet
  2016-05-12 10:35     ` Daniel Pieczko (dpieczko)
  0 siblings, 2 replies; 5+ messages in thread
From: Eric Dumazet @ 2016-05-10 14:57 UTC (permalink / raw)
  To: Edward Cree; +Cc: linux-net-drivers, David Miller, netdev, Daniel Pieczko

On Tue, 2016-05-10 at 07:55 -0700, Eric Dumazet wrote:
> On Tue, 2016-05-10 at 15:21 +0100, Edward Cree wrote:
> > From: Daniel Pieczko <dpieczko@solarflare.com>
> > 
> > When the interrupt servicing a channel is on a NUMA node that is
> > not local to the device, performance is improved by allocating
> > rx pages on the node local to the interrupt (remote to the device)
> > 
> > The performance-optimal case, where interrupts and applications
> > are pinned to CPUs on the same node as the device, is not altered
> > by this change.
> > 
> > This change gave a 1% improvement in transaction rate using Nginx
> > with all interrupts and Nginx threads on the node remote to the
> > device. It also gave a small reduction in round-trip latency,
> > again with the interrupt and application on a different node to
> > the device.
> 
> Yes, I advocated for such changes in the past for mlx4 NIC.
> 
> But your patch makes no sense to me.
> 
> alloc_pages() by default would run on the cpu servicing the IRQ, so
> would automatically provide pages on the local node.
> 
> If you care only of the initial pages allocated with GFP_KERNEL at
> device start, really that is a small detail as they should be consumed
> and replaced quite fast.
> 
> If you worry that "wrong" pages would be reused over and over,
> you could make sure that efx_reuse_page() wont reuse a page on the wrong
> node.
> 
> page_to_nid(page) != numa_mem_id() 
> 
> Note that this could happen even if IRQ are not changed since
> alloc_pages() could in stress situations give you a page from a remote
> node.

Something like this (untested) ?

diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 8956995b2fe713b377f205a55fada36c8a04253a..90665a8992e29cceb29dd465b52f934dff3b3d4e 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -124,7 +124,7 @@ static struct page *efx_reuse_page(struct efx_rx_queue *rx_queue)
 		++rx_queue->page_remove;
 
 	/* If page_count is 1 then we hold the only reference to this page. */
-	if (page_count(page) == 1) {
+	if (page_count(page) == 1 && page_to_nid(page) == numa_mem_id()) {
 		++rx_queue->page_recycle_count;
 		return page;
 	} else {

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] sfc: allocate rx pages on the same node as the interrupt
  2016-05-10 14:57   ` Eric Dumazet
@ 2016-05-10 14:59     ` Eric Dumazet
  2016-05-12 10:35     ` Daniel Pieczko (dpieczko)
  1 sibling, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2016-05-10 14:59 UTC (permalink / raw)
  To: Edward Cree; +Cc: linux-net-drivers, David Miller, netdev, Daniel Pieczko

On Tue, 2016-05-10 at 07:57 -0700, Eric Dumazet wrote:
> On Tue, 2016-05-10 at 07:55 -0700, Eric Dumazet wrote:
> > On Tue, 2016-05-10 at 15:21 +0100, Edward Cree wrote:
> > > From: Daniel Pieczko <dpieczko@solarflare.com>
> > > 
> > > When the interrupt servicing a channel is on a NUMA node that is
> > > not local to the device, performance is improved by allocating
> > > rx pages on the node local to the interrupt (remote to the device)
> > > 
> > > The performance-optimal case, where interrupts and applications
> > > are pinned to CPUs on the same node as the device, is not altered
> > > by this change.
> > > 
> > > This change gave a 1% improvement in transaction rate using Nginx
> > > with all interrupts and Nginx threads on the node remote to the
> > > device. It also gave a small reduction in round-trip latency,
> > > again with the interrupt and application on a different node to
> > > the device.
> > 
> > Yes, I advocated for such changes in the past for mlx4 NIC.
> > 
> > But your patch makes no sense to me.
> > 
> > alloc_pages() by default would run on the cpu servicing the IRQ, so
> > would automatically provide pages on the local node.
> > 
> > If you care only of the initial pages allocated with GFP_KERNEL at
> > device start, really that is a small detail as they should be consumed
> > and replaced quite fast.
> > 
> > If you worry that "wrong" pages would be reused over and over,
> > you could make sure that efx_reuse_page() wont reuse a page on the wrong
> > node.
> > 
> > page_to_nid(page) != numa_mem_id() 
> > 
> > Note that this could happen even if IRQ are not changed since
> > alloc_pages() could in stress situations give you a page from a remote
> > node.
> 
> Something like this (untested) ?
> 
> diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
> index 8956995b2fe713b377f205a55fada36c8a04253a..90665a8992e29cceb29dd465b52f934dff3b3d4e 100644
> --- a/drivers/net/ethernet/sfc/rx.c
> +++ b/drivers/net/ethernet/sfc/rx.c
> @@ -124,7 +124,7 @@ static struct page *efx_reuse_page(struct efx_rx_queue *rx_queue)
>  		++rx_queue->page_remove;
>  
>  	/* If page_count is 1 then we hold the only reference to this page. */
> -	if (page_count(page) == 1) {
> +	if (page_count(page) == 1 && page_to_nid(page) == numa_mem_id()) {
>  		++rx_queue->page_recycle_count;
>  		return page;
>  	} else {
> 

You could also factorize code from Intel drivers in some common helper

static inline bool igb_page_is_reserved(struct page *page)
{
        return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page);
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] sfc: allocate rx pages on the same node as the interrupt
  2016-05-10 14:57   ` Eric Dumazet
  2016-05-10 14:59     ` Eric Dumazet
@ 2016-05-12 10:35     ` Daniel Pieczko (dpieczko)
  1 sibling, 0 replies; 5+ messages in thread
From: Daniel Pieczko (dpieczko) @ 2016-05-12 10:35 UTC (permalink / raw)
  To: Eric Dumazet, Edward Cree; +Cc: linux-net-drivers, David Miller, netdev

On 10/05/16 15:57, Eric Dumazet wrote:
> On Tue, 2016-05-10 at 07:55 -0700, Eric Dumazet wrote:
>> If you care only of the initial pages allocated with GFP_KERNEL at
>> device start, really that is a small detail as they should be consumed
>> and replaced quite fast.

This patch was addressing just that situation.

>> If you worry that "wrong" pages would be reused over and over,
>> you could make sure that efx_reuse_page() wont reuse a page on the wrong
>> node.
>>
>> page_to_nid(page) != numa_mem_id()

Yes, that would solve the general problem and your patch looks correct
for the code as it is right now.

However, we have some pending changes to our rx recycle ring that
haven't been submitted upstream yet which will alter the way that pages
are reused, so the solution may no longer be so simple.  I think we
avoided making the simple change of checking page_to_nid() right now
because of these future changes to rx page recycling.

So I think that the original patch in this submission is valuable in
addressing the initial startup case - that won't be changed by the other
rx page recycling changes.

I have no preference on whether we take Eric's follow-up patch to check
page_to_nid() right now or wait until the other rx changes are submitted
and then fix it up as required.

Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-05-12 10:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-10 14:21 [PATCH net-next] sfc: allocate rx pages on the same node as the interrupt Edward Cree
2016-05-10 14:55 ` Eric Dumazet
2016-05-10 14:57   ` Eric Dumazet
2016-05-10 14:59     ` Eric Dumazet
2016-05-12 10:35     ` Daniel Pieczko (dpieczko)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.