[RFC PATCH net-next 0/1] ibmveth: Implement BQL

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH net-next 0/1] ibmveth: Implement BQL
@ 2022-10-24 21:38 Nick Child
  2022-10-24 21:38 ` [RFC PATCH net-next 1/1] " Nick Child
  2022-10-25 18:41 ` [RFC PATCH net-next 0/1] " Jakub Kicinski
  0 siblings, 2 replies; 7+ messages in thread
From: Nick Child @ 2022-10-24 21:38 UTC (permalink / raw)
  To: netdev; +Cc: nick.child, dave.taht, Nick Child

Hello,

Labeled as RFC because I am unsure if adding Byte Queue Limits (BQL) is
positively effecting the ibmveth driver. BQL is common among network
drivers so I would like to incorporate it into the virtual ethernet
driver, ibmveth. But I am having trouble measuring its effects.

From my understanding (and please correct me if I am wrong), BQL will 
use the number of packets sent to the NIC to approximate the minimum
number of packets to enqueue to a netdev_queue without starving the NIC.
As a result, bufferbloat in the networking queues are minimized which
may allow for smaller latencies.

After performing various netperf tests under differing loads and
priorities, I do not see any performance effect when comparing the
driver with and without BQL. The ibmveth driver is a virtual driver
which has an abstracted view of the NIC so I am comfortable without
seeing any performance deltas. That being said, I would like to know if
BQL is actually being enforced in some way. In other words, I would
like to observe a change in the number of queued bytes during BQL
implementations. Does anyone know of a mechanism to measure the length
of a netdev_queue?

I tried creating a BPF script[1] to track the bytes in a netdev_queue
but again am not seeing any difference with and without BQL. I do not
believe anything is wrong with BQL (it is more likely that my tracing
is bad) but I would like to have some evidence of BQL having a
positive effect on the device. Any recommendations or advice would be
greatly appreciated.
Thanks.

[1] https://github.com/nick-child-ibm/bpf_scripts/blob/main/bpftrace_queued_bytes.bt 

Nick Child (1):
  ibmveth: Implement BQL

 drivers/net/ethernet/ibm/ibmveth.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

-- 
2.31.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH net-next 1/1] ibmveth: Implement BQL
  2022-10-24 21:38 [RFC PATCH net-next 0/1] ibmveth: Implement BQL Nick Child
@ 2022-10-24 21:38 ` Nick Child
  2022-10-25 18:41 ` [RFC PATCH net-next 0/1] " Jakub Kicinski
  1 sibling, 0 replies; 7+ messages in thread
From: Nick Child @ 2022-10-24 21:38 UTC (permalink / raw)
  To: netdev; +Cc: nick.child, dave.taht, Nick Child

Byte Queue Limits allows for dynamic size management of transmit
queues. By informing the higher-level networking layers about in-flight
transmissions, the number queued bytes will be optimized for better
performance and minimal buffer-bloat.

Since the ibmveth driver does not have an irq to receive packet
completion information from firmware, assume the packet has completed
transmission when the hypervisor call H_SEND_LOGICAL_LAN returns.

Signed-off-by: Nick Child <nnac123@linux.ibm.com>
---
 drivers/net/ethernet/ibm/ibmveth.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c
index 3b14dc93f59d..1290848e2e2b 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -578,6 +578,7 @@ static int ibmveth_open(struct net_device *netdev)
 	}
 
 	for (i = 0; i < netdev->real_num_tx_queues; i++) {
+		netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i));
 		if (ibmveth_allocate_tx_ltb(adapter, i))
 			goto out_free_tx_ltb;
 	}
@@ -1027,8 +1028,10 @@ static int ibmveth_set_channels(struct net_device *netdev,
 			continue;
 
 		rc = ibmveth_allocate_tx_ltb(adapter, i);
-		if (!rc)
+		if (!rc) {
+			netdev_tx_reset_queue(netdev_get_tx_queue(netdev, i));
 			continue;
+		}
 
 		/* if something goes wrong, free everything we just allocated */
 		netdev_err(netdev, "Failed to allocate more tx queues, returning to %d queues\n",
@@ -1125,6 +1128,7 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,
 	union ibmveth_buf_desc desc;
 	int i, queue_num = skb_get_queue_mapping(skb);
 	unsigned long mss = 0;
+	struct netdev_queue *txq = netdev_get_tx_queue(netdev, queue_num);
 
 	if (ibmveth_is_packet_unsupported(skb, netdev))
 		goto out;
@@ -1202,6 +1206,7 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,
 	/* finish writing to long_term_buff before VIOS accessing it */
 	dma_wmb();
 
+	netdev_tx_sent_queue(txq, skb->len);
 	if (ibmveth_send(adapter, desc.desc, mss)) {
 		adapter->tx_send_failed++;
 		netdev->stats.tx_dropped++;
@@ -1209,6 +1214,7 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,
 		netdev->stats.tx_packets++;
 		netdev->stats.tx_bytes += skb->len;
 	}
+	netdev_tx_completed_queue(txq, 1, skb->len);
 
 out:
 	dev_consume_skb_any(skb);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH net-next 0/1] ibmveth: Implement BQL
  2022-10-24 21:38 [RFC PATCH net-next 0/1] ibmveth: Implement BQL Nick Child
  2022-10-24 21:38 ` [RFC PATCH net-next 1/1] " Nick Child
@ 2022-10-25 18:41 ` Jakub Kicinski
  2022-10-25 20:03   ` Nick Child
  1 sibling, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2022-10-25 18:41 UTC (permalink / raw)
  To: Nick Child; +Cc: netdev, nick.child, dave.taht

On Mon, 24 Oct 2022 16:38:27 -0500 Nick Child wrote:
> Labeled as RFC because I am unsure if adding Byte Queue Limits (BQL) is
> positively effecting the ibmveth driver. BQL is common among network
> drivers so I would like to incorporate it into the virtual ethernet
> driver, ibmveth. But I am having trouble measuring its effects.
> 
> From my understanding (and please correct me if I am wrong), BQL will 
> use the number of packets sent to the NIC to approximate the minimum
> number of packets to enqueue to a netdev_queue without starving the NIC.
> As a result, bufferbloat in the networking queues are minimized which
> may allow for smaller latencies.
> 
> After performing various netperf tests under differing loads and
> priorities, I do not see any performance effect when comparing the
> driver with and without BQL. The ibmveth driver is a virtual driver
> which has an abstracted view of the NIC so I am comfortable without
> seeing any performance deltas. That being said, I would like to know if
> BQL is actually being enforced in some way. In other words, I would
> like to observe a change in the number of queued bytes during BQL
> implementations. Does anyone know of a mechanism to measure the length
> of a netdev_queue?
> 
> I tried creating a BPF script[1] to track the bytes in a netdev_queue
> but again am not seeing any difference with and without BQL. I do not
> believe anything is wrong with BQL (it is more likely that my tracing
> is bad) but I would like to have some evidence of BQL having a
> positive effect on the device. Any recommendations or advice would be
> greatly appreciated.

What qdisc are you using and what "netperf tests" are you running?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH net-next 0/1] ibmveth: Implement BQL
  2022-10-25 18:41 ` [RFC PATCH net-next 0/1] " Jakub Kicinski
@ 2022-10-25 20:03   ` Nick Child
  2022-10-25 22:10     ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Nick Child @ 2022-10-25 20:03 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, nick.child, dave.taht

On 10/25/22 13:41, Jakub Kicinski wrote:
> On Mon, 24 Oct 2022 16:38:27 -0500 Nick Child wrote:

>>  Does anyone know of a mechanism to measure the length
>> of a netdev_queue?
>>
>> I tried creating a BPF script[1] to track the bytes in a netdev_queue
>> but again am not seeing any difference with and without BQL. I do not
>> believe anything is wrong with BQL (it is more likely that my tracing
>> is bad) but I would like to have some evidence of BQL having a
>> positive effect on the device. Any recommendations or advice would be
>> greatly appreciated.
> 
> What qdisc are you using and what "netperf tests" are you running?

Th qdisc is default pfifo_fast.

I have tried the netperf tests described in the patchset which 
introduced BQL[1]. More specifically, 100 low priority netperf 
TCP_STREAMs with 1 high priority TCP_RR. The author of the patchset also 
listed data for number of queued bytes but did not explain how he 
managed to get those measurements.
Additionally, I have tried using flent[2] (a wrapper for netperf) to run 
performance measurements when the system is under considerable load. In 
particular I tried the flent rrul_prio (Realtime Response Under Load - 
Test Prio Queue) and rtt_fair (RTT Fair Realtime Response Under Load) tests.

Again, a positive effect on performance is not as much as a concern for 
me as knowing that BQL is doing is enforcing queue size limits.

Thanks for your help,
Nick

[1] https://lwn.net/Articles/469652/
[2] https://flent.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH net-next 0/1] ibmveth: Implement BQL
  2022-10-25 20:03   ` Nick Child
@ 2022-10-25 22:10     ` Jakub Kicinski
  2022-10-26  0:08       ` Dave Taht
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2022-10-25 22:10 UTC (permalink / raw)
  To: Nick Child; +Cc: netdev, nick.child, dave.taht

On Tue, 25 Oct 2022 15:03:03 -0500 Nick Child wrote:
> Th qdisc is default pfifo_fast.

You need a more advanced qdisc to seen an effect. Try fq.
BQL tries to keep the NIC queue (fifo) as short as possible
to hold packets in the qdisc. But if the qdisc is also just
a fifo there's no practical difference.

I have no practical experience with BQL on virtualized NICs 
tho, so unsure what gains you should expect to see..

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH net-next 0/1] ibmveth: Implement BQL
  2022-10-25 22:10     ` Jakub Kicinski
@ 2022-10-26  0:08       ` Dave Taht
  2022-10-26 21:10         ` Nick Child
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Taht @ 2022-10-26  0:08 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Nick Child, netdev, nick.child

On Tue, Oct 25, 2022 at 3:10 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 25 Oct 2022 15:03:03 -0500 Nick Child wrote:
> > Th qdisc is default pfifo_fast.
>
> You need a more advanced qdisc to seen an effect. Try fq.
> BQL tries to keep the NIC queue (fifo) as short as possible
> to hold packets in the qdisc. But if the qdisc is also just
> a fifo there's no practical difference.
>
> I have no practical experience with BQL on virtualized NICs
> tho, so unsure what gains you should expect to see..

fq_codel would be a better choice of underlying qdisc for a test, and
in this environment you'd need to pound the interface flat with hundreds
of flows, preferably in both directions.

My questions are:

If the ring buffers never fill, why do you need to allocate so many
buffers in the first place?
If bql never engages, what's the bottleneck elsewhere? XMIT_MORE?

Now the only tool for monitoring bql I know of is bqlmon.

-- 
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH net-next 0/1] ibmveth: Implement BQL
  2022-10-26  0:08       ` Dave Taht
@ 2022-10-26 21:10         ` Nick Child
  0 siblings, 0 replies; 7+ messages in thread
From: Nick Child @ 2022-10-26 21:10 UTC (permalink / raw)
  To: Dave Taht, Jakub Kicinski; +Cc: netdev, nick.child

On 10/25/22 19:08, Dave Taht wrote:
> On Tue, Oct 25, 2022 at 3:10 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Tue, 25 Oct 2022 15:03:03 -0500 Nick Child wrote:
>>> Th qdisc is default pfifo_fast.
>>
>> You need a more advanced qdisc to seen an effect. Try fq.
>> BQL tries to keep the NIC queue (fifo) as short as possible
>> to hold packets in the qdisc. But if the qdisc is also just
>> a fifo there's no practical difference.
>>
>> I have no practical experience with BQL on virtualized NICs
>> tho, so unsure what gains you should expect to see..
> 

I understand. I think that is why I am trying to investigate this 
further, because the whole virtualization aspect could undermine
everything that BQL is trying to accomplish. That being said, I could 
also be shining my flashlight in the wrong places. Hence the reason for
the RFC.

> fq_codel would be a better choice of underlying qdisc for a test, and
> in this environment you'd need to pound the interface flat with hundreds
> of flows, preferably in both directions.
> 

Enabling FQ_CODEL and restarting tests, I am still not seeing any 
noticeable difference in bytes sitting in the netdev_queue (but it is 
possible my tracing is incorrect). I also tried reducing the number of 
queues, disabling tso and even running 100-500 parallel iperf 
connections. I can see the throughput and latency taking a hit with more 
connections so I assume the systems are saturated.

> My questions are:
> 
> If the ring buffers never fill, why do you need to allocate so many
> buffers in the first place?

The reasoning for 16 tx queues was mostly to allow for more parallel 
calls to the devices xmit function. After hearing your points about 
resource issues, I will send a patch to reduce this number to 8 queues.

> If bql never engages, what's the bottleneck elsewhere? XMIT_MORE?
> 

I suppose the question I am trying to pose is: How do we know that bql 
is engaging?

> Now the only tool for monitoring bql I know of is bqlmon.
> 
bqlmon is to useful for tracking the bql `limit` value assigned to a 
queue (IOW `watch 
/sys/class/net/<device>/queues/tx*/byte_queue_limits/limit` ) but 
whether or not this value is being applied to an active network 
connection is what I would like to figure out.

Thanks again for feedback and helping me out with this.
Nick Child

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-10-26 21:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-24 21:38 [RFC PATCH net-next 0/1] ibmveth: Implement BQL Nick Child
2022-10-24 21:38 ` [RFC PATCH net-next 1/1] " Nick Child
2022-10-25 18:41 ` [RFC PATCH net-next 0/1] " Jakub Kicinski
2022-10-25 20:03   ` Nick Child
2022-10-25 22:10     ` Jakub Kicinski
2022-10-26  0:08       ` Dave Taht
2022-10-26 21:10         ` Nick Child

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).