* RE: [PATCH net-next] net: lan966x: Improve the CPU TX bitrate.
2022-03-08 22:30 ` Horatiu Vultur
@ 2022-03-08 22:46 ` David Laight
2022-03-09 9:11 ` 'Horatiu Vultur'
2022-03-09 0:40 ` Jakub Kicinski
2022-03-09 13:11 ` Andrew Lunn
2 siblings, 1 reply; 12+ messages in thread
From: David Laight @ 2022-03-08 22:46 UTC (permalink / raw)
To: 'Horatiu Vultur', Andrew Lunn
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
UNGLinuxDriver@microchip.com, davem@davemloft.net,
kuba@kernel.org
From: Horatiu Vultur
> Sent: 08 March 2022 22:30
>
> The 03/08/2022 22:36, Andrew Lunn wrote:
> >
> > > static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > > {
> > > - u32 val;
> > > + unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > + int ret = 0;
> > >
> > > - return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > - QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > - READL_SLEEP_US, READL_TIMEOUT_US);
> > > + while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > + QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > + if (time_after(jiffies, time)) {
> > > + ret = -ETIMEDOUT;
> > > + break;
> > > + }
> >
> > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > explicitly supports that.
>
> I have tried but it didn't improve. It was the same as before.
How many times round the loop is it going ?
It might be that by the time readx_poll_timeout_atomic()
gets around to reading the register the fifo is actually ready.
The there is the delay between detecting 'ready' and writing
the next data.
That delay might be cumulative and affect performance.
OTOH spinning waiting for fifo space is just plain horrid.
Reminds me of 3C509 drivers :-)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH net-next] net: lan966x: Improve the CPU TX bitrate.
2022-03-08 22:46 ` David Laight
@ 2022-03-09 9:11 ` 'Horatiu Vultur'
2022-03-09 9:57 ` David Laight
0 siblings, 1 reply; 12+ messages in thread
From: 'Horatiu Vultur' @ 2022-03-09 9:11 UTC (permalink / raw)
To: David Laight
Cc: Andrew Lunn, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
UNGLinuxDriver@microchip.com, davem@davemloft.net,
kuba@kernel.org
The 03/08/2022 22:46, David Laight wrote:
>
> From: Horatiu Vultur
> > Sent: 08 March 2022 22:30
> >
> > The 03/08/2022 22:36, Andrew Lunn wrote:
> > >
> > > > static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > > > {
> > > > - u32 val;
> > > > + unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > > + int ret = 0;
> > > >
> > > > - return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > > - QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > > - READL_SLEEP_US, READL_TIMEOUT_US);
> > > > + while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > > + QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > > + if (time_after(jiffies, time)) {
> > > > + ret = -ETIMEDOUT;
> > > > + break;
> > > > + }
> > >
> > > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > > explicitly supports that.
> >
> > I have tried but it didn't improve. It was the same as before.
>
> How many times round the loop is it going ?
In the tests that I have done, I have never seen entering in the loop.
>
> It might be that by the time readx_poll_timeout_atomic()
> gets around to reading the register the fifo is actually ready.
>
> The there is the delay between detecting 'ready' and writing
> the next data.
> That delay might be cumulative and affect performance.
There is also a small delay between writting the word and checking if
more words can be written.
>
> OTOH spinning waiting for fifo space is just plain horrid.
Yes, I agree with you.
> Reminds me of 3C509 drivers :-)
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
--
/Horatiu
^ permalink raw reply [flat|nested] 12+ messages in thread* RE: [PATCH net-next] net: lan966x: Improve the CPU TX bitrate.
2022-03-09 9:11 ` 'Horatiu Vultur'
@ 2022-03-09 9:57 ` David Laight
2022-03-09 12:16 ` 'Horatiu Vultur'
0 siblings, 1 reply; 12+ messages in thread
From: David Laight @ 2022-03-09 9:57 UTC (permalink / raw)
To: 'Horatiu Vultur'
Cc: Andrew Lunn, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
UNGLinuxDriver@microchip.com, davem@davemloft.net,
kuba@kernel.org
From: 'Horatiu Vultur'
> Sent: 09 March 2022 09:11
>
> The 03/08/2022 22:46, David Laight wrote:
> >
> > From: Horatiu Vultur
> > > Sent: 08 March 2022 22:30
> > >
> > > The 03/08/2022 22:36, Andrew Lunn wrote:
> > > >
> > > > > static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > > > > {
> > > > > - u32 val;
> > > > > + unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > > > + int ret = 0;
> > > > >
> > > > > - return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > > > - QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > > > - READL_SLEEP_US, READL_TIMEOUT_US);
> > > > > + while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > > > + QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > > > + if (time_after(jiffies, time)) {
> > > > > + ret = -ETIMEDOUT;
> > > > > + break;
> > > > > + }
> > > >
> > > > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > > > explicitly supports that.
> > >
> > > I have tried but it didn't improve. It was the same as before.
> >
> > How many times round the loop is it going ?
>
> In the tests that I have done, I have never seen entering in the loop.
In which case I'd do an initial status check before even
faffing with 'jiffies'.
It might even be that the status read is so slow that space
is always available by the time it is processed.
PCIe reads can be horribly slow.
Into our fgpa they end up being slower than old ISA bus cycles.
Probably several thousand cpu clocks.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH net-next] net: lan966x: Improve the CPU TX bitrate.
2022-03-09 9:57 ` David Laight
@ 2022-03-09 12:16 ` 'Horatiu Vultur'
0 siblings, 0 replies; 12+ messages in thread
From: 'Horatiu Vultur' @ 2022-03-09 12:16 UTC (permalink / raw)
To: David Laight
Cc: Andrew Lunn, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
UNGLinuxDriver@microchip.com, davem@davemloft.net,
kuba@kernel.org
The 03/09/2022 09:57, David Laight wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> From: 'Horatiu Vultur'
> > Sent: 09 March 2022 09:11
> >
> > The 03/08/2022 22:46, David Laight wrote:
> > >
> > > From: Horatiu Vultur
> > > > Sent: 08 March 2022 22:30
> > > >
> > > > The 03/08/2022 22:36, Andrew Lunn wrote:
> > > > >
> > > > > > static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > > > > > {
> > > > > > - u32 val;
> > > > > > + unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > > > > + int ret = 0;
> > > > > >
> > > > > > - return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > > > > - QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > > > > - READL_SLEEP_US, READL_TIMEOUT_US);
> > > > > > + while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > > > > + QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > > > > + if (time_after(jiffies, time)) {
> > > > > > + ret = -ETIMEDOUT;
> > > > > > + break;
> > > > > > + }
> > > > >
> > > > > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > > > > explicitly supports that.
> > > >
> > > > I have tried but it didn't improve. It was the same as before.
> > >
> > > How many times round the loop is it going ?
> >
> > In the tests that I have done, I have never seen entering in the loop.
>
> In which case I'd do an initial status check before even
> faffing with 'jiffies'.
That is a good idea. I will do this change in the next version.
>
> It might even be that the status read is so slow that space
> is always available by the time it is processed.
> PCIe reads can be horribly slow.
> Into our fgpa they end up being slower than old ISA bus cycles.
> Probably several thousand cpu clocks.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
--
/Horatiu
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] net: lan966x: Improve the CPU TX bitrate.
2022-03-08 22:30 ` Horatiu Vultur
2022-03-08 22:46 ` David Laight
@ 2022-03-09 0:40 ` Jakub Kicinski
2022-03-09 9:14 ` Horatiu Vultur
2022-03-09 13:11 ` Andrew Lunn
2 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2022-03-09 0:40 UTC (permalink / raw)
To: Horatiu Vultur; +Cc: Andrew Lunn, netdev, linux-kernel, UNGLinuxDriver, davem
On Tue, 8 Mar 2022 23:30:00 +0100 Horatiu Vultur wrote:
> > > static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > > {
> > > - u32 val;
> > > + unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > + int ret = 0;
> > >
> > > - return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > - QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > - READL_SLEEP_US, READL_TIMEOUT_US);
> > > + while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > + QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > + if (time_after(jiffies, time)) {
> > > + ret = -ETIMEDOUT;
> > > + break;
> > > + }
> >
> > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > explicitly supports that.
>
> I have tried but it didn't improve. It was the same as before.
Huh, is ktime_get() super expensive on that platform?
jiffies vs ktime seems to be the main difference?
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH net-next] net: lan966x: Improve the CPU TX bitrate.
2022-03-09 0:40 ` Jakub Kicinski
@ 2022-03-09 9:14 ` Horatiu Vultur
0 siblings, 0 replies; 12+ messages in thread
From: Horatiu Vultur @ 2022-03-09 9:14 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Andrew Lunn, netdev, linux-kernel, UNGLinuxDriver, davem
The 03/08/2022 16:40, Jakub Kicinski wrote:
>
> On Tue, 8 Mar 2022 23:30:00 +0100 Horatiu Vultur wrote:
> > > > static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > > > {
> > > > - u32 val;
> > > > + unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > > + int ret = 0;
> > > >
> > > > - return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > > - QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > > - READL_SLEEP_US, READL_TIMEOUT_US);
> > > > + while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > > + QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > > + if (time_after(jiffies, time)) {
> > > > + ret = -ETIMEDOUT;
> > > > + break;
> > > > + }
> > >
> > > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > > explicitly supports that.
> >
> > I have tried but it didn't improve. It was the same as before.
>
> Huh, is ktime_get() super expensive on that platform?
Hm.. it looks like. Just adding ktime_get() before the while loop, then
the performance drops like before.
I am using SOC_LAN966 which has an ARMv7 CPU. So I am not sure how
expensive is ktime_get().
> jiffies vs ktime seems to be the main difference?
--
/Horatiu
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] net: lan966x: Improve the CPU TX bitrate.
2022-03-08 22:30 ` Horatiu Vultur
2022-03-08 22:46 ` David Laight
2022-03-09 0:40 ` Jakub Kicinski
@ 2022-03-09 13:11 ` Andrew Lunn
2022-03-09 22:05 ` Horatiu Vultur
2 siblings, 1 reply; 12+ messages in thread
From: Andrew Lunn @ 2022-03-09 13:11 UTC (permalink / raw)
To: Horatiu Vultur; +Cc: netdev, linux-kernel, UNGLinuxDriver, davem, kuba
On Tue, Mar 08, 2022 at 11:30:00PM +0100, Horatiu Vultur wrote:
> The 03/08/2022 22:36, Andrew Lunn wrote:
> >
> > > static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > > {
> > > - u32 val;
> > > + unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > + int ret = 0;
> > >
> > > - return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > - QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > - READL_SLEEP_US, READL_TIMEOUT_US);
> > > + while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > + QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > + if (time_after(jiffies, time)) {
> > > + ret = -ETIMEDOUT;
> > > + break;
> > > + }
> >
> > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > explicitly supports that.
>
> I have tried but it didn't improve. It was the same as before.
The reason i recommend ipoll.h is that your implementation has the
usual bug, which iopoll does not have. Since you are using _atomic()
it is less of an issue, but it still exists.
while (!(lan_rd(lan966x, QS_INJ_STATUS) &
QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
Say you take an interrupt here
if (time_after(jiffies, time)) {
ret = -ETIMEDOUT;
break;
}
The interrupt takes a while, so that by the time you get to
time_after(), you have reached your timeout. So -ETIMEDOUT is
returned. But in fact, the hardware has done its thing, and if you
where to read the status the bit would be set, and you should actually
return O.K, not an error.
iopoll does another check of the status before deciding to return
-ETIMEDOUT or O.K.
If you decide to simply check the status directly after the write, i
suggest you then use readx_poll_timeout_atomic() if you do need to
poll.
Andrew
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH net-next] net: lan966x: Improve the CPU TX bitrate.
2022-03-09 13:11 ` Andrew Lunn
@ 2022-03-09 22:05 ` Horatiu Vultur
2022-03-10 0:37 ` Andrew Lunn
0 siblings, 1 reply; 12+ messages in thread
From: Horatiu Vultur @ 2022-03-09 22:05 UTC (permalink / raw)
To: Andrew Lunn; +Cc: netdev, linux-kernel, UNGLinuxDriver, davem, kuba
The 03/09/2022 14:11, Andrew Lunn wrote:
>
> On Tue, Mar 08, 2022 at 11:30:00PM +0100, Horatiu Vultur wrote:
> > The 03/08/2022 22:36, Andrew Lunn wrote:
> > >
> > > > static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > > > {
> > > > - u32 val;
> > > > + unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > > + int ret = 0;
> > > >
> > > > - return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > > - QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > > - READL_SLEEP_US, READL_TIMEOUT_US);
> > > > + while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > > + QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > > + if (time_after(jiffies, time)) {
> > > > + ret = -ETIMEDOUT;
> > > > + break;
> > > > + }
> > >
> > > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > > explicitly supports that.
> >
> > I have tried but it didn't improve. It was the same as before.
>
> The reason i recommend ipoll.h is that your implementation has the
> usual bug, which iopoll does not have. Since you are using _atomic()
> it is less of an issue, but it still exists.
>
> while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
>
> Say you take an interrupt here
>
> if (time_after(jiffies, time)) {
> ret = -ETIMEDOUT;
> break;
> }
>
>
> The interrupt takes a while, so that by the time you get to
> time_after(), you have reached your timeout. So -ETIMEDOUT is
> returned. But in fact, the hardware has done its thing, and if you
> where to read the status the bit would be set, and you should actually
> return O.K, not an error.
That is a good catch and really nice explanation!
Then if I add also the check at the end, then it should be also OK.
>
> iopoll does another check of the status before deciding to return
> -ETIMEDOUT or O.K.
>
> If you decide to simply check the status directly after the write, i
> suggest you then use readx_poll_timeout_atomic() if you do need to
> poll.
>
> Andrew
--
/Horatiu
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH net-next] net: lan966x: Improve the CPU TX bitrate.
2022-03-09 22:05 ` Horatiu Vultur
@ 2022-03-10 0:37 ` Andrew Lunn
0 siblings, 0 replies; 12+ messages in thread
From: Andrew Lunn @ 2022-03-10 0:37 UTC (permalink / raw)
To: Horatiu Vultur; +Cc: netdev, linux-kernel, UNGLinuxDriver, davem, kuba
On Wed, Mar 09, 2022 at 11:05:16PM +0100, Horatiu Vultur wrote:
> The 03/09/2022 14:11, Andrew Lunn wrote:
> >
> > On Tue, Mar 08, 2022 at 11:30:00PM +0100, Horatiu Vultur wrote:
> > > The 03/08/2022 22:36, Andrew Lunn wrote:
> > > >
> > > > > static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > > > > {
> > > > > - u32 val;
> > > > > + unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > > > + int ret = 0;
> > > > >
> > > > > - return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > > > - QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > > > - READL_SLEEP_US, READL_TIMEOUT_US);
> > > > > + while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > > > + QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > > > + if (time_after(jiffies, time)) {
> > > > > + ret = -ETIMEDOUT;
> > > > > + break;
> > > > > + }
> > > >
> > > > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > > > explicitly supports that.
> > >
> > > I have tried but it didn't improve. It was the same as before.
> >
> > The reason i recommend ipoll.h is that your implementation has the
> > usual bug, which iopoll does not have. Since you are using _atomic()
> > it is less of an issue, but it still exists.
> >
> > while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> >
> > Say you take an interrupt here
> >
> > if (time_after(jiffies, time)) {
> > ret = -ETIMEDOUT;
> > break;
> > }
> >
> >
> > The interrupt takes a while, so that by the time you get to
> > time_after(), you have reached your timeout. So -ETIMEDOUT is
> > returned. But in fact, the hardware has done its thing, and if you
> > where to read the status the bit would be set, and you should actually
> > return O.K, not an error.
>
> That is a good catch and really nice explanation!
> Then if I add also the check at the end, then it should be also OK.
You are then just repeating code which is already in the core. That is
generally not liked. If you find reading the status once works 99% of
the time, then i suggest you call readx_poll_timeout_atomic() when the
status does indicate you need to poll.
Andrew
^ permalink raw reply [flat|nested] 12+ messages in thread