netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>,
	davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
	pabeni@redhat.com, Herbert Xu <herbert@gondor.apana.org.au>,
	"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code
Date: Mon, 3 Apr 2023 08:56:01 -0700	[thread overview]
Message-ID: <20230403085601.44f04cd2@kernel.org> (raw)
In-Reply-To: <CAKgT0UeDy6B0QJt126tykUfu+cB2VK0YOoMOYcL1JQFmxtgG0A@mail.gmail.com>

On Mon, 3 Apr 2023 08:18:04 -0700 Alexander Duyck wrote:
> On Sat, Apr 1, 2023 at 11:58 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > One more question: Don't we need a read memory barrier here to ensure
> > > get_desc is up-to-date?  
> >
> > CC: Alex, maybe I should not be posting after 10pm, with the missing v2
> > and sparse CC list.. :|
> >
> > I was thinking about this too yesterday. AFAICT this implementation
> > could indeed result in waking even tho the queue is full on non-x86.
> > That's why the drivers have an extra check at the start of .xmit? :(  
> 
> The extra check at the start is more historical than anything else.
> Logic like that has been there since the e1000 days. I think it
> addressed items like pktgen which I think didn't make use of the
> stop/wake flags way back when. I'll add in Herbet who was the original
> author for this code so he can add some additional history if needed.

Thanks for the pointer, you weren't kidding with the 2.6.19, that seems
to be when to code was added to e1000 :) Looks fairly similar to the
current code minus the BQL.

> > I *think* that the right ordering would be:
> >
> > c1. WRITE cons
> > c2. mb()  # A
> > c3. READ stopped
> > c4. rmb() # C
> > c5. READ prod, cons  
> 
> What would the extra rmb() get you? The mb() will have already flushed
> out any writes and if stopped is set the tail should have already been
> written before setting it.

I don't think in terms of flushes. Let me add line numbers to the
producer and the consumer.

 c1. WRITE cons
 c2. mb()  # A
 c3. READ stopped
 c4. rmb() # C
 c5. READ prod, cons  

 p1. WRITE prod
 p2. READ prod, cons
 p3. mb()  # B
 p4. WRITE stopped
 p5. READ prod, cons

The way I think the mb() orders c1 and c3 vs p2 and p4. The rmb()
orders c3 and c5 vs p1 and p4. Let me impenitently add Paul..

> One other thing to keep in mind is that the wake gives itself a pretty
> good runway. We are talking about enough to transmit at least 2
> frames. So if another consumer is stopping it we aren't waking it
> unless there is enough space for yet another frame after the current
> consumer.

Ack, the race is very unlikely, basically the completing CPU would have
to take an expensive IRQ between checking the descriptor count and
checking if stopped -- to let the sending CPU queue multiple frames.

But in theory the race is there, right?

> > And on the producer side (existing):
> >
> > p1. WRITE prod
> > p2. READ prod, cons
> > p3. mb()  # B
> > p4. WRITE stopped
> > p5. READ prod, cons
> >
> > But I'm slightly afraid to change it, it's been working for over
> > a decade :D  
> 
> I wouldn't change it. The code has predated BQL in the e1000 driver
> and has been that way since the inception of it I believe in 2.6.19.
> 
> > One neat thing that I noticed, which we could potentially exploit
> > if we were to touch this code is that BQL already has a smp_mb()
> > on the consumer side. So on any kernel config and driver which support
> > BQL we can use that instead of adding another barrier at #A.
> >
> > It would actually be a neat optimization because right now, AFAICT,
> > completion will fire the # A -like barrier almost every time.  
> 
> Yeah, the fact is the barrier in the wake path may actually be
> redundant if BQL is enabled. My advice is if you are wanting to get a
> better idea of how this was setup you could take a look at the e1000
> driver in the 2.6.19 kernel as that was where this code originated and
> I am pretty certain it predates anything in any of the other Intel
> drivers other than maybe e100.


  reply	other threads:[~2023-04-03 15:56 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-01  5:12 [PATCH net-next 0/3] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
2023-04-01  5:12 ` [PATCH net-next 1/3] " Jakub Kicinski
2023-04-01 15:04   ` Heiner Kallweit
2023-04-01 18:03     ` Jakub Kicinski
2023-04-01 15:18   ` Heiner Kallweit
2023-04-01 18:58     ` Jakub Kicinski
2023-04-01 20:41       ` Heiner Kallweit
2023-04-03 15:18       ` Alexander Duyck
2023-04-03 15:56         ` Jakub Kicinski [this message]
2023-04-03 18:11           ` Alexander Duyck
2023-04-03 19:03             ` Jakub Kicinski
2023-04-03 20:27               ` Alexander Duyck
2023-04-05 22:20                 ` Paul E. McKenney
2023-04-06  5:15                   ` Herbert Xu
2023-04-06 14:17                     ` Paul E. McKenney
2023-04-06 14:46                       ` Jakub Kicinski
2023-04-06 15:45                         ` Paul E. McKenney
2023-04-06 15:56                           ` Jakub Kicinski
2023-04-06 16:25                             ` Paul E. McKenney
2023-04-07  0:58                         ` Herbert Xu
2023-04-07  1:03                           ` Jakub Kicinski
2023-04-07  1:14                             ` Herbert Xu
2023-04-07  1:21                               ` Jakub Kicinski
2023-04-04  6:39         ` Herbert Xu
2023-04-04 22:36           ` Jakub Kicinski
2023-04-01  5:12 ` [PATCH net-next 2/3] ixgbe: use new queue try_stop/try_wake macros Jakub Kicinski
2023-04-01  5:12 ` [PATCH net-next 3/3] bnxt: " Jakub Kicinski
2023-04-01 18:35   ` Michael Chan
  -- strict thread matches above, loose matches on Subject: below --
2023-03-22 23:30 [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski
2023-03-23  0:35 ` Andrew Lunn
2023-03-23  1:04   ` Jakub Kicinski
2023-03-23 21:02     ` Andrew Lunn
2023-03-23 22:46       ` Jakub Kicinski
2023-03-23  3:05 ` Yunsheng Lin
2023-03-23  3:27   ` Jakub Kicinski
2023-03-23  4:53 ` Pavan Chebbi
2023-03-23  5:08   ` Jakub Kicinski
2023-03-23 16:05 ` Alexander H Duyck
2023-03-24  3:09   ` Jakub Kicinski
2023-03-24 15:45     ` Alexander Duyck
2023-03-24 21:28       ` Jakub Kicinski
2023-03-26 21:23         ` Alexander Duyck
2023-03-29  0:56           ` Jakub Kicinski
2023-03-30 14:56             ` Paolo Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230403085601.44f04cd2@kernel.org \
    --to=kuba@kernel.org \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=hkallweit1@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=paulmck@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).