From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Alexander Aring <alex.aring@gmail.com>
Cc: Stefan Schmidt <stefan@datenfreihafen.org>,
linux-wpan - ML <linux-wpan@vger.kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
"open list:NETWORKING [GENERAL]" <netdev@vger.kernel.org>,
David Girault <david.girault@qorvo.com>,
Romuald Despres <romuald.despres@qorvo.com>,
Frederic Blain <frederic.blain@qorvo.com>,
Nicolas Schodet <nico@ni.fr.eu.org>,
Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Subject: Re: [PATCH wpan-next v2 13/14] net: mac802154: Introduce a tx queue flushing mechanism
Date: Tue, 29 Mar 2022 18:29:26 +0200 [thread overview]
Message-ID: <20220329182926.37de2779@xps13> (raw)
In-Reply-To: <CAB_54W4d+qUd2nA2Av3N0OfCf5stDjG3YQ97kebGiTrPAbLqZg@mail.gmail.com>
Hi Alexander,
alex.aring@gmail.com wrote on Sun, 27 Mar 2022 12:45:20 -0400:
> Hi,
>
> On Fri, Mar 18, 2022 at 2:11 PM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hi Alexander,
> >
> > alex.aring@gmail.com wrote on Sun, 13 Mar 2022 16:43:52 -0400:
> >
> > > Hi,
> > >
> > > On Fri, Mar 4, 2022 at 5:54 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > >
> > > > I had a second look at it and it appears to me that the issue was
> > > > already there and is structural. We just did not really cared about it
> > > > because we didn't bother with synchronization issues.
> > > >
> > >
> > > I am not sure if I understand correctly. We stop the queue at some
> > > specific moment and we need to make sure that xmit_do() is not called
> > > or can't be called anymore.
> > >
> > > I was thinking about:
> > >
> > > void ieee802154_disable_queue(struct ieee802154_hw *hw)
> > > {
> > > struct ieee802154_local *local = hw_to_local(hw);
> > > struct ieee802154_sub_if_data *sdata;
> > >
> > > rcu_read_lock();
> > > list_for_each_entry_rcu(sdata, &local->interfaces, list) {
> > > if (!sdata->dev)
> > > continue;
> > >
> > > netif_tx_disable(sdata->dev);
> > > }
> > > rcu_read_unlock();
> > > }
> > > EXPORT_SYMBOL(ieee802154_stop_queue);
> > >
> > > From my quick view is that "netif_tx_disable()" ensures by holding
> > > locks and other things and doing netif_tx_stop_queue() it we can be
> > > sure there will be no xmit_do() going on while it's called and
> > > afterwards. It can be that there are still transmissions on the
> > > transceiver that are on the way, but then your atomic counter and
> > > wait_event() will come in place.
> >
> > I went for a deeper investigation to understand how the net core
> > was calling our callbacks. And it appeared to go through
> > dev_hard_start_xmit() and come from __dev_queue_xmit(). This means
> > the ieee802154 callback could only be called once at a time
> > because it is protected by the network device transmit lock
> > (netif_tx_lock()). Which makes the logic safe and not racy as I
> > initially thought. This was the missing peace in my mental model I
> > believe.
> >
>
> You forget here that you want to stop all transmission and to wait
> that all transmissions are done from a sleepable context.
>
> > > We need to be sure there will be nothing queued anymore for
> > > transmission what (in my opinion) tx_disable() does. from any context.
> > >
> > > We might need to review some netif callbacks... I have in my mind for
> > > example stop(), maybe netif_tx_stop_queue() is enough (because the
> > > context is like netif_tx_disable(), helding similar locks, etc.) but
> > > we might want to be sure that nothing is going on anymore by using
> > > your wait_event() with counter.
> >
> > I don't see a real reason anymore to use the tx_disable() call. Is
> > there any reason this could be needed that I don't have in mind? Right
> > now the only thing that I see is that it could delay a little bit the
> > moment where we actually stop the queue because we would be waiting for
> > the lock to be released after the skb has been offloaded to hardware.
> > Perhaps maybe we would let another frame to be transmitted before we
> > actually get the lock.
> >
> > > Is there any problem which I don't see?
> >
> > One question however, as I understand, if userspace tries to send more
> > packets, I believe the "if (!stopped)" condition will be false and the
> > xmit call will simply be skipped, ending with a -ENETDOWN error [1]. Is
> > it what we want? I initially thought we could actually queue patches and
> > wait for the queue to be re-enabled again, but it does not look easy.
> >
>
> The problem is here that netif_tx_stop_queue() will only set some
> flags and this will be checked there. [0]
> Now you want to do from a sleepable context:
>
> 1. stop queue (net core functionality check [0])
> 2. wait until all ongoing transmissions are done (softmac layer atomic
> counter handled in xmit_do())
>
> Example situation for the race:
>
> cpu0:
> - checked _already_ the if queue is stopped [0] it was not the case
> BUT not incremented the atomic counter yet to signal that a
> transmission is going on (which is done later in xmit_do)
>
> While cpu0 is in the above mentioned state cpu1 is in the following state:
>
> - mlme message will transmitted
> - stop the queue by setting flag [0] (note the check in cpu0 already
> happened and a transmission is on the way)
> - make a wait_event($ONGOING_TRANSMISSION_COUNTER) which will not wait
> - (it's zero because and does not wait because cpu0 didn't incremented
> the ongoing transmission counter yet)
>
> ---
>
> This will end in that both cpu0 and cpu1 start transmissions... but
> this is only "we completed the spi transfer to the transceiver" the
> framebuffer is written and transmission is started. That the
> transceiver actually transmits the frame is completely handled on the
> transceiver side, on the Linux side we only need to care about that we
> don't overwrite the framebuffer while a transmission is going on. This
> can happen here, e.g. cpu0 writes the framebuffer first, then cpu1
> will overwrite the framebuffer because we start another transmission
> (writing framebuffer) while the transceiver still reads the
> framebuffer for the cpu0 transmission. In short it will break.
>
> If we want to start transmissions from any sleepable context we cannot
> use "netif_tx_stop_queue()" because it does not guarantee that
> xmit_do() is still going on, "netif_tx_disable()" will do it because
> it will held the xmit_lock while setting the flag and we don't run
> into the above problem.
>
> Is this more clear?
Crystal clear. Thanks for taking the time to explain it. I am now
convinced of the usefulness of calling netif_tx_disable() (and create
our own ieee802154 helper to call it).
> I think it was never clear what I really meant by
> this race, I hope the above example helped. Also "netif_tx_disable()"
> was my first hit to find netif_tx_disable()what we need, but maybe
> there exists better options?
> To be sure, I mean we need "netif_tx_disable()" only for any sleepable
> context e.g. we trigger mlme transmission and take control of all
> transmissions and be sure nothing is going on anymore, then we need to
> have still the wait_event(). After this is done we can be sure no
> transmission is going on and we can take over until we are done (mlme
> sync tx handling) and can call wake_queue() again.
>
> - Alex
>
> [0] https://elixir.bootlin.com/linux/v5.17/source/net/core/dev.c#L4114
Thanks,
Miquèl
next prev parent reply other threads:[~2022-03-29 16:29 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-07 14:47 [PATCH wpan-next v2 00/14] ieee802154: Synchronous Tx API Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 01/14] net: ieee802154: Move the logic restarting the queue upon transmission Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 02/14] net: mac802154: Create a transmit error helper Miquel Raynal
2022-02-20 23:31 ` Alexander Aring
2022-02-21 20:22 ` Alexander Aring
2022-02-22 8:43 ` Miquel Raynal
2022-02-24 1:53 ` Alexander Aring
2022-02-07 14:47 ` [PATCH wpan-next v2 03/14] net: ieee802154: at86rf230: Call _xmit_error() when a transmission fails Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 04/14] net: ieee802154: atusb: " Miquel Raynal
2022-02-20 23:35 ` Alexander Aring
2022-02-24 2:00 ` Alexander Aring
2022-02-24 14:43 ` Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 05/14] net: ieee802154: ca8210: " Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 06/14] net: mac802154: Stop exporting ieee802154_wake/stop_queue() Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 07/14] net: mac802154: Rename the synchronous xmit worker Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 08/14] net: mac802154: Rename the main tx_work struct Miquel Raynal
2022-02-07 14:47 ` [PATCH wpan-next v2 09/14] net: mac802154: Follow the count of ongoing transmissions Miquel Raynal
2022-02-07 14:48 ` [PATCH wpan-next v2 10/14] net: mac802154: Hold the transmit queue when relevant Miquel Raynal
2022-02-07 14:48 ` [PATCH wpan-next v2 11/14] net: mac802154: Create a hot tx path Miquel Raynal
2022-02-07 14:48 ` [PATCH wpan-next v2 12/14] net: mac802154: Add a warning in the hot path Miquel Raynal
2022-02-07 14:48 ` [PATCH wpan-next v2 13/14] net: mac802154: Introduce a tx queue flushing mechanism Miquel Raynal
2022-02-20 23:49 ` Alexander Aring
2022-03-03 18:17 ` Miquel Raynal
2022-03-04 10:54 ` Miquel Raynal
2022-03-13 20:43 ` Alexander Aring
2022-03-18 18:11 ` Miquel Raynal
2022-03-27 16:45 ` Alexander Aring
2022-03-29 16:29 ` Miquel Raynal [this message]
2022-02-07 14:48 ` [PATCH wpan-next v2 14/14] net: mac802154: Introduce a synchronous API for MLME commands Miquel Raynal
2022-02-20 23:52 ` Alexander Aring
2022-02-21 20:33 ` Alexander Aring
2022-02-21 20:33 ` Alexander Aring
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220329182926.37de2779@xps13 \
--to=miquel.raynal@bootlin.com \
--cc=alex.aring@gmail.com \
--cc=davem@davemloft.net \
--cc=david.girault@qorvo.com \
--cc=frederic.blain@qorvo.com \
--cc=kuba@kernel.org \
--cc=linux-wpan@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=nico@ni.fr.eu.org \
--cc=romuald.despres@qorvo.com \
--cc=stefan@datenfreihafen.org \
--cc=thomas.petazzoni@bootlin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).