From: <Parthiban.Veerasooran@microchip.com>
To: <pabeni@redhat.com>
Cc: <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<UNGLinuxDriver@microchip.com>, <jacob.e.keller@intel.com>,
<andrew+netdev@lunn.ch>, <davem@davemloft.net>,
<edumazet@google.com>, <kuba@kernel.org>
Subject: Re: [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers
Date: Thu, 28 Nov 2024 05:43:22 +0000 [thread overview]
Message-ID: <b3e23d57-3b3b-474c-ae45-cbbf4eaaef3a@microchip.com> (raw)
In-Reply-To: <7f5fd10d-aaf9-4259-9505-3fbabc3ba102@redhat.com>
Hi Paolo,
On 27/11/24 5:41 pm, Paolo Abeni wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> On 11/27/24 11:49, Parthiban.Veerasooran@microchip.com wrote:
>> On 26/11/24 4:11 pm, Paolo Abeni wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>>
>>> On 11/22/24 11:21, Parthiban Veerasooran wrote:
>>>> There are two skb pointers to manage tx skb's enqueued from n/w stack.
>>>> waiting_tx_skb pointer points to the tx skb which needs to be processed
>>>> and ongoing_tx_skb pointer points to the tx skb which is being processed.
>>>>
>>>> SPI thread prepares the tx data chunks from the tx skb pointed by the
>>>> ongoing_tx_skb pointer. When the tx skb pointed by the ongoing_tx_skb is
>>>> processed, the tx skb pointed by the waiting_tx_skb is assigned to
>>>> ongoing_tx_skb and the waiting_tx_skb pointer is assigned with NULL.
>>>> Whenever there is a new tx skb from n/w stack, it will be assigned to
>>>> waiting_tx_skb pointer if it is NULL. Enqueuing and processing of a tx skb
>>>> handled in two different threads.
>>>>
>>>> Consider a scenario where the SPI thread processed an ongoing_tx_skb and
>>>> it moves next tx skb from waiting_tx_skb pointer to ongoing_tx_skb pointer
>>>> without doing any NULL check. At this time, if the waiting_tx_skb pointer
>>>> is NULL then ongoing_tx_skb pointer is also assigned with NULL. After
>>>> that, if a new tx skb is assigned to waiting_tx_skb pointer by the n/w
>>>> stack and there is a chance to overwrite the tx skb pointer with NULL in
>>>> the SPI thread. Finally one of the tx skb will be left as unhandled,
>>>> resulting packet missing and memory leak.
>>>> To overcome the above issue, protect the moving of tx skb reference from
>>>> waiting_tx_skb pointer to ongoing_tx_skb pointer so that the other thread
>>>> can't access the waiting_tx_skb pointer until the current thread completes
>>>> moving the tx skb reference safely.
>>>
>>> A mutex looks overkill. Why don't you use a spinlock? why locking only
>>> one side (the writer) would be enough?
>> Ah my bad, missed to protect tc6->waiting_tx_skb = skb. So that it will
>> become like below,
>>
>> mutex_lock(&tc6->tx_skb_lock);
>> tc6->waiting_tx_skb = skb;
>> mutex_unlock(&tc6->tx_skb_lock);
>>
>> As both are not called from atomic context and they are allowed to
>> sleep, I used mutex rather than spinlock.
>>>
>>> Could you please report the exact sequence of events in a time diagram
>>> leading to the bug, something alike the following?
>>>
>>> CPU0 CPU1
>>> oa_tc6_start_xmit
>>> ...
>>> oa_tc6_spi_thread_handler
>>> ...
>> Good case:
>> ----------
>> Consider waiting_tx_skb is NULL.
>>
>> Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler)
>> --------------------------- -----------------------------------
>> - if waiting_tx_skb is NULL
>> - waiting_tx_skb = skb
>> - if ongoing_tx_skb is NULL
>> - ongoing_tx_skb = waiting_tx_skb
>> - waiting_tx_skb = NULL
>> ...
>> - ongoing_tx_skb = NULL
>> - if waiting_tx_skb is NULL
>> - waiting_tx_skb = skb
>> - if ongoing_tx_skb is NULL
>> - ongoing_tx_skb = waiting_tx_skb
>> - waiting_tx_skb = NULL
>> ...
>> - ongoing_tx_skb = NULL
>> ....
>>
>> Bad case:
>> ---------
>> Consider waiting_tx_skb is NULL.
>>
>> Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler)
>> --------------------------- -----------------------------------
>> - if waiting_tx_skb is NULL
>> - waiting_tx_skb = skb
>> - if ongoing_tx_skb is NULL
>
> AFAICS, if 'waiting_tx_skb == NULL and Thread2 is in
> oa_tc6_spi_thread_handler()/oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
> then ongoing_tx_skb can not be NULL, due to the previous check in:
>
> https://elixir.bootlin.com/linux/v6.12/source/drivers/net/ethernet/oa_tc6.c#L1064
>
> This looks like a single reader/single write scenarios that does not
> need any lock to ensure consistency.
>
> Do you observe any memory leak in real life scenarios?
>
> BTW it looks like both oa_tc6_start_xmit and oa_tc6_spi_thread_handler
> are possibly lacking memory barriers to avoid missing wake-ups.
Actually the transmit flow control is done using the TXC reported from
MAC-PHY and it is done in the below for loop. TXC is Transmit Credit
Count represents the rooms available to place the tx chunks in the MAC-PHY.
https://elixir.bootlin.com/linux/v6.12/source/drivers/net/ethernet/oa_tc6.c#L1004
- Consider a scenario where the TXC reported from the previous transfer
is 10 and ongoing_tx_skb holds an tx ethernet frame which can be
transported in 20 TXCs and waiting_tx_skb is still NULL.
tx_credits = 10; /* 21 are filled in the previous transfer */
ongoing_tx_skb = 20;
waiting_tx_skb = NULL; /* Still NULL */
- So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true.
- After oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
ongoing_tx_skb = 10;
waiting_tx_skb = NULL; /* Still NULL */
- Perform SPI transfer.
- Process SPI rx buffer to get the TXC from footers.
- Now let's assume previously filled 21 TXCs are freed so we are good to
transport the next remaining 10 tx chunks from ongoing_tx_skb.
tx_credits = 21;
ongoing_tx_skb = 10;
waiting_tx_skb = NULL;
- So, (tc6->ongoing_tx_skb || tc6->waiting_tx_skb) becomes true again.
- In the oa_tc6_prepare_spi_tx_buf_for_tx_skbs()
ongoing_tx_skb = NULL;
waiting_tx_skb = NULL;
Now the below bad case might happen,
Thread1 (oa_tc6_start_xmit) Thread2 (oa_tc6_spi_thread_handler)
--------------------------- -----------------------------------
- if waiting_tx_skb is NULL
- if ongoing_tx_skb is NULL
- ongoing_tx_skb = waiting_tx_skb
- waiting_tx_skb = skb
- waiting_tx_skb = NULL
...
- ongoing_tx_skb = NULL
- if waiting_tx_skb is NULL
- waiting_tx_skb = skb
Hope this clarifies.
Best regards,
Parthiban V
>
> Cheers,
>
> Paolo
>
next prev parent reply other threads:[~2024-11-28 5:43 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-22 10:21 [PATCH net v2 0/2] Fixes on the OPEN Alliance TC6 10BASE-T1x MAC-PHY support generic lib Parthiban Veerasooran
2024-11-22 10:21 ` [PATCH net v2 1/2] net: ethernet: oa_tc6: fix infinite loop error when tx credits becomes 0 Parthiban Veerasooran
2024-11-26 10:44 ` Paolo Abeni
2024-11-27 8:43 ` Parthiban.Veerasooran
2024-11-22 10:21 ` [PATCH net v2 2/2] net: ethernet: oa_tc6: fix tx skb race condition between reference pointers Parthiban Veerasooran
2024-11-26 10:41 ` Paolo Abeni
2024-11-27 10:49 ` Parthiban.Veerasooran
2024-11-27 12:11 ` Paolo Abeni
2024-11-28 5:43 ` Parthiban.Veerasooran [this message]
2024-12-02 3:37 ` Parthiban.Veerasooran
2024-12-04 2:52 ` Jakub Kicinski
2024-12-04 8:28 ` Parthiban.Veerasooran
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b3e23d57-3b3b-474c-ae45-cbbf4eaaef3a@microchip.com \
--to=parthiban.veerasooran@microchip.com \
--cc=UNGLinuxDriver@microchip.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=jacob.e.keller@intel.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.