From: Valentin Schneider <vschneid@redhat.com>
To: Eric Dumazet <edumazet@google.com>, Juri Lelli <juri.lelli@redhat.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
Paolo Abeni <pabeni@redhat.com>, netdev <netdev@vger.kernel.org>
Subject: Re: Question on tw_timer TIMER_PINNED
Date: Tue, 03 Oct 2023 16:09:24 +0200 [thread overview]
Message-ID: <xhsmhwmw3ol0r.mognet@vschneid.remote.csb> (raw)
In-Reply-To: <CANn89iJFyqckr3x=nwbExs3B1u=MXv9izL=2ByxOf20su2fhhg@mail.gmail.com>
Hi,
On 06/09/23 14:10, Eric Dumazet wrote:
> On Wed, Sep 6, 2023 at 1:58 PM Juri Lelli <juri.lelli@redhat.com> wrote:
>>
>> Hi Eric,
>>
>> I'm bothering you with a question about timewait_sock tw_timer, as I
>> believe you are one of the last persons touching it sometime ago. Please
>> feel free to redirect if I failed to git blame it correctly.
>>
>> At my end, latency spikes (entering the kernel) have been reported when
>> running latency sensitive applications in the field (essentially a
>> polling userspace application that doesn't want any interruption at
>> all). I think I've been able to track down one of such interruptions to
>> the servicing of tw_timer_handler. This system isolates application CPUs
>> dynamically, so what I think it happens is that at some point tw_timer
>> is armed on a CPU, and it is PINNED to that CPU, meanwhile (before the
>> 60s timeout) such CPU is 'isolated' and the latency sensitive app
>> started on it. After 60s the timer fires and interrupts the app
>> generating a spike.
>>
>> I'm not very familiar with this part of the kernel and from staring
>> at code for a while I had mixed feeling about the need to keep tw_timer
>> as TIMER_PINNED. Could you please shed some light on it? Is it a strict
>> functional requirement or maybe a nice to have performance (locality I'd
>> guess) improvement? Could we in principle make it !PINNED (so that it
>> can be moved/queued away and prevent interruptions)?
>>
>
> It is a functional requirement in current implementation.
>
> cfac7f836a71 ("tcp/dccp: block bh before arming time_wait timer")
> changelog has some details about it.
>
> Can this be changed to non pinned ? Probably, but with some care.
>
> You could simply disable tw completely, it is a best effort mechanism.
>
So it's looking like doing that is not acceptable for our use-case, as
we still want timewait sockets for the traffic happening on the
housekepeing (non-isolated) CPUs.
I had a look at these commits to figure out what it would take to make it
not pinned:
cfac7f836a71 ("tcp/dccp: block bh before arming time_wait timer")
ed2e92394589 ("tcp/dccp: fix timewait races in timer handling")
and I'm struggling to understand why we want the timer to be armed before
inet_twsk_hashdance(). I found this discussion on LKML:
https://lore.kernel.org/all/56941035.9040000@fastly.com/
And I can see that __inet_lookup_established() and tw_timer_handler()
both operate on __tw_common.skc_nulls_node and __tw_common.skc_refcnt, but:
- the timer has its own count in the refcount
- sk_nulls_for_each_rcu() is (on paper) safe to run concurrently with
tw_timer_handler
`\
inet_twsk_kill()
`\
sk_nulls_del_node_init_rcu()
So I'm thinking we could let the timer be armed after the *hashdance(), so
it wouldn't need to be pinned anymore, but that's pretty much a revert of
ed2e92394589 ("tcp/dccp: fix timewait races in timer handling")
which fixed a race.
Now this is the first time I poke my nose into this area and I can't
properly reason how said race is laid out. I'm sorry for asking about such
an old commit, but would you have any pointers on that?
Thanks
prev parent reply other threads:[~2023-10-03 14:09 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-06 11:58 Question on tw_timer TIMER_PINNED Juri Lelli
2023-09-06 12:10 ` Eric Dumazet
2023-09-06 13:04 ` Juri Lelli
2023-10-03 14:09 ` Valentin Schneider [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xhsmhwmw3ol0r.mognet@vschneid.remote.csb \
--to=vschneid@redhat.com \
--cc=edumazet@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).