netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Valentin Schneider <vschneid@redhat.com>
To: Eric Dumazet <edumazet@google.com>, Juri Lelli <juri.lelli@redhat.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Paolo Abeni <pabeni@redhat.com>, netdev <netdev@vger.kernel.org>
Subject: Re: Question on tw_timer TIMER_PINNED
Date: Tue, 03 Oct 2023 16:09:24 +0200	[thread overview]
Message-ID: <xhsmhwmw3ol0r.mognet@vschneid.remote.csb> (raw)
In-Reply-To: <CANn89iJFyqckr3x=nwbExs3B1u=MXv9izL=2ByxOf20su2fhhg@mail.gmail.com>

Hi,

On 06/09/23 14:10, Eric Dumazet wrote:
> On Wed, Sep 6, 2023 at 1:58 PM Juri Lelli <juri.lelli@redhat.com> wrote:
>>
>> Hi Eric,
>>
>> I'm bothering you with a question about timewait_sock tw_timer, as I
>> believe you are one of the last persons touching it sometime ago. Please
>> feel free to redirect if I failed to git blame it correctly.
>>
>> At my end, latency spikes (entering the kernel) have been reported when
>> running latency sensitive applications in the field (essentially a
>> polling userspace application that doesn't want any interruption at
>> all). I think I've been able to track down one of such interruptions to
>> the servicing of tw_timer_handler. This system isolates application CPUs
>> dynamically, so what I think it happens is that at some point tw_timer
>> is armed on a CPU, and it is PINNED to that CPU, meanwhile (before the
>> 60s timeout) such CPU is 'isolated' and the latency sensitive app
>> started on it. After 60s the timer fires and interrupts the app
>> generating a spike.
>>
>> I'm not very familiar with this part of the kernel and from staring
>> at code for a while I had mixed feeling about the need to keep tw_timer
>> as TIMER_PINNED. Could you please shed some light on it? Is it a strict
>> functional requirement or maybe a nice to have performance (locality I'd
>> guess) improvement? Could we in principle make it !PINNED (so that it
>> can be moved/queued away and prevent interruptions)?
>>
>
> It is a functional requirement in current implementation.
>
> cfac7f836a71 ("tcp/dccp: block bh before arming time_wait timer")
> changelog has some details about it.
>
> Can this be changed to non pinned ? Probably, but with some care.
>
> You could simply disable tw completely, it is a best effort mechanism.
>

So it's looking like doing that is not acceptable for our use-case, as
we still want timewait sockets for the traffic happening on the
housekepeing (non-isolated) CPUs.


I had a look at these commits to figure out what it would take to make it
not pinned:

  cfac7f836a71 ("tcp/dccp: block bh before arming time_wait timer")
  ed2e92394589 ("tcp/dccp: fix timewait races in timer handling")

and I'm struggling to understand why we want the timer to be armed before
inet_twsk_hashdance(). I found this discussion on LKML:

  https://lore.kernel.org/all/56941035.9040000@fastly.com/

And I can see that __inet_lookup_established() and tw_timer_handler()
both operate on __tw_common.skc_nulls_node and __tw_common.skc_refcnt, but:
- the timer has its own count in the refcount
- sk_nulls_for_each_rcu() is (on paper) safe to run concurrently with
  tw_timer_handler
  `\
    inet_twsk_kill()
    `\
      sk_nulls_del_node_init_rcu()

So I'm thinking we could let the timer be armed after the *hashdance(), so
it wouldn't need to be pinned anymore, but that's pretty much a revert of
  ed2e92394589 ("tcp/dccp: fix timewait races in timer handling")
which fixed a race.

Now this is the first time I poke my nose into this area and I can't
properly reason how said race is laid out. I'm sorry for asking about such
an old commit, but would you have any pointers on that?

Thanks


      parent reply	other threads:[~2023-10-03 14:09 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-06 11:58 Question on tw_timer TIMER_PINNED Juri Lelli
2023-09-06 12:10 ` Eric Dumazet
2023-09-06 13:04   ` Juri Lelli
2023-10-03 14:09   ` Valentin Schneider [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xhsmhwmw3ol0r.mognet@vschneid.remote.csb \
    --to=vschneid@redhat.com \
    --cc=edumazet@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).