All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: Florian Westphal <fw@strlen.de>
Cc: Eric Dumazet <edumazet@google.com>,
	netdev@vger.kernel.org, Paolo Abeni <pabeni@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	mleitner@redhat.com, juri.lelli@redhat.com, vschneid@redhat.com,
	tglozar@redhat.com, dsahern@kernel.org, bigeasy@linutronix.de,
	tglx@linutronix.de
Subject: Re: [PATCH net-next v6 1/3] net: tcp/dcpp: prepare for tw_timer un-pinning
Date: Mon, 3 Jun 2024 15:31:58 +0200	[thread overview]
Message-ID: <20240603133158.GC8496@breakpoint.cc> (raw)
In-Reply-To: <20240603112152.GB8496@breakpoint.cc>

Florian Westphal <fw@strlen.de> wrote:
> Eric Dumazet <edumazet@google.com> wrote:
> > On Mon, Jun 3, 2024 at 11:37 AM Florian Westphal <fw@strlen.de> wrote:
> > > +       spin_lock(lock);
> > > +       if (timer_shutdown(&tw->tw_timer)) {
> > > +               /* releases @lock */
> > > +               __inet_twsk_kill(tw, lock);
> > > +       } else {
> > 
> > If we do not have a sync variant here, I think that inet_twsk_purge()
> > could return while ongoing timers are alive.
> 
> Yes.
> 
> We can't use sync variant, it would deadlock on ehash spinlock.
> 
> > tcp_sk_exit_batch() would then possibly hit :
> > 
> > WARN_ON_ONCE(!refcount_dec_and_test(&net->ipv4.tcp_death_row.tw_refcount));
> > 
> > The alive timer are releasing tw->tw_dr->tw_refcount at the end of
> > inet_twsk_kill()
> 
> Theoretically the tw socket can be unlinked from the tw hash already
> (inet_twsk_purge won't encounter it), but timer is still running.
> 
> Only solution I see is to schedule() in tcp_sk_exit_batch() until
> tw_refcount has dropped to the expected value, i.e. something like
> 
> static void tcp_wait_for_tw_timers(struct net *n)
> {
> 	while (refcount_read(&n->ipv4.tcp_death_row.tw_refcount) > 1))
> 		schedule();
> }
> 
> Any better idea?

Actually, I think we can solve this in a much simpler way.

Instead of replacing:

void inet_twsk_deschedule_put(struct inet_timewait_sock *tw)
{
 if (del_timer_sync(&tw->tw_timer))
    inet_twsk_kill(tw);
 inet_twsk_put(tw);
}

With:
 spinlock_t *lock = inet_ehash_lockp(hashinfo, tw->tw_hash);
 spin_lock(lock);
 if (timer_shutdown(&tw->tw_timer)) {

(Which gets us into the tcp_sk_exit_batch trouble Eric points out),
we can simply add "empty" ehash lock unlock pair before calling
del_timer_sync():

void inet_twsk_deschedule_put(struct inet_timewait_sock *tw)
{
+	spinlock_t *lock = inet_ehash_lockp(hashinfo, tw->tw_hash);
+	spin_lock(lock)
+	spin_unlock(lock)

        if (del_timer_sync(&tw->tw_timer))
                inet_twsk_kill(tw);
        inet_twsk_put(tw);
}

Rationale:
inet_twsk_deschedule_put() cannot be called before hashdance_schedule
calls refcount_set(&tw->tw_refcnt, 3).

Before this any refcount_inc_not_zero fails so we never get into
deschedule_put.

Hashdance_schedule holds the ehash lock when it sets the tw refcount.
The lock is released only after the timer is up and running.

When inet_twsk_deschedule_put() is called, and hashdance_schedule
is not yet done, the spinlock/unlock pair will guarantee that
the timer is up after the spin_unlock.

I think this is much better than the schedule loop waiting for tw_dr
refcount to drop, it mainly needs a comment to explain what this is
doing.

Thoughts?

  parent reply	other threads:[~2024-06-03 13:32 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-03  9:36 [PATCH net-next v6 0/3] net: tcp: un-pin tw timer Florian Westphal
2024-06-03  9:36 ` [PATCH net-next v6 1/3] net: tcp/dcpp: prepare for tw_timer un-pinning Florian Westphal
2024-06-03 10:30   ` Eric Dumazet
2024-06-03 11:21     ` Florian Westphal
2024-06-03 11:53       ` Eric Dumazet
2024-06-03 12:10       ` Sebastian Andrzej Siewior
2024-06-03 13:31       ` Florian Westphal [this message]
2024-06-03 13:50         ` Eric Dumazet
2024-06-03 10:53   ` shaozhengchao
2024-06-03  9:36 ` [PATCH net-next v6 2/3] net: tcp: un-pin the tw_timer Florian Westphal
2024-06-03  9:36 ` [PATCH net-next v6 3/3] tcp: move inet_twsk_schedule helper out of header Florian Westphal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240603133158.GC8496@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=bigeasy@linutronix.de \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kuba@kernel.org \
    --cc=mleitner@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=tglozar@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.