netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cong Wang <xiyou.wangcong@gmail.com>
To: Petr Machata <petrm@mellanox.com>
Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Jiri Pirko <jiri@mellanox.com>,
	Ido Schimmel <idosch@mellanox.com>
Subject: Re: [PATCH net-next v1 2/5] net: sched: Introduce helpers for qevent blocks
Date: Mon, 13 Jul 2020 19:51:41 -0700	[thread overview]
Message-ID: <CAM_iQpUi-aKBLF5MkkSkCBchHeK5a_8OEDw3eXHZ4yPo=_hvsQ@mail.gmail.com> (raw)
In-Reply-To: <875zavh1re.fsf@mellanox.com>

On Fri, Jul 10, 2020 at 7:40 AM Petr Machata <petrm@mellanox.com> wrote:
>
>
> Cong Wang <xiyou.wangcong@gmail.com> writes:
>
> > On Wed, Jul 8, 2020 at 5:13 PM Petr Machata <petrm@mellanox.com> wrote:
> >>
> >>
> >> Petr Machata <petrm@mellanox.com> writes:
> >>
> >> > Cong Wang <xiyou.wangcong@gmail.com> writes:
> >> >
> >> > I'll think about it some more. For now I will at least fix the lack of
> >> > locking.
> >>
> >> I guess I could store smp_processor_id() that acquired the lock in
> >> struct qdisc_skb_head. Do a trylock instead of lock, and on fail check
> >> the stored value. I'll need to be careful about the race between
> >> unsuccessful trylock and the test, and about making sure CPU ID doesn't
> >> change after it is read. I'll probe this tomorrow.
> >
> > Like __netif_tx_lock(), right? Seems doable.
>
> Good to see it actually used, I wasn't sure if the idea made sense :)
>
> Unfortunately it is not enough.
>
> Consider two threads (A, B) and two netdevices (eth0, eth1):
>
> - "A" takes eth0's root lock and proceeds to classification
> - "B" takes eth1's root lock and proceeds to classification
> - "A" invokes mirror to eth1, waits on lock held by "B"
> - "B" invakes mirror to eth0, waits on lock held by "A"
> - Some say they are still waiting to this day.

Sure, AA or ABBA deadlock.

>
> So one option that I see is to just stash the mirrored packet in a queue
> instead of delivering it right away:
>
> - s/netif_receive_skb/netif_rx/ in act_mirred
>
> - Reuse the RX queue for TX packets as well, differentiating the two by
>   a bit in SKB CB. Then process_backlog() would call either
>   __netif_receive_skb() or dev_queue_transmit().
>
> - Drop mirred_rec_level guard.

I don't think I follow you, the root qdisc lock is on egress which has
nothing to do with ingress, so I don't see how netif_rx() is even involved.

>
> This seems to work, but I might be missing something non-obvious, such
> as CB actually being used for something already in that context. I would
> really rather not introduce a second backlog queue just for mirred
> though.
>
> Since mirred_rec_level does not kick in anymore, the same packet can end
> up being forwarded from the backlog queue, to the qdisc, and back to the
> backlog queue, forever. But that seems OK, that's what the admin
> configured, so that's what's happening.
>
> If this is not a good idea for some reason, this might work as well:
>
> - Convert the current root lock to an rw lock. Convert all current
>   lockers to write lock (which should be safe), except of enqueue, which
>   will take read lock. That will allow many concurrent threads to enter
>   enqueue, or one thread several times, but it will exclude all other
>   users.

Are you sure we can parallelize enqueue()? They all need to move
skb into some queue, which is not able to parallelize with just a read
lock. Even the "lockless" qdisc takes a spinlock, r->producer_lock,
for enqueue().


>
>   So this guards configuration access to the qdisc tree, makes sure
>   qdiscs don't go away from under one's feet.
>
> - Introduce another spin lock to guard the private data of the qdisc
>   tree, counters etc., things that even two concurrent enqueue
>   operations shouldn't trample on. Enqueue takes this spin lock after
>   read-locking the root lock. act_mirred drops it before injecting the
>   packet and takes it again afterwards.
>
> Any opinions y'all?

I thought about forbidding mirror/redirecting to the same device,
but there might be some legitimate use cases of such. So, I don't
have any other ideas yet, perhaps there is some way to refactor
dev_queue_xmit() to avoid this deadlock.

Thanks.

  reply	other threads:[~2020-07-14  2:51 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-26 22:45 [PATCH net-next v1 0/5] TC: Introduce qevents Petr Machata
2020-06-26 22:45 ` [PATCH net-next v1 1/5] net: sched: Pass root lock to Qdisc_ops.enqueue Petr Machata
2020-07-06 19:21   ` Cong Wang
2020-07-07 15:25     ` Petr Machata
2020-07-07 19:41       ` Cong Wang
2020-06-26 22:45 ` [PATCH net-next v1 2/5] net: sched: Introduce helpers for qevent blocks Petr Machata
2020-07-06 19:48   ` Cong Wang
2020-07-07 15:22     ` Petr Machata
2020-07-07 19:13       ` Cong Wang
2020-07-08 12:35         ` Petr Machata
2020-07-08 16:21           ` Petr Machata
2020-07-08 19:09             ` Cong Wang
2020-07-08 19:04           ` Cong Wang
2020-07-08 21:04             ` Petr Machata
2020-07-09  0:13               ` Petr Machata
2020-07-09 19:37                 ` Cong Wang
2020-07-10 14:40                   ` Petr Machata
2020-07-14  2:51                     ` Cong Wang [this message]
2020-07-14  9:12                       ` Petr Machata
2020-07-07 19:48   ` Cong Wang
2020-07-08  9:19     ` Petr Machata
2020-06-26 22:45 ` [PATCH net-next v1 3/5] net: sched: sch_red: Split init and change callbacks Petr Machata
2020-06-26 22:45 ` [PATCH net-next v1 4/5] net: sched: sch_red: Add qevents "early_drop" and "mark" Petr Machata
2020-06-26 22:45 ` [PATCH net-next v1 5/5] selftests: forwarding: Add a RED test for SW datapath Petr Machata
2020-06-26 22:45 ` [PATCH iproute2-next v1 1/4] uapi: pkt_sched: Add two new RED attributes Petr Machata
2020-06-26 22:45   ` [PATCH iproute2-next v1 2/4] tc: Add helpers to support qevent handling Petr Machata
2020-06-26 22:45   ` [PATCH iproute2-next v1 3/4] man: tc: Describe qevents Petr Machata
2020-06-26 22:45   ` [PATCH iproute2-next v1 4/4] tc: q_red: Add support for qevents "mark" and "early_drop" Petr Machata
2020-06-26 22:56 ` [PATCH net-next v1 0/5] TC: Introduce qevents Stephen Hemminger
2020-06-29 13:21   ` Petr Machata
2020-06-30  0:15 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM_iQpUi-aKBLF5MkkSkCBchHeK5a_8OEDw3eXHZ4yPo=_hvsQ@mail.gmail.com' \
    --to=xiyou.wangcong@gmail.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=idosch@mellanox.com \
    --cc=jiri@mellanox.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=petrm@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).