From: Cong Wang <xiyou.wangcong@gmail.com>
To: Petr Machata <petrm@mellanox.com>
Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>,
David Miller <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Eric Dumazet <eric.dumazet@gmail.com>,
Jiri Pirko <jiri@mellanox.com>,
Ido Schimmel <idosch@mellanox.com>
Subject: Re: [PATCH net-next v1 2/5] net: sched: Introduce helpers for qevent blocks
Date: Mon, 13 Jul 2020 19:51:41 -0700 [thread overview]
Message-ID: <CAM_iQpUi-aKBLF5MkkSkCBchHeK5a_8OEDw3eXHZ4yPo=_hvsQ@mail.gmail.com> (raw)
In-Reply-To: <875zavh1re.fsf@mellanox.com>
On Fri, Jul 10, 2020 at 7:40 AM Petr Machata <petrm@mellanox.com> wrote:
>
>
> Cong Wang <xiyou.wangcong@gmail.com> writes:
>
> > On Wed, Jul 8, 2020 at 5:13 PM Petr Machata <petrm@mellanox.com> wrote:
> >>
> >>
> >> Petr Machata <petrm@mellanox.com> writes:
> >>
> >> > Cong Wang <xiyou.wangcong@gmail.com> writes:
> >> >
> >> > I'll think about it some more. For now I will at least fix the lack of
> >> > locking.
> >>
> >> I guess I could store smp_processor_id() that acquired the lock in
> >> struct qdisc_skb_head. Do a trylock instead of lock, and on fail check
> >> the stored value. I'll need to be careful about the race between
> >> unsuccessful trylock and the test, and about making sure CPU ID doesn't
> >> change after it is read. I'll probe this tomorrow.
> >
> > Like __netif_tx_lock(), right? Seems doable.
>
> Good to see it actually used, I wasn't sure if the idea made sense :)
>
> Unfortunately it is not enough.
>
> Consider two threads (A, B) and two netdevices (eth0, eth1):
>
> - "A" takes eth0's root lock and proceeds to classification
> - "B" takes eth1's root lock and proceeds to classification
> - "A" invokes mirror to eth1, waits on lock held by "B"
> - "B" invakes mirror to eth0, waits on lock held by "A"
> - Some say they are still waiting to this day.
Sure, AA or ABBA deadlock.
>
> So one option that I see is to just stash the mirrored packet in a queue
> instead of delivering it right away:
>
> - s/netif_receive_skb/netif_rx/ in act_mirred
>
> - Reuse the RX queue for TX packets as well, differentiating the two by
> a bit in SKB CB. Then process_backlog() would call either
> __netif_receive_skb() or dev_queue_transmit().
>
> - Drop mirred_rec_level guard.
I don't think I follow you, the root qdisc lock is on egress which has
nothing to do with ingress, so I don't see how netif_rx() is even involved.
>
> This seems to work, but I might be missing something non-obvious, such
> as CB actually being used for something already in that context. I would
> really rather not introduce a second backlog queue just for mirred
> though.
>
> Since mirred_rec_level does not kick in anymore, the same packet can end
> up being forwarded from the backlog queue, to the qdisc, and back to the
> backlog queue, forever. But that seems OK, that's what the admin
> configured, so that's what's happening.
>
> If this is not a good idea for some reason, this might work as well:
>
> - Convert the current root lock to an rw lock. Convert all current
> lockers to write lock (which should be safe), except of enqueue, which
> will take read lock. That will allow many concurrent threads to enter
> enqueue, or one thread several times, but it will exclude all other
> users.
Are you sure we can parallelize enqueue()? They all need to move
skb into some queue, which is not able to parallelize with just a read
lock. Even the "lockless" qdisc takes a spinlock, r->producer_lock,
for enqueue().
>
> So this guards configuration access to the qdisc tree, makes sure
> qdiscs don't go away from under one's feet.
>
> - Introduce another spin lock to guard the private data of the qdisc
> tree, counters etc., things that even two concurrent enqueue
> operations shouldn't trample on. Enqueue takes this spin lock after
> read-locking the root lock. act_mirred drops it before injecting the
> packet and takes it again afterwards.
>
> Any opinions y'all?
I thought about forbidding mirror/redirecting to the same device,
but there might be some legitimate use cases of such. So, I don't
have any other ideas yet, perhaps there is some way to refactor
dev_queue_xmit() to avoid this deadlock.
Thanks.
next prev parent reply other threads:[~2020-07-14 2:51 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-26 22:45 [PATCH net-next v1 0/5] TC: Introduce qevents Petr Machata
2020-06-26 22:45 ` [PATCH net-next v1 1/5] net: sched: Pass root lock to Qdisc_ops.enqueue Petr Machata
2020-07-06 19:21 ` Cong Wang
2020-07-07 15:25 ` Petr Machata
2020-07-07 19:41 ` Cong Wang
2020-06-26 22:45 ` [PATCH net-next v1 2/5] net: sched: Introduce helpers for qevent blocks Petr Machata
2020-07-06 19:48 ` Cong Wang
2020-07-07 15:22 ` Petr Machata
2020-07-07 19:13 ` Cong Wang
2020-07-08 12:35 ` Petr Machata
2020-07-08 16:21 ` Petr Machata
2020-07-08 19:09 ` Cong Wang
2020-07-08 19:04 ` Cong Wang
2020-07-08 21:04 ` Petr Machata
2020-07-09 0:13 ` Petr Machata
2020-07-09 19:37 ` Cong Wang
2020-07-10 14:40 ` Petr Machata
2020-07-14 2:51 ` Cong Wang [this message]
2020-07-14 9:12 ` Petr Machata
2020-07-07 19:48 ` Cong Wang
2020-07-08 9:19 ` Petr Machata
2020-06-26 22:45 ` [PATCH net-next v1 3/5] net: sched: sch_red: Split init and change callbacks Petr Machata
2020-06-26 22:45 ` [PATCH net-next v1 4/5] net: sched: sch_red: Add qevents "early_drop" and "mark" Petr Machata
2020-06-26 22:45 ` [PATCH net-next v1 5/5] selftests: forwarding: Add a RED test for SW datapath Petr Machata
2020-06-26 22:45 ` [PATCH iproute2-next v1 1/4] uapi: pkt_sched: Add two new RED attributes Petr Machata
2020-06-26 22:45 ` [PATCH iproute2-next v1 2/4] tc: Add helpers to support qevent handling Petr Machata
2020-06-26 22:45 ` [PATCH iproute2-next v1 3/4] man: tc: Describe qevents Petr Machata
2020-06-26 22:45 ` [PATCH iproute2-next v1 4/4] tc: q_red: Add support for qevents "mark" and "early_drop" Petr Machata
2020-06-26 22:56 ` [PATCH net-next v1 0/5] TC: Introduce qevents Stephen Hemminger
2020-06-29 13:21 ` Petr Machata
2020-06-30 0:15 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAM_iQpUi-aKBLF5MkkSkCBchHeK5a_8OEDw3eXHZ4yPo=_hvsQ@mail.gmail.com' \
--to=xiyou.wangcong@gmail.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=idosch@mellanox.com \
--cc=jiri@mellanox.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=petrm@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).