From: Brenden Blanco <bblanco@plumgrid.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
Tariq Toukan <tariqt@mellanox.com>,
Tom Herbert <tom@herbertland.com>,
Saeed Mahameed <saeedm@mellanox.com>,
Rana Shahout <rana.shahot@gmail.com>,
Eran Ben Elisha <eranbe@mellanox.com>
Subject: Re: XDP_TX bug report on mlx4
Date: Sun, 18 Sep 2016 19:59:43 -0400 [thread overview]
Message-ID: <20160918235943.GA18685@gmail.com> (raw)
In-Reply-To: <20160916212443.0eddbeb5@redhat.com>
On Fri, Sep 16, 2016 at 09:24:43PM +0200, Jesper Dangaard Brouer wrote:
> On Fri, 16 Sep 2016 12:17:27 -0700
> Brenden Blanco <bblanco@plumgrid.com> wrote:
>
> > On Fri, Sep 16, 2016 at 09:03:40PM +0200, Jesper Dangaard Brouer wrote:
> > > Hi Brenden,
> > >
> > > I've discovered a bug with XDP_TX recycling of pages in the mlx4 driver.
> > >
> > > If I increase the number of RX and TX queues/channels via ethtool cmd:
> > > ethtool -L mlx4p1 rx 10 tx 10
> > >
> > > Then when running the xdp2 program, which does XDP_TX, the kernel will
> > > crash with page errors, because the page refcnt goes to zero or even
> > > minus. I've noticed pages delivered to mlx4_en_rx_recycle() can have
> > > a page refcnt of zero, which is wrong, they should always have 1 (for
> > > XDP).
> > >
> > > Debugging it further, I find that this can happen when mlx4_en_rx_recycle()
> > > is called from mlx4_en_recycle_tx_desc(). This is the TX cleanup function,
> > > associated with TX ring queues used for XDP_TX only. No others than the
> > > XDP_TX action should be able to place packets into these TX rings
> > > which call mlx4_en_recycle_tx_desc().
> >
> > Sounds pretty straightforward, let me look into it.
>
> Here is some debug info I instrumented my kernel with, and I've
> attached my minicom output with a warning and a panic.
Thanks for the info.
I've also spent this weekend trying to debug (pretty easy to reproduce),
but with no conclusive answer. I was investigating the sequence in
mlx4_en_stop_port to see if rx might still be running through the
function, on the theory that the xdp tx might race with
mlx4_en_free_tx_buf. For instance, I tried moving the napi_synchronize
loop to be just below the msleep(10). No improvement.
Unfortunately, I'm out of options, since my one test box has decided not
to reboot itself automatically, and I'll be out of email communication
(for 3 weeks) before anybody can physicially resuscitate it (tomorrow).
>
> Enable some driver debug printks via::
> ethtool -s mlx4p1 msglvl drv on
>
> Debug normal situation::
>
[...]
next prev parent reply other threads:[~2016-09-18 23:59 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-16 19:03 XDP_TX bug report on mlx4 Jesper Dangaard Brouer
2016-09-16 19:17 ` Brenden Blanco
2016-09-16 19:24 ` Jesper Dangaard Brouer
2016-09-18 23:59 ` Brenden Blanco [this message]
2016-10-13 19:46 ` Brenden Blanco
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160918235943.GA18685@gmail.com \
--to=bblanco@plumgrid.com \
--cc=brouer@redhat.com \
--cc=eranbe@mellanox.com \
--cc=netdev@vger.kernel.org \
--cc=rana.shahot@gmail.com \
--cc=saeedm@mellanox.com \
--cc=tariqt@mellanox.com \
--cc=tom@herbertland.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).