From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>,
Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org
Subject: Re: [PATCH 2/2] ipoib: Ratelimit messages which can flood a host.
Date: Wed, 03 Aug 2016 12:25:25 -0400 [thread overview]
Message-ID: <1470241525.18081.66.camel@redhat.com> (raw)
In-Reply-To: <57A2006B.5070304-6AxghH7DbtA@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 5432 bytes --]
On Wed, 2016-08-03 at 17:32 +0300, Nikolay Borisov wrote:
>
> On 08/03/2016 05:26 PM, Doug Ledford wrote:
> >
> > On Tue, 2016-07-26 at 21:00 +0300, Yuval Shaia wrote:
> > >
> > > On Tue, Jul 26, 2016 at 05:38:49PM +0300, Nikolay Borisov wrote:
> > > >
> > > >
> > > > In case an infiniband outtage occurs the messages modified
> > > > in this patchset can flood the host with a rate of 1 message
> > > > per
> > >
> > > s/patchset/patch
> > >
> > > >
> > > >
> > > > ms which is a lot. Using the ratelimited version of ipoib_warn
> > > > fixes the issue by limiting at most 10 messages in 5 seconds.
> > > >
> > > > Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
> > > > ---
> > > > drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 ++--
> > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > > b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > > index 7d3281866ffc..dd5b4afbc00b 100644
> > > > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > > @@ -1034,9 +1034,9 @@ static void ipoib_timeout(struct
> > > > net_device
> > > > *dev)
> > > > {
> > > > struct ipoib_dev_priv *priv = netdev_priv(dev);
> > > >
> > > > - ipoib_warn(priv, "transmit timeout: latency %d
> > > > msecs\n",
> > > > + ipoib_warn_rl(priv, "transmit timeout: latency %d
> > > > msecs\n",
> > > > jiffies_to_msecs(jiffies - dev-
> > > > >trans_start));
> > > > - ipoib_warn(priv, "queue stopped %d, tx_head %u,
> > > > tx_tail
> > > > %u\n",
> > > > + ipoib_warn_rl(priv, "queue stopped %d, tx_head %u,
> > > > tx_tail
> > > > %u\n",
> > >
> > > So this is just to hide the hide the problem,
> >
> > No, that's not true. To hide the problem, he would have to remove
> > the
> > printk entirely. Linus absolutely despises when programmers use
> > BUG_ON() for situations where it is possible for the system to
> > continue, albeit with reduced capability or what not. Similarly,
> > it is
> > entirely unacceptable that a simple issue, such as a hardware queue
> > stoppage on a network device, can render a system mostly or
> > entirely or
> > even just somewhat unusable due to a flood of kernel printk
> > messages.
> >
> > Just so we're clear, I absolutely despise printk floods. There is
> > never a valid reason for them. If someone in the kernel community
> > suggested that we make the default printk implementation be flood
> > proof, I would support it. In the meantime, I'm all for these
> > patches.
> > And if there are other places in the ipoib code that can cause
> > floods
> > too, I would like to see them switched over to use the rate limited
> > version too.
> >
> > Now that I think about it, I'm just as likely to be receptive to a
> > patch that makes the default ipoib_warn() always be rate limited,
> > and
> > if there are exceptions in the ipoib stack where we truly want
> > every
>
> The thing with this approach is that it's entirely possible to miss
> messages if there is one particular ratelimit state.
>
> >
> > message, then change just those places to a different
> > ipoib_warn_no_rate_limit() implementation. I would probably modify
> > the
> > default rate though...10 in 5 seconds is probably too low for some
> > of
> > the messages, as we can legitimately get a burst of 50 in less than
> > a
> > second when doing things like joining lots of multicast groups and
> > there is an issue, or something like that. Maybe 100 in 60 to
> > allow
> > for bursts but to still prevent runaway printks like you are
> > having?
> >
> If you take a closer look at the definition of ipoib_warn_rl you
> would
> see that it's enclosed in it's own {} block, meaning its ratelimit
> state
> is going to apply to this particular place. In essence only the
> "transmit timeout: latency msec" message would be rate limited to 10
> messages in 5 seconds, which I believe is fine. In a flood situation
> you'd get 10 identical messages and then there will be a line saying
> how
> many were suppressed, indicating a flood.
I saw that. You're missing my point. I was referring to keeping the
individual {} block in the define of ipoib_warn() instead of creating a
new ipoib_warn_rl() define, but I was commenting that some of the
messages in the ipoib file really need a higher burst, and since even
though each of these messages would have their own rate limit state,
they still all share a common rate limit config, the config would need
to be bumped up for some of the spammier message spots in ipoib.
>
> >
> > >
> > > the real question is what is
> > > causing this to happen a lot.
> > >
> > > >
> > > >
> > > > netif_queue_stopped(dev),
> > > > priv->tx_head, priv->tx_tail);
> > > > /* XXX reset QP, etc. */
> > > > --
> > > > 2.5.0
> > > >
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe
> > > > linux-
> > > > rdma" in
> > > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.h
> > > > tml
> >
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: 0E572FDD
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2016-08-03 16:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-26 14:38 [PATCH 1/2] ipoib: Add ratelimited version of ipoib_warn Nikolay Borisov
[not found] ` <1469543929-17659-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-07-26 14:38 ` [PATCH 2/2] ipoib: Ratelimit messages which can flood a host Nikolay Borisov
[not found] ` <1469543929-17659-2-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-07-26 18:00 ` Yuval Shaia
[not found] ` <20160726180021.GA6144-Hxa29pjIrETlQW142y8m19+IiqhCXseY@public.gmane.org>
2016-07-26 22:02 ` Nikolay Borisov
[not found] ` <CAJFSNy5RA_PVxz0oPdKamF7Kc+1LFDZ9P0LbQH6EHuvqJO18xQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-08-01 8:11 ` Erez Shitrit
2016-08-03 14:26 ` Doug Ledford
[not found] ` <1470234370.18081.63.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-08-03 14:32 ` Nikolay Borisov
[not found] ` <57A2006B.5070304-6AxghH7DbtA@public.gmane.org>
2016-08-03 16:25 ` Doug Ledford [this message]
2016-08-08 14:14 ` [PATCH] ipoib: Make ipoib_warn ratelimited Nikolay Borisov
[not found] ` <1470665662-24028-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-08-25 14:24 ` Doug Ledford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1470241525.18081.66.camel@redhat.com \
--to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org \
--cc=kernel-6AxghH7DbtA@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).