From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org
Subject: Re: [PATCH 2/2] ipoib: Ratelimit messages which can flood a host.
Date: Wed, 03 Aug 2016 10:26:10 -0400 [thread overview]
Message-ID: <1470234370.18081.63.camel@redhat.com> (raw)
In-Reply-To: <20160726180021.GA6144-Hxa29pjIrETlQW142y8m19+IiqhCXseY@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 3658 bytes --]
On Tue, 2016-07-26 at 21:00 +0300, Yuval Shaia wrote:
> On Tue, Jul 26, 2016 at 05:38:49PM +0300, Nikolay Borisov wrote:
> >
> > In case an infiniband outtage occurs the messages modified
> > in this patchset can flood the host with a rate of 1 message per
>
> s/patchset/patch
>
> >
> > ms which is a lot. Using the ratelimited version of ipoib_warn
> > fixes the issue by limiting at most 10 messages in 5 seconds.
> >
> > Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
> > ---
> > drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > index 7d3281866ffc..dd5b4afbc00b 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > @@ -1034,9 +1034,9 @@ static void ipoib_timeout(struct net_device
> > *dev)
> > {
> > struct ipoib_dev_priv *priv = netdev_priv(dev);
> >
> > - ipoib_warn(priv, "transmit timeout: latency %d msecs\n",
> > + ipoib_warn_rl(priv, "transmit timeout: latency %d
> > msecs\n",
> > jiffies_to_msecs(jiffies - dev->trans_start));
> > - ipoib_warn(priv, "queue stopped %d, tx_head %u, tx_tail
> > %u\n",
> > + ipoib_warn_rl(priv, "queue stopped %d, tx_head %u, tx_tail
> > %u\n",
>
> So this is just to hide the hide the problem,
No, that's not true. To hide the problem, he would have to remove the
printk entirely. Linus absolutely despises when programmers use
BUG_ON() for situations where it is possible for the system to
continue, albeit with reduced capability or what not. Similarly, it is
entirely unacceptable that a simple issue, such as a hardware queue
stoppage on a network device, can render a system mostly or entirely or
even just somewhat unusable due to a flood of kernel printk messages.
Just so we're clear, I absolutely despise printk floods. There is
never a valid reason for them. If someone in the kernel community
suggested that we make the default printk implementation be flood
proof, I would support it. In the meantime, I'm all for these patches.
And if there are other places in the ipoib code that can cause floods
too, I would like to see them switched over to use the rate limited
version too.
Now that I think about it, I'm just as likely to be receptive to a
patch that makes the default ipoib_warn() always be rate limited, and
if there are exceptions in the ipoib stack where we truly want every
message, then change just those places to a different
ipoib_warn_no_rate_limit() implementation. I would probably modify the
default rate though...10 in 5 seconds is probably too low for some of
the messages, as we can legitimately get a burst of 50 in less than a
second when doing things like joining lots of multicast groups and
there is an issue, or something like that. Maybe 100 in 60 to allow
for bursts but to still prevent runaway printks like you are having?
> the real question is what is
> causing this to happen a lot.
>
> >
> > netif_queue_stopped(dev),
> > priv->tx_head, priv->tx_tail);
> > /* XXX reset QP, etc. */
> > --
> > 2.5.0
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-
> > rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: 0E572FDD
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2016-08-03 14:26 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-26 14:38 [PATCH 1/2] ipoib: Add ratelimited version of ipoib_warn Nikolay Borisov
[not found] ` <1469543929-17659-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-07-26 14:38 ` [PATCH 2/2] ipoib: Ratelimit messages which can flood a host Nikolay Borisov
[not found] ` <1469543929-17659-2-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-07-26 18:00 ` Yuval Shaia
[not found] ` <20160726180021.GA6144-Hxa29pjIrETlQW142y8m19+IiqhCXseY@public.gmane.org>
2016-07-26 22:02 ` Nikolay Borisov
[not found] ` <CAJFSNy5RA_PVxz0oPdKamF7Kc+1LFDZ9P0LbQH6EHuvqJO18xQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-08-01 8:11 ` Erez Shitrit
2016-08-03 14:26 ` Doug Ledford [this message]
[not found] ` <1470234370.18081.63.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-08-03 14:32 ` Nikolay Borisov
[not found] ` <57A2006B.5070304-6AxghH7DbtA@public.gmane.org>
2016-08-03 16:25 ` Doug Ledford
2016-08-08 14:14 ` [PATCH] ipoib: Make ipoib_warn ratelimited Nikolay Borisov
[not found] ` <1470665662-24028-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-08-25 14:24 ` Doug Ledford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1470234370.18081.63.camel@redhat.com \
--to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org \
--cc=kernel-6AxghH7DbtA@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).