All of lore.kernel.org
 help / color / mirror / Atom feed
From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>,
	Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org
Subject: Re: [PATCH 2/2] ipoib: Ratelimit messages which can flood a host.
Date: Wed, 03 Aug 2016 12:25:25 -0400	[thread overview]
Message-ID: <1470241525.18081.66.camel@redhat.com> (raw)
In-Reply-To: <57A2006B.5070304-6AxghH7DbtA@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 5432 bytes --]

On Wed, 2016-08-03 at 17:32 +0300, Nikolay Borisov wrote:
> 
> On 08/03/2016 05:26 PM, Doug Ledford wrote:
> > 
> > On Tue, 2016-07-26 at 21:00 +0300, Yuval Shaia wrote:
> > > 
> > > On Tue, Jul 26, 2016 at 05:38:49PM +0300, Nikolay Borisov wrote:
> > > > 
> > > > 
> > > > In case an infiniband outtage occurs the messages modified
> > > > in this patchset can flood the host with a rate of 1 message
> > > > per
> > > 
> > > s/patchset/patch
> > > 
> > > > 
> > > > 
> > > > ms which is a lot. Using the ratelimited version of ipoib_warn
> > > > fixes the issue by limiting at most 10 messages in 5 seconds.
> > > > 
> > > > Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
> > > > ---
> > > >  drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > > b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > > index 7d3281866ffc..dd5b4afbc00b 100644
> > > > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > > @@ -1034,9 +1034,9 @@ static void ipoib_timeout(struct
> > > > net_device
> > > > *dev)
> > > >  {
> > > >  	struct ipoib_dev_priv *priv = netdev_priv(dev);
> > > >  
> > > > -	ipoib_warn(priv, "transmit timeout: latency %d
> > > > msecs\n",
> > > > +	ipoib_warn_rl(priv, "transmit timeout: latency %d
> > > > msecs\n",
> > > >  		   jiffies_to_msecs(jiffies - dev-
> > > > >trans_start));
> > > > -	ipoib_warn(priv, "queue stopped %d, tx_head %u,
> > > > tx_tail
> > > > %u\n",
> > > > +	ipoib_warn_rl(priv, "queue stopped %d, tx_head %u,
> > > > tx_tail
> > > > %u\n",
> > > 
> > > So this is just to hide the hide the problem,
> > 
> > No, that's not true.  To hide the problem, he would have to remove
> > the
> > printk entirely.  Linus absolutely despises when programmers use
> > BUG_ON() for situations where it is possible for the system to
> > continue, albeit with reduced capability or what not.  Similarly,
> > it is
> > entirely unacceptable that a simple issue, such as a hardware queue
> > stoppage on a network device, can render a system mostly or
> > entirely or
> > even just somewhat unusable due to a flood of kernel printk
> > messages.
> > 
> > Just so we're clear, I absolutely despise printk floods.  There is
> > never a valid reason for them.  If someone in the kernel community
> > suggested that we make the default printk implementation be flood
> > proof, I would support it.  In the meantime, I'm all for these
> > patches.
> >  And if there are other places in the ipoib code that can cause
> > floods
> > too, I would like to see them switched over to use the rate limited
> > version too.
> > 
> > Now that I think about it, I'm just as likely to be receptive to a
> > patch that makes the default ipoib_warn() always be rate limited,
> > and
> > if there are exceptions in the ipoib stack where we truly want
> > every
> 
> The thing with this approach is that it's entirely possible to miss
> messages if there is one particular ratelimit state.
> 
> > 
> > message, then change just those places to a different
> > ipoib_warn_no_rate_limit() implementation.  I would probably modify
> > the
> > default rate though...10 in 5 seconds is probably too low for some
> > of
> > the messages, as we can legitimately get a burst of 50 in less than
> > a
> > second when doing things like joining lots of multicast groups and
> > there is an issue, or something like that.  Maybe 100 in 60 to
> > allow
> > for bursts but to still prevent runaway printks like you are
> > having?
> > 
> If you take a closer look at the definition of ipoib_warn_rl you
> would
> see that it's enclosed in it's own {} block, meaning its ratelimit
> state
> is going to apply to this particular place. In essence only the
> "transmit timeout: latency msec" message would be rate limited to 10
> messages in 5 seconds, which I believe is fine. In a flood situation
> you'd get 10 identical messages and then there will be a line saying
> how
> many were suppressed, indicating a flood.

I saw that.  You're missing my point.  I was referring to keeping the
individual {} block in the define of ipoib_warn() instead of creating a
new ipoib_warn_rl() define, but I was commenting that some of the
messages in the ipoib file really need a higher burst, and since even
though each of these messages would have their own rate limit state,
they still all share a common rate limit config, the config would need
to be bumped up for some of the spammier message spots in ipoib.

> 
> > 
> > > 
> > >  the real question is what is
> > > causing this to happen a lot.
> > > 
> > > > 
> > > > 
> > > >  		   netif_queue_stopped(dev),
> > > >  		   priv->tx_head, priv->tx_tail);
> > > >  	/* XXX reset QP, etc. */
> > > > -- 
> > > > 2.5.0
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe
> > > > linux-
> > > > rdma" in
> > > > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.h
> > > > tml
> > 

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  parent reply	other threads:[~2016-08-03 16:25 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-26 14:38 [PATCH 1/2] ipoib: Add ratelimited version of ipoib_warn Nikolay Borisov
     [not found] ` <1469543929-17659-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-07-26 14:38   ` [PATCH 2/2] ipoib: Ratelimit messages which can flood a host Nikolay Borisov
     [not found]     ` <1469543929-17659-2-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-07-26 18:00       ` Yuval Shaia
     [not found]         ` <20160726180021.GA6144-Hxa29pjIrETlQW142y8m19+IiqhCXseY@public.gmane.org>
2016-07-26 22:02           ` Nikolay Borisov
     [not found]             ` <CAJFSNy5RA_PVxz0oPdKamF7Kc+1LFDZ9P0LbQH6EHuvqJO18xQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-08-01  8:11               ` Erez Shitrit
2016-08-03 14:26           ` Doug Ledford
     [not found]             ` <1470234370.18081.63.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-08-03 14:32               ` Nikolay Borisov
     [not found]                 ` <57A2006B.5070304-6AxghH7DbtA@public.gmane.org>
2016-08-03 16:25                   ` Doug Ledford [this message]
2016-08-08 14:14               ` [PATCH] ipoib: Make ipoib_warn ratelimited Nikolay Borisov
     [not found]                 ` <1470665662-24028-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-08-25 14:24                   ` Doug Ledford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1470241525.18081.66.camel@redhat.com \
    --to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org \
    --cc=kernel-6AxghH7DbtA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.