All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
To: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org
Subject: Re: [PATCH 2/2] ipoib: Ratelimit messages which can flood a host.
Date: Wed, 3 Aug 2016 17:32:11 +0300	[thread overview]
Message-ID: <57A2006B.5070304@kyup.com> (raw)
In-Reply-To: <1470234370.18081.63.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>



On 08/03/2016 05:26 PM, Doug Ledford wrote:
> On Tue, 2016-07-26 at 21:00 +0300, Yuval Shaia wrote:
>> On Tue, Jul 26, 2016 at 05:38:49PM +0300, Nikolay Borisov wrote:
>>>
>>> In case an infiniband outtage occurs the messages modified
>>> in this patchset can flood the host with a rate of 1 message per
>>
>> s/patchset/patch
>>
>>>
>>> ms which is a lot. Using the ratelimited version of ipoib_warn
>>> fixes the issue by limiting at most 10 messages in 5 seconds.
>>>
>>> Signed-off-by: Nikolay Borisov <kernel-6AxghH7DbtA@public.gmane.org>
>>> ---
>>>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
>>> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
>>> index 7d3281866ffc..dd5b4afbc00b 100644
>>> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
>>> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
>>> @@ -1034,9 +1034,9 @@ static void ipoib_timeout(struct net_device
>>> *dev)
>>>  {
>>>  	struct ipoib_dev_priv *priv = netdev_priv(dev);
>>>  
>>> -	ipoib_warn(priv, "transmit timeout: latency %d msecs\n",
>>> +	ipoib_warn_rl(priv, "transmit timeout: latency %d
>>> msecs\n",
>>>  		   jiffies_to_msecs(jiffies - dev->trans_start));
>>> -	ipoib_warn(priv, "queue stopped %d, tx_head %u, tx_tail
>>> %u\n",
>>> +	ipoib_warn_rl(priv, "queue stopped %d, tx_head %u, tx_tail
>>> %u\n",
>>
>> So this is just to hide the hide the problem,
> 
> No, that's not true.  To hide the problem, he would have to remove the
> printk entirely.  Linus absolutely despises when programmers use
> BUG_ON() for situations where it is possible for the system to
> continue, albeit with reduced capability or what not.  Similarly, it is
> entirely unacceptable that a simple issue, such as a hardware queue
> stoppage on a network device, can render a system mostly or entirely or
> even just somewhat unusable due to a flood of kernel printk messages.
> 
> Just so we're clear, I absolutely despise printk floods.  There is
> never a valid reason for them.  If someone in the kernel community
> suggested that we make the default printk implementation be flood
> proof, I would support it.  In the meantime, I'm all for these patches.
>  And if there are other places in the ipoib code that can cause floods
> too, I would like to see them switched over to use the rate limited
> version too.
> 
> Now that I think about it, I'm just as likely to be receptive to a
> patch that makes the default ipoib_warn() always be rate limited, and
> if there are exceptions in the ipoib stack where we truly want every

The thing with this approach is that it's entirely possible to miss
messages if there is one particular ratelimit state.

> message, then change just those places to a different
> ipoib_warn_no_rate_limit() implementation.  I would probably modify the
> default rate though...10 in 5 seconds is probably too low for some of
> the messages, as we can legitimately get a burst of 50 in less than a
> second when doing things like joining lots of multicast groups and
> there is an issue, or something like that.  Maybe 100 in 60 to allow
> for bursts but to still prevent runaway printks like you are having?
> 
If you take a closer look at the definition of ipoib_warn_rl you would
see that it's enclosed in it's own {} block, meaning its ratelimit state
is going to apply to this particular place. In essence only the
"transmit timeout: latency msec" message would be rate limited to 10
messages in 5 seconds, which I believe is fine. In a flood situation
you'd get 10 identical messages and then there will be a line saying how
many were suppressed, indicating a flood.


>>  the real question is what is
>> causing this to happen a lot.
>>
>>>
>>>  		   netif_queue_stopped(dev),
>>>  		   priv->tx_head, priv->tx_tail);
>>>  	/* XXX reset QP, etc. */
>>> -- 
>>> 2.5.0
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-
>>> rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2016-08-03 14:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-26 14:38 [PATCH 1/2] ipoib: Add ratelimited version of ipoib_warn Nikolay Borisov
     [not found] ` <1469543929-17659-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-07-26 14:38   ` [PATCH 2/2] ipoib: Ratelimit messages which can flood a host Nikolay Borisov
     [not found]     ` <1469543929-17659-2-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-07-26 18:00       ` Yuval Shaia
     [not found]         ` <20160726180021.GA6144-Hxa29pjIrETlQW142y8m19+IiqhCXseY@public.gmane.org>
2016-07-26 22:02           ` Nikolay Borisov
     [not found]             ` <CAJFSNy5RA_PVxz0oPdKamF7Kc+1LFDZ9P0LbQH6EHuvqJO18xQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2016-08-01  8:11               ` Erez Shitrit
2016-08-03 14:26           ` Doug Ledford
     [not found]             ` <1470234370.18081.63.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-08-03 14:32               ` Nikolay Borisov [this message]
     [not found]                 ` <57A2006B.5070304-6AxghH7DbtA@public.gmane.org>
2016-08-03 16:25                   ` Doug Ledford
2016-08-08 14:14               ` [PATCH] ipoib: Make ipoib_warn ratelimited Nikolay Borisov
     [not found]                 ` <1470665662-24028-1-git-send-email-kernel-6AxghH7DbtA@public.gmane.org>
2016-08-25 14:24                   ` Doug Ledford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57A2006B.5070304@kyup.com \
    --to=kernel-6axghh7dbta@public.gmane.org \
    --cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=guysh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.