From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Jonathan Toppins
<jtoppins-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>,
Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org>,
Mel Gorman
<mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt@public.gmane.org>,
Hillf Danton <hillf.zj-gPhfCIXyaqCqndwCJWfcng@public.gmane.org>,
open list <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH] mm: ratelimit PFNs busy info message
Date: Wed, 02 Aug 2017 14:05:16 -0400 [thread overview]
Message-ID: <1501697116.109555.9.camel@redhat.com> (raw)
In-Reply-To: <499c0f6cc10d6eb829a67f2a4d75b4228a9b356e.1501695897.git.jtoppins-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
On Wed, 2017-08-02 at 13:44 -0400, Jonathan Toppins wrote:
> The RDMA subsystem can generate several thousand of these messages
> per
> second eventually leading to a kernel crash. Ratelimit these messages
> to prevent this crash.
>
> Signed-off-by: Jonathan Toppins <jtoppins-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Reviewed-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Tested-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
> mm/page_alloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6d30e914afb6..07b7d3060b21 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7666,7 +7666,7 @@ int alloc_contig_range(unsigned long start,
> unsigned long end,
>
> /* Make sure the range is really isolated. */
> if (test_pages_isolated(outer_start, end, false)) {
> - pr_info("%s: [%lx, %lx) PFNs busy\n",
> + pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
> __func__, outer_start, end);
> ret = -EBUSY;
> goto done;
FWIW, I've been carrying a version of this for several kernel versions.
I don't remember when they started, but we have one (and only one)
class of machines: Dell PE R730xd, that generate these errors. When it
happens, without a rate limit, we get rcu timeouts and kernel oopses.
With the rate limit, we just get a lot of annoying kernel messages but
the machine continues on, recovers, and eventually the memory
operations all succeed.
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
WARNING: multiple messages have this Message-ID (diff)
From: Doug Ledford <dledford@redhat.com>
To: Jonathan Toppins <jtoppins@redhat.com>, linux-mm@kvack.org
Cc: linux-rdma@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
Mel Gorman <mgorman@techsingularity.net>,
Hillf Danton <hillf.zj@alibaba-inc.com>,
open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: ratelimit PFNs busy info message
Date: Wed, 02 Aug 2017 14:05:16 -0400 [thread overview]
Message-ID: <1501697116.109555.9.camel@redhat.com> (raw)
In-Reply-To: <499c0f6cc10d6eb829a67f2a4d75b4228a9b356e.1501695897.git.jtoppins@redhat.com>
On Wed, 2017-08-02 at 13:44 -0400, Jonathan Toppins wrote:
> The RDMA subsystem can generate several thousand of these messages
> per
> second eventually leading to a kernel crash. Ratelimit these messages
> to prevent this crash.
>
> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
> Reviewed-by: Doug Ledford <dledford@redhat.com>
> Tested-by: Doug Ledford <dledford@redhat.com>
> ---
> mm/page_alloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6d30e914afb6..07b7d3060b21 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7666,7 +7666,7 @@ int alloc_contig_range(unsigned long start,
> unsigned long end,
>
> /* Make sure the range is really isolated. */
> if (test_pages_isolated(outer_start, end, false)) {
> - pr_info("%s: [%lx, %lx) PFNs busy\n",
> + pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
> __func__, outer_start, end);
> ret = -EBUSY;
> goto done;
FWIW, I've been carrying a version of this for several kernel versions.
I don't remember when they started, but we have one (and only one)
class of machines: Dell PE R730xd, that generate these errors. When it
happens, without a rate limit, we get rcu timeouts and kernel oopses.
With the rate limit, we just get a lot of annoying kernel messages but
the machine continues on, recovers, and eventually the memory
operations all succeed.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Doug Ledford <dledford@redhat.com>
To: Jonathan Toppins <jtoppins@redhat.com>, linux-mm@kvack.org
Cc: linux-rdma@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
Mel Gorman <mgorman@techsingularity.net>,
Hillf Danton <hillf.zj@alibaba-inc.com>,
open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: ratelimit PFNs busy info message
Date: Wed, 02 Aug 2017 14:05:16 -0400 [thread overview]
Message-ID: <1501697116.109555.9.camel@redhat.com> (raw)
In-Reply-To: <499c0f6cc10d6eb829a67f2a4d75b4228a9b356e.1501695897.git.jtoppins@redhat.com>
On Wed, 2017-08-02 at 13:44 -0400, Jonathan Toppins wrote:
> The RDMA subsystem can generate several thousand of these messages
> per
> second eventually leading to a kernel crash. Ratelimit these messages
> to prevent this crash.
>
> Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
> Reviewed-by: Doug Ledford <dledford@redhat.com>
> Tested-by: Doug Ledford <dledford@redhat.com>
> ---
> mm/page_alloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6d30e914afb6..07b7d3060b21 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7666,7 +7666,7 @@ int alloc_contig_range(unsigned long start,
> unsigned long end,
>
> /* Make sure the range is really isolated. */
> if (test_pages_isolated(outer_start, end, false)) {
> - pr_info("%s: [%lx, %lx) PFNs busy\n",
> + pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
> __func__, outer_start, end);
> ret = -EBUSY;
> goto done;
FWIW, I've been carrying a version of this for several kernel versions.
I don't remember when they started, but we have one (and only one)
class of machines: Dell PE R730xd, that generate these errors. When it
happens, without a rate limit, we get rcu timeouts and kernel oopses.
With the rate limit, we just get a lot of annoying kernel messages but
the machine continues on, recovers, and eventually the memory
operations all succeed.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
next prev parent reply other threads:[~2017-08-02 18:05 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-02 17:44 [PATCH] mm: ratelimit PFNs busy info message Jonathan Toppins
2017-08-02 17:44 ` Jonathan Toppins
[not found] ` <499c0f6cc10d6eb829a67f2a4d75b4228a9b356e.1501695897.git.jtoppins-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-02 18:05 ` Doug Ledford [this message]
2017-08-02 18:05 ` Doug Ledford
2017-08-02 18:05 ` Doug Ledford
2017-08-02 21:17 ` Andrew Morton
2017-08-02 21:17 ` Andrew Morton
[not found] ` <20170802141720.228502368b534f517e3107ff-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2017-08-04 18:55 ` Doug Ledford
2017-08-04 18:55 ` Doug Ledford
2017-08-04 18:55 ` Doug Ledford
2017-08-07 6:58 ` Michal Hocko
2017-08-07 6:58 ` Michal Hocko
2017-08-08 5:34 ` Michael Ellerman
2017-08-08 5:34 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1501697116.109555.9.camel@redhat.com \
--to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=hillf.zj-gPhfCIXyaqCqndwCJWfcng@public.gmane.org \
--cc=jtoppins-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt@public.gmane.org \
--cc=mhocko-IBi9RG/b67k@public.gmane.org \
--cc=vbabka-AlSwsSmVLrQ@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.