Linux EDAC development
 help / color / mirror / Atom feed
From: Shiju Jose <shiju.jose@huawei.com>
To: Linux regressions mailing list <regressions@lists.linux.dev>,
	"rostedt@goodmis.org" <rostedt@goodmis.org>
Cc: "mhiramat@kernel.org" <mhiramat@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-trace-kernel@vger.kernel.org" 
	<linux-trace-kernel@vger.kernel.org>,
	tanxiaofei <tanxiaofei@huawei.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	Linuxarm <linuxarm@huawei.com>,
	"mchehab@kernel.org" <mchehab@kernel.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Subject: RE: [RFC PATCH V2 1/1] rasdaemon: Fix poll() on per_cpu trace_pipe_raw blocks indefinitely
Date: Thu, 16 Feb 2023 13:40:08 +0000	[thread overview]
Message-ID: <1e759e44f5e64b4e99096afd9e89b6dc@huawei.com> (raw)
In-Reply-To: <286293b4-5ae6-1348-9d69-7049ef5adf35@leemhuis.info>

Hello,

>-----Original Message-----
>From: Linux regression tracking (Thorsten Leemhuis)
><regressions@leemhuis.info>
>Sent: 16 February 2023 11:48
>To: rostedt@goodmis.org
>Cc: mhiramat@kernel.org; linux-kernel@vger.kernel.org; linux-trace-
>kernel@vger.kernel.org; tanxiaofei <tanxiaofei@huawei.com>; Jonathan
>Cameron <jonathan.cameron@huawei.com>; Linuxarm
><linuxarm@huawei.com>; Linux kernel regressions list
><regressions@lists.linux.dev>; Shiju Jose <shiju.jose@huawei.com>;
>mchehab@kernel.org; linux-edac@vger.kernel.org
>Subject: Re: [RFC PATCH V2 1/1] rasdaemon: Fix poll() on per_cpu
>trace_pipe_raw blocks indefinitely
>
>Hi, this is your Linux kernel regression tracker.

Kernel fix patch for this issue is already in the mainline. Please see the commit
3e46d910d8acf94e5360126593b68bf4fee4c4a1
("tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw")

>
>On 04.02.23 20:33, shiju.jose@huawei.com wrote:
>> From: Shiju Jose <shiju.jose@huawei.com>
>>
>> The error events are not received in the rasdaemon since kernel 6.1-rc6.
>> This issue is firstly detected and reported, when testing the CXL
>> error events in the rasdaemon.
>
>Thanks for working on this. This submission looks stalled, unless I missed
>something. This is unfortunate, as this afaics is fixing a regression (caused by a
>commit from Steven). Hence it would be good to get this fixed rather sooner
>than later. Or is the RFC in the subject the reason why there was no progress? Is
>it maybe time to remove it?

I made the pull request for this rasdaemon  patch here,
 https://github.com/mchehab/rasdaemon/pull/86

>
>> Debugging showed, poll() on trace_pipe_raw in the ras-events.c do not
>> return and this issue is seen after the commit
>> 42fb0a1e84ff525ebe560e2baf9451ab69127e2b ("tracing/ring-buffer: Have
>> polling block on watermark").
>>
>> This also verified using a test application for poll() and select() on
>> trace_pipe_raw.
>>
>> There is also a bug reported on this issue,
>> https://lore.kernel.org/all/31eb3b12-3350-90a4-a0d9-d1494db7cf74@oracl
>> e.com/
>
>
>
>
>> This issue occurs for the per_cpu case, which calls the
>> ring_buffer_poll_wait(), in kernel/trace/ring_buffer.c, with the
>> buffer_percent > 0 and then wait until the percentage of pages are
>> available.The default value set for the buffer_percent is 50 in the
>> kernel/trace/trace.c. However poll() does not return even met the
>> percentage of pages condition.
>>
>> As a fix, rasdaemon set buffer_percent as 0 through the
>> /sys/kernel/debug/tracing/instances/rasdaemon/buffer_percent, then the
>> task will wake up as soon as data is added to any of the specific cpu
>> buffer and poll() on per_cpu/cpuX/trace_pipe_raw does not block
>> indefinitely.
>>
>> Dependency on the kernel RFC patch
>> tracing: Fix poll() and select() do not work on per_cpu trace_pipe and
>> trace_pipe_raw
>
>BTW, this patch afaics should have these tags:
>
>Fixes: 42fb0a1e84ff ("tracing/ring-buffer: Have polling block on watermark")
>Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
>Link:
>https://lore.kernel.org/r/31eb3b12-3350-90a4-a0d9-
>d1494db7cf74@oracle.com/
Yes. I had given the link in the patch header.

>
>An likely a
>
>Cc: <stable@vger.kernel.org> # 6.1.x
>
>Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>--
>Everything you wanna know about Linux kernel regression tracking:
>https://linux-regtracking.leemhuis.info/about/#tldr
>If I did something stupid, please tell me, as explained on that page.
>
>#regzbot poke
>#regzbot ^backmonitor:
>https://lore.kernel.org/r/31eb3b12-3350-90a4-a0d9-
>d1494db7cf74@oracle.com/
>
>> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
>>
>> Changes:
>> RFC V1 -> RFC V2
>> 1. Rename the patch header subject.
>> 2. Changes for the backward compatability to the old kernels.
>> ---
>>  ras-events.c | 22 ++++++++++++++++++++++
>>  1 file changed, 22 insertions(+)
>>
>> diff --git a/ras-events.c b/ras-events.c index 3691311..e505a0e 100644
>> --- a/ras-events.c
>> +++ b/ras-events.c
>> @@ -383,6 +383,8 @@ static int read_ras_event_all_cpus(struct pthread_data
>*pdata,
>>  	int warnonce[n_cpus];
>>  	char pipe_raw[PATH_MAX];
>>  	int legacy_kernel = 0;
>> +	int fd;
>> +	char buf[10];
>>  #if 0
>>  	int need_sleep = 0;
>>  #endif
>> @@ -402,6 +404,26 @@ static int read_ras_event_all_cpus(struct
>pthread_data *pdata,
>>  		return -ENOMEM;
>>  	}
>>
>> +	/* Fix for poll() on the per_cpu trace_pipe and trace_pipe_raw blocks
>> +	 * indefinitely with the default buffer_percent in the kernel trace
>system,
>> +	 * which is introduced by the following change in the kernel.
>> +	 *
>https://lore.kernel.org/all/20221020231427.41be3f26@gandalf.local.home/T/#u
>.
>> +	 * Set buffer_percent to 0 so that poll() will return immediately
>> +	 * when the trace data is available in the ras per_cpu trace pipe_raw
>> +	 */
>> +	fd = open_trace(pdata[0].ras, "buffer_percent", O_WRONLY);
>> +	if (fd >= 0) {
>> +		/* For the backward compatabilty to the old kernel, do not
>return
>> +		 * if fail to set the buffer_percent.
>> +		 */
>> +		snprintf(buf, sizeof(buf), "0");
>> +		size = write(fd, buf, strlen(buf));
>> +		if (size <= 0)
>> +			log(TERM, LOG_WARNING, "can't write to
>buffer_percent\n");
>> +		close(fd);
>> +	} else
>> +		log(TERM, LOG_WARNING, "Can't open buffer_percent\n");
>> +
>>  	for (i = 0; i < (n_cpus + 1); i++)
>>  		fds[i].fd = -1;
>>

Thanks,
Shiju

  reply	other threads:[~2023-02-16 13:40 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-04 19:33 [RFC PATCH V2 1/1] rasdaemon: Fix poll() on per_cpu trace_pipe_raw blocks indefinitely shiju.jose
2023-02-16 11:47 ` Linux regression tracking (Thorsten Leemhuis)
2023-02-16 13:40   ` Shiju Jose [this message]
2023-02-16 13:55     ` Linux regression tracking (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e759e44f5e64b4e99096afd9e89b6dc@huawei.com \
    --to=shiju.jose@huawei.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mchehab@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=tanxiaofei@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox