public inbox for linux-riscv@lists.infradead.org
 help / color / mirror / Atom feed
* potential riscv special bug maybe found
@ 2023-03-09 15:18 Bo YU
  2023-08-09 14:47 ` Aurelien Jarno
  0 siblings, 1 reply; 4+ messages in thread
From: Bo YU @ 2023-03-09 15:18 UTC (permalink / raw)
  To: tsu.yubo, linux-riscv; +Cc: Andreas Gruenbacher


[-- Attachment #1.1: Type: text/plain, Size: 1301 bytes --]

Hi,

I am sorry if this is noise.

Some days ago I noticed strace 6.2 was built failed on riscv64 due to
test cases[0]. There is one program from strace can reproduce it:

```
./tests/read-write 
```

It will be hang.

In fact, the issue has existed since 5.18. I `git bisect` and finally
found out the issue was introduced by the commit[1]:

commit 631f871f071746789e9242e514ab0f49067fa97a
Author: Andreas Gruenbacher <agruenba@redhat.com>
Date:   Tue Nov 9 12:56:06 2021 +0100

     fs/iomap: Fix buffered write page prefaulting

I do not think there is a problem with this commit, because it does not
affect others arch expect riscv and after I reverted it, it will pass
all test cases from strace(There is still one case failed on qemu, but
this is another store). 

I try to debug something but failed.
Would be appreciated it any help.

PS:
This is `cat /proc/${read-write-pid}/stack` when hang:
[<0>] generic_perform_write+0x12e/0x1ec
[<0>] ext4_buffered_write_iter+0x5e/0xe6
[<0>] ext4_file_write_iter+0xb4/0x67c
[<0>] vfs_write+0x1d2/0x308
[<0>] ksys_write+0x56/0xc6
[<0>] sys_write+0xe/0x16
[<0>] check_syscall_nr+0x3c/0x3c

[0]: https://github.com/strace/strace/issues/242
[1]: https://lkml.org/lkml/2021/11/23/641

-- 
Regards,
--
   Bo YU


[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: potential riscv special bug maybe found
  2023-03-09 15:18 potential riscv special bug maybe found Bo YU
@ 2023-08-09 14:47 ` Aurelien Jarno
  2023-08-10 11:09   ` Alexandre Ghiti
  0 siblings, 1 reply; 4+ messages in thread
From: Aurelien Jarno @ 2023-08-09 14:47 UTC (permalink / raw)
  To: linux-riscv; +Cc: Bo YU, Andreas Gruenbacher


[-- Attachment #1.1.1: Type: text/plain, Size: 1316 bytes --]

Hi,

On 2023-03-09 23:18, Bo YU wrote:
> Hi,
> 
> I am sorry if this is noise.
> 
> Some days ago I noticed strace 6.2 was built failed on riscv64 due to
> test cases[0]. There is one program from strace can reproduce it:
> 
> ```
> ./tests/read-write ```
> 
> It will be hang.
> 
> In fact, the issue has existed since 5.18. I `git bisect` and finally
> found out the issue was introduced by the commit[1]:
> 
> commit 631f871f071746789e9242e514ab0f49067fa97a
> Author: Andreas Gruenbacher <agruenba@redhat.com>
> Date:   Tue Nov 9 12:56:06 2021 +0100
> 
>     fs/iomap: Fix buffered write page prefaulting
> 
> I do not think there is a problem with this commit, because it does not
> affect others arch expect riscv and after I reverted it, it will pass
> all test cases from strace(There is still one case failed on qemu, but
> this is another store).
> 
> I try to debug something but failed.
> Would be appreciated it any help.

Please find attached a simpler reproducer extracted from strace, which
should make the issue easier to reproduce. It hangs on riscv64 and needs
to be killed with -9, while it works fine on amd64.

Regards
Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                     http://aurel32.net

[-- Attachment #1.1.2: read-write.c --]
[-- Type: text/x-csrc, Size: 2277 bytes --]

/*
 * Check decoding and dumping of read and write syscalls.
 *
 * Copyright (c) 2016 Dmitry V. Levin <ldv@strace.io>
 * Copyright (c) 2016-2021 The strace developers.
 * All rights reserved.
 *
 * SPDX-License-Identifier: GPL-2.0-or-later
 */

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/mman.h>

# define LENGTH_OF(arg) ((unsigned int) sizeof(arg) - 1)
# define ARRAY_SIZE(a_)        (sizeof(a_) / sizeof((a_)[0]))

static void *
tail_alloc(const size_t size)
{
	const size_t page_size = sysconf(_SC_PAGESIZE);
	const size_t len = (size + page_size - 1) & -page_size;
	const size_t alloc_size = len + 6 * page_size;

	void *p = mmap(NULL, alloc_size, PROT_READ | PROT_WRITE,
		       MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
	if (MAP_FAILED == p) {
		perror("mmap");
		exit(1);
	}

	void *start_work = p + 3 * page_size;
	void *tail_guard = start_work + len;

	if (munmap(p, page_size) ||
	    munmap(p + 2 * page_size, page_size) ||
	    munmap(tail_guard, page_size) ||
	    munmap(tail_guard + 2 * page_size, page_size)) {
		perror("munmap");
		exit(1);
	}

	memset(start_work, 0xff, len);
	return tail_guard - size;
}

static void
fill_memory_ex(void *ptr, size_t size, unsigned char start,
	       unsigned int period)
{
	unsigned char *p = ptr;

	for (typeof(size) i = 0; i < size; ++i) {
		p[i] = start + i % period;
	}
}

int
main(void)
{
	static const char tmp[] = "read-write-tmpfile";
	long rc;
	long fdr;
	long fdw;

	unlink(tmp);

	fdr = open(tmp, O_CREAT|O_EXCL|O_RDONLY, 0600);
	if (fdr < 0) {
		perror("create");
		exit(1);
	}

	fdw = open(tmp, O_TRUNC|O_WRONLY);
	if (fdw < 0) {
		perror("open");
		exit(1);
	}

	static const char w_c[] = "0123456789abcde";
	const unsigned int w_len = LENGTH_OF(w_c);

	rc = write(fdw, w_c, w_len);
	if (rc != (int) w_len) {
		perror("write");
		exit(1);
	}

	static const size_t six_wide_size = 1 << 20;
	static const size_t fetch_size = 1 << 16;
	const size_t buf_size = six_wide_size + fetch_size;
	const size_t sizes[] = {
		buf_size,
		buf_size + 1,
	};
	char *big_buf = tail_alloc(buf_size);

	fill_memory_ex(big_buf, buf_size, 0, 0x100);

	for (size_t i = 0; i < ARRAY_SIZE(sizes); i++) {
		write(fdw, big_buf, sizes[i]);
	}

	close(fdw);

	return 0;
}

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 161 bytes --]

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: potential riscv special bug maybe found
  2023-08-09 14:47 ` Aurelien Jarno
@ 2023-08-10 11:09   ` Alexandre Ghiti
  2023-08-11  9:32     ` Alexandre Ghiti
  0 siblings, 1 reply; 4+ messages in thread
From: Alexandre Ghiti @ 2023-08-10 11:09 UTC (permalink / raw)
  To: Aurelien Jarno, linux-riscv; +Cc: Bo YU, Andreas Gruenbacher

Hi Aurélien, Bo,


On 09/08/2023 16:47, Aurelien Jarno wrote:
> Hi,
>
> On 2023-03-09 23:18, Bo YU wrote:
>> Hi,
>>
>> I am sorry if this is noise.
>>
>> Some days ago I noticed strace 6.2 was built failed on riscv64 due to
>> test cases[0]. There is one program from strace can reproduce it:
>>
>> ```
>> ./tests/read-write ```
>>
>> It will be hang.
>>
>> In fact, the issue has existed since 5.18. I `git bisect` and finally
>> found out the issue was introduced by the commit[1]:
>>
>> commit 631f871f071746789e9242e514ab0f49067fa97a
>> Author: Andreas Gruenbacher <agruenba@redhat.com>
>> Date:   Tue Nov 9 12:56:06 2021 +0100
>>
>>      fs/iomap: Fix buffered write page prefaulting
>>
>> I do not think there is a problem with this commit, because it does not
>> affect others arch expect riscv and after I reverted it, it will pass
>> all test cases from strace(There is still one case failed on qemu, but
>> this is another store).
>>
>> I try to debug something but failed.
>> Would be appreciated it any help.
> Please find attached a simpler reproducer extracted from strace, which
> should make the issue easier to reproduce. It hangs on riscv64 and needs
> to be killed with -9, while it works fine on amd64.


Thanks for the reproducer, I was able to reproduce the problem: the 
kernel is stuck trying to copy data from user, I'm looking into it right 
now as this seems very weird. Note that I will be on vacation at the end 
of the week, if I don't have time to fix this, I'll post my findings here.


Thanks again,

Alex


> Regards
> Aurelien
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: potential riscv special bug maybe found
  2023-08-10 11:09   ` Alexandre Ghiti
@ 2023-08-11  9:32     ` Alexandre Ghiti
  0 siblings, 0 replies; 4+ messages in thread
From: Alexandre Ghiti @ 2023-08-11  9:32 UTC (permalink / raw)
  To: Aurelien Jarno, linux-riscv; +Cc: Bo YU, Andreas Gruenbacher

Hi Aurélien, Bo,

On 10/08/2023 13:09, Alexandre Ghiti wrote:
> Hi Aurélien, Bo,
>
>
> On 09/08/2023 16:47, Aurelien Jarno wrote:
>> Hi,
>>
>> On 2023-03-09 23:18, Bo YU wrote:
>>> Hi,
>>>
>>> I am sorry if this is noise.
>>>
>>> Some days ago I noticed strace 6.2 was built failed on riscv64 due to
>>> test cases[0]. There is one program from strace can reproduce it:
>>>
>>> ```
>>> ./tests/read-write ```
>>>
>>> It will be hang.
>>>
>>> In fact, the issue has existed since 5.18. I `git bisect` and finally
>>> found out the issue was introduced by the commit[1]:
>>>
>>> commit 631f871f071746789e9242e514ab0f49067fa97a
>>> Author: Andreas Gruenbacher <agruenba@redhat.com>
>>> Date:   Tue Nov 9 12:56:06 2021 +0100
>>>
>>>      fs/iomap: Fix buffered write page prefaulting
>>>
>>> I do not think there is a problem with this commit, because it does not
>>> affect others arch expect riscv and after I reverted it, it will pass
>>> all test cases from strace(There is still one case failed on qemu, but
>>> this is another store).
>>>
>>> I try to debug something but failed.
>>> Would be appreciated it any help.
>> Please find attached a simpler reproducer extracted from strace, which
>> should make the issue easier to reproduce. It hangs on riscv64 and needs
>> to be killed with -9, while it works fine on amd64.
>
>
> Thanks for the reproducer, I was able to reproduce the problem: the 
> kernel is stuck trying to copy data from user, I'm looking into it 
> right now as this seems very weird. Note that I will be on vacation at 
> the end of the week, if I don't have time to fix this, I'll post my 
> findings here.
>

So I was able to find the root cause and I'm about to send a fix, but in 
a nustshell, our copy_[from|to]_user and clear_user routines do not 
return the number of bytes effectively written when a "fixup exception" 
happens, which causes the hang you both observed.

Thank you very much Aurélien for the reproducer, that really helps!

And thanks Bo for the initial report!

Alex


>
> Thanks again,
>
> Alex
>
>
>> Regards
>> Aurelien
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-08-11  9:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-09 15:18 potential riscv special bug maybe found Bo YU
2023-08-09 14:47 ` Aurelien Jarno
2023-08-10 11:09   ` Alexandre Ghiti
2023-08-11  9:32     ` Alexandre Ghiti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox