From: "Ihor Solodrai" <ihor.solodrai@linux.dev>
To: "David Howells" <dhowells@redhat.com>,
"Marc Dionne" <marc.dionne@auristor.com>,
"Steve French" <stfrench@microsoft.com>
Cc: dhowells@redhat.com, "Eric Van Hensbergen" <ericvh@kernel.org>,
"Latchesar Ionkov" <lucho@ionkov.net>,
"Dominique Martinet" <asmadeus@codewreck.org>,
"Christian Schoenebeck" <linux_oss@crudebyte.com>,
"Paulo Alcantara" <pc@manguebit.com>,
"Jeff Layton" <jlayton@kernel.org>,
"Christian Brauner" <brauner@kernel.org>,
v9fs@lists.linux.dev, linux-cifs@vger.kernel.org,
netfs@lists.linux.dev, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, ast@kernel.org,
bpf@vger.kernel.org
Subject: Re: [PATCH] netfs: Add retry stat counters
Date: Tue, 28 Jan 2025 19:11:04 +0000 [thread overview]
Message-ID: <335ad811ae2cf5ebdfc494c185b9f02e9ca40c3e@linux.dev> (raw)
In-Reply-To: <3187377.1738056789@warthog.procyon.org.uk>
January 28, 2025 at 1:33 AM, "David Howells" <dhowells@redhat.com> wrote:
>
> Here's an additional patch to allow stats on the number of retries to be
>
> obtained. This isn't a fix per se.
>
> David
>
> [...]
Hi David, Marc.
I regret to report that this patch didn't fix the hanging in
v9fs_evict_inode when running selftests/bpf.
Here is what I tried to test:
* Checked out latest bpf-next source tree (0fc5dddb9409)
* Applied patch: https://lore.kernel.org/netfs/3173328.1738024385@warthog.procyon.org.uk/
* Applied retry stats patch: https://lore.kernel.org/netfs/3187377.1738056789@warthog.procyon.org.uk/
* Modified tools/testing/selftests/bpf/config to enable /proc/fs/netfs/stats
* Modified CI scripts to collect the stats
* Ran the shell script reproducing the CI testing pipeline
Bash piece starting a process collecting /proc/fs/netfs/stats:
function tail_netfs {
echo -n > /mnt/vmtest/netfs-stats.log
while true; do
echo >> /mnt/vmtest/netfs-stats.log
cat /proc/fs/netfs/stats >> /mnt/vmtest/netfs-stats.log
sleep 1
done
}
export -f tail_netfs
nohup bash -c 'tail_netfs' & disown
Last recored /proc/fs/netfs/stats (note 0 retries):
Reads : DR=0 RA=15184 RF=5 RS=0 WB=0 WBZ=0
Writes : BW=488 WT=0 DW=0 WP=488 2C=0
ZeroOps: ZR=7964 sh=0 sk=0
DownOps: DL=15189 ds=15189 df=0 di=0
CaRdOps: RD=0 rs=0 rf=0
UpldOps: UL=488 us=488 uf=0
CaWrOps: WR=0 ws=0 wf=0
Retries: rq=0 rs=0 wq=0 ws=0
Objs : rr=2 sr=1 foq=1 wsc=0
WbLock : skip=0 wait=0
-- FS-Cache statistics --
Cookies: n=0 v=0 vcol=0 voom=0
Acquire: n=0 ok=0 oom=0
LRU : n=0 exp=0 rmv=0 drp=0 at=0
Invals : n=0
Updates: n=0 rsz=0 rsn=0
Relinqs: n=0 rtr=0 drop=0
NoSpace: nwr=0 ncr=0 cull=0
IO : rd=0 wr=0 mis=0
The stack on hung task hasn't changed:
[ 184.375149] INFO: task modprobe:2759 blocked for more than 20 seconds.
[ 184.376149] Tainted: G OE 6.13.0-gbb67a65a921c-dirty #3
[ 184.376593] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 184.377119] task:modprobe state:D stack:0 pid:2759 tgid:2759 ppid:455 flags:0x00004002
[ 184.377701] Call Trace:
[ 184.377886] <TASK>
[ 184.378039] __schedule+0xa91/0xe80
[ 184.378282] schedule+0x41/0xb0
[ 184.378490] v9fs_evict_inode+0xfe/0x170
[ 184.378754] ? __pfx_var_wake_function+0x10/0x10
[ 184.379070] evict+0x1ef/0x360
[ 184.379288] __dentry_kill+0xb0/0x220
[ 184.379528] ? dput+0x3a/0x1d0
[ 184.379736] dput+0x114/0x1d0
[ 184.379946] __fput+0x136/0x2b0
[ 184.380158] task_work_run+0x89/0xc0
[ 184.380396] do_exit+0x2c6/0x9c0
[ 184.380617] do_group_exit+0xa4/0xb0
[ 184.380870] __x64_sys_exit_group+0x17/0x20
[ 184.381137] x64_sys_call+0x21a0/0x21a0
[ 184.381386] do_syscall_64+0x79/0x120
[ 184.381630] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 184.381969] RIP: 0033:0x7f817bf7c21d
[ 184.382202] RSP: 002b:00007fff92c8d148 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
[ 184.382676] RAX: ffffffffffffffda RBX: 00007f817c092fa8 RCX: 00007f817bf7c21d
[ 184.383138] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000001
[ 184.383582] RBP: 00007fff92c8d1a0 R08: 00007fff92c8d0e8 R09: 0000000000000000
[ 184.384042] R10: 00007fff92c8d05f R11: 0000000000000206 R12: 0000000000000001
[ 184.384486] R13: 0000000000000000 R14: 0000000000000001 R15: 00007f817c092fc0
[ 184.384963] </TASK>
[ 184.385112]
[ 184.385112] Showing all locks held in the system:
[ 184.385499] 1 lock held by khungtaskd/32:
[ 184.385793] #0: ffffffff9d195d90 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x2e/0x180
[ 184.386366] 2 locks held by kworker/u8:10/455:
[ 184.386649] #0: ffffa1a240104d48 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x23a/0x600
[ 184.387357] #1: ffffb06380a23e20 ((work_completion)(&sub_info->work)){+.+.}-{0:0}, at: process_scheduled_works+0x25a/0x600
[ 184.388076]
[ 184.388183] =============================================
I pushed full logs to github:
https://github.com/kernel-patches/bpf/commit/88c0d0e1692b04c0d54b7c1941003758d23e0d6a
I recommend trying to reproduce with steps I shared in my initial report:
https://lore.kernel.org/bpf/a7x33d4dnMdGTtRivptq6S1i8btK70SNBP2XyX_xwDAhLvgQoPox6FVBOkifq4eBinfFfbZlIkMZBe3QarlWTxoEtHZwJCZbNKtaqrR7PvI=@pm.me/
I know it may not be very convenient due to all the CI stuff, but you
should be able to use it to iterate on the kernel source locally and
narrow down the problem.
I have everything set up, so you also might share some debugging code
for me to run if you prefer.
Thanks.
---
Not directly related, but it took me a while to figure out how to
collect the netfs stats.
I first added:
CONFIG_NETFS_DEBUG=y
CONFIG_NETFS_STATS=y
But that didn't work, because /proc/fs/netfs/stats is created only
with CONFIG_FSCACHE_STATS (fs/netfs/main.c):
#ifdef CONFIG_FSCACHE_STATS
if (!proc_create_single("fs/netfs/stats", S_IFREG | 0444, NULL,
netfs_stats_show))
goto error_procfile;
#endif
And that depends on CONFIG_FSCACHE=y, so I ended up with:
CONFIG_FSCACHE=y
CONFIG_FSCACHE_STATS=y
CONFIG_NETFS_SUPPORT=y
CONFIG_NETFS_DEBUG=y
CONFIG_NETFS_STATS=y
next prev parent reply other threads:[~2025-01-28 19:11 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-28 0:33 [PATCH] netfs: Fix a number of read-retry hangs David Howells
2025-01-28 9:33 ` [PATCH] netfs: Add retry stat counters David Howells
2025-01-28 19:11 ` Ihor Solodrai [this message]
2025-02-03 18:01 ` Ihor Solodrai
2025-02-10 10:57 ` David Howells
2025-02-10 11:12 ` David Howells
2025-02-10 21:54 ` Ihor Solodrai
2025-02-10 23:18 ` David Howells
2025-02-11 0:54 ` Ihor Solodrai
2025-02-12 9:47 ` [PATCH] netfs: Fix setting NETFS_RREQ_ALL_QUEUED to be after all subreqs queued David Howells
2025-02-12 18:01 ` Ihor Solodrai
2025-01-28 16:51 ` [PATCH] netfs: Fix a number of read-retry hangs Marc Dionne
2025-02-11 1:07 ` Steve French
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=335ad811ae2cf5ebdfc494c185b9f02e9ca40c3e@linux.dev \
--to=ihor.solodrai@linux.dev \
--cc=asmadeus@codewreck.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=dhowells@redhat.com \
--cc=ericvh@kernel.org \
--cc=jlayton@kernel.org \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux_oss@crudebyte.com \
--cc=lucho@ionkov.net \
--cc=marc.dionne@auristor.com \
--cc=netfs@lists.linux.dev \
--cc=pc@manguebit.com \
--cc=stfrench@microsoft.com \
--cc=v9fs@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).