From: dai.ngo@oracle.com
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: chuck.lever@oracle.com, linux-nfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH RFC v4 0/2] nfsd: Initial implementation of NFSv4 Courteous Server
Date: Mon, 27 Sep 2021 23:22:54 -0700 [thread overview]
Message-ID: <4f5cdc04-9584-282c-8add-4b0c1e313f1b@oracle.com> (raw)
In-Reply-To: <20210927203952.GB1704@fieldses.org>
On 9/27/21 1:39 PM, J. Bruce Fields wrote:
> On Mon, Sep 27, 2021 at 04:14:33PM -0400, J. Bruce Fields wrote:
>> The file_rwsem is used for /proc/locks; only the code that produces the
>> /proc/locks output calls down_write, the rest only calls down_read.
>>
>> I assumed that it was OK to nest read acquisitions of a rwsem, but I
>> think that's wrong.
>>
>> I think it should be no big deal to move the lm_expire_lock(.,0) call
>> outside of the file_rwsem?
> You probably want to turn on LOCKDEP for any more testing.
I use 'make menuconfig' to turn on LOCKDEP by enabling
'Kernel hacking/Lock debugging/prove locking correctness'. However
when I boot this kernel, DHCP failed to get an IP address making
the system inaccessible from the network. I'm not sure if it's
related to this error:
Sep 27 23:14:51 nfsvme24 kernel: unchecked MSR access error: WRMSR to 0xe00 (tried to write 0x0000000000010003) at rIP: 0xffffffff8101584d (wrmsrl+0xb/0x1f)
Sep 27 23:14:51 nfsvme24 kernel: Call Trace:
Sep 27 23:14:51 nfsvme24 kernel: uncore_box_ref.part.0+0x60/0x78
Sep 27 23:14:51 nfsvme24 kernel: ? uncore_pci_pmu_register+0xea/0xea
Sep 27 23:14:51 nfsvme24 kernel: uncore_event_cpu_online+0x51/0x107
Sep 27 23:14:51 nfsvme24 kernel: ? uncore_pci_pmu_register+0xea/0xea
Sep 27 23:14:51 nfsvme24 kernel: cpuhp_invoke_callback+0xb2/0x23d
Sep 27 23:14:51 nfsvme24 kernel: ? __schedule+0x5d3/0x625
Sep 27 23:14:51 nfsvme24 kernel: cpuhp_thread_fun+0xc6/0x111
Sep 27 23:14:51 nfsvme24 kernel: ? smpboot_register_percpu_thread+0xcc/0xcc
Sep 27 23:14:51 nfsvme24 kernel: smpboot_thread_fn+0x1b1/0x1c6
Sep 27 23:14:51 nfsvme24 kernel: kthread+0x107/0x10f
Sep 27 23:14:51 nfsvme24 kernel: ? kthread_flush_worker+0x75/0x75
Sep 27 23:14:51 nfsvme24 kernel: ret_from_fork+0x22/0x30
Here is the diff of the working config and non-working (LOCKDEP) config:
[dngo@nfsdev linux]$ diff .config .config-LOCKDEP
245c245
< # CONFIG_KALLSYMS_ALL is not set
---
> CONFIG_KALLSYMS_ALL=y
470a471
> # CONFIG_LIVEPATCH is not set
905,909c906
< CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
< CONFIG_INLINE_READ_UNLOCK=y
< CONFIG_INLINE_READ_UNLOCK_IRQ=y
< CONFIG_INLINE_WRITE_UNLOCK=y
< CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
---
> CONFIG_UNINLINE_SPIN_UNLOCK=y
4463,4470c4460,4475
< # CONFIG_PROVE_LOCKING is not set
< # CONFIG_LOCK_STAT is not set
< # CONFIG_DEBUG_RT_MUTEXES is not set
< # CONFIG_DEBUG_SPINLOCK is not set
< # CONFIG_DEBUG_MUTEXES is not set
< # CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
< # CONFIG_DEBUG_RWSEMS is not set
< # CONFIG_DEBUG_LOCK_ALLOC is not set
---
> CONFIG_PROVE_LOCKING=y
> CONFIG_PROVE_RAW_LOCK_NESTING=y
> CONFIG_LOCK_STAT=y
> CONFIG_DEBUG_RT_MUTEXES=y
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
> CONFIG_DEBUG_RWSEMS=y
> CONFIG_DEBUG_LOCK_ALLOC=y
> CONFIG_LOCKDEP=y
> CONFIG_LOCKDEP_BITS=15
> CONFIG_LOCKDEP_CHAINS_BITS=16
> CONFIG_LOCKDEP_STACK_TRACE_BITS=19
> CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
> CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
> CONFIG_DEBUG_LOCKDEP=y
4479c4484,4486
< # CONFIG_DEBUG_IRQFLAGS is not set
---
> CONFIG_TRACE_IRQFLAGS=y
> CONFIG_TRACE_IRQFLAGS_NMI=y
> CONFIG_DEBUG_IRQFLAGS=y
4498a4506
> CONFIG_PROVE_RCU=y
4526a4535
> CONFIG_PREEMPTIRQ_TRACEPOINTS=y
[dngo@nfsdev linux]$
can you share your config with LOCKDEP enabled so I can give
it a try?
Thanks,
-Dai
>
> I wonder if there's any potential issue with the other locks held here
> (st_mutex, rp_mutex).
>
> --b.
>
>> --b.
>>
>> [ 959.807364] ============================================
>> [ 959.807803] WARNING: possible recursive locking detected
>> [ 959.808228] 5.15.0-rc2-00009-g4e5af4d2635a #533 Not tainted
>> [ 959.808675] --------------------------------------------
>> [ 959.809189] nfsd/5675 is trying to acquire lock:
>> [ 959.809664] ffffffff8519e470 (file_rwsem){.+.+}-{0:0}, at: locks_remove_posix+0x37f/0x4e0
>> [ 959.810647]
>> but task is already holding lock:
>> [ 959.811097] ffffffff8519e470 (file_rwsem){.+.+}-{0:0}, at: nfsd4_lock+0xcb9/0x3850 [nfsd]
>> [ 959.812147]
>> other info that might help us debug this:
>> [ 959.812698] Possible unsafe locking scenario:
>>
>> [ 959.813189] CPU0
>> [ 959.813362] ----
>> [ 959.813544] lock(file_rwsem);
>> [ 959.813812] lock(file_rwsem);
>> [ 959.814078]
>> *** DEADLOCK ***
>>
>> [ 959.814386] May be due to missing lock nesting notation
>>
>> [ 959.814968] 3 locks held by nfsd/5675:
>> [ 959.815315] #0: ffff888007d42bc8 (&rp->rp_mutex){+.+.}-{3:3}, at: nfs4_preprocess_seqid_op+0x395/0x730 [nfsd]
>> [ 959.816546] #1: ffff88800f378b70 (&stp->st_mutex#2){+.+.}-{3:3}, at: nfsd4_lock+0x1f91/0x3850 [nfsd]
>> [ 959.817697] #2: ffffffff8519e470 (file_rwsem){.+.+}-{0:0}, at: nfsd4_lock+0xcb9/0x3850 [nfsd]
>> [ 959.818755]
>> stack backtrace:
>> [ 959.819010] CPU: 2 PID: 5675 Comm: nfsd Not tainted 5.15.0-rc2-00009-g4e5af4d2635a #533
>> [ 959.819847] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-3.fc34 04/01/2014
>> [ 959.820637] Call Trace:
>> [ 959.820759] dump_stack_lvl+0x45/0x59
>> [ 959.821016] __lock_acquire.cold+0x175/0x3a5
>> [ 959.821316] ? lockdep_hardirqs_on_prepare+0x400/0x400
>> [ 959.821741] lock_acquire+0x1a6/0x4b0
>> [ 959.821976] ? locks_remove_posix+0x37f/0x4e0
>> [ 959.822316] ? lock_release+0x6d0/0x6d0
>> [ 959.822591] ? find_held_lock+0x2c/0x110
>> [ 959.822852] ? lock_is_held_type+0xd5/0x130
>> [ 959.823139] posix_lock_inode+0x143/0x1ab0
>> [ 959.823414] ? locks_remove_posix+0x37f/0x4e0
>> [ 959.823739] ? do_raw_spin_unlock+0x54/0x220
>> [ 959.824031] ? lockdep_init_map_type+0x2c3/0x7a0
>> [ 959.824355] ? locks_remove_flock+0x2e0/0x2e0
>> [ 959.824681] locks_remove_posix+0x37f/0x4e0
>> [ 959.824984] ? do_lock_file_wait+0x2a0/0x2a0
>> [ 959.825287] ? lock_downgrade+0x6a0/0x6a0
>> [ 959.825584] ? nfsd_file_put+0x170/0x170 [nfsd]
>> [ 959.825941] filp_close+0xed/0x130
>> [ 959.826191] nfs4_free_lock_stateid+0xcc/0x190 [nfsd]
>> [ 959.826625] free_ol_stateid_reaplist+0x128/0x1f0 [nfsd]
>> [ 959.827037] release_openowner+0xee/0x150 [nfsd]
>> [ 959.827382] ? release_last_closed_stateid+0x460/0x460 [nfsd]
>> [ 959.827837] ? rwlock_bug.part.0+0x90/0x90
>> [ 959.828115] __destroy_client+0x39f/0x6f0 [nfsd]
>> [ 959.828460] ? nfsd4_cb_recall_release+0x20/0x20 [nfsd]
>> [ 959.828868] nfsd4_fl_expire_lock+0x2bc/0x460 [nfsd]
>> [ 959.829273] posix_lock_inode+0xa46/0x1ab0
>> [ 959.829579] ? lockdep_init_map_type+0x2c3/0x7a0
>> [ 959.829913] ? locks_remove_flock+0x2e0/0x2e0
>> [ 959.830250] ? __init_waitqueue_head+0x70/0xd0
>> [ 959.830568] nfsd4_lock+0xcb9/0x3850 [nfsd]
>> [ 959.830878] ? nfsd4_delegreturn+0x3b0/0x3b0 [nfsd]
>> [ 959.831248] ? percpu_counter_add_batch+0x77/0x130
>> [ 959.831594] nfsd4_proc_compound+0xcee/0x21d0 [nfsd]
>> [ 959.831973] ? nfsd4_release_compoundargs+0x140/0x140 [nfsd]
>> [ 959.832414] nfsd_dispatch+0x4df/0xc50 [nfsd]
>> [ 959.832737] ? nfsd_svc+0xca0/0xca0 [nfsd]
>> [ 959.833051] svc_process_common+0xdeb/0x2480 [sunrpc]
>> [ 959.833462] ? svc_create+0x20/0x20 [sunrpc]
>> [ 959.833830] ? nfsd_svc+0xca0/0xca0 [nfsd]
>> [ 959.834144] ? svc_sock_secure_port+0x36/0x40 [sunrpc]
>> [ 959.834578] ? svc_recv+0xd9c/0x2490 [sunrpc]
>> [ 959.834915] svc_process+0x32e/0x4a0 [sunrpc]
>> [ 959.835249] nfsd+0x306/0x530 [nfsd]
>> [ 959.835499] ? nfsd_shutdown_threads+0x300/0x300 [nfsd]
>> [ 959.835899] kthread+0x391/0x470
>> [ 959.836094] ? _raw_spin_unlock_irq+0x24/0x50
>> [ 959.836394] ? set_kthread_struct+0x100/0x100
>> [ 959.836698] ret_from_fork+0x22/0x30
>> [ 960.750222] nfsd: last server has exited, flushing export cache
>> [ 960.880355] NFSD: Using nfsdcld client tracking operations.
>> [ 960.880956] NFSD: starting 15-second grace period (net f0000098)
>> [ 1403.405511] nfsd: last server has exited, flushing export cache
>> [ 1403.656335] NFSD: Using nfsdcld client tracking operations.
>> [ 1403.657585] NFSD: starting 15-second grace period (net f0000098)
>> [ 1445.741596] nfsd: last server has exited, flushing export cache
>> [ 1445.981980] NFSD: Using nfsdcld client tracking operations.
>> [ 1445.983143] NFSD: starting 15-second grace period (net f0000098)
>> [ 1450.025112] NFSD: all clients done reclaiming, ending NFSv4 grace period (net f0000098)
>> [ 1472.325551] nfsd: last server has exited, flushing export cache
>> [ 1472.583073] NFSD: Using nfsdcld client tracking operations.
>> [ 1472.583998] NFSD: starting 15-second grace period (net f0000098)
>> [ 1473.175582] NFSD: all clients done reclaiming, ending NFSv4 grace period (net f0000098)
>> [ 1494.637499] nfsd: last server has exited, flushing export cache
>> [ 1494.885795] NFSD: Using nfsdcld client tracking operations.
>> [ 1494.886484] NFSD: starting 15-second grace period (net f0000098)
>> [ 1495.393667] NFSD: all clients done reclaiming, ending NFSv4 grace period (net f0000098)
>> [ 1516.781474] nfsd: last server has exited, flushing export cache
>> [ 1516.902903] NFSD: Using nfsdcld client tracking operations.
>> [ 1516.903460] NFSD: starting 15-second grace period (net f0000098)
>> [ 1538.045156] NFSD: all clients done reclaiming, ending NFSv4 grace period (net f0000098)
>> [ 1559.125362] nfsd: last server has exited, flushing export cache
>> [ 1559.362856] NFSD: Using nfsdcld client tracking operations.
>> [ 1559.363658] NFSD: starting 15-second grace period (net f0000098)
>> [ 1559.480531] NFSD: all clients done reclaiming, ending NFSv4 grace period (net f0000098)
>> [ 1583.745353] nfsd: last server has exited, flushing export cache
>> [ 1583.876877] NFSD: Using nfsdcld client tracking operations.
>> [ 1583.877573] NFSD: starting 15-second grace period (net f0000098)
>> [ 1586.401321] nfsd: last server has exited, flushing export cache
>> [ 1586.525629] NFSD: Using nfsdcld client tracking operations.
>> [ 1586.526388] NFSD: starting 15-second grace period (net f0000098)
>> [ 1625.993218] nfsd: last server has exited, flushing export cache
>> [ 1626.442627] NFSD: Using nfsdcld client tracking operations.
>> [ 1626.444397] NFSD: starting 15-second grace period (net f0000098)
>> [ 1627.117214] nfsd: last server has exited, flushing export cache
>> [ 1627.351487] NFSD: Using nfsdcld client tracking operations.
>> [ 1627.352663] NFSD: starting 15-second grace period (net f0000098)
>> [ 1627.854410] NFSD: all clients done reclaiming, ending NFSv4 grace period (net f0000098)
>> [ 3285.818905] clocksource: timekeeping watchdog on CPU3: acpi_pm retried 2 times before success
next prev parent reply other threads:[~2021-09-28 6:23 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-24 20:52 [PATCH RFC v4 0/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo
2021-09-24 20:52 ` [PATCH RFC v4 1/2] fs/lock: add new callback, lm_expire_lock, to lock_manager_operations Dai Ngo
2021-09-24 20:52 ` [PATCH RFC v4 2/2] nfsd: Initial implementation of NFSv4 Courteous Server Dai Ngo
2021-09-27 20:14 ` [PATCH RFC v4 0/2] " J. Bruce Fields
2021-09-27 20:39 ` J. Bruce Fields
2021-09-28 6:22 ` dai.ngo [this message]
2021-09-28 14:10 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4f5cdc04-9584-282c-8add-4b0c1e313f1b@oracle.com \
--to=dai.ngo@oracle.com \
--cc=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).