* [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin().
@ 2011-03-26 4:12 Tetsuo Handa
2011-03-28 17:12 ` Steven Rostedt
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Tetsuo Handa @ 2011-03-26 4:12 UTC (permalink / raw)
To: peterz, mingo; +Cc: linux-kernel
I got a freeze between "seqlock_t rename_lock" and "DEFINE_BRLOCK(vfsmount_lock)"
in 2.6.38. The bug was already fixed in linux-2.6 git tree.
http://www.spinics.net/lists/linux-fsdevel/msg43078.html
This bug is timing dependent and lockdep shows nothing before the freeze.
I think that lockdep cannot detect this bug because no lock primitives are
called within read_seqbegin().
/* Lock out other writers and update the count.
* Acts like a normal spin_lock/unlock.
* Don't need preempt_disable() because that is in the spin_lock already.
*/
static inline void write_seqlock(seqlock_t *sl)
{
spin_lock(&sl->lock);
++sl->sequence;
smp_wmb();
}
static inline void write_sequnlock(seqlock_t *sl)
{
smp_wmb();
sl->sequence++;
spin_unlock(&sl->lock);
}
static __always_inline unsigned read_seqbegin(const seqlock_t *sl)
{
unsigned ret;
repeat:
ret = sl->sequence;
smp_rmb();
if (unlikely(ret & 1)) {
cpu_relax();
goto repeat;
}
return ret;
}
read_seqbegin() waits until the writer (if any) calls write_sequnlock().
read_seqbegin() acts like a sort of normal spin_lock()+spin_unlock().
--- linux-2.6.38.1.orig/include/linux/seqlock.h
+++ linux-2.6.38.1/include/linux/seqlock.h
@@ -86,6 +86,11 @@ static inline int write_tryseqlock(seqlo
static __always_inline unsigned read_seqbegin(const seqlock_t *sl)
{
unsigned ret;
+#ifdef CONFIG_PROVE_LOCKING
+ unsigned long flags;
+ spin_lock_irqsave(&((seqlock_t *) sl)->lock, flags);
+ spin_unlock_irqrestore(&((seqlock_t *) sl)->lock, flags);
+#endif
repeat:
ret = sl->sequence;
Therefore, if we apply the patch above, lockdep can detect this bug.
[ 179.457160] =======================================================
[ 179.458004] [ INFO: possible circular locking dependency detected ]
[ 179.458004] 2.6.38.1 #2
[ 179.458004] -------------------------------------------------------
[ 179.458004] ccs_execute_han/3717 is trying to acquire lock:
[ 179.458004] (vfsmount_lock){++++..}, at: [<c04d9910>] vfsmount_lock_local_lock+0x0/0x60
[ 179.458004]
[ 179.458004] but task is already holding lock:
[ 179.458004] (rename_lock){+.+...}, at: [<c04d37dc>] __d_path+0x3c/0x80
[ 179.458004]
[ 179.458004] which lock already depends on the new lock.
[ 179.458004]
[ 179.458004]
[ 179.458004] the existing dependency chain (in reverse order) is:
[ 179.458004]
[ 179.458004] -> #1 (rename_lock){+.+...}:
[ 179.458004] [<c046d5f4>] __lock_acquire+0x244/0x6d0
[ 179.458004] [<c046dafb>] lock_acquire+0x7b/0xa0
[ 179.458004] [<c06c50a5>] _raw_spin_lock_irqsave+0x45/0x80
[ 179.458004] [<c04d308b>] is_subdir+0x2b/0xd0
[ 179.458004] [<c04dac29>] sys_pivot_root+0x219/0x290
[ 179.458004] [<c0402c50>] sysenter_do_call+0x12/0x36
[ 179.458004]
[ 179.458004] -> #0 (vfsmount_lock){++++..}:
[ 179.458004] [<c046d39e>] validate_chain+0x10ee/0x1100
[ 179.458004] [<c046d5f4>] __lock_acquire+0x244/0x6d0
[ 179.458004] [<c046dafb>] lock_acquire+0x7b/0xa0
[ 179.458004] [<c04d9943>] vfsmount_lock_local_lock+0x33/0x60
[ 179.458004] [<c04d284c>] prepend_path+0x1c/0x150
[ 179.458004] [<c04d37f4>] __d_path+0x54/0x80
[ 179.458004] [<e090e1f2>] ccs_realpath_from_path+0x122/0x3e0 [ccsecurity]
[ 179.458004] [<e090eca4>] ccs_get_exe+0x54/0x60 [ccsecurity]
[ 179.458004] [<e090ccdb>] ccs_manager+0x6b/0x1e0 [ccsecurity]
[ 179.458004] [<e090d547>] ccs_write_control+0xc7/0x250 [ccsecurity]
[ 179.458004] [<e090dec8>] ccs_write+0x8/0x10 [ccsecurity]
[ 179.458004] [<c050145d>] proc_reg_write+0x5d/0x90
[ 179.458004] [<c04c0256>] vfs_write+0x96/0x140
[ 179.458004] [<c04c080d>] sys_write+0x3d/0x70
[ 179.458004] [<c0402c50>] sysenter_do_call+0x12/0x36
[ 179.458004]
[ 179.458004] other info that might help us debug this:
[ 179.458004]
[ 179.458004] 4 locks held by ccs_execute_han/3717:
[ 179.458004] #0: (&head->io_sem){+.+.+.}, at: [<e090d4ea>] ccs_write_control+0x6a/0x250 [ccsecurity]
[ 179.458004] #1: (&ccs_ss){.+.+.+}, at: [<e090d4f7>] ccs_write_control+0x77/0x250 [ccsecurity]
[ 179.458004] #2: (&mm->mmap_sem){++++++}, at: [<e090ec72>] ccs_get_exe+0x22/0x60 [ccsecurity]
[ 179.458004] #3: (rename_lock){+.+...}, at: [<c04d37dc>] __d_path+0x3c/0x80
[ 179.458004]
[ 179.458004] stack backtrace:
[ 179.458004] Pid: 3717, comm: ccs_execute_han Not tainted 2.6.38.1 #2
[ 179.458004] Call Trace:
[ 179.458004] [<c046bceb>] ? print_circular_bug+0xbb/0xc0
[ 179.458004] [<c046d39e>] ? validate_chain+0x10ee/0x1100
[ 179.458004] [<c0431216>] ? scheduler_tick+0x146/0x1f0
[ 179.458004] [<c046d5f4>] ? __lock_acquire+0x244/0x6d0
[ 179.458004] [<c046dafb>] ? lock_acquire+0x7b/0xa0
[ 179.458004] [<c04d9910>] ? vfsmount_lock_local_lock+0x0/0x60
[ 179.458004] [<c04d9943>] ? vfsmount_lock_local_lock+0x33/0x60
[ 179.458004] [<c04d9910>] ? vfsmount_lock_local_lock+0x0/0x60
[ 179.458004] [<c04d284c>] ? prepend_path+0x1c/0x150
[ 179.458004] [<c04d37f4>] ? __d_path+0x54/0x80
[ 179.458004] [<e090e1f2>] ? ccs_realpath_from_path+0x122/0x3e0 [ccsecurity]
[ 179.458004] [<e090ec72>] ? ccs_get_exe+0x22/0x60 [ccsecurity]
[ 179.458004] [<c06c42af>] ? down_read+0x7f/0x90
[ 179.458004] [<e090eca4>] ? ccs_get_exe+0x54/0x60 [ccsecurity]
[ 179.458004] [<e090ccdb>] ? ccs_manager+0x6b/0x1e0 [ccsecurity]
[ 179.458004] [<e090d4f7>] ? ccs_write_control+0x77/0x250 [ccsecurity]
[ 179.458004] [<e090d547>] ? ccs_write_control+0xc7/0x250 [ccsecurity]
[ 179.458004] [<e090d4f7>] ? ccs_write_control+0x77/0x250 [ccsecurity]
[ 179.458004] [<e090dec0>] ? ccs_write+0x0/0x10 [ccsecurity]
[ 179.458004] [<e090dec8>] ? ccs_write+0x8/0x10 [ccsecurity]
[ 179.458004] [<c050145d>] ? proc_reg_write+0x5d/0x90
[ 179.458004] [<c04c0256>] ? vfs_write+0x96/0x140
[ 179.458004] [<c04af2e7>] ? sys_mmap_pgoff+0x77/0x100
[ 179.458004] [<c0501400>] ? proc_reg_write+0x0/0x90
[ 179.458004] [<c04c080d>] ? sys_write+0x3d/0x70
[ 179.458004] [<c0402c50>] ? sysenter_do_call+0x12/0x36
But there is still a problem. It seems to me that lockdep checks for this bug
only when a new locking pattern (a locking pattern which was not already added
to lockdep database) is added. This freeze can be triggered by running
while :; do newns /sbin/pivot_root /proc/ /proc/sys/; done
on one terminal and running
while :; do /bin/ls -l /proc/*/exe; done
on another terminal. (The "newns" is a program that unshares the mnt namespace
before execve() using CLONE_NEWNS.) But even after applying the patch above,
lockdep does not show the trace above.
lockdep shows the trace above only after I run test programs for TOMOYO (which
causes a locking pattern that was generated by neither
"/sbin/pivot_root /proc/ /proc/sys/" nor "/bin/ls -l /proc/*/exe" to be added
to lockdep database).
I think that we want some method for rechecking already added locking pattern.
Maybe it is run by (e.g.) every 60 seconds. Maybe it is run when stall checking
mechanisms report the possibility of stall. (The sysrq key didn't work after
the freeze occurred.)
Also, what to do with __read_seqcount_begin() case? Since seqcount_t does not
have a spinlock embedded into the struct, we can't use lock primitives...?
Regards.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin().
2011-03-26 4:12 [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin() Tetsuo Handa
@ 2011-03-28 17:12 ` Steven Rostedt
2011-03-28 21:57 ` Tetsuo Handa
2011-03-29 4:30 ` Tetsuo Handa
2011-03-29 13:11 ` Peter Zijlstra
2011-03-29 13:14 ` Peter Zijlstra
2 siblings, 2 replies; 13+ messages in thread
From: Steven Rostedt @ 2011-03-28 17:12 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: peterz, mingo, linux-kernel
On Sat, Mar 26, 2011 at 01:12:17PM +0900, Tetsuo Handa wrote:
> I got a freeze between "seqlock_t rename_lock" and "DEFINE_BRLOCK(vfsmount_lock)"
Egad, that DEFINE_BRLOCK() is a scary macro. Turns into DEFINE_LGLOCK()
and creates a local and global lock.
>
> But there is still a problem. It seems to me that lockdep checks for this bug
> only when a new locking pattern (a locking pattern which was not already added
> to lockdep database) is added. This freeze can be triggered by running
>
> while :; do newns /sbin/pivot_root /proc/ /proc/sys/; done
>
> on one terminal and running
>
> while :; do /bin/ls -l /proc/*/exe; done
>
> on another terminal. (The "newns" is a program that unshares the mnt namespace
> before execve() using CLONE_NEWNS.) But even after applying the patch above,
> lockdep does not show the trace above.
>
> lockdep shows the trace above only after I run test programs for TOMOYO (which
> causes a locking pattern that was generated by neither
> "/sbin/pivot_root /proc/ /proc/sys/" nor "/bin/ls -l /proc/*/exe" to be added
> to lockdep database).
>
> I think that we want some method for rechecking already added locking pattern.
> Maybe it is run by (e.g.) every 60 seconds. Maybe it is run when stall checking
> mechanisms report the possibility of stall. (The sysrq key didn't work after
> the freeze occurred.)
This is where you confused me. Why would a pattern that was previously
checked, suddenly break? If we found A->B and then come across B->A,
lockdep should surely catch it. Maybe I don't understand exactly whay
you are trying to say.
>
> Also, what to do with __read_seqcount_begin() case? Since seqcount_t does not
> have a spinlock embedded into the struct, we can't use lock primitives...?
>
We could probably just add a lockdep key to the seqcount, and "acquire"
and "release" it when necessary. I can look into that as I'm playing
around with lockdep now anyway ;)
-- Steve
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin().
2011-03-28 17:12 ` Steven Rostedt
@ 2011-03-28 21:57 ` Tetsuo Handa
2011-03-29 4:30 ` Tetsuo Handa
1 sibling, 0 replies; 13+ messages in thread
From: Tetsuo Handa @ 2011-03-28 21:57 UTC (permalink / raw)
To: rostedt; +Cc: peterz, mingo, linux-kernel
Steven Rostedt wrote:
> This is where you confused me. Why would a pattern that was previously
> checked, suddenly break? If we found A->B and then come across B->A,
> lockdep should surely catch it.
I don't know. This bug can be triggered by simply running
while :; do newns /sbin/pivot_root /proc/ /proc/sys/; done
and
while :; do /bin/ls -l /proc/*/exe; done
in parallel. However, lockdep can't catch it even after applying
--- linux-2.6.38.1.orig/include/linux/seqlock.h
+++ linux-2.6.38.1/include/linux/seqlock.h
@@ -86,6 +86,11 @@ static inline int write_tryseqlock(seqlo
static __always_inline unsigned read_seqbegin(const seqlock_t *sl)
{
unsigned ret;
+#ifdef CONFIG_PROVE_LOCKING
+ unsigned long flags;
+ spin_lock_irqsave(&((seqlock_t *) sl)->lock, flags);
+ spin_unlock_irqrestore(&((seqlock_t *) sl)->lock, flags);
+#endif
repeat:
ret = sl->sequence;
. Maybe usage of spin_lock_irqsave()/spin_unlock_irqrestore() is bad.
> Maybe I don't understand exactly whay you are trying to say.
Just apply the patch and run the 2 loops above in parallel using 2.6.38.1
will show you. Source code for newns is shown below.
---------- newns.c start ----------
#include <stdio.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <sched.h>
#include <errno.h>
#include <stdlib.h>
static int child(void *arg)
{
char **argv = (char **) arg;
argv++;
execvp(argv[0], argv);
_exit(1);
}
int main(int argc, char *argv[])
{
char c = 0;
char *stack = malloc(8192);
const pid_t pid = clone(child, stack + (8192 / 2), CLONE_NEWNS,
(void *) argv);
while (waitpid(pid, NULL, __WALL) == EOF && errno == EINTR)
c++; /* Dummy. */
return 0;
}
---------- newns.c end ----------
Regards.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin().
2011-03-28 17:12 ` Steven Rostedt
2011-03-28 21:57 ` Tetsuo Handa
@ 2011-03-29 4:30 ` Tetsuo Handa
2011-03-29 12:49 ` Steven Rostedt
2011-03-29 13:39 ` Peter Zijlstra
1 sibling, 2 replies; 13+ messages in thread
From: Tetsuo Handa @ 2011-03-29 4:30 UTC (permalink / raw)
To: rostedt; +Cc: peterz, mingo, linux-kernel
I made a small test module (which should work on 2.6.38.2).
---------- locktest.c start ----------
#include <linux/module.h>
#include <linux/seqlock.h>
#include <linux/lglock.h>
#include <linux/proc_fs.h>
static seqlock_t seqlock1;
static DEFINE_BRLOCK(brlock1);
static unsigned int read_seqbegin2(seqlock_t *sl)
{
unsigned int ret;
#if 1
unsigned long flags;
spin_lock_irqsave(&sl->lock, flags);
spin_unlock_irqrestore(&sl->lock, flags);
#endif
repeat:
ret = sl->sequence;
smp_rmb();
if (unlikely(ret & 1)) {
cpu_relax();
goto repeat;
}
return ret;
}
static int locktest_open1(struct inode *inode, struct file *file)
{
write_seqlock(&seqlock1);
msleep(1000); /* Open /proc/locktest2 while sleeping here. */
br_read_lock(brlock1);
br_read_unlock(brlock1);
write_sequnlock(&seqlock1);
return -EINVAL;
}
static int locktest_open2(struct inode *inode, struct file *file)
{
br_write_lock(brlock1);
read_seqbegin2(&seqlock1);
br_write_unlock(brlock1);
return -EINVAL;
}
static int locktest_open3(struct inode *inode, struct file *file)
{
static DEFINE_MUTEX(mutex1);
mutex_lock(&mutex1);
locktest_open1(inode, file);
mutex_unlock(&mutex1);
return -EINVAL;
}
static const struct file_operations locktest_operations1 = {
.open = locktest_open1
};
static const struct file_operations locktest_operations2 = {
.open = locktest_open2
};
static const struct file_operations locktest_operations3 = {
.open = locktest_open3
};
static int __init locktest_init(void)
{
struct proc_dir_entry *entry;
seqlock_init(&seqlock1);
entry = create_proc_entry("locktest1", 0666, NULL);
if (entry)
entry->proc_fops = &locktest_operations1;
entry = create_proc_entry("locktest2", 0666, NULL);
if (entry)
entry->proc_fops = &locktest_operations2;
entry = create_proc_entry("locktest3", 0666, NULL);
if (entry)
entry->proc_fops = &locktest_operations3;
return 0;
}
static void __exit locktest_exit(void)
{
remove_proc_entry("locktest1", NULL);
remove_proc_entry("locktest2", NULL);
remove_proc_entry("locktest3", NULL);
}
module_init(locktest_init);
module_exit(locktest_exit);
MODULE_LICENSE("GPL");
---------- locktest.c end ----------
---------- Makefile start ----------
obj-m += locktest.o
---------- Makefile end ----------
Below are my testcases and results (gcc 3.3 / x86_32).
Kernel config is at http://I-love.SAKURA.ne.jp/tmp/config-2.6.38.2 .
---------- Testcase 1 ----------
# cat /proc/locktest1
# cat /proc/locktest2
# cat /proc/locktest1
# cat /proc/locktest2
# cat /proc/locktest1
# cat /proc/locktest2
showed nothing in dmesg.
---------- Testcase 2 ----------
# cat /proc/locktest1
# cat /proc/locktest2
# cat /proc/locktest3
showed below message.
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.38.2 #1
-------------------------------------------------------
cat/2892 is trying to acquire lock:
(brlock1_lock_dep_map){++++..}, at: [<e08440b0>] brlock1_local_lock+0x0/0x90 [locktest]
but task is already holding lock:
(&(&(&seqlock1)->lock)->rlock){+.+...}, at: [<e084451d>] locktest_open1+0xd/0x40 [locktest]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&(&(&seqlock1)->lock)->rlock){+.+...}:
[<c106b62b>] check_prevs_add+0xab/0x100
[<c106b9b5>] validate_chain+0x305/0x5a0
[<c106dab4>] __lock_acquire+0x2a4/0x900
[<c106f1ba>] lock_acquire+0x7a/0xa0
[<c13a83e4>] _raw_spin_lock_irqsave+0x44/0x80
[<e08444e9>] read_seqbegin2+0x19/0x40 [locktest]
[<e0844562>] locktest_open2+0x12/0x20 [locktest]
[<c1113f95>] proc_reg_open+0x65/0xe0
[<c10caf2b>] __dentry_open+0xbb/0x2a0
[<c10cb20e>] nameidata_to_filp+0x5e/0x70
[<c10d8207>] finish_open+0x77/0xf0
[<c10d86e9>] do_filp_open+0x1a9/0x5c0
[<c10cb50c>] do_sys_open+0x5c/0xe0
[<c10cb5b9>] sys_open+0x29/0x40
[<c13a8f81>] syscall_call+0x7/0xb
-> #0 (brlock1_lock_dep_map){++++..}:
[<c106b4e8>] check_prev_add+0x768/0x800
[<c106b62b>] check_prevs_add+0xab/0x100
[<c106b9b5>] validate_chain+0x305/0x5a0
[<c106dab4>] __lock_acquire+0x2a4/0x900
[<c106f1ba>] lock_acquire+0x7a/0xa0
[<e08440e3>] brlock1_local_lock+0x33/0x90 [locktest]
[<e0844532>] locktest_open1+0x22/0x40 [locktest]
[<e0844596>] locktest_open3+0x26/0x50 [locktest]
[<c1113f95>] proc_reg_open+0x65/0xe0
[<c10caf2b>] __dentry_open+0xbb/0x2a0
[<c10cb20e>] nameidata_to_filp+0x5e/0x70
[<c10d8207>] finish_open+0x77/0xf0
[<c10d86e9>] do_filp_open+0x1a9/0x5c0
[<c10cb50c>] do_sys_open+0x5c/0xe0
[<c10cb5b9>] sys_open+0x29/0x40
[<c13a8f81>] syscall_call+0x7/0xb
other info that might help us debug this:
2 locks held by cat/2892:
#0: (mutex1){+.+...}, at: [<e084458d>] locktest_open3+0x1d/0x50 [locktest]
#1: (&(&(&seqlock1)->lock)->rlock){+.+...}, at: [<e084451d>] locktest_open1+0xd/0x40 [locktest]
stack backtrace:
Pid: 2892, comm: cat Not tainted 2.6.38.2 #1
Call Trace:
[<c1069ff6>] ? print_circular_bug+0xc6/0xd0
[<c106b4e8>] ? check_prev_add+0x768/0x800
[<c106b62b>] ? check_prevs_add+0xab/0x100
[<c106b9b5>] ? validate_chain+0x305/0x5a0
[<c106dab4>] ? __lock_acquire+0x2a4/0x900
[<c104a0ed>] ? destroy_timer_on_stack+0xd/0x10
[<c13a6776>] ? schedule_timeout+0xf6/0x1b0
[<c106f1ba>] ? lock_acquire+0x7a/0xa0
[<e08440b0>] ? brlock1_local_lock+0x0/0x90 [locktest]
[<e0844570>] ? locktest_open3+0x0/0x50 [locktest]
[<e08440e3>] ? brlock1_local_lock+0x33/0x90 [locktest]
[<e08440b0>] ? brlock1_local_lock+0x0/0x90 [locktest]
[<c104af75>] ? msleep+0x15/0x20
[<e0844532>] ? locktest_open1+0x22/0x40 [locktest]
[<e0844596>] ? locktest_open3+0x26/0x50 [locktest]
[<c1113f95>] ? proc_reg_open+0x65/0xe0
[<c10caf2b>] ? __dentry_open+0xbb/0x2a0
[<c10cb20e>] ? nameidata_to_filp+0x5e/0x70
[<c1113f30>] ? proc_reg_open+0x0/0xe0
[<c10d8207>] ? finish_open+0x77/0xf0
[<c10d86e9>] ? do_filp_open+0x1a9/0x5c0
[<c106c97b>] ? trace_hardirqs_off+0xb/0x10
[<c13a853d>] ? _raw_spin_unlock+0x1d/0x20
[<c10e5d81>] ? alloc_fd+0xe1/0x1a0
[<c10cb50c>] ? do_sys_open+0x5c/0xe0
[<c10cb5b9>] ? sys_open+0x29/0x40
[<c13a8f81>] ? syscall_call+0x7/0xb
---------- Testcase 3 ----------
# cat /proc/locktest2
# cat /proc/locktest1
showed below message.
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.38.2 #1
-------------------------------------------------------
cat/2891 is trying to acquire lock:
(brlock1_lock_dep_map){++++..}, at: [<e08440b0>] brlock1_local_lock+0x0/0x90 [locktest]
but task is already holding lock:
(&(&(&seqlock1)->lock)->rlock){+.+...}, at: [<e084451d>] locktest_open1+0xd/0x40 [locktest]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&(&(&seqlock1)->lock)->rlock){+.+...}:
[<c106b62b>] check_prevs_add+0xab/0x100
[<c106b9b5>] validate_chain+0x305/0x5a0
[<c106dab4>] __lock_acquire+0x2a4/0x900
[<c106f1ba>] lock_acquire+0x7a/0xa0
[<c13a83e4>] _raw_spin_lock_irqsave+0x44/0x80
[<e08444e9>] read_seqbegin2+0x19/0x40 [locktest]
[<e0844562>] locktest_open2+0x12/0x20 [locktest]
[<c1113f95>] proc_reg_open+0x65/0xe0
[<c10caf2b>] __dentry_open+0xbb/0x2a0
[<c10cb20e>] nameidata_to_filp+0x5e/0x70
[<c10d8207>] finish_open+0x77/0xf0
[<c10d86e9>] do_filp_open+0x1a9/0x5c0
[<c10cb50c>] do_sys_open+0x5c/0xe0
[<c10cb5b9>] sys_open+0x29/0x40
[<c13a8f81>] syscall_call+0x7/0xb
-> #0 (brlock1_lock_dep_map){++++..}:
[<c106b4e8>] check_prev_add+0x768/0x800
[<c106b62b>] check_prevs_add+0xab/0x100
[<c106b9b5>] validate_chain+0x305/0x5a0
[<c106dab4>] __lock_acquire+0x2a4/0x900
[<c106f1ba>] lock_acquire+0x7a/0xa0
[<e08440e3>] brlock1_local_lock+0x33/0x90 [locktest]
[<e0844532>] locktest_open1+0x22/0x40 [locktest]
[<c1113f95>] proc_reg_open+0x65/0xe0
[<c10caf2b>] __dentry_open+0xbb/0x2a0
[<c10cb20e>] nameidata_to_filp+0x5e/0x70
[<c10d8207>] finish_open+0x77/0xf0
[<c10d86e9>] do_filp_open+0x1a9/0x5c0
[<c10cb50c>] do_sys_open+0x5c/0xe0
[<c10cb5b9>] sys_open+0x29/0x40
[<c13a8f81>] syscall_call+0x7/0xb
other info that might help us debug this:
1 lock held by cat/2891:
#0: (&(&(&seqlock1)->lock)->rlock){+.+...}, at: [<e084451d>] locktest_open1+0xd/0x40 [locktest]
stack backtrace:
Pid: 2891, comm: cat Not tainted 2.6.38.2 #1
Call Trace:
[<c1069ff6>] ? print_circular_bug+0xc6/0xd0
[<c106b4e8>] ? check_prev_add+0x768/0x800
[<c100590e>] ? dump_trace+0x5e/0xd0
[<c106b62b>] ? check_prevs_add+0xab/0x100
[<c106b9b5>] ? validate_chain+0x305/0x5a0
[<c106d10c>] ? mark_lock+0x21c/0x3c0
[<c106dab4>] ? __lock_acquire+0x2a4/0x900
[<c104a0ed>] ? destroy_timer_on_stack+0xd/0x10
[<c13a6776>] ? schedule_timeout+0xf6/0x1b0
[<c106f1ba>] ? lock_acquire+0x7a/0xa0
[<e08440b0>] ? brlock1_local_lock+0x0/0x90 [locktest]
[<e0844510>] ? locktest_open1+0x0/0x40 [locktest]
[<e08440e3>] ? brlock1_local_lock+0x33/0x90 [locktest]
[<e08440b0>] ? brlock1_local_lock+0x0/0x90 [locktest]
[<c104af75>] ? msleep+0x15/0x20
[<e0844532>] ? locktest_open1+0x22/0x40 [locktest]
[<c1113f95>] ? proc_reg_open+0x65/0xe0
[<c10caf2b>] ? __dentry_open+0xbb/0x2a0
[<c10cb20e>] ? nameidata_to_filp+0x5e/0x70
[<c1113f30>] ? proc_reg_open+0x0/0xe0
[<c10d8207>] ? finish_open+0x77/0xf0
[<c10d86e9>] ? do_filp_open+0x1a9/0x5c0
[<c106c97b>] ? trace_hardirqs_off+0xb/0x10
[<c13a853d>] ? _raw_spin_unlock+0x1d/0x20
[<c10e5d81>] ? alloc_fd+0xe1/0x1a0
[<c10cb50c>] ? do_sys_open+0x5c/0xe0
[<c10cb5b9>] ? sys_open+0x29/0x40
[<c13a8f81>] ? syscall_call+0x7/0xb
---------- Testcase 4 ----------
Starting
# cat /proc/locktest1
from one terminal and starting
# cat /proc/locktest2
(before "cat /proc/locktest1" finishes) from another terminal
showed below message.
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.38.2 #1
-------------------------------------------------------
cat/2893 is trying to acquire lock:
(brlock1_lock_dep_map){++++..}, at: [<e08440b0>] brlock1_local_lock+0x0/0x90 [locktest]
but task is already holding lock:
(&(&(&seqlock1)->lock)->rlock){+.+...}, at: [<e084451d>] locktest_open1+0xd/0x40 [locktest]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&(&(&seqlock1)->lock)->rlock){+.+...}:
[<c106b62b>] check_prevs_add+0xab/0x100
[<c106b9b5>] validate_chain+0x305/0x5a0
[<c106dab4>] __lock_acquire+0x2a4/0x900
[<c106f1ba>] lock_acquire+0x7a/0xa0
[<c13a83e4>] _raw_spin_lock_irqsave+0x44/0x80
[<e08444e9>] read_seqbegin2+0x19/0x40 [locktest]
[<e0844562>] locktest_open2+0x12/0x20 [locktest]
[<c1113f95>] proc_reg_open+0x65/0xe0
[<c10caf2b>] __dentry_open+0xbb/0x2a0
[<c10cb20e>] nameidata_to_filp+0x5e/0x70
[<c10d8207>] finish_open+0x77/0xf0
[<c10d86e9>] do_filp_open+0x1a9/0x5c0
[<c10cb50c>] do_sys_open+0x5c/0xe0
[<c10cb5b9>] sys_open+0x29/0x40
[<c13a8f81>] syscall_call+0x7/0xb
-> #0 (brlock1_lock_dep_map){++++..}:
[<c106b4e8>] check_prev_add+0x768/0x800
[<c106b62b>] check_prevs_add+0xab/0x100
[<c106b9b5>] validate_chain+0x305/0x5a0
[<c106dab4>] __lock_acquire+0x2a4/0x900
[<c106f1ba>] lock_acquire+0x7a/0xa0
[<e08440e3>] brlock1_local_lock+0x33/0x90 [locktest]
[<e0844532>] locktest_open1+0x22/0x40 [locktest]
[<c1113f95>] proc_reg_open+0x65/0xe0
[<c10caf2b>] __dentry_open+0xbb/0x2a0
[<c10cb20e>] nameidata_to_filp+0x5e/0x70
[<c10d8207>] finish_open+0x77/0xf0
[<c10d86e9>] do_filp_open+0x1a9/0x5c0
[<c10cb50c>] do_sys_open+0x5c/0xe0
[<c10cb5b9>] sys_open+0x29/0x40
[<c13a8f81>] syscall_call+0x7/0xb
other info that might help us debug this:
1 lock held by cat/2893:
#0: (&(&(&seqlock1)->lock)->rlock){+.+...}, at: [<e084451d>] locktest_open1+0xd/0x40 [locktest]
stack backtrace:
Pid: 2893, comm: cat Not tainted 2.6.38.2 #1
Call Trace:
[<c1069ff6>] ? print_circular_bug+0xc6/0xd0
[<c106b4e8>] ? check_prev_add+0x768/0x800
[<c100590e>] ? dump_trace+0x5e/0xd0
[<c106b62b>] ? check_prevs_add+0xab/0x100
[<c106b9b5>] ? validate_chain+0x305/0x5a0
[<c106d10c>] ? mark_lock+0x21c/0x3c0
[<c106dab4>] ? __lock_acquire+0x2a4/0x900
[<c104a0ed>] ? destroy_timer_on_stack+0xd/0x10
[<c13a6776>] ? schedule_timeout+0xf6/0x1b0
[<c106f1ba>] ? lock_acquire+0x7a/0xa0
[<e08440b0>] ? brlock1_local_lock+0x0/0x90 [locktest]
[<e0844510>] ? locktest_open1+0x0/0x40 [locktest]
[<e08440e3>] ? brlock1_local_lock+0x33/0x90 [locktest]
[<e08440b0>] ? brlock1_local_lock+0x0/0x90 [locktest]
[<c104af75>] ? msleep+0x15/0x20
[<e0844532>] ? locktest_open1+0x22/0x40 [locktest]
[<c1113f95>] ? proc_reg_open+0x65/0xe0
[<c10caf2b>] ? __dentry_open+0xbb/0x2a0
[<c10cb20e>] ? nameidata_to_filp+0x5e/0x70
[<c1113f30>] ? proc_reg_open+0x0/0xe0
[<c10d8207>] ? finish_open+0x77/0xf0
[<c10d86e9>] ? do_filp_open+0x1a9/0x5c0
[<c106c97b>] ? trace_hardirqs_off+0xb/0x10
[<c13a853d>] ? _raw_spin_unlock+0x1d/0x20
[<c10e5d81>] ? alloc_fd+0xe1/0x1a0
[<c10cb50c>] ? do_sys_open+0x5c/0xe0
[<c10cb5b9>] ? sys_open+0x29/0x40
[<c13a8f81>] ? syscall_call+0x7/0xb
BUG: spinlock lockup on CPU#1, cat/2894, e0844a44
Pid: 2894, comm: cat Not tainted 2.6.38.2 #1
Call Trace:
[<c11e7d8a>] ? __spin_lock_debug+0xca/0xf0
[<c11e7e19>] ? do_raw_spin_lock+0x69/0xa0
[<c13a8419>] ? _raw_spin_lock_irqsave+0x79/0x80
[<e08444e9>] ? read_seqbegin2+0x19/0x40 [locktest]
[<e0844550>] ? locktest_open2+0x0/0x20 [locktest]
[<e08444e9>] ? read_seqbegin2+0x19/0x40 [locktest]
[<e0844562>] ? locktest_open2+0x12/0x20 [locktest]
[<c1113f95>] ? proc_reg_open+0x65/0xe0
[<c10caf2b>] ? __dentry_open+0xbb/0x2a0
[<c10cb20e>] ? nameidata_to_filp+0x5e/0x70
[<c1113f30>] ? proc_reg_open+0x0/0xe0
[<c10d8207>] ? finish_open+0x77/0xf0
[<c10d86e9>] ? do_filp_open+0x1a9/0x5c0
[<c106c97b>] ? trace_hardirqs_off+0xb/0x10
[<c13a853d>] ? _raw_spin_unlock+0x1d/0x20
[<c10e5d81>] ? alloc_fd+0xe1/0x1a0
[<c10cb50c>] ? do_sys_open+0x5c/0xe0
[<c10cb5b9>] ? sys_open+0x29/0x40
[<c13a8f81>] ? syscall_call+0x7/0xb
sending NMI to all CPUs:
(...snipped...)
---------- Testcase 5 ----------
Using testcase 4, but different result.
BUG: spinlock cpu recursion on CPU#0, cat/2894
lock: e0844a44, .magic: dead4ead, .owner: cat/2893, .owner_cpu: 0
Pid: 2894, comm: cat Not tainted 2.6.38.2 #1
Call Trace:
[<c11e7cad>] ? spin_bug+0xad/0xc0
[<c11e7e27>] ? do_raw_spin_lock+0x77/0xa0
[<c13a8419>] ? _raw_spin_lock_irqsave+0x79/0x80
[<e08444e9>] ? read_seqbegin2+0x19/0x40 [locktest]
[<e0844550>] ? locktest_open2+0x0/0x20 [locktest]
[<e08444e9>] ? read_seqbegin2+0x19/0x40 [locktest]
[<e0844562>] ? locktest_open2+0x12/0x20 [locktest]
[<c1113f95>] ? proc_reg_open+0x65/0xe0
[<c10caf2b>] ? __dentry_open+0xbb/0x2a0
[<c10cb20e>] ? nameidata_to_filp+0x5e/0x70
[<c1113f30>] ? proc_reg_open+0x0/0xe0
[<c10d8207>] ? finish_open+0x77/0xf0
[<c10d86e9>] ? do_filp_open+0x1a9/0x5c0
[<c106c97b>] ? trace_hardirqs_off+0xb/0x10
[<c13a853d>] ? _raw_spin_unlock+0x1d/0x20
[<c10e5d81>] ? alloc_fd+0xe1/0x1a0
[<c10cb50c>] ? do_sys_open+0x5c/0xe0
[<c10cb5b9>] ? sys_open+0x29/0x40
[<c13a8f81>] ? syscall_call+0x7/0xb
---------- Testcase 6 ----------
Same testcase with 4, but different result.
BUG: spinlock lockup on CPU#0, cat/2894, e0844a44
Pid: 2894, comm: cat Not tainted 2.6.38.2 #1
Call Trace:
[<c11e7d8a>] ? __spin_lock_debug+0xca/0xf0
[<c11e7e19>] ? do_raw_spin_lock+0x69/0xa0
[<c13a8419>] ? _raw_spin_lock_irqsave+0x79/0x80
[<e08444e9>] ? read_seqbegin2+0x19/0x40 [locktest]
[<e0844550>] ? locktest_open2+0x0/0x20 [locktest]
[<e08444e9>] ? read_seqbegin2+0x19/0x40 [locktest]
[<e0844562>] ? locktest_open2+0x12/0x20 [locktest]
[<c1113f95>] ? proc_reg_open+0x65/0xe0
[<c10caf2b>] ? __dentry_open+0xbb/0x2a0
[<c10cb20e>] ? nameidata_to_filp+0x5e/0x70
[<c1113f30>] ? proc_reg_open+0x0/0xe0
[<c10d8207>] ? finish_open+0x77/0xf0
[<c10d86e9>] ? do_filp_open+0x1a9/0x5c0
[<c106c97b>] ? trace_hardirqs_off+0xb/0x10
[<c13a853d>] ? _raw_spin_unlock+0x1d/0x20
[<c10e5d81>] ? alloc_fd+0xe1/0x1a0
[<c10cb50c>] ? do_sys_open+0x5c/0xe0
[<c10cb5b9>] ? sys_open+0x29/0x40
[<c13a8f81>] ? syscall_call+0x7/0xb
sending NMI to all CPUs:
NMI backtrace for cpu 0
(...snipped...)
Well... did I misuse spinlock primitives in read_seqbegin2()?
Anyway, lockdep should catch testcase 1.
Regards.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin().
2011-03-29 4:30 ` Tetsuo Handa
@ 2011-03-29 12:49 ` Steven Rostedt
2011-03-29 13:39 ` Peter Zijlstra
1 sibling, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2011-03-29 12:49 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: peterz, mingo, linux-kernel
On Tue, 2011-03-29 at 13:30 +0900, Tetsuo Handa wrote:
> I made a small test module (which should work on 2.6.38.2).
Thanks! I'll take a look at this later today.
>
> ---------- locktest.c start ----------
[...]
> ---------- Makefile end ----------
>
> Below are my testcases and results (gcc 3.3 / x86_32).
> Kernel config is at http://I-love.SAKURA.ne.jp/tmp/config-2.6.38.2 .
>
> ---------- Testcase 1 ----------
>
> # cat /proc/locktest1
> # cat /proc/locktest2
> # cat /proc/locktest1
> # cat /proc/locktest2
> # cat /proc/locktest1
> # cat /proc/locktest2
>
> showed nothing in dmesg.
>
> ---------- Testcase 2 ----------
>
> # cat /proc/locktest1
> # cat /proc/locktest2
> # cat /proc/locktest3
>
> showed below message.
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.38.2 #1
> -------------------------------------------------------
[...]
> Well... did I misuse spinlock primitives in read_seqbegin2()?
> Anyway, lockdep should catch testcase 1.
Strange. I'll investigate this further.
Thanks,
-- Steve
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin().
2011-03-26 4:12 [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin() Tetsuo Handa
2011-03-28 17:12 ` Steven Rostedt
@ 2011-03-29 13:11 ` Peter Zijlstra
2011-03-29 13:14 ` Peter Zijlstra
2 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2011-03-29 13:11 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: mingo, linux-kernel, tglx
On Sat, 2011-03-26 at 13:12 +0900, Tetsuo Handa wrote:
> @@ -86,6 +86,11 @@ static inline int write_tryseqlock(seqlo
> static __always_inline unsigned read_seqbegin(const seqlock_t *sl)
> {
> unsigned ret;
> +#ifdef CONFIG_PROVE_LOCKING
> + unsigned long flags;
> + spin_lock_irqsave(&((seqlock_t *) sl)->lock, flags);
> + spin_unlock_irqrestore(&((seqlock_t *) sl)->lock, flags);
> +#endif
>
> repeat:
> ret = sl->sequence;
That isn't the right way, something like the below would do, however
there's a reason this isn't done, we use these primitives from the VDSO
which means they're not permitted to write to any kernel memory at all.
We could sort that by creating a raw_ variant that doesn't get the
annotation, something I think is already done for -rt anyway.
---
include/linux/seqlock.h | 13 ++++++++++++-
1 files changed, 12 insertions(+), 1 deletions(-)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index e98cd2e..f2bc19c 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -94,6 +94,7 @@ static __always_inline unsigned read_seqbegin(const seqlock_t *sl)
cpu_relax();
goto repeat;
}
+ rwlock_acquire_read(&sl->lock->dep_map, 0, 0, _RET_IP_);
return ret;
}
@@ -107,6 +108,8 @@ static __always_inline int read_seqretry(const seqlock_t *sl, unsigned start)
{
smp_rmb();
+ rwlock_release(&sl->lock->dep_map, 1, _RET_IP_);
+
return unlikely(sl->sequence != start);
}
@@ -120,6 +123,9 @@ static __always_inline int read_seqretry(const seqlock_t *sl, unsigned start)
typedef struct seqcount {
unsigned sequence;
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ struct lockdep_map dep_map;
+#endif
} seqcount_t;
#define SEQCNT_ZERO { 0 }
@@ -148,6 +154,7 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
cpu_relax();
goto repeat;
}
+ rwlock_acquire_read(&sl->dep_map, 0, 0, _RET_IP_);
return ret;
}
@@ -183,7 +190,9 @@ static inline unsigned read_seqcount_begin(const seqcount_t *s)
*/
static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
{
- return unlikely(s->sequence != start);
+ int ret = unlikely(s->sequence != start);
+ rwlock_release(&sl->dep_map, 1, _RET_IP_);
+ return ret;
}
/**
@@ -210,6 +219,7 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
*/
static inline void write_seqcount_begin(seqcount_t *s)
{
+ rwlock_acquire(&sl->dep_map, 0, 0, _RET_IP_);
s->sequence++;
smp_wmb();
}
@@ -218,6 +228,7 @@ static inline void write_seqcount_end(seqcount_t *s)
{
smp_wmb();
s->sequence++;
+ rwlock_release(&sl->dep_map, 1, _RET_IP_);
}
/**
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin().
2011-03-26 4:12 [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin() Tetsuo Handa
2011-03-28 17:12 ` Steven Rostedt
2011-03-29 13:11 ` Peter Zijlstra
@ 2011-03-29 13:14 ` Peter Zijlstra
2 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2011-03-29 13:14 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: mingo, linux-kernel
On Sat, 2011-03-26 at 13:12 +0900, Tetsuo Handa wrote:
>
> But there is still a problem. It seems to me that lockdep checks for this bug
> only when a new locking pattern (a locking pattern which was not already added
> to lockdep database) is added.
That's how it works. Why is that a problem?
> This freeze can be triggered by running
>
> while :; do newns /sbin/pivot_root /proc/ /proc/sys/; done
>
> on one terminal and running
>
> while :; do /bin/ls -l /proc/*/exe; done
>
> on another terminal. (The "newns" is a program that unshares the mnt namespace
> before execve() using CLONE_NEWNS.) But even after applying the patch above,
> lockdep does not show the trace above.
Then you're still missing something.
> lockdep shows the trace above only after I run test programs for TOMOYO (which
> causes a locking pattern that was generated by neither
> "/sbin/pivot_root /proc/ /proc/sys/" nor "/bin/ls -l /proc/*/exe" to be added
> to lockdep database).
And tomoyo manages to close the cycle for some reason.
> I think that we want some method for rechecking already added locking pattern.
> Maybe it is run by (e.g.) every 60 seconds. Maybe it is run when stall checking
> mechanisms report the possibility of stall. (The sysrq key didn't work after
> the freeze occurred.)
I don't get this, why would you need to recheck anything? lockdep does a
full analysis on every addition, rechecking when nothing changed should
not yield another result.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin().
2011-03-29 4:30 ` Tetsuo Handa
2011-03-29 12:49 ` Steven Rostedt
@ 2011-03-29 13:39 ` Peter Zijlstra
2011-03-29 17:50 ` Peter Zijlstra
1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2011-03-29 13:39 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: rostedt, mingo, linux-kernel
On Tue, 2011-03-29 at 13:30 +0900, Tetsuo Handa wrote:
> static int locktest_open1(struct inode *inode, struct file *file)
> {
> write_seqlock(&seqlock1);
> msleep(1000); /* Open /proc/locktest2 while sleeping here. */
> br_read_lock(brlock1);
> br_read_unlock(brlock1);
> write_sequnlock(&seqlock1);
> return -EINVAL;
> }
>
> static int locktest_open2(struct inode *inode, struct file *file)
> {
> br_write_lock(brlock1);
> read_seqbegin2(&seqlock1);
> br_write_unlock(brlock1);
> return -EINVAL;
> }
>
> static int locktest_open3(struct inode *inode, struct file *file)
> {
> static DEFINE_MUTEX(mutex1);
> mutex_lock(&mutex1);
> locktest_open1(inode, file);
> mutex_unlock(&mutex1);
> return -EINVAL;
> }
That's quite horrid as far as test-cases go, why bother with the
userspace part at all?
In order to hit your inversion you need to do something like:
cat /proc/locktest1 & cat /proc/locktest2
if you do them serialized you'll never hit that inversion.
That said, there are some out-standing issues with rw_locks and lockdep,
Gautham and I worked on that for a while but we never persevered and
finished it..
http://lkml.org/lkml/2009/5/11/203
And I think you're hitting that case.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin().
2011-03-29 13:39 ` Peter Zijlstra
@ 2011-03-29 17:50 ` Peter Zijlstra
2011-03-30 8:12 ` Tetsuo Handa
0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2011-03-29 17:50 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: rostedt, mingo, linux-kernel
On Tue, 2011-03-29 at 15:39 +0200, Peter Zijlstra wrote:
>
> That said, there are some out-standing issues with rw_locks and lockdep,
> Gautham and I worked on that for a while but we never persevered and
> finished it..
>
> http://lkml.org/lkml/2009/5/11/203
I just did a quick rebase onto tip/master (compile tested only):
http://programming.kicks-ass.net/sekrit/patches-lockdep.tar.bz2
That series needs testing and a few patches to extend the
lib/locking-selftest* bits to cover the new functionality.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin().
2011-03-29 17:50 ` Peter Zijlstra
@ 2011-03-30 8:12 ` Tetsuo Handa
2011-03-30 9:50 ` Peter Zijlstra
0 siblings, 1 reply; 13+ messages in thread
From: Tetsuo Handa @ 2011-03-30 8:12 UTC (permalink / raw)
To: peterz; +Cc: rostedt, mingo, linux-kernel
Peter Zijlstra wrote:
> On Tue, 2011-03-29 at 15:39 +0200, Peter Zijlstra wrote:
> >
> > That said, there are some out-standing issues with rw_locks and lockdep,
> > Gautham and I worked on that for a while but we never persevered and
> > finished it..
> >
> > http://lkml.org/lkml/2009/5/11/203
>
> I just did a quick rebase onto tip/master (compile tested only):
>
> http://programming.kicks-ass.net/sekrit/patches-lockdep.tar.bz2
>
> That series needs testing and a few patches to extend the
> lib/locking-selftest* bits to cover the new functionality.
Thanks, but I didn't apply above tarball to 2.6.38.2 because lockdep selftests
failed. Instead, I retested using 2.6.39-rc1 without applying above tarball.
> In order to hit your inversion you need to do something like:
>
> cat /proc/locktest1 & cat /proc/locktest2
>
> if you do them serialized you'll never hit that inversion.
Yes, I know. But I think that lockdep should report the possibility of hitting
that inversion even if I do them serialized.
My understanding is that lockdep can report the possibility of deadlock without
actually triggering deadlock as long as lockdep annotation is maintained
correctly and each annotation is called for at least one time.
Am I misunderstanding?
I'm exploring this code in order to let lockdep report the possibility of
hitting the inversion without actually hitting the inversion.
> On Sat, 2011-03-26 at 13:12 +0900, Tetsuo Handa wrote:
> > @@ -86,6 +86,11 @@ static inline int write_tryseqlock(seqlo
> > static __always_inline unsigned read_seqbegin(const seqlock_t *sl)
> > {
> > unsigned ret;
> > +#ifdef CONFIG_PROVE_LOCKING
> > + unsigned long flags;
> > + spin_lock_irqsave(&((seqlock_t *) sl)->lock, flags);
> > + spin_unlock_irqrestore(&((seqlock_t *) sl)->lock, flags);
> > +#endif
> >
> > repeat:
> > ret = sl->sequence;
>
> That isn't the right way, something like the below would do, however
> there's a reason this isn't done, we use these primitives from the VDSO
> which means they're not permitted to write to any kernel memory at all.
So, this is not a bug but intended coding. Then, we want a comment here why
lockdep annotation is missing.
> --- a/include/linux/seqlock.h
> +++ b/include/linux/seqlock.h
> @@ -94,6 +94,7 @@ static __always_inline unsigned read_seqbegin(const seqlock_t *sl)
> cpu_relax();
> goto repeat;
> }
> + rwlock_acquire_read(&sl->lock->dep_map, 0, 0, _RET_IP_);
>
> return ret;
> }
> @@ -107,6 +108,8 @@ static __always_inline int read_seqretry(const seqlock_t *sl, unsigned start)
> {
> smp_rmb();
>
> + rwlock_release(&sl->lock->dep_map, 1, _RET_IP_);
> +
> return unlikely(sl->sequence != start);
> }
Excuse me, but the lock embedded into seqlock_t is spinlock rather than rwlock.
I assume you meant spin_acquire()/spin_release() rather than
rwlock_acquire_read()/rwlock_release(). Also, I assume you meant to call
spin_acquire() before entering the spin state (as with
static inline void __raw_spin_lock(raw_spinlock_t *lock)
{
preempt_disable();
spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
}
. Otherwise, lockdep cannot report it when hit this bug upon the first call to
this function).
In order to avoid writing kernel memory from VDSO code,
I copied the original function and added lockdep annotation.
---------- locktest.c (V2) start ----------
#include <linux/module.h>
#include <linux/seqlock.h>
#include <linux/lglock.h>
#include <linux/proc_fs.h>
static seqlock_t seqlock1;
static DEFINE_BRLOCK(brlock1);
static __always_inline unsigned read_seqbegin2(seqlock_t *sl)
{
unsigned ret;
spin_acquire(&sl->lock.dep_map, 0, 0, _RET_IP_);
repeat:
ret = sl->sequence;
smp_rmb();
if (unlikely(ret & 1)) {
cpu_relax();
goto repeat;
}
return ret;
}
static __always_inline int read_seqretry2(seqlock_t *sl, unsigned start)
{
smp_rmb();
spin_release(&sl->lock.dep_map, 1, _RET_IP_);
return unlikely(sl->sequence != start);
}
static int locktest_open1(struct inode *inode, struct file *file)
{
write_seqlock(&seqlock1);
msleep(1000); /* Open /proc/locktest2 while sleeping here. */
br_read_lock(brlock1);
br_read_unlock(brlock1);
write_sequnlock(&seqlock1);
return -EINVAL;
}
static int locktest_open2(struct inode *inode, struct file *file)
{
unsigned int seq;
br_write_lock(brlock1);
seq = read_seqbegin2(&seqlock1);
read_seqretry2(&seqlock1, seq);
br_write_unlock(brlock1);
return -EINVAL;
}
static int locktest_open3(struct inode *inode, struct file *file)
{
static DEFINE_MUTEX(mutex1);
mutex_lock(&mutex1);
locktest_open1(inode, file);
mutex_unlock(&mutex1);
return -EINVAL;
}
static const struct file_operations locktest_operations1 = {
.open = locktest_open1
};
static const struct file_operations locktest_operations2 = {
.open = locktest_open2
};
static const struct file_operations locktest_operations3 = {
.open = locktest_open3
};
static int __init locktest_init(void)
{
struct proc_dir_entry *entry;
seqlock_init(&seqlock1);
entry = create_proc_entry("locktest1", 0666, NULL);
if (entry)
entry->proc_fops = &locktest_operations1;
entry = create_proc_entry("locktest2", 0666, NULL);
if (entry)
entry->proc_fops = &locktest_operations2;
entry = create_proc_entry("locktest3", 0666, NULL);
if (entry)
entry->proc_fops = &locktest_operations3;
return 0;
}
static void __exit locktest_exit(void)
{
remove_proc_entry("locktest1", NULL);
remove_proc_entry("locktest2", NULL);
remove_proc_entry("locktest3", NULL);
}
module_init(locktest_init);
module_exit(locktest_exit);
MODULE_LICENSE("GPL");
---------- locktest.c (V2) end ----------
(... well, I wonder calling spin_release() from read_seqretry2() makes sense
because it is legal to call it like
while (1) {
seq = read_seqbegin();
do_something();
if (read_seqretry(seq))
continue;
do_something_else();
if (!read_seqretry(seq))
break;
}
whereas it is illegal for read_unlock()...)
But the result was same.
"cat /proc/locktest1 /proc/locktest2" showed nothing in dmesg whereas
"cat /proc/locktest2 /proc/locktest1" showed below message.
[ 221.461750]
[ 221.461752] =======================================================
[ 221.464593] [ INFO: possible circular locking dependency detected ]
[ 221.465538] 2.6.39-rc1-ccs #1
[ 221.465538] -------------------------------------------------------
[ 221.465538] cat/3313 is trying to acquire lock:
[ 221.465538] (brlock1_lock_dep_map){++++..}, at: [<e08540b0>] brlock1_local_lock+0x0/0x90 [locktest]
[ 221.465538]
[ 221.465538] but task is already holding lock:
[ 221.465538] (&(&(&seqlock1)->lock)->rlock){+.+...}, at: [<e08544dd>] locktest_open1+0xd/0x40 [locktest]
[ 221.465538]
[ 221.465538] which lock already depends on the new lock.
[ 221.465538]
[ 221.465538]
[ 221.465538] the existing dependency chain (in reverse order) is:
[ 221.465538]
[ 221.465538] -> #1 (&(&(&seqlock1)->lock)->rlock){+.+...}:
[ 221.465538] [<c106c58b>] check_prevs_add+0xab/0x100
[ 221.465538] [<c106c915>] validate_chain+0x305/0x5a0
[ 221.465538] [<c106ea14>] __lock_acquire+0x2a4/0x900
[ 221.465538] [<c107011a>] lock_acquire+0x7a/0xa0
[ 221.465538] [<e0854546>] locktest_open2+0x36/0x70 [locktest]
[ 221.465538] [<c1118585>] proc_reg_open+0x65/0xe0
[ 221.465538] [<c10ce90f>] __dentry_open+0x16f/0x2e0
[ 221.465538] [<c10ceb7e>] nameidata_to_filp+0x5e/0x70
[ 221.465538] [<c10dc358>] do_last+0xf8/0x6c0
[ 221.465538] [<c10dc9c6>] path_openat+0xa6/0x340
[ 221.465538] [<c10dccd4>] do_filp_open+0x74/0x80
[ 221.465538] [<c10cf121>] do_sys_open+0x101/0x1a0
[ 221.465538] [<c10cf1e9>] sys_open+0x29/0x40
[ 221.465538] [<c13b4951>] syscall_call+0x7/0xb
[ 221.465538]
[ 221.465538] -> #0 (brlock1_lock_dep_map){++++..}:
[ 221.465538] [<c106c448>] check_prev_add+0x768/0x800
[ 221.465538] [<c106c58b>] check_prevs_add+0xab/0x100
[ 221.465538] [<c106c915>] validate_chain+0x305/0x5a0
[ 221.465538] [<c106ea14>] __lock_acquire+0x2a4/0x900
[ 221.465538] [<c107011a>] lock_acquire+0x7a/0xa0
[ 221.465538] [<e08540e3>] brlock1_local_lock+0x33/0x90 [locktest]
[ 221.465538] [<e08544f2>] locktest_open1+0x22/0x40 [locktest]
[ 221.465538] [<c1118585>] proc_reg_open+0x65/0xe0
[ 221.465538] [<c10ce90f>] __dentry_open+0x16f/0x2e0
[ 221.465538] [<c10ceb7e>] nameidata_to_filp+0x5e/0x70
[ 221.465538] [<c10dc358>] do_last+0xf8/0x6c0
[ 221.465538] [<c10dc9c6>] path_openat+0xa6/0x340
[ 221.465538] [<c10dccd4>] do_filp_open+0x74/0x80
[ 221.465538] [<c10cf121>] do_sys_open+0x101/0x1a0
[ 221.465538] [<c10cf1e9>] sys_open+0x29/0x40
[ 221.465538] [<c13b4951>] syscall_call+0x7/0xb
[ 221.465538]
[ 221.465538] other info that might help us debug this:
[ 221.465538]
[ 221.465538] 1 lock held by cat/3313:
[ 221.465538] #0: (&(&(&seqlock1)->lock)->rlock){+.+...}, at: [<e08544dd>] locktest_open1+0xd/0x40 [locktest]
[ 221.465538]
[ 221.465538] stack backtrace:
[ 221.465538] Pid: 3313, comm: cat Not tainted 2.6.39-rc1-ccs #1
[ 221.465538] Call Trace:
[ 221.465538] [<c106af56>] print_circular_bug+0xc6/0xd0
[ 221.465538] [<c106c448>] check_prev_add+0x768/0x800
[ 221.465538] [<c107d44b>] ? __module_text_address+0xb/0x50
[ 221.465538] [<c1005d3b>] ? print_context_stack+0x3b/0xa0
[ 221.465538] [<c106c58b>] check_prevs_add+0xab/0x100
[ 221.465538] [<c106c915>] validate_chain+0x305/0x5a0
[ 221.465538] [<c106e06c>] ? mark_lock+0x21c/0x3c0
[ 221.465538] [<c106ea14>] __lock_acquire+0x2a4/0x900
[ 221.465538] [<c1049fed>] ? destroy_timer_on_stack+0xd/0x10
[ 221.465538] [<c13b22b6>] ? schedule_timeout+0xf6/0x1b0
[ 221.465538] [<c107011a>] lock_acquire+0x7a/0xa0
[ 221.465538] [<e08540b0>] ? brlock1_lock_init+0xb0/0xb0 [locktest]
[ 221.465538] [<e08544d0>] ? brlock1_global_unlock+0xa0/0xa0 [locktest]
[ 221.465538] [<e08540e3>] brlock1_local_lock+0x33/0x90 [locktest]
[ 221.465538] [<e08540b0>] ? brlock1_lock_init+0xb0/0xb0 [locktest]
[ 221.465538] [<c104ae45>] ? msleep+0x15/0x20
[ 221.465538] [<e08544f2>] locktest_open1+0x22/0x40 [locktest]
[ 221.465538] [<c1118585>] proc_reg_open+0x65/0xe0
[ 221.465538] [<c10ce90f>] __dentry_open+0x16f/0x2e0
[ 221.465538] [<c10ceb7e>] nameidata_to_filp+0x5e/0x70
[ 221.465538] [<c1118520>] ? proc_reg_mmap+0x80/0x80
[ 221.465538] [<c10dc358>] do_last+0xf8/0x6c0
[ 221.465538] [<c10dc9c6>] path_openat+0xa6/0x340
[ 221.465538] [<c106d8db>] ? trace_hardirqs_off+0xb/0x10
[ 221.465538] [<c10dccd4>] do_filp_open+0x74/0x80
[ 221.465538] [<c13b3fed>] ? _raw_spin_unlock+0x1d/0x20
[ 221.465538] [<c10ea091>] ? alloc_fd+0xe1/0x1a0
[ 221.465538] [<c10cf121>] do_sys_open+0x101/0x1a0
[ 221.465538] [<c10cfe2b>] ? vfs_write+0x10b/0x130
[ 221.465538] [<c10cf1e9>] sys_open+0x29/0x40
[ 221.465538] [<c13b4951>] syscall_call+0x7/0xb
If lockdep is doing a full analysis on every addition, why lockdep can't catch
"cat /proc/locktest1 /proc/locktest2" case? (I think the answer is either
"annotation I added to read_seqbegin2()/read_seqretry2() is wrong" or
"lockdep cannot do full analysis on some cases".)
Regards.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin().
2011-03-30 8:12 ` Tetsuo Handa
@ 2011-03-30 9:50 ` Peter Zijlstra
2011-03-30 12:17 ` Tetsuo Handa
0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2011-03-30 9:50 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: rostedt, mingo, linux-kernel
On Wed, 2011-03-30 at 17:12 +0900, Tetsuo Handa wrote:
> Peter Zijlstra wrote:
> > On Tue, 2011-03-29 at 15:39 +0200, Peter Zijlstra wrote:
> > >
> > > That said, there are some out-standing issues with rw_locks and lockdep,
> > > Gautham and I worked on that for a while but we never persevered and
> > > finished it..
> > >
> > > http://lkml.org/lkml/2009/5/11/203
> >
> > I just did a quick rebase onto tip/master (compile tested only):
> >
> > http://programming.kicks-ass.net/sekrit/patches-lockdep.tar.bz2
> >
> > That series needs testing and a few patches to extend the
> > lib/locking-selftest* bits to cover the new functionality.
>
> Thanks, but I didn't apply above tarball to 2.6.38.2 because lockdep selftests
> failed.
I probably messed up the last patch, its basically a complete rewrite
because lockdep changed significantly between when that series was
written and now.
> > In order to hit your inversion you need to do something like:
> >
> > cat /proc/locktest1 & cat /proc/locktest2
> >
> > if you do them serialized you'll never hit that inversion.
>
> Yes, I know. But I think that lockdep should report the possibility of hitting
> that inversion even if I do them serialized.
True, my bad.
> So, this is not a bug but intended coding. Then, we want a comment here why
> lockdep annotation is missing.
Nah, ideally we'd fix it by making the VDSO code use another primitive.
> > --- a/include/linux/seqlock.h
> > +++ b/include/linux/seqlock.h
> > @@ -94,6 +94,7 @@ static __always_inline unsigned read_seqbegin(const seqlock_t *sl)
> > cpu_relax();
> > goto repeat;
> > }
> > + rwlock_acquire_read(&sl->lock->dep_map, 0, 0, _RET_IP_);
> >
> > return ret;
> > }
> > @@ -107,6 +108,8 @@ static __always_inline int read_seqretry(const seqlock_t *sl, unsigned start)
> > {
> > smp_rmb();
> >
> > + rwlock_release(&sl->lock->dep_map, 1, _RET_IP_);
> > +
> > return unlikely(sl->sequence != start);
> > }
>
> Excuse me, but the lock embedded into seqlock_t is spinlock rather than rwlock.
> I assume you meant spin_acquire()/spin_release() rather than
> rwlock_acquire_read()/rwlock_release().
No, I meant what I wrote ;-) it doesn't matter to lockdep that its a
spinlock (lockdep doesn't even know that) and in fact rwlock_acquire
(the write version) is identical to spin_acquire() both acquire the lock
in the exclusive state.
The read side of seqlocks is a recursive read lock, hence
rwlock_acquire_read()
> Also, I assume you meant to call
> spin_acquire() before entering the spin state (as with
>
> static inline void __raw_spin_lock(raw_spinlock_t *lock)
> {
> preempt_disable();
> spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
> LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
> }
>
> . Otherwise, lockdep cannot report it when hit this bug upon the first call to
> this function).
Huh no, of course not, a seqlock read side cannot contend in the classic
sense.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin().
2011-03-30 9:50 ` Peter Zijlstra
@ 2011-03-30 12:17 ` Tetsuo Handa
2011-03-31 13:59 ` Peter Zijlstra
0 siblings, 1 reply; 13+ messages in thread
From: Tetsuo Handa @ 2011-03-30 12:17 UTC (permalink / raw)
To: peterz; +Cc: rostedt, mingo, linux-kernel
Peter Zijlstra wrote:
> > Also, I assume you meant to call
> > spin_acquire() before entering the spin state (as with
> >
> > static inline void __raw_spin_lock(raw_spinlock_t *lock)
> > {
> > preempt_disable();
> > spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
> > LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
> > }
> >
> > . Otherwise, lockdep cannot report it when hit this bug upon the first call to
> > this function).
>
> Huh no, of course not, a seqlock read side cannot contend in the classic
> sense.
I couldn't understand what 'contend' means. I think
static __always_inline unsigned read_seqbegin(const seqlock_t *sl)
{
unsigned ret;
repeat:
ret = sl->sequence;
smp_rmb();
if (unlikely(ret & 1)) {
cpu_relax();
goto repeat;
}
return ret;
}
is equivalent (except that above one will not write to any kernel memory) to
static __always_inline unsigned read_seqbegin(seqlock_t *sl)
{
unsigned ret;
unsigned long flags;
spin_lock_irqsave(&sl->lock, flags);
ret = sl->sequence;
spin_unlock_irqrestore(&sl->lock, flags);
return ret;
}
because read_seqbegin() cannot return to the reader until the writer (if there
is one) calls write_sequnlock().
static inline void write_seqlock(seqlock_t *sl)
{
spin_lock(&sl->lock);
++sl->sequence;
smp_wmb();
}
static inline void write_sequnlock(seqlock_t *sl)
{
smp_wmb();
sl->sequence++;
spin_unlock(&sl->lock);
}
Don't we call this situation (a reader thread temporarily behaves like a writer
thread who writes nothing) as 'contended'?
Anyway, could you show me read_seqbegin2()/read_seqretry2() for testing with
locktest module?
Regards.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin().
2011-03-30 12:17 ` Tetsuo Handa
@ 2011-03-31 13:59 ` Peter Zijlstra
0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2011-03-31 13:59 UTC (permalink / raw)
To: Tetsuo Handa; +Cc: rostedt, mingo, linux-kernel
On Wed, 2011-03-30 at 21:17 +0900, Tetsuo Handa wrote:
> Peter Zijlstra wrote:
> > > Also, I assume you meant to call
> > > spin_acquire() before entering the spin state (as with
> > >
> > > static inline void __raw_spin_lock(raw_spinlock_t *lock)
> > > {
> > > preempt_disable();
> > > spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
> > > LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
> > > }
> > >
> > > . Otherwise, lockdep cannot report it when hit this bug upon the first call to
> > > this function).
> >
> > Huh no, of course not, a seqlock read side cannot contend in the classic
> > sense.
>
> I couldn't understand what 'contend' means. I think
>
> static __always_inline unsigned read_seqbegin(const seqlock_t *sl)
> {
> unsigned ret;
> repeat:
> ret = sl->sequence;
> smp_rmb();
> if (unlikely(ret & 1)) {
> cpu_relax();
> goto repeat;
> }
> return ret;
> }
>
> is equivalent (except that above one will not write to any kernel memory) to
>
> static __always_inline unsigned read_seqbegin(seqlock_t *sl)
> {
> unsigned ret;
> unsigned long flags;
> spin_lock_irqsave(&sl->lock, flags);
> ret = sl->sequence;
> spin_unlock_irqrestore(&sl->lock, flags);
> return ret;
> }
>
> because read_seqbegin() cannot return to the reader until the writer (if there
> is one) calls write_sequnlock().
It more or less it, but conceptually the seqlock read-side is a
non-blocking algorithm and thus doesn't block or contend. The above
initial wait is merely an optimization to avoid having to retry, which
could be more expensive than simply waiting there.
Anyway, all the lockdep contention crap is purely about lockstat and
doesn't matter for dependency tracking.
> Don't we call this situation (a reader thread temporarily behaves like a writer
> thread who writes nothing) as 'contended'?
>
> Anyway, could you show me read_seqbegin2()/read_seqretry2() for testing with
> locktest module?
Like I wrote before:
> @@ -94,6 +94,7 @@ static __always_inline unsigned read_seqbegin(const seqlock_t *sl)
> cpu_relax();
> goto repeat;
> }
> + rwlock_acquire_read(&sl->lock->dep_map, 0, 0, _RET_IP_);
>
> return ret;
> }
> @@ -107,6 +108,8 @@ static __always_inline int read_seqretry(const seqlock_t *sl, unsigned start)
> {
> smp_rmb();
>
> + rwlock_release(&sl->lock->dep_map, 1, _RET_IP_);
> +
> return unlikely(sl->sequence != start);
> }
Should do, except that lockdep doesn't properly work for read-recursive
locks, which needs to get fixed.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2011-03-31 14:00 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-26 4:12 [RFC] seqlock,lockdep: Add lock primitives to read_seqbegin() Tetsuo Handa
2011-03-28 17:12 ` Steven Rostedt
2011-03-28 21:57 ` Tetsuo Handa
2011-03-29 4:30 ` Tetsuo Handa
2011-03-29 12:49 ` Steven Rostedt
2011-03-29 13:39 ` Peter Zijlstra
2011-03-29 17:50 ` Peter Zijlstra
2011-03-30 8:12 ` Tetsuo Handa
2011-03-30 9:50 ` Peter Zijlstra
2011-03-30 12:17 ` Tetsuo Handa
2011-03-31 13:59 ` Peter Zijlstra
2011-03-29 13:11 ` Peter Zijlstra
2011-03-29 13:14 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox