From: Boaz Harrosh <bharrosh@panasas.com>
To: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Jeff Layton <jlayton@redhat.com>,
Stephen Boyd <sboyd@codeaurora.org>,
<linux-kernel@vger.kernel.org>, <bfields@redhat.com>,
<linux-nfs@vger.kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
Tejun Heo <tj@kernel.org>
Subject: Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
Date: Tue, 24 Jan 2012 11:51:59 +0200 [thread overview]
Message-ID: <4F1E7F3F.3060703@panasas.com> (raw)
In-Reply-To: <20120124074516.GC2420@redhat.com>
On 01/24/2012 09:45 AM, Stanislaw Gruszka wrote:
> On Mon, Jan 23, 2012 at 10:23:11AM -0500, Jeff Layton wrote:
>> On Sun, 22 Jan 2012 00:46:14 -0800
>> Stephen Boyd <sboyd@codeaurora.org> wrote:
>>
>>> On 1/20/2012 10:56 AM, Jeff Layton wrote:
>>>> What I'm finding though is that if I stop and start nfsd multiple
>>>> times, then I eventually get a warning like this on start:
>>>>
>>>> [ 5157.128514] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
>>>> [ 5157.128742] Hardware name: Bochs
>>>> [ 5157.128742] ODEBUG: activate not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
>>>> [ 5157.128742] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
>>>> [ 5157.128742] Pid: 1312, comm: rpc.nfsd Tainted: G W O 3.3.0-rc1+ #1
>>>> [ 5157.128742] Call Trace:
>>>> [ 5157.128742] [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
>>>> [ 5157.128742] [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
>>>> [ 5157.128742] [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
>>>> [ 5157.128742] [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
>>>> [ 5157.128742] [<ffffffff8132c02b>] debug_object_activate+0xfb/0x190
>>>> [ 5157.128742] [<ffffffff81072728>] ? lock_timer_base.isra.24+0x38/0x70
>>>> [ 5157.128742] [<ffffffff81074676>] mod_timer+0xf6/0x450
>>>> [ 5157.128742] [<ffffffff810749e8>] add_timer+0x18/0x20
>>>> [ 5157.128742] [<ffffffff8108168e>] queue_delayed_work_on+0xbe/0x140
>>>> [ 5157.128742] [<ffffffff81084441>] queue_delayed_work+0x21/0x40
>>>> [ 5157.128742] [<ffffffffa006e568>] rpc_queue_upcall+0xe8/0x100 [sunrpc]
>>>> [ 5157.128742] [<ffffffffa012a121>] __cld_pipe_upcall+0x61/0xc0 [nfsd]
>>>> [ 5157.128742] [<ffffffffa012ad98>] nfsd4_cld_init+0x48/0x140 [nfsd]
>>>> [ 5157.128742] [<ffffffffa012b22a>] nfsd4_client_tracking_init+0x2a/0xc0 [nfsd]
>>>> [ 5157.128742] [<ffffffff8169797e>] ? mutex_unlock+0xe/0x10
>>>> [ 5157.128742] [<ffffffffa01266fa>] nfs4_state_start+0x1a/0x100 [nfsd]
>>>> [ 5157.128742] [<ffffffffa01028c5>] nfsd_svc+0x135/0x200 [nfsd]
>>>> [ 5157.128742] [<ffffffffa0103df0>] ? write_maxblksize+0x130/0x130 [nfsd]
>>>> [ 5157.128742] [<ffffffffa0103e6d>] write_threads+0x7d/0xd0 [nfsd]
>>>> [ 5157.128742] [<ffffffff811dd16a>] ? simple_transaction_get+0xca/0xe0
>>>> [ 5157.128742] [<ffffffffa0102ee7>] nfsctl_transaction_write+0x57/0x90 [nfsd]
>>>> [ 5157.128742] [<ffffffff811b4c9f>] vfs_write+0xaf/0x190
>>>> [ 5157.128742] [<ffffffff811b4fdd>] sys_write+0x4d/0x90
>>>> [ 5157.128742] [<ffffffff816a3469>] system_call_fastpath+0x16/0x1b
>>>
>>> This one is telling you that the timer you're about to run hasn't been
>>> registered with debug_objects. Most likely the work item hasn't been
>>> initialized properly and so it hasn't had INIT_DELAYED_WORK() called on it.
>>>
>>>>
>>>> ...or this on stop:
>>>>
>>>> [ 5200.804410] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
>>>> [ 5200.808205] Hardware name: Bochs
>>>> [ 5200.810121] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
>>>> [ 5200.812734] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
>>>> [ 5200.821596] Pid: 1394, comm: nfsd Tainted: G W O 3.3.0-rc1+ #1
>>>> [ 5200.822409] Call Trace:
>>>> [ 5200.822726] [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
>>>> [ 5200.823509] [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
>>>> [ 5200.824243] [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
>>>> [ 5200.824935] [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
>>>> [ 5200.825665] [<ffffffff8132c543>] debug_object_assert_init+0xe3/0x120
>>>> [ 5200.826474] [<ffffffff810737b4>] del_timer+0x24/0x1b0
>>>> [ 5200.827129] [<ffffffff81085314>] __cancel_work_timer+0x34/0x140
>>>> [ 5200.827829] [<ffffffff81085432>] cancel_delayed_work_sync+0x12/0x20
>>>> [ 5200.832905] [<ffffffffa006e430>] rpc_unlink+0x1e0/0x230 [sunrpc]
>>>> [ 5200.833716] [<ffffffffa012a195>] nfsd4_remove_cld_pipe+0x15/0x40 [nfsd]
>>>> [ 5200.834558] [<ffffffffa012b2e0>] nfsd4_client_tracking_exit+0x20/0x30 [nfsd]
>>>> [ 5200.835434] [<ffffffffa0126984>] nfs4_state_shutdown+0x1a4/0x1c0 [nfsd]
>>>> [ 5200.836271] [<ffffffffa010219a>] nfsd_last_thread+0x2a/0x60 [nfsd]
>>>> [ 5200.837045] [<ffffffffa005f66c>] svc_destroy+0x5c/0x140 [sunrpc]
>>>> [ 5200.837778] [<ffffffffa005f9f6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
>>>> [ 5200.838545] [<ffffffffa0102123>] nfsd+0x123/0x170 [nfsd]
>>>> [ 5200.839221] [<ffffffffa0102000>] ? 0xffffffffa0101fff
>>>> [ 5200.839820] [<ffffffff8108a747>] kthread+0xb7/0xc0
>>>> [ 5200.840431] [<ffffffff816a48b4>] kernel_thread_helper+0x4/0x10
>>>> [ 5200.850883] [<ffffffff8169acf4>] ? retint_restore_args+0x13/0x13
>>>> [ 5200.851654] [<ffffffff8108a690>] ? kthread_worker_fn+0x1a0/0x1a0
>>>> [ 5200.852418] [<ffffffff816a48b0>] ? gs_change+0x13/0x13
>>>
>>> This is similar. Now you're deleting a timer that debug_objects doesn't
>>> know about.
>>>
>>>>
>>>> The problem is that even after staring at this code for a while, I'm
>>>> still clueless as to what this is telling me. It doesn't happen every
>>>> time either, so maybe there's a race of some sort involved.
>>>>
>>>> It's possible that my patch is broken and doing something wrong, but
>>>> I'm starting not to think so. I'm not really using timers directly --
>>>> I'm using rpc_pipefs, which queues a delayed job to a workqueue, and
>>>> that is manipulating the timer. The delayed_work (and hence the timer)
>>>> are part of an rpc_inode.
>>>>
>>>> Can anyone shed a little light on what this is complaining about?
>>>>
>>>
>>> Do you have workqueue debugging enabled too (DEBUG_OBJECTS_WORK)? I
>>> would hope that work item debugging would say the same things and then
>>> we would know that the work item itself wasn't initialized properly.
>>
>> (adding linux-nfs to cc list...)
>>
>> Ok, I think I sort of see what's happening, but I'm not sure if it's a
>> bug in the debug objects code or something else. To answer the question
>> that I didn't before, the kernels I've been testing have this set:
>>
>> CONFIG_DEBUG_OBJECTS_WORK=y
>>
>> ...so workqueue debugging is on but I haven't seen any messages that
>> come from it AFAICT. The rpc_inode objects have their delayed_work
>> fields initialized via a slab "constructor". Here's the rpc_inode_cache
>> creation call:
>>
>> rpc_inode_cachep = kmem_cache_create("rpc_inode_cache",
>> sizeof(struct rpc_inode),
>> 0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
>> SLAB_MEM_SPREAD),
>> init_once);
>>
>> ...and the init_once() routine does this:
>>
>> INIT_DELAYED_WORK(&rpci->queue_timeout,
>> rpc_timeout_upcall_queue);
>>
>> On the first use of an rpc_inode object, everything works fine. I think
>> the problem comes in when rpc_inode objects get recycled without the
>> INIT_DELAYED_WORK() getting called on it again.
>>
>> Before the object is freed it gets cancel_delayed_work_sync() called on
>> it, but that's apparently not enough to convince the debugobjects code
>> that it's already correctly initialized. As a Q&D check, the following
>> patch seems to stop the warnings. Is there some better way to do this
>> that doesn't require the reinitialization of the delayed work on each
>> inode allocation?
>
> If this is debug objects false positive, it should be rather fixed in
> debug object code itself, not workaround in nfs code, but I do not have
> any idea how to do this.
>
> Stanislaw
>
I don't think so. Look at what INIT_DELAYED_WORK does.
INIT_WORK - Set some start values, register with the objects-debugger if on.
init_timer -
I'd imagine that cancel_delayed_work_sync() would: 1st cancel some of those
registrations, specifically with the objects-debugger. 2nd will leave some
state undefined as left from the last round.
I think this patch is cardinal and you might have found a potential bug.
Also the cost of INIT_DELAYED_WORK is marginal compare to the extra safety.
Recomended-by: Boaz Harrosh <bharrosh@panasas.com>
>> --- a/net/sunrpc/rpc_pipe.c
>> +++ b/net/sunrpc/rpc_pipe.c
>> @@ -178,6 +178,8 @@ rpc_alloc_inode(struct super_block *sb)
>> rpci = (struct rpc_inode *)kmem_cache_alloc(rpc_inode_cachep, GFP_KERNEL);
>> if (!rpci)
>> return NULL;
>> +
>> + INIT_DELAYED_WORK(&rpci->queue_timeout, rpc_timeout_upcall_queue);
>> return &rpci->vfs_inode;
>> }
>>
>> --
>> Jeff Layton <jlayton@redhat.com>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2012-01-24 9:59 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-20 18:56 WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0() Jeff Layton
2012-01-22 8:46 ` Stephen Boyd
2012-01-22 12:14 ` Jeff Layton
2012-01-23 15:23 ` Jeff Layton
2012-01-24 7:45 ` Stanislaw Gruszka
2012-01-24 9:51 ` Boaz Harrosh [this message]
2012-01-24 12:36 ` Jeff Layton
2012-01-24 15:01 ` Boaz Harrosh
2012-01-24 16:32 ` Jeff Layton
2012-01-24 17:43 ` Jeff Layton
2012-01-25 14:05 ` Thomas Gleixner
2012-01-25 14:46 ` Jeff Layton
2012-01-25 14:47 ` Boaz Harrosh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F1E7F3F.3060703@panasas.com \
--to=bharrosh@panasas.com \
--cc=bfields@redhat.com \
--cc=jlayton@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=sboyd@codeaurora.org \
--cc=sgruszka@redhat.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.