From: Al Viro <viro@ZenIV.linux.org.uk>
To: Mike Marshall <hubcap@omnibond.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Stephen Rothwell <sfr@canb.auug.org.au>
Subject: Re: Orangefs ABI documentation
Date: Sat, 6 Feb 2016 19:42:10 +0000 [thread overview]
Message-ID: <20160206194210.GX17997@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CAOg9mSQOZP155TnuoJG_YFb19oU35cj8fHTxn8BfJCw+asP-CQ@mail.gmail.com>
On Thu, Feb 04, 2016 at 06:30:26PM -0500, Mike Marshall wrote:
> As for the WARN_ONs, the waitqueue one is easy to hit when the
> client-core stops and restarts, you can see here where precopy_buffers
> started whining about the client-core, you can see that the client
> core restarted when the debug mask got sent back over, and then
> the WARN_ON in waitqueue gets hit:
>
> [ 1239.198976] precopy_buffers: Failed to copy-in buffers. Please make
> sure that the pvfs2-client is running. -14
> [ 1239.198979] precopy_buffers: Failed to copy-in buffers. Please make
> sure that the pvfs2-client is running. -14
> [ 1239.198983] orangefs_file_write_iter: do_readv_writev failed, rc:-14:.
> [ 1239.199175] precopy_buffers: Failed to copy-in buffers. Please make
> sure that the pvfs2-client is running. -14
> [ 1239.199177] precopy_buffers: Failed to copy-in buffers. Please make
> sure that the pvfs2-client is running. -14
> [ 1239.199180] orangefs_file_write_iter: do_readv_writev failed, rc:-14:.
> [ 1239.199601] precopy_buffers: Failed to copy-in buffers. Please make
> sure that the pvfs2-client is running. -14
> [ 1239.199602] precopy_buffers: Failed to copy-in buffers. Please make
> sure that the pvfs2-client is running. -14
> [ 1239.199604] orangefs_file_write_iter: do_readv_writev failed, rc:-14:.
> [ 1239.248239] dispatch_ioctl_command: client debug mask has been been
> received :0: :0:
> [ 1239.248257] dispatch_ioctl_command: client debug array string has
> been receiv ed.
> [ 1239.307842] ------------[ cut here ]------------
> [ 1239.307847] WARNING: CPU: 0 PID: 1347 at
> fs/orangefs/waitqueue.c:208 service_ operation+0x59f/0x9b0()
> [ 1239.307848] Modules linked in: bnep bluetooth ip6t_rpfilter rfkill
> ip6t_REJEC T nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6
> nf_conntrack_ipv4 nf_defrag_ip v4 xt_conntrack nf_conntrack
> ebtable_nat ebtable_broute bridge stp llc ebtable_f ilter ebtables
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip
> 6_tables iptable_mangle iptable_security iptable_raw ppdev parport_pc
> virtio_bal loon virtio_console parport 8139too serio_raw pvpanic
> i2c_piix4 uinput qxl drm_k ms_helper ttm drm 8139cp i2c_core
> virtio_pci ata_generic virtio virtio_ring mii pata_acpi
> [ 1239.307870] CPU: 0 PID: 1347 Comm: dbench Not tainted
> 4.4.0-161988-g237f828-d irty #49
> [ 1239.307871] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [ 1239.307872] 0000000000000000 0000000011eac412 ffff88003bf27cd0
> ffffffff8139c 84d
> [ 1239.307874] ffff88003bf27d08 ffffffff8108e510 ffff880010968000
> ffff88001096c 1d0
> [ 1239.307876] ffff88001096c188 00000000fffffff5 0000000000000000
> ffff88003bf27 d18
> [ 1239.307877] Call Trace:
> [ 1239.307881] [<ffffffff8139c84d>] dump_stack+0x19/0x1c
> [ 1239.307884] [<ffffffff8108e510>] warn_slowpath_common+0x80/0xc0
> [ 1239.307886] [<ffffffff8108e65a>] warn_slowpath_null+0x1a/0x20
> [ 1239.307887] [<ffffffff812fe73f>] service_operation+0x59f/0x9b0
> [ 1239.307889] [<ffffffff810c28b0>] ? prepare_to_wait_event+0x100/0x100
> [ 1239.307891] [<ffffffff810c28b0>] ? prepare_to_wait_event+0x100/0x100
> [ 1239.307893] [<ffffffff812fbd12>] orangefs_readdir+0x172/0xd50
> [ 1239.307895] [<ffffffff810c9a2d>] ? trace_hardirqs_on+0xd/0x10
> [ 1239.307898] [<ffffffff8120e4bf>] iterate_dir+0x9f/0x130
> [ 1239.307899] [<ffffffff8120e9d0>] SyS_getdents+0xa0/0x140
> [ 1239.307900] [<ffffffff8120e550>] ? iterate_dir+0x130/0x130
> [ 1239.307903] [<ffffffff8178302f>] entry_SYSCALL_64_fastpath+0x12/0x76
> [ 1239.307904] ---[ end trace 66a9a15ad78b3dea ]---
Bloody hell. OK, something's seriously fishy here. You have
service_operation() called from orangefs_readdir(). It had been
immediately preceded by
ret = orangefs_readdir_index_get(&bufmap, &buffer_index);
returning a success. Moreover, *all* paths after return from
service_operation() pass through some instance of
orangefs_readdir_index_put(bufmap, buffer_index);
(possibly hidden behind readdir_handle_dtor(bufmap, &rhandle)).
We'd bloody better _not_ have had bufmap freed by that point, right?
OTOH, this service_operation() has just seen orangefs_get_bufmap_init()
returning 0. Since
int orangefs_get_bufmap_init(void)
{
return __orangefs_bufmap ? 1 : 0;
}
we have observed NULL __orangefs_bufmap at that point. __orangefs_bufmap
is static, its address is never taken and the only assignments to it are
static void orangefs_bufmap_unref(struct orangefs_bufmap *bufmap)
{
if (atomic_dec_and_lock(&bufmap->refcnt, &orangefs_bufmap_lock)) {
__orangefs_bufmap = NULL;
spin_unlock(&orangefs_bufmap_lock);
orangefs_bufmap_unmap(bufmap);
orangefs_bufmap_free(bufmap);
}
}
and
bufmap = orangefs_bufmap_alloc(user_desc);
if (!bufmap)
goto out;
ret = orangefs_bufmap_map(bufmap, user_desc);
if (ret)
goto out_free_bufmap;
spin_lock(&orangefs_bufmap_lock);
if (__orangefs_bufmap) {
spin_unlock(&orangefs_bufmap_lock);
gossip_err("orangefs: error: bufmap already initialized.\n");
ret = -EALREADY;
goto out_unmap_bufmap;
}
__orangefs_bufmap = bufmap;
spin_unlock(&orangefs_bufmap_lock);
The latter is definitely *not* making it NULL - that's daemon asking to
install a new one. And the former is unconditionally followed by
kfree(bufmap) (via orangefs_bufmap_free()).
IOW, if this WARN_ON() is triggered, you've already gotten to freeing the
damn thing. And subsequent crash appears to match that theory.
The really weird thing is that orangefs_readdir_index_get() increments the
refcount, so having hit that __orangefs_bufmap = NULL; while we sat in
service_operation() means that refcounting of bufmap got broken somehow...
orangefs_bufmap_alloc() creates it with refcount 1 and is very shortly followed
either by orangefs_bufmap_free() or by __orangefs_bufmap changing from NULL to
the pointer to created instance.
orangefs_bufmap_ref() either returns NULL or non-NULL value equal to
__orangefs_bufmap, in the latter case having bumped its refcount.
orangefs_bufmap_unref() drops refcount on its argument and if it has
reached 0, clears __orangefs_bufmap and frees the argument. That's
a bit of a red flag - either we are always called with argument equal to
__orangefs_bufmap, or we might end up with a leak...
Nothing else changes their refcounts. In orangefs_bufmap_{size,shift}_query()
a successful orangefs_bufmap_ref() is immediately followed by
orangefs_bufmap_unref() (incidentally, both might as well have just
grabbed orangefs_bufmap_lock, checked __orangefs_bufmap and picked its
->desc_{size,shift} before dropping the lock - no need to mess with refcounts
in those two).
Remaining callers of orangefs_bufmap_ref() are orangefs_bufmap_get() and
orangefs_readdir_index_get(). Those begin with grabbing a reference (and
failing if __orangefs_bufmap was NULL) and eventually either returning a
zero and storing the acquired pointer in *mapp, or dropping the reference
and returning non-zero. More serious red flag, BTW - *mapp is still set
in the latter case, which might end up with confused callers...
Let's see... bufmap_get is called in wait_for_direct_io() and returning
a negative is followed by buggering off to
if (buffer_index >= 0) {
orangefs_bufmap_put(bufmap, buffer_index);
gossip_debug(GOSSIP_FILE_DEBUG,
"%s(%pU): PUT buffer_index %d\n",
__func__, handle, buffer_index);
buffer_index = -1;
}
Uh-oh...
1) can that thing return a positive value?
2) can it end up storing something in buffer_index and still
failing?
3) can we get there with non-negative buffer_index?
OK, (1) and (2) translate into the same question for wait_for_a_slot(),
which returns 0, -EINTR or -ETIMEDOUT and only overwrites buffer_index in
case when it returns 0. And (3)... Fuck. (3) is possible. Look:
if (ret == -EAGAIN && op_state_purged(new_op)) {
orangefs_bufmap_put(bufmap, buffer_index);
gossip_debug(GOSSIP_FILE_DEBUG,
"%s:going to repopulate_shared_memory.\n",
__func__);
goto populate_shared_memory;
}
in wait_for_direct_io() sends us to populate_shared_memory with buffer_index
unchanged despite the fact that we'd done orangefs_bufmap_put() on it. And
there we hit
/* get a shared buffer index */
ret = orangefs_bufmap_get(&bufmap, &buffer_index);
if (ret < 0) {
gossip_debug(GOSSIP_FILE_DEBUG,
"%s: orangefs_bufmap_get failure (%ld)\n",
__func__, (long)ret);
goto out;
with failure followed by _another_ orangefs_bufmap_put() around out:
OK, there's your buggered refcounting. The minimal fix is to slap
buffer_index = -1; right after the orangefs_bufmap_put() in there.
orangefs_readdir_index_get() caller simply buggers off on failure. No problem
there, and longer term that's what I'd suggest doing in wait_for_direct_io()
as well. Anyway, could you try this on top of your for-next and see if you
can reproduce either WARN_ON?
diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index d865b58..40b3805 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -210,6 +210,7 @@ populate_shared_memory:
*/
if (ret == -EAGAIN && op_state_purged(new_op)) {
orangefs_bufmap_put(bufmap, buffer_index);
+ buffer_index = -1;
gossip_debug(GOSSIP_FILE_DEBUG,
"%s:going to repopulate_shared_memory.\n",
__func__);
next prev parent reply other threads:[~2016-02-06 19:42 UTC|newest]
Thread overview: 111+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-15 21:46 Orangefs ABI documentation Mike Marshall
2016-01-22 7:11 ` Al Viro
2016-01-22 11:09 ` Mike Marshall
2016-01-22 16:59 ` Mike Marshall
2016-01-22 17:08 ` Al Viro
2016-01-22 17:40 ` Mike Marshall
2016-01-22 17:43 ` Al Viro
2016-01-22 18:17 ` Mike Marshall
2016-01-22 18:37 ` Al Viro
2016-01-22 19:07 ` Mike Marshall
2016-01-22 19:21 ` Mike Marshall
2016-01-22 20:04 ` Al Viro
2016-01-22 20:30 ` Mike Marshall
2016-01-23 0:12 ` Al Viro
2016-01-23 1:28 ` Al Viro
2016-01-23 2:54 ` Mike Marshall
2016-01-23 19:10 ` Al Viro
2016-01-23 19:24 ` Mike Marshall
2016-01-23 21:35 ` Mike Marshall
2016-01-23 22:05 ` Al Viro
2016-01-23 21:40 ` Al Viro
2016-01-23 22:36 ` Mike Marshall
2016-01-24 0:16 ` Al Viro
2016-01-24 4:05 ` Al Viro
2016-01-24 22:12 ` Mike Marshall
2016-01-30 17:22 ` Al Viro
2016-01-26 19:52 ` Martin Brandenburg
2016-01-30 17:34 ` Al Viro
2016-01-30 18:27 ` Al Viro
2016-02-04 23:30 ` Mike Marshall
2016-02-06 19:42 ` Al Viro [this message]
2016-02-07 1:38 ` Al Viro
2016-02-07 3:53 ` Al Viro
2016-02-07 20:01 ` [RFC] bufmap-related wait logics (Re: Orangefs ABI documentation) Al Viro
2016-02-08 22:26 ` Orangefs ABI documentation Mike Marshall
2016-02-08 23:35 ` Al Viro
2016-02-09 3:32 ` Al Viro
2016-02-09 14:34 ` Mike Marshall
2016-02-09 17:40 ` Al Viro
2016-02-09 21:06 ` Al Viro
2016-02-09 22:25 ` Mike Marshall
2016-02-11 23:36 ` Mike Marshall
2016-02-09 22:02 ` Mike Marshall
2016-02-09 22:16 ` Al Viro
2016-02-09 22:40 ` Al Viro
2016-02-09 23:13 ` Al Viro
2016-02-10 16:44 ` Al Viro
2016-02-10 21:26 ` Al Viro
2016-02-11 23:54 ` Mike Marshall
2016-02-12 0:55 ` Al Viro
2016-02-12 12:13 ` Mike Marshall
2016-02-11 0:44 ` Al Viro
2016-02-11 3:22 ` Mike Marshall
2016-02-12 4:27 ` Al Viro
2016-02-12 12:26 ` Mike Marshall
2016-02-12 18:00 ` Martin Brandenburg
2016-02-13 17:18 ` Mike Marshall
2016-02-13 17:47 ` Al Viro
2016-02-14 2:56 ` Al Viro
2016-02-14 3:46 ` [RFC] slot allocator - waitqueue use review needed (Re: Orangefs ABI documentation) Al Viro
2016-02-14 4:06 ` Al Viro
2016-02-16 2:12 ` Al Viro
2016-02-16 19:28 ` Al Viro
2016-02-14 22:31 ` Orangefs ABI documentation Mike Marshall
2016-02-14 23:43 ` Al Viro
2016-02-15 17:46 ` Mike Marshall
2016-02-15 18:45 ` Al Viro
2016-02-15 22:32 ` Martin Brandenburg
2016-02-15 23:04 ` Al Viro
2016-02-16 23:15 ` Mike Marshall
2016-02-16 23:36 ` Al Viro
2016-02-16 23:54 ` Al Viro
2016-02-17 19:24 ` Mike Marshall
2016-02-17 20:11 ` Al Viro
2016-02-17 21:17 ` Al Viro
2016-02-17 22:24 ` Mike Marshall
2016-02-17 22:40 ` Martin Brandenburg
2016-02-17 23:09 ` Al Viro
2016-02-17 23:15 ` Al Viro
2016-02-18 0:04 ` Al Viro
2016-02-18 11:11 ` Al Viro
2016-02-18 18:58 ` Mike Marshall
2016-02-18 19:20 ` Al Viro
2016-02-18 19:49 ` Martin Brandenburg
2016-02-18 20:08 ` Mike Marshall
2016-02-18 20:22 ` Mike Marshall
2016-02-18 20:38 ` Mike Marshall
2016-02-18 20:52 ` Al Viro
2016-02-18 21:50 ` Mike Marshall
2016-02-19 0:25 ` Al Viro
2016-02-19 22:11 ` Mike Marshall
2016-02-19 22:22 ` Al Viro
2016-02-20 12:14 ` Mike Marshall
2016-02-20 13:36 ` Al Viro
2016-02-22 16:20 ` Mike Marshall
2016-02-22 21:22 ` Mike Marshall
2016-02-23 21:58 ` Mike Marshall
2016-02-26 20:21 ` Mike Marshall
2016-02-19 22:32 ` Al Viro
2016-02-19 22:45 ` Martin Brandenburg
2016-02-19 22:50 ` Martin Brandenburg
2016-02-18 20:49 ` Al Viro
2016-02-15 22:47 ` Mike Marshall
2016-01-23 22:46 ` write() semantics (Re: Orangefs ABI documentation) Al Viro
2016-01-23 23:35 ` Linus Torvalds
2016-03-03 22:25 ` Mike Marshall
2016-03-04 20:55 ` Mike Marshall
2016-01-22 20:51 ` Orangefs ABI documentation Mike Marshall
2016-01-22 23:53 ` Mike Marshall
2016-01-22 19:54 ` Al Viro
2016-01-22 19:50 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160206194210.GX17997@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=hubcap@omnibond.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=sfr@canb.auug.org.au \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).