All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: rafael@kernel.org, jeyu@kernel.org, ngupta@vflare.org,
	sergey.senozhatsky.work@gmail.com, minchan@kernel.org,
	axboe@kernel.dk, mbenes@suse.com, jpoimboe@redhat.com,
	tglx@linutronix.de, keescook@chromium.org, jikos@kernel.org,
	rostedt@goodmis.org, peterz@infradead.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] drivers/base/core: refcount kobject and bus on device attribute read / store
Date: Wed, 23 Jun 2021 18:51:42 +0200	[thread overview]
Message-ID: <YNNmnrjpOGGVXsP2@kroah.com> (raw)
In-Reply-To: <20210623161434.qraapo4xaprte7bs@garbanzo>

On Wed, Jun 23, 2021 at 09:14:34AM -0700, Luis Chamberlain wrote:
> kernfs / sysfs is in not struct device specific, it has no semantics for
> modules or even devices. The aspect of kernfs which deals with locking
> a struct kernfs_node is kernfs_get_active(). The refcount there of
> importance is the call to atomic_inc_unless_negative(&kn->active).
> 
> struct kernfs_node *kernfs_get_active(struct kernfs_node *kn)                   
> {                                                                               
> 	if (unlikely(!kn))
> 		return NULL;                                                    
> 
> 	if (!atomic_inc_unless_negative(&kn->active))                           
> 		return NULL;                                                    
> 
> 	if (kernfs_lockdep(kn))
> 		rwsem_acquire_read(&kn->dep_map, 0, 1, _RET_IP_);               
> 	return kn;                                                              
> }
> 
> BTW this was also precisely where I had suggested to extend the
> kernfs_node with an optional kn->owner and if set we try_module_get() to
> prevent the deadlock case if the module exit routine also happens to use
> a lock also used by a sysfs attribute store/read.
> 
> The flow would be (from a real live gdb crash backtrace from an original
> bug report from a customer):
> 
> write system call -->
>   ksys_write ()
>     vfs_write() -->
>       __vfs_write() -->
>        kernfs_fop_write() (note now this is kernfs_fop_write_iter()) -->
>          sysfs_kf_write() -->
>            dev_attr_store() -->
> 	     null reference
> 
> The dereference is because the dev_attr_store() call which we are
> modifying
> 
> LINE-001 static ssize_t dev_attr_store(struct kobject *kobj, struct attribute *attr,                                                                                         
> LINE-002 const char *buf, size_t count)                                                                                                                 
> LINE-003 {
> LINE-004	struct device_attribute *dev_attr = to_dev_attr(attr);
> LINE-005	struct device *dev = kobj_to_dev(kobj);
> LINE-006	ssize_t ret = -EIO;
> LINE-007
> LINE-008	if (dev_attr->store)
> LINE-009		ret = dev_attr->store(dev, dev_attr, buf, count);
> 	...
> 	}
> 
> The race happens because a sysfs store / read can be triggered, the CPU
> could be preempted after LINE-008, and the ->store is gone by LINE-009.
> This begs the question if kernfs_fop_write_iter() or sysfs protects this
> somehow? Let's see.
> 
> For kernfs we have:
> 
> static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter) 
> {
> 	struct kernfs_open_file *of = kernfs_of(iocb->ki_filp);
> 	...
> 	mutex_lock(&of->mutex);
> 	if (!kernfs_get_active(of->kn)) {
> 		mutex_unlock(&of->mutex);
> 		len = -ENODEV;
> 		goto out_free;
> 	}
> 
> 	ops = kernfs_ops(of->kn);
> 	if (ops->write)
> 		len = ops->write(of, buf, len, iocb->ki_pos);
> 	else
> 		len = -EINVAL;
> 
> 	kernfs_put_active(of->kn);
> 	mutex_unlock(&of->mutex);
> 	...
> }
> 
> And the write call here is a syfs calls:
> 
> static ssize_t sysfs_kf_write(struct kernfs_open_file *of, char *buf,           
> 			      size_t count, loff_t pos)                         
> {                                                                               
> 	const struct sysfs_ops *ops = sysfs_file_ops(of->kn);                   
> 	struct kobject *kobj = of->kn->parent->priv;                            
> 
> 	if (!count)                                                             
> 		return 0;                                                       
> 
> 	return ops->store(kobj, of->kn->priv, buf, count);                      
> }           
> 
> As we have observed already, the active reference obtained through
> kernfs_get_active() was for the struct kernfs_node. Sure, the
> ops->write() is valid, in this case it sysfs_kf_write().
> 
> sysfs isn't doing any active reference check for the kobject device
> attribute as it doesn't care for them. So syfs calls
> dev_attr_store(), but the dev_attr_store() is not preventing the device
> attribute ops to go fishing, and we destroy them while we're destroying
> the device on module removal.

Ah, but sysfs _should_ be doing this properly.

I think the issue is that when we store the kobject pointer in kernfs,
it is NOT incremented.  Look at sysfs_create_dir_ns(), if we call
kobject_get(kobj) right before we call kernfs_create_dir_ns(), and then
properly clean things up later on when we remove the sysfs directory
(i.e. the kobject), it _should_ fix this problem.

Then, we know, whenever show/store/whatever is called, when we cast out
of kernfs the private pointer to a kobject, that the kobject really is
still alive, so we can use it properly.

Can you try that, it should be a much "simpler" change here.

thanks,

greg k-h

  parent reply	other threads:[~2021-06-23 16:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-23  0:36 [PATCH v3] drivers/base/core: refcount kobject and bus on device attribute read / store Luis Chamberlain
2021-06-23  8:32 ` Greg KH
2021-06-23 16:14   ` Luis Chamberlain
2021-06-23 16:28     ` Greg KH
2021-06-23 16:51     ` Greg KH [this message]
2021-06-23 17:23       ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YNNmnrjpOGGVXsP2@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=jeyu@kernel.org \
    --cc=jikos@kernel.org \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mbenes@suse.com \
    --cc=mcgrof@kernel.org \
    --cc=minchan@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.