public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Fries <david@fries.net>
To: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Thorsten Bschorr <thorsten@bschorr.de>,
	Jonathan ALIBERT <jonathan.alibert@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm
Date: Tue, 17 Mar 2015 23:20:37 -0500	[thread overview]
Message-ID: <20150318042037.GE21067@spacedout.fries.net> (raw)
In-Reply-To: <2283421426366516@web2o.yandex.ru>

On Sat, Mar 14, 2015 at 11:55:16PM +0300, Evgeniy Polyakov wrote:
> Hi David
> 
> 12.03.2015, 03:54, "David Fries" <david@fries.net>:
> > Would that be removing all four refcnt, w1_slave, w1_master,
> > w1_family, w1_cb_block, or just some of them?  It sounds good to me,
> > if that had bugs there would be much more than just the w1 system
> > relying on it.  I don't know enough about that system or have the time
> > to code up that change.
> >
> > I can take another look at and post the reference counting w1_therm
> > fix instead of the mutex version as a near term work around until that
> > is available if you want.
> 
> Please cook up a quick fix for this problem - this bug really hurts people.
> And then we will discuss how 'ideal' life cycle should look

Done, I don't like it, I'm not sure anyone else will either, but I'm
no longer crashing in testing, so that's an improvement.  My
"production" system doesn't use w1_therm, so I only see these bugs in
development testing it.  I've come to the conclusion that in the face
of a slave vanishing, w1_therm can't avoid all the race conditions, so
the real fix must be elsewhere.

>From 51d4024ca667c8b712de462489d125a78e85aa57 Mon Sep 17 00:00:00 2001
From: David Fries <David@Fries.net>
Date: Sat, 7 Mar 2015 22:25:37 -0600
Subject: [PATCH] w1_therm, reduce race conditions in w1_slave_show

After applying this patch commands such as the following in one
process,

slave=28-000002c95fb1
while true; do echo $slave > /sys/devices/w1_bus_master1/w1_master_add; sleep .1; echo $slave > /sys/devices/w1_bus_master1/w1_master_remove; sleep .1; done

and then two at the same time in two other processes,
slave=28-000002c95fb1
while true; do time cat /sys/devices/w1_bus_master1/$slave/w1_slave ; sleep .1; done

then randomly stop all three and repeat.

With this patch I no longer see crashes, but at best this patch
effectively hiding the result of a race condition.  sl->family_data is
being freed and set to NULL in the slave removal while the
w1_slave_show is then dereferencing it, this holds on to the pointer
meaning it's probably clobbering memory now instead of crashing.  I
wonder if that would make RCU be a fit for this?  The original bug
report was pointing the problem as unlocking bus_mutex while waiting
for the temperature conversion, but I was getting sl->family_data set
to NULL more reliable without external power which means bux_mutex was
held for the duration of w1_slave_show, which is not to say that the
original bug report wasn't correct, it is to say that even with the
spinlock, holding bus_mutex on the slave, isn't sufficient to keep the
slave from being removed.

Reported-By: Thorsten Bschorr <thorsten@bschorr.de>
---
 drivers/w1/slaves/w1_therm.c |   70 +++++++++++++++++++++++++++++++++---------
 1 file changed, 55 insertions(+), 15 deletions(-)

diff --git a/drivers/w1/slaves/w1_therm.c b/drivers/w1/slaves/w1_therm.c
index 1f11a20..403285d 100644
--- a/drivers/w1/slaves/w1_therm.c
+++ b/drivers/w1/slaves/w1_therm.c
@@ -59,16 +59,32 @@ MODULE_ALIAS("w1-family-" __stringify(W1_THERM_DS28EA00));
 static int w1_strong_pullup = 1;
 module_param_named(strong_pullup, w1_strong_pullup, int, 0);
 
+struct w1_therm_family_data {
+	uint8_t rom[9];
+	atomic_t refcnt;
+};
+
+/* return the address of the refcnt in the family data */
+#define THERM_REFCNT(family_data) \
+	(&((struct w1_therm_family_data*)family_data)->refcnt)
+
 static int w1_therm_add_slave(struct w1_slave *sl)
 {
-	sl->family_data = kzalloc(9, GFP_KERNEL);
+	sl->family_data = kzalloc(sizeof(struct w1_therm_family_data),
+		GFP_KERNEL);
 	if (!sl->family_data)
 		return -ENOMEM;
+	atomic_set(THERM_REFCNT(sl->family_data), 1);
 	return 0;
 }
 
 static void w1_therm_remove_slave(struct w1_slave *sl)
 {
+	int refcnt = atomic_sub_return(1, THERM_REFCNT(sl->family_data));
+	while(refcnt) {
+		msleep(1000);
+		refcnt = atomic_read(THERM_REFCNT(sl->family_data));
+	}
 	kfree(sl->family_data);
 	sl->family_data = NULL;
 }
@@ -194,13 +210,30 @@ static ssize_t w1_slave_show(struct device *device,
 	struct w1_slave *sl = dev_to_w1_slave(device);
 	struct w1_master *dev = sl->master;
 	u8 rom[9], crc, verdict, external_power;
-	int i, max_trying = 10;
+	int i, ret, max_trying = 10;
 	ssize_t c = PAGE_SIZE;
+	u8 *family_data = sl->family_data;
+
+	ret = mutex_lock_interruptible(&dev->bus_mutex);
+	if (ret != 0)
+		goto post_unlock;
 
-	i = mutex_lock_interruptible(&dev->bus_mutex);
-	if (i != 0)
-		return i;
+	if(!sl->family_data)
+	{
+		ret = -ENODEV;
+		/* Note for anyoe who actually saw this message, it is a known
+		 * problem with either slave drivers or this driver in
+		 * particular and the request is only a canary indication as
+		 * to how many people and how often it is being ran into.
+		 */
+		printk(KERN_NOTICE
+			"%s: %u sl->family_data is NULL please report\n",
+			__FILE__, __LINE__);
+		goto pre_unlock;
+	}
 
+	/* prevent the slave from going away in sleep */
+	atomic_inc(THERM_REFCNT(family_data));
 	memset(rom, 0, sizeof(rom));
 
 	while (max_trying--) {
@@ -230,17 +263,19 @@ static ssize_t w1_slave_show(struct device *device,
 				mutex_unlock(&dev->bus_mutex);
 
 				sleep_rem = msleep_interruptible(tm);
-				if (sleep_rem != 0)
-					return -EINTR;
+				if (sleep_rem != 0) {
+					ret = -EINTR;
+					goto post_unlock;
+				}
 
-				i = mutex_lock_interruptible(&dev->bus_mutex);
-				if (i != 0)
-					return i;
+				ret = mutex_lock_interruptible(&dev->bus_mutex);
+				if (ret != 0)
+					goto post_unlock;
 			} else if (!w1_strong_pullup) {
 				sleep_rem = msleep_interruptible(tm);
 				if (sleep_rem != 0) {
-					mutex_unlock(&dev->bus_mutex);
-					return -EINTR;
+					ret = -EINTR;
+					goto pre_unlock;
 				}
 			}
 
@@ -269,19 +304,24 @@ static ssize_t w1_slave_show(struct device *device,
 	c -= snprintf(buf + PAGE_SIZE - c, c, ": crc=%02x %s\n",
 			   crc, (verdict) ? "YES" : "NO");
 	if (verdict)
-		memcpy(sl->family_data, rom, sizeof(rom));
+		memcpy(family_data, rom, sizeof(rom));
 	else
 		dev_warn(device, "Read failed CRC check\n");
 
 	for (i = 0; i < 9; ++i)
 		c -= snprintf(buf + PAGE_SIZE - c, c, "%02x ",
-			      ((u8 *)sl->family_data)[i]);
+			      ((u8 *)family_data)[i]);
 
 	c -= snprintf(buf + PAGE_SIZE - c, c, "t=%d\n",
 		w1_convert_temp(rom, sl->family->fid));
+	ret = PAGE_SIZE - c;
+
+pre_unlock:
 	mutex_unlock(&dev->bus_mutex);
 
-	return PAGE_SIZE - c;
+post_unlock:
+	atomic_dec(THERM_REFCNT(family_data));
+	return ret;
 }
 
 static int __init w1_therm_init(void)
-- 
1.7.10.4


  reply	other threads:[~2015-03-18  4:20 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CA+1k2cH5Y+RngvEbgE3CW5g+bO3V1ytDJC=u6DLVDZvbEOhu5A@mail.gmail.com>
     [not found] ` <CA+1k2cF1frcEUu-L_cSJxTp=GKExn6Vt2rdCeY=zrhM62FUggw@mail.gmail.com>
     [not found]   ` <CA+1k2cEwtY+U7TS3rpmMp5nEBokO8vwhcOiD0ExVQnuoB=XVLQ@mail.gmail.com>
2015-02-23 17:09     ` Fwd: w1/slaves/w1_therm: null-ptr access of sl->family_data Thorsten Bschorr
2015-02-24  1:37       ` David Fries
2015-02-25  9:28         ` Thorsten Bschorr
2015-02-27  8:43 ` [PATCH] Avoid null-pointer access in w1/slaves/w1_therm Thorsten.Bschorr
2015-02-28 20:17   ` David Fries
     [not found]     ` <369891425174502@web4m.yandex.ru>
2015-03-01  2:17       ` David Fries
2015-03-01 13:04         ` Thorsten Bschorr
2015-03-02  0:17           ` David Fries
2015-03-04 15:36             ` Евгений Поляков
2015-03-08 21:14               ` David Fries
2015-03-09 22:47                 ` Thorsten Bschorr
2015-03-09 23:09                   ` David Fries
2015-03-10  0:05                     ` Thorsten Bschorr
2015-03-10  0:34                       ` Thorsten Bschorr
2015-03-12  0:44                         ` David Fries
2015-03-10 13:52                     ` Evgeniy Polyakov
2015-03-12  0:54                       ` David Fries
2015-03-14 20:55                         ` Evgeniy Polyakov
2015-03-18  4:20                           ` David Fries [this message]
2015-03-18 15:18                             ` Evgeniy Polyakov
2015-03-19  0:09                               ` David Fries
     [not found]                                 ` <CADq11+-tZmQEVUL=sHC64i4auC_5i=+y2yBcMTaJMdD5Z0dE6w@mail.gmail.com>
2015-04-16  3:51                                   ` David Fries
2015-04-16 11:57                                     ` Evgeniy Polyakov
     [not found]                                       ` <CA+1k2cFw+2NTOtbSaJ1S=kBAn2Mj62DTeZo68V9t1Wk-7m7GyA@mail.gmail.com>
2015-04-17 12:55                                         ` Evgeniy Polyakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150318042037.GE21067@spacedout.fries.net \
    --to=david@fries.net \
    --cc=jonathan.alibert@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=thorsten@bschorr.de \
    --cc=zbr@ioremap.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox