Re: [PATCH] driver core: fix shutdown races with probe/remove(v2)

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Ming Lei <ming.lei@canonical.com>
Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	Alan Stern <stern@rowland.harvard.edu>,
	stable@vger.kernel.org
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove(v2)
Date: Wed, 20 Jun 2012 15:37:54 -0700	[thread overview]
Message-ID: <20120620223754.GA5864@kroah.com> (raw)
In-Reply-To: <CACVXFVOr1mnjaEoGpc3O_MDuvi_KHTXtsEBQ7LLoaeWxWRa0aQ@mail.gmail.com>

On Tue, Jun 19, 2012 at 10:00:36AM +0800, Ming Lei wrote:
> On Tue, Jun 19, 2012 at 6:25 AM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> 
> > Have you read Documentation/stable_kernel_patches.txt?  Please do so and
> 
> I haven't read Documentation/stable_kernel_patches.txt, but read
> Documentation/stable_kernel_rules.txt, :-)

Sorry, you are correct :)

> > see why I can't take this patch for a stable tree.  Note that no one has
> > ever reported this as a bug before, and the original poster ran away
> > never to be heard from again, so I really don't think it was a real
> > problem that people ever saw.
> 
> If Documentation/stable_kernel_rules.txt is the correct doc for stable rule,
> looks reporter requirement isn't listed in the file, but the below can be found:
> 
>           - No "theoretical race condition" issues, unless an explanation of
>          how the race can be exploited is also provided.
> 
> so I marked it as -stable because I have explained how the race can be
> exploited in reality.

Ok, but as this has been there since when, 2.5, I'll refrain from
marking it this way, as no one has reported a real problem like this
before.

> >> >> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> >> >> ---
> >> >> v2:
> >> >>       - take Alan's suggestion to use device_trylock to avoid
> >> >>       hanging during shutdown by buggy device or driver
> >> >>       - hold parent reference counter
> >> >>
> >> >>  drivers/base/core.c |   32 ++++++++++++++++++++++++++++++++
> >> >>  1 file changed, 32 insertions(+)
> >> >>
> >> >> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >> >> index 346be8b..f2fc989 100644
> >> >> --- a/drivers/base/core.c
> >> >> +++ b/drivers/base/core.c
> >> >> @@ -1796,6 +1796,16 @@ out:
> >> >>  }
> >> >>  EXPORT_SYMBOL_GPL(device_move);
> >> >>
> >> >> +static int __try_lock(struct device *dev)
> >> >> +{
> >> >> +     int i = 0;
> >> >> +
> >> >> +     while (!device_trylock(dev) && i++ < 100)
> >> >> +             msleep(10);
> >> >> +
> >> >> +     return i < 100;
> >> >> +}
> >> >
> >> > That's a totally arbritary time, why does this work and other times do
> >> > not?  And what is this returning, if the lock was grabbed successfully?
> >>
> >>  It is a timeout time and is 1sec now. If the lock can't be held in 1sec, the
> >>  function will return 0, otherwise it will return 1 and indicates that the lock
> >>  has been held successfully.
> >
> > My point is why 1 second?  That's completly arbitrary and means nothing.
> 
> 1 second is a estimated value, just like many other timeout values in kernel.
> 
> For example, the timeout passed to usb_control_msg() is mostly estimated
> first, then may be adjusted later from some new report.
> 
> Another example is one recent xhci fix commit:
> 
> 622eb783fe(xHCI: Increase the timeout for controller save/restore state
> operation)
> 
> the timeout value is just increased arbitrarily to adapt new device.
> 
> I still have more many examples in kernel about timeout value...

Yes, I know this, but now you are putting a limit on the amount of time
a probe function can take, when before, we have never had one.  That's
not something to be taken lightly, and is one I know is not true.

> > Why not just do a real lock and try for forever?
> 
> IMO, there are two advantages not just doing a real lock for forever:
> 
> - avoiding buggy device/driver to hang the system
> - with trylock, we can log the buggy device so that it is a bit
> easier to troubleshoot the buggy drivers, suppose the bug is
> only triggered 1 time in one year or more

No, just fix the driver, I don't want to put a time limit on how long
probe can take, as we never have in the past and I'm sure that whatever
we pick, will be wrong for someone.

I have seen devices that take many seconds, and minutes for some if bad
things happen (i.e. the firmware doesn't download properly).  Don't
break people's working systems.

greg k-h

WARNING: multiple messages have this Message-ID (diff)

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Ming Lei <ming.lei@canonical.com>
Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	Alan Stern <stern@rowland.harvard.edu>,
	stable@vger.kernel.org
Subject: Re: [PATCH] driver core: fix shutdown races with probe/remove(v2)
Date: Wed, 20 Jun 2012 15:37:54 -0700	[thread overview]
Message-ID: <20120620223754.GA5864@kroah.com> (raw)
In-Reply-To: <CACVXFVOr1mnjaEoGpc3O_MDuvi_KHTXtsEBQ7LLoaeWxWRa0aQ@mail.gmail.com>

On Tue, Jun 19, 2012 at 10:00:36AM +0800, Ming Lei wrote:
> On Tue, Jun 19, 2012 at 6:25 AM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> 
> > Have you read Documentation/stable_kernel_patches.txt? ï¿½Please do so and
> 
> I haven't read Documentation/stable_kernel_patches.txt, but read
> Documentation/stable_kernel_rules.txt, :-)

Sorry, you are correct :)

> > see why I can't take this patch for a stable tree. ï¿½Note that no one has
> > ever reported this as a bug before, and the original poster ran away
> > never to be heard from again, so I really don't think it was a real
> > problem that people ever saw.
> 
> If Documentation/stable_kernel_rules.txt is the correct doc for stable rule,
> looks reporter requirement isn't listed in the file, but the below can be found:
> 
>           - No "theoretical race condition" issues, unless an explanation of
>          how the race can be exploited is also provided.
> 
> so I marked it as -stable because I have explained how the race can be
> exploited in reality.

Ok, but as this has been there since when, 2.5, I'll refrain from
marking it this way, as no one has reported a real problem like this
before.

> >> >> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> >> >> ---
> >> >> v2:
> >> >> ï¿½ ï¿½ ï¿½ - take Alan's suggestion to use device_trylock to avoid
> >> >> ï¿½ ï¿½ ï¿½ hanging during shutdown by buggy device or driver
> >> >> ï¿½ ï¿½ ï¿½ - hold parent reference counter
> >> >>
> >> >> ï¿½drivers/base/core.c | ï¿½ 32 ++++++++++++++++++++++++++++++++
> >> >> ï¿½1 file changed, 32 insertions(+)
> >> >>
> >> >> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >> >> index 346be8b..f2fc989 100644
> >> >> --- a/drivers/base/core.c
> >> >> +++ b/drivers/base/core.c
> >> >> @@ -1796,6 +1796,16 @@ out:
> >> >> ï¿½}
> >> >> ï¿½EXPORT_SYMBOL_GPL(device_move);
> >> >>
> >> >> +static int __try_lock(struct device *dev)
> >> >> +{
> >> >> + ï¿½ ï¿½ int i = 0;
> >> >> +
> >> >> + ï¿½ ï¿½ while (!device_trylock(dev) && i++ < 100)
> >> >> + ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ msleep(10);
> >> >> +
> >> >> + ï¿½ ï¿½ return i < 100;
> >> >> +}
> >> >
> >> > That's a totally arbritary time, why does this work and other times do
> >> > not? ï¿½And what is this returning, if the lock was grabbed successfully?
> >>
> >> ï¿½It is a timeout time and is 1sec now. If the lock can't be held in 1sec, the
> >> ï¿½function will return 0, otherwise it will return 1 and indicates that the lock
> >> ï¿½has been held successfully.
> >
> > My point is why 1 second? ï¿½That's completly arbitrary and means nothing.
> 
> 1 second is a estimated value, just like many other timeout values in kernel.
> 
> For example, the timeout passed to usb_control_msg() is mostly estimated
> first, then may be adjusted later from some new report.
> 
> Another example is one recent xhci fix commit:
> 
> 622eb783fe(xHCI: Increase the timeout for controller save/restore state
> operation)
> 
> the timeout value is just increased arbitrarily to adapt new device.
> 
> I still have more many examples in kernel about timeout value...

Yes, I know this, but now you are putting a limit on the amount of time
a probe function can take, when before, we have never had one.  That's
not something to be taken lightly, and is one I know is not true.

> > Why not just do a real lock and try for forever?
> 
> IMO, there are two advantages not just doing a real lock for forever:
> 
> - avoiding buggy device/driver to hang the system
> - with trylock, we can log the buggy device so that it is a bit
> easier to troubleshoot the buggy drivers, suppose the bug is
> only triggered 1 time in one year or more

No, just fix the driver, I don't want to put a time limit on how long
probe can take, as we never have in the past and I'm sure that whatever
we pick, will be wrong for someone.

I have seen devices that take many seconds, and minutes for some if bad
things happen (i.e. the firmware doesn't download properly).  Don't
break people's working systems.

greg k-h

next prev parent reply	other threads:[~2012-06-20 22:38 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-11  5:13 [PATCH] driver core: fix shutdown races with probe/remove(v2) Ming Lei
2012-06-11 14:16 ` Alan Stern
2012-06-11 14:43   ` Ming Lei
2012-06-11 14:43     ` Ming Lei
2012-06-11 16:02     ` Alan Stern
2012-06-11 16:02       ` Alan Stern
2012-06-12  1:02       ` Ming Lei
2012-06-12  1:02         ` Ming Lei
2012-06-15 22:03 ` Greg Kroah-Hartman
2012-06-18  1:52   ` Ming Lei
2012-06-18  1:52     ` Ming Lei
2012-06-18 22:25     ` Greg Kroah-Hartman
2012-06-18 22:25       ` Greg Kroah-Hartman
2012-06-19  2:00       ` Ming Lei
2012-06-19  2:00         ` Ming Lei
2012-06-20 22:37         ` Greg Kroah-Hartman [this message]
2012-06-20 22:37           ` Greg Kroah-Hartman
2012-06-20 23:33           ` Jonathan Nieder
2012-06-21  1:21           ` Ming Lei
2012-06-21  1:21             ` Ming Lei
2012-06-21 13:49             ` Greg Kroah-Hartman
2012-06-21 13:49               ` Greg Kroah-Hartman
2012-06-21 16:04               ` Ming Lei
2012-06-21 16:12                 ` Alan Stern
2012-06-21 16:21                   ` Ming Lei
2012-06-21 16:21                     ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120620223754.GA5864@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=ming.lei@canonical.com \
    --cc=stable@vger.kernel.org \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.