Re: Race condition between "read CFQ stats" and "block device shutdown"

linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Race condition between "read CFQ stats" and "block device shutdown"
       [not found]       ` <CAOMFOmWWfxCZ7ND_Vso4UkSEOqGm=o-xsqrfrj5MKdf9_jr1gA@mail.gmail.com>
@ 2013-09-26 13:54         ` Tejun Heo
  2013-09-26 14:18           ` Hannes Reinecke
  0 siblings, 1 reply; 6+ messages in thread
From: Tejun Heo @ 2013-09-26 13:54 UTC (permalink / raw)
  To: Anatol Pomozov; +Cc: Hannes Reinecke, Cgroups, Jens Axboe, linux-scsi

Hello, (cc'ing linux-scsi)

On Wed, Sep 25, 2013 at 01:37:51PM -0700, Anatol Pomozov wrote:
> Hi
> 
> On Wed, Sep 4, 2013 at 9:07 AM, Tejun Heo <tj@kernel.org> wrote:
> > Hello,
> >
> > On Wed, Sep 04, 2013 at 08:45:33AM -0700, Anatol Pomozov wrote:
> >> I am not an expect in block code, so I have a few questions here:
> >>
> >> - are we sure that this operation is atomic? What if blkg->q becomes
> >> dead right after we checked it, and blkg->q->queue_lock got invalid so
> >> we have the same crash as before?
> >
> > request_queue lock switching is something inherently broken in block
> > layer.  It's unsalvageable.
> 
> Fully agree. The problem that request_queue->queue_lock is a shared
> resource that concurrently modified/accessed. In this case (when one
> thread changes, another thread access it) we need synchronization to
> prevent race conditions. So we need a spin_lock to access queue_lock
> spin_lock, otherwise we have a crash like one above...
> 
> >  Maybe we can drop lock switching once blk-mq is fully merged.
> 
> Could you please provide more information about it? What is the timeline?

I have no idea.  Hopefully, not too far out.  Jens would have better
idea.

> If there is an easy way to fix the race condition I would like to
> help. Please give me some pointer what direction I should move.

The first step would be identifying who are actually making use of
lock switching, why and how much difference it would make for them to
not do that.

> PS Just a little bit of context why I care about this bug. We test a
> large farm that actively uses iscsi. We are going to have a lot of
> iscsi device startup/shutdown. I am testing whether this codepath has
> race conditions and I found one above.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race condition between "read CFQ stats" and "block device shutdown"
  2013-09-26 13:54         ` Race condition between "read CFQ stats" and "block device shutdown" Tejun Heo
@ 2013-09-26 14:18           ` Hannes Reinecke
  2013-09-26 14:20             ` Tejun Heo
       [not found]             ` <5244423A.2050107-l3A5Bk7waGM@public.gmane.org>
  0 siblings, 2 replies; 6+ messages in thread
From: Hannes Reinecke @ 2013-09-26 14:18 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Anatol Pomozov, Cgroups, Jens Axboe, linux-scsi

On 09/26/2013 03:54 PM, Tejun Heo wrote:
> Hello, (cc'ing linux-scsi)
> 
> On Wed, Sep 25, 2013 at 01:37:51PM -0700, Anatol Pomozov wrote:
>> Hi
>>
>> On Wed, Sep 4, 2013 at 9:07 AM, Tejun Heo <tj@kernel.org> wrote:
>>> Hello,
>>>
>>> On Wed, Sep 04, 2013 at 08:45:33AM -0700, Anatol Pomozov wrote:
>>>> I am not an expect in block code, so I have a few questions here:
>>>>
>>>> - are we sure that this operation is atomic? What if blkg->q becomes
>>>> dead right after we checked it, and blkg->q->queue_lock got invalid so
>>>> we have the same crash as before?
>>>
>>> request_queue lock switching is something inherently broken in block
>>> layer.  It's unsalvageable.
>>
>> Fully agree. The problem that request_queue->queue_lock is a shared
>> resource that concurrently modified/accessed. In this case (when one
>> thread changes, another thread access it) we need synchronization to
>> prevent race conditions. So we need a spin_lock to access queue_lock
>> spin_lock, otherwise we have a crash like one above...
>>
>>>  Maybe we can drop lock switching once blk-mq is fully merged.
>>
>> Could you please provide more information about it? What is the timeline?
> 
> I have no idea.  Hopefully, not too far out.  Jens would have better
> idea.
> 
>> If there is an easy way to fix the race condition I would like to
>> help. Please give me some pointer what direction I should move.
> 
> The first step would be identifying who are actually making use of
> lock switching, why and how much difference it would make for them to
> not do that.
> 
Typically, the lock is being used by the block drivers to
synchronize access between some internal data structures and the
request queue itself. You don't actually _need_ to do it that way,
but removing the lock switching would involve quite some redesign of
these drivers.
Give that most of the are rather oldish I really wouldn't want to
touch them.

However, none of the modern devices should be using this lock
switching, so I would just ignore it.
EG SCSI most definitely doesn't use it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race condition between "read CFQ stats" and "block device shutdown"
  2013-09-26 14:18           ` Hannes Reinecke
@ 2013-09-26 14:20             ` Tejun Heo
       [not found]             ` <5244423A.2050107-l3A5Bk7waGM@public.gmane.org>
  1 sibling, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2013-09-26 14:20 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: Anatol Pomozov, Cgroups, Jens Axboe, linux-scsi

Hey, Hannes.

On Thu, Sep 26, 2013 at 04:18:34PM +0200, Hannes Reinecke wrote:
> However, none of the modern devices should be using this lock
> switching, so I would just ignore it.
> EG SCSI most definitely doesn't use it.

The kernel is crashing from it, so I don't think ignoring is an
acceptable strategy here.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race condition between "read CFQ stats" and "block device shutdown"
       [not found]             ` <5244423A.2050107-l3A5Bk7waGM@public.gmane.org>
@ 2013-09-26 16:23               ` Anatol Pomozov
  2013-09-26 16:30                 ` Tejun Heo
       [not found]                 ` <CAOMFOmX2f35qWyTr7=1HNu=RMB_LMAmpMbYxSEsX1xgURhx_mg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 2 replies; 6+ messages in thread
From: Anatol Pomozov @ 2013-09-26 16:23 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Tejun Heo, Cgroups, Jens Axboe, linux-scsi-u79uwXL29TY76Z2rM5mHXA

Hi

On Thu, Sep 26, 2013 at 7:18 AM, Hannes Reinecke <hare-l3A5Bk7waGM@public.gmane.org> wrote:
> On 09/26/2013 03:54 PM, Tejun Heo wrote:
>> Hello, (cc'ing linux-scsi)
>>
>> On Wed, Sep 25, 2013 at 01:37:51PM -0700, Anatol Pomozov wrote:
>>> Hi
>>>
>>> On Wed, Sep 4, 2013 at 9:07 AM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>>> Hello,
>>>>
>>>> On Wed, Sep 04, 2013 at 08:45:33AM -0700, Anatol Pomozov wrote:
>>>>> I am not an expect in block code, so I have a few questions here:
>>>>>
>>>>> - are we sure that this operation is atomic? What if blkg->q becomes
>>>>> dead right after we checked it, and blkg->q->queue_lock got invalid so
>>>>> we have the same crash as before?
>>>>
>>>> request_queue lock switching is something inherently broken in block
>>>> layer.  It's unsalvageable.
>>>
>>> Fully agree. The problem that request_queue->queue_lock is a shared
>>> resource that concurrently modified/accessed. In this case (when one
>>> thread changes, another thread access it) we need synchronization to
>>> prevent race conditions. So we need a spin_lock to access queue_lock
>>> spin_lock, otherwise we have a crash like one above...
>>>
>>>>  Maybe we can drop lock switching once blk-mq is fully merged.
>>>
>>> Could you please provide more information about it? What is the timeline?
>>
>> I have no idea.  Hopefully, not too far out.  Jens would have better
>> idea.
>>
>>> If there is an easy way to fix the race condition I would like to
>>> help. Please give me some pointer what direction I should move.
>>
>> The first step would be identifying who are actually making use of
>> lock switching, why and how much difference it would make for them to
>> not do that.
>>
> Typically, the lock is being used by the block drivers to
> synchronize access between some internal data structures and the
> request queue itself. You don't actually _need_ to do it that way,
> but removing the lock switching would involve quite some redesign of
> these drivers.
> Give that most of the are rather oldish I really wouldn't want to
> touch them.

Thanks for the info.

We use modified version of "sbull" block device driver from GKH book.
We use it for testing block device startup/shutdown path + CFQ
manipulation.

The sbull driver uses function
blk_init_queue(..., &dev->qlock);

it passes lock as a second parameter and function makes the lock
swapping. According to your information passing lock to
blk_init_queue() considered as an "old non recommended way" so we
should modify our driver and avoid doing this. I'll take a look.

Thanks.


>
> However, none of the modern devices should be using this lock
> switching, so I would just ignore it.
> EG SCSI most definitely doesn't use it.
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                   zSeries & Storage
> hare-l3A5Bk7waGM@public.gmane.org                          +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race condition between "read CFQ stats" and "block device shutdown"
  2013-09-26 16:23               ` Anatol Pomozov
@ 2013-09-26 16:30                 ` Tejun Heo
       [not found]                 ` <CAOMFOmX2f35qWyTr7=1HNu=RMB_LMAmpMbYxSEsX1xgURhx_mg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2013-09-26 16:30 UTC (permalink / raw)
  To: Anatol Pomozov; +Cc: Hannes Reinecke, Cgroups, Jens Axboe, linux-scsi

Hello,

On Thu, Sep 26, 2013 at 09:23:19AM -0700, Anatol Pomozov wrote:
> We use modified version of "sbull" block device driver from GKH book.
> We use it for testing block device startup/shutdown path + CFQ
> manipulation.
> 
> The sbull driver uses function
> blk_init_queue(..., &dev->qlock);
> 
> it passes lock as a second parameter and function makes the lock
> swapping. According to your information passing lock to
> blk_init_queue() considered as an "old non recommended way" so we
> should modify our driver and avoid doing this. I'll take a look.

I wonder whether we can make request_queue record the passed in lock
and grab it in addition to the queue lock instead of replacing it when
invoking the request function.  This would mean more overhead for the
broken drivers but given how legacy they are at this point, I don't
think that's gonna be a big problem.  if that's likely to work, we may
as well just push all the extra locking to the legacy drivers so that
their request functions take the extra lock and then we can remove the
spinlock argument completely.  I'm not sure whether the locking in
request path is the only thing necessary tho.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Race condition between "read CFQ stats" and "block device shutdown"
       [not found]                 ` <CAOMFOmX2f35qWyTr7=1HNu=RMB_LMAmpMbYxSEsX1xgURhx_mg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-09-27  5:59                   ` Hannes Reinecke
  0 siblings, 0 replies; 6+ messages in thread
From: Hannes Reinecke @ 2013-09-27  5:59 UTC (permalink / raw)
  To: Anatol Pomozov
  Cc: Tejun Heo, Cgroups, Jens Axboe, linux-scsi-u79uwXL29TY76Z2rM5mHXA

On 09/26/2013 06:23 PM, Anatol Pomozov wrote:
> Hi
> 
> On Thu, Sep 26, 2013 at 7:18 AM, Hannes Reinecke <hare-l3A5Bk7waGM@public.gmane.org> wrote:
>> On 09/26/2013 03:54 PM, Tejun Heo wrote:
>>> Hello, (cc'ing linux-scsi)
>>>
>>> On Wed, Sep 25, 2013 at 01:37:51PM -0700, Anatol Pomozov wrote:
>>>> Hi
>>>>
>>>> On Wed, Sep 4, 2013 at 9:07 AM, Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>>>> Hello,
>>>>>
>>>>> On Wed, Sep 04, 2013 at 08:45:33AM -0700, Anatol Pomozov wrote:
>>>>>> I am not an expect in block code, so I have a few questions here:
>>>>>>
>>>>>> - are we sure that this operation is atomic? What if blkg->q becomes
>>>>>> dead right after we checked it, and blkg->q->queue_lock got invalid so
>>>>>> we have the same crash as before?
>>>>>
>>>>> request_queue lock switching is something inherently broken in block
>>>>> layer.  It's unsalvageable.
>>>>
>>>> Fully agree. The problem that request_queue->queue_lock is a shared
>>>> resource that concurrently modified/accessed. In this case (when one
>>>> thread changes, another thread access it) we need synchronization to
>>>> prevent race conditions. So we need a spin_lock to access queue_lock
>>>> spin_lock, otherwise we have a crash like one above...
>>>>
>>>>>  Maybe we can drop lock switching once blk-mq is fully merged.
>>>>
>>>> Could you please provide more information about it? What is the timeline?
>>>
>>> I have no idea.  Hopefully, not too far out.  Jens would have better
>>> idea.
>>>
>>>> If there is an easy way to fix the race condition I would like to
>>>> help. Please give me some pointer what direction I should move.
>>>
>>> The first step would be identifying who are actually making use of
>>> lock switching, why and how much difference it would make for them to
>>> not do that.
>>>
>> Typically, the lock is being used by the block drivers to
>> synchronize access between some internal data structures and the
>> request queue itself. You don't actually _need_ to do it that way,
>> but removing the lock switching would involve quite some redesign of
>> these drivers.
>> Give that most of the are rather oldish I really wouldn't want to
>> touch them.
> 
> Thanks for the info.
> 
> We use modified version of "sbull" block device driver from GKH book.
> We use it for testing block device startup/shutdown path + CFQ
> manipulation.
> 
> The sbull driver uses function
> blk_init_queue(..., &dev->qlock);
> 
> it passes lock as a second parameter and function makes the lock
> swapping. According to your information passing lock to
> blk_init_queue() considered as an "old non recommended way" so we
> should modify our driver and avoid doing this. I'll take a look.
> 
Yep. I would recommend to use the queue_lock directly whenever you
need to pull requests off the request_queue, and use your own lock
to protect any internal structures associated with handling the
request internally.

Yes, this _might_ involve some lock dancing between the queue_lock
and the internal lock, but has the advantage that the internal lock
typically it used far more often than the queue_lock itself.
So you might even get some speed advantage here as you reduce
contention on the queue_lock.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare-l3A5Bk7waGM@public.gmane.org			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-09-27  5:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAOMFOmXJ5ZTYdOvdUt-oxsouhPGRmMshCRhn6AFgmFAGZw5WZA@mail.gmail.com>
     [not found] ` <5226D661.7070301@suse.de>
     [not found]   ` <CAOMFOmUCqXN1uaqBEWH3PStuZXvnvLw=YrARgv7DvqO6Y4bFPQ@mail.gmail.com>
     [not found]     ` <20130904160723.GC26609@mtj.dyndns.org>
     [not found]       ` <CAOMFOmWWfxCZ7ND_Vso4UkSEOqGm=o-xsqrfrj5MKdf9_jr1gA@mail.gmail.com>
2013-09-26 13:54         ` Race condition between "read CFQ stats" and "block device shutdown" Tejun Heo
2013-09-26 14:18           ` Hannes Reinecke
2013-09-26 14:20             ` Tejun Heo
     [not found]             ` <5244423A.2050107-l3A5Bk7waGM@public.gmane.org>
2013-09-26 16:23               ` Anatol Pomozov
2013-09-26 16:30                 ` Tejun Heo
     [not found]                 ` <CAOMFOmX2f35qWyTr7=1HNu=RMB_LMAmpMbYxSEsX1xgURhx_mg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-27  5:59                   ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).