public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH -stable] block: destroy bdi before blockdev is unregistered.
       [not found] ` <20150423160551.45345f96@notabene.brown>
@ 2015-04-27  4:12   ` NeilBrown
  2015-04-27 13:03     ` Christoph Hellwig
                       ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: NeilBrown @ 2015-04-27  4:12 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Azat Khuzhin, Christoph Hellwig, Kernel.org-Linux-RAID,
	Guoqing Jiang, Tejun Heo, Jan Kara, lkml

[-- Attachment #1: Type: text/plain, Size: 3587 bytes --]


Because of the peculiar way that md devices are created (automatically
when the device node is opened), a new device can be created and
registered immediately after the
	blk_unregister_region(disk_devt(disk), disk->minors);
call in del_gendisk().

Therefore it is important that all visible artifacts of the previous
device are removed before this call.  In particular, the 'bdi'.

Since:
commit c4db59d31e39ea067c32163ac961e9c80198fd37
Author: Christoph Hellwig <hch@lst.de>
    fs: don't reassign dirty inodes to default_backing_dev_info

moved the
   device_unregister(bdi->dev);
call from bdi_unregister() to bdi_destroy() it has been quite easy to
lose a race and have a new (e.g.) "md127" be created after the
blk_unregister_region() call and before bdi_destroy() is ultimately
called by the final 'put_disk', which must come after del_gendisk().

The new device finds that the bdi name is already registered in sysfs
and complains

> [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
> [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'

We can fix this by moving the bdi_destroy() call out of
blk_release_queue() (which can happen very late when a refcount
reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
device driver calls it.

Then it is only necessary for md to call blk_cleanup_queue() before
del_gendisk().  As loop.c devices are also created on demand by
opening the device node, we make the same change there.

Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
Reported-by: Azat Khuzhin <a3at.mail@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org (v4.0)
Signed-off-by: NeilBrown <neilb@suse.de>

--
Hi Jens,
 if you could check this and forward on to Linus I'd really appreciate it.

Thanks,
NeilBrown


diff --git a/block/blk-core.c b/block/blk-core.c
index fd154b94447a..7871603f0a29 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -552,6 +552,8 @@ void blk_cleanup_queue(struct request_queue *q)
 		q->queue_lock = &q->__queue_lock;
 	spin_unlock_irq(lock);
 
+	bdi_destroy(&q->backing_dev_info);
+
 	/* @q is and will stay empty, shutdown and put */
 	blk_put_queue(q);
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index faaf36ade7eb..2b8fd302f677 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -522,8 +522,6 @@ static void blk_release_queue(struct kobject *kobj)
 
 	blk_trace_shutdown(q);
 
-	bdi_destroy(&q->backing_dev_info);
-
 	ida_simple_remove(&blk_queue_ida, q->id);
 	call_rcu(&q->rcu_head, blk_free_queue_rcu);
 }
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ae3fcb4199e9..d7173cb1ea76 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1620,8 +1620,8 @@ out:
 
 static void loop_remove(struct loop_device *lo)
 {
-	del_gendisk(lo->lo_disk);
 	blk_cleanup_queue(lo->lo_queue);
+	del_gendisk(lo->lo_disk);
 	blk_mq_free_tag_set(&lo->tag_set);
 	put_disk(lo->lo_disk);
 	kfree(lo);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index d4f31e195e26..593a02476c78 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4818,12 +4818,12 @@ static void md_free(struct kobject *ko)
 	if (mddev->sysfs_state)
 		sysfs_put(mddev->sysfs_state);
 
+	if (mddev->queue)
+		blk_cleanup_queue(mddev->queue);
 	if (mddev->gendisk) {
 		del_gendisk(mddev->gendisk);
 		put_disk(mddev->gendisk);
 	}
-	if (mddev->queue)
-		blk_cleanup_queue(mddev->queue);
 
 	kfree(mddev);
 }

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH -stable] block: destroy bdi before blockdev is unregistered.
  2015-04-27  4:12   ` [PATCH -stable] block: destroy bdi before blockdev is unregistered NeilBrown
@ 2015-04-27 13:03     ` Christoph Hellwig
  2015-04-27 16:27     ` Jens Axboe
  2015-04-28 16:41     ` Mike Snitzer
  2 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2015-04-27 13:03 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jens Axboe, Azat Khuzhin, Christoph Hellwig,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -stable] block: destroy bdi before blockdev is unregistered.
  2015-04-27  4:12   ` [PATCH -stable] block: destroy bdi before blockdev is unregistered NeilBrown
  2015-04-27 13:03     ` Christoph Hellwig
@ 2015-04-27 16:27     ` Jens Axboe
  2015-04-28 16:41     ` Mike Snitzer
  2 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2015-04-27 16:27 UTC (permalink / raw)
  To: NeilBrown
  Cc: Azat Khuzhin, Christoph Hellwig, Kernel.org-Linux-RAID,
	Guoqing Jiang, Tejun Heo, Jan Kara, lkml

On 04/26/2015 10:12 PM, NeilBrown wrote:
>
> Because of the peculiar way that md devices are created (automatically
> when the device node is opened), a new device can be created and
> registered immediately after the
> 	blk_unregister_region(disk_devt(disk), disk->minors);
> call in del_gendisk().
>
> Therefore it is important that all visible artifacts of the previous
> device are removed before this call.  In particular, the 'bdi'.
>
> Since:
> commit c4db59d31e39ea067c32163ac961e9c80198fd37
> Author: Christoph Hellwig <hch@lst.de>
>      fs: don't reassign dirty inodes to default_backing_dev_info
>
> moved the
>     device_unregister(bdi->dev);
> call from bdi_unregister() to bdi_destroy() it has been quite easy to
> lose a race and have a new (e.g.) "md127" be created after the
> blk_unregister_region() call and before bdi_destroy() is ultimately
> called by the final 'put_disk', which must come after del_gendisk().
>
> The new device finds that the bdi name is already registered in sysfs
> and complains
>
>> [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
>> [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'
>
> We can fix this by moving the bdi_destroy() call out of
> blk_release_queue() (which can happen very late when a refcount
> reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
> device driver calls it.
>
> Then it is only necessary for md to call blk_cleanup_queue() before
> del_gendisk().  As loop.c devices are also created on demand by
> opening the device node, we make the same change there.
>
> Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
> Reported-by: Azat Khuzhin <a3at.mail@gmail.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: stable@vger.kernel.org (v4.0)
> Signed-off-by: NeilBrown <neilb@suse.de>
>
> --
> Hi Jens,
>   if you could check this and forward on to Linus I'd really appreciate it.

Yup, I added it. BTW, that line needs 3 '-', otherwise git am will pick 
up the comments below :-)

Thanks Neil.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -stable] block: destroy bdi before blockdev is unregistered.
  2015-04-27  4:12   ` [PATCH -stable] block: destroy bdi before blockdev is unregistered NeilBrown
  2015-04-27 13:03     ` Christoph Hellwig
  2015-04-27 16:27     ` Jens Axboe
@ 2015-04-28 16:41     ` Mike Snitzer
  2015-04-28 21:25       ` NeilBrown
  2 siblings, 1 reply; 12+ messages in thread
From: Mike Snitzer @ 2015-04-28 16:41 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jens Axboe, Azat Khuzhin, Christoph Hellwig,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml,
	device-mapper development

On Mon, Apr 27, 2015 at 12:12 AM, NeilBrown <neilb@suse.de> wrote:
>
> Because of the peculiar way that md devices are created (automatically
> when the device node is opened), a new device can be created and
> registered immediately after the
>         blk_unregister_region(disk_devt(disk), disk->minors);
> call in del_gendisk().
>
> Therefore it is important that all visible artifacts of the previous
> device are removed before this call.  In particular, the 'bdi'.
>
> Since:
> commit c4db59d31e39ea067c32163ac961e9c80198fd37
> Author: Christoph Hellwig <hch@lst.de>
>     fs: don't reassign dirty inodes to default_backing_dev_info
>
> moved the
>    device_unregister(bdi->dev);
> call from bdi_unregister() to bdi_destroy() it has been quite easy to
> lose a race and have a new (e.g.) "md127" be created after the
> blk_unregister_region() call and before bdi_destroy() is ultimately
> called by the final 'put_disk', which must come after del_gendisk().
>
> The new device finds that the bdi name is already registered in sysfs
> and complains
>
>> [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
>> [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'
>
> We can fix this by moving the bdi_destroy() call out of
> blk_release_queue() (which can happen very late when a refcount
> reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
> device driver calls it.
>
> Then it is only necessary for md to call blk_cleanup_queue() before
> del_gendisk().  As loop.c devices are also created on demand by
> opening the device node, we make the same change there.
>
> Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
> Reported-by: Azat Khuzhin <a3at.mail@gmail.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: stable@vger.kernel.org (v4.0)
> Signed-off-by: NeilBrown <neilb@suse.de>
>
> --
> Hi Jens,
>  if you could check this and forward on to Linus I'd really appreciate it.
>
> Thanks,
> NeilBrown
>
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index fd154b94447a..7871603f0a29 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -552,6 +552,8 @@ void blk_cleanup_queue(struct request_queue *q)
>                 q->queue_lock = &q->__queue_lock;
>         spin_unlock_irq(lock);
>
> +       bdi_destroy(&q->backing_dev_info);
> +
>         /* @q is and will stay empty, shutdown and put */
>         blk_put_queue(q);
>  }
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index faaf36ade7eb..2b8fd302f677 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -522,8 +522,6 @@ static void blk_release_queue(struct kobject *kobj)
>
>         blk_trace_shutdown(q);
>
> -       bdi_destroy(&q->backing_dev_info);
> -
>         ida_simple_remove(&blk_queue_ida, q->id);
>         call_rcu(&q->rcu_head, blk_free_queue_rcu);
>  }
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index ae3fcb4199e9..d7173cb1ea76 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1620,8 +1620,8 @@ out:
>
>  static void loop_remove(struct loop_device *lo)
>  {
> -       del_gendisk(lo->lo_disk);
>         blk_cleanup_queue(lo->lo_queue);
> +       del_gendisk(lo->lo_disk);
>         blk_mq_free_tag_set(&lo->tag_set);
>         put_disk(lo->lo_disk);
>         kfree(lo);
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index d4f31e195e26..593a02476c78 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4818,12 +4818,12 @@ static void md_free(struct kobject *ko)
>         if (mddev->sysfs_state)
>                 sysfs_put(mddev->sysfs_state);
>
> +       if (mddev->queue)
> +               blk_cleanup_queue(mddev->queue);
>         if (mddev->gendisk) {
>                 del_gendisk(mddev->gendisk);
>                 put_disk(mddev->gendisk);
>         }
> -       if (mddev->queue)
> -               blk_cleanup_queue(mddev->queue);
>
>         kfree(mddev);
>  }

I've taken this patch into consideration relative to DM, please see:
http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip&id=5dac86fb79697e93297edb6a316b429236d00137

Point is in my private snitzer/wip branch DM now calls
blk_cleanup_queue() before del_gendisk().

With your patch:
1) blk_cleanup_queue -> bdi_destroy -> bdi->dev = NULL;
2) del_gendisk -> bdi_unregister -> WARN_ON_ONCE(!bdi->dev)

So, in testing with DM I'm hitting bdi_unregister's WARN_ON(), are you
seeing this WARN_ON?

[  385.134474] WARNING: CPU: 3 PID: 11231 at mm/backing-dev.c:372
bdi_unregister+0x36/0x40()
[  385.143593] Modules linked in: dm_service_time dm_multipath
target_core_iblock tcm_loop target_core_mod sg iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi core
temp kvm_intel kvm ixgbe igb crct10dif_pclmul crc32_pclmul
crc32c_intel ghash_clmulni_intel mdio aesni_intel ptp pps_core
glue_helper ipmi_si i7core_edac iTCO_wdt lrw
dca iTCO_vendor_support edac_core i2c_i801 pcspkr gf128mul
ipmi_msghandler acpi_power_meter shpchp lpc_ich ablk_helper cryptd
mfd_core acpi_cpufreq xfs libcrc32c sr_mo
d cdrom mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit
drm_kms_helper ata_generic ttm pata_acpi sd_mod ata_piix drm
iomemory_vsl(POE) libata megaraid_sas i2c_c
ore skd dm_mirror dm_region_hash dm_log dm_mod
[  385.213736] CPU: 3 PID: 11231 Comm: dmsetup Tainted: P          IOE
  4.1.0-rc1.snitm+ #55
[  385.222958] Hardware name: FUJITSU
PRIMERGY RX300 S6             /D2619, BIOS 6.00 Rev. 1.10.2619.N1
     05/24/2011
[  385.237602]  0000000000000000 000000006cf7a5f2 ffff88031bfcfb68
ffffffff8167b7e6
[  385.245897]  0000000000000000 0000000000000000 ffff88031bfcfba8
ffffffff8107bd5a
[  385.254193]  0000000000000000 ffff88032c4b6c00 0000000000000000
0000000000000000
[  385.262488] Call Trace:
[  385.265218]  [<ffffffff8167b7e6>] dump_stack+0x45/0x57
[  385.270955]  [<ffffffff8107bd5a>] warn_slowpath_common+0x8a/0xc0
[  385.277660]  [<ffffffff8107be8a>] warn_slowpath_null+0x1a/0x20
[  385.284171]  [<ffffffff8119e216>] bdi_unregister+0x36/0x40
[  385.290295]  [<ffffffff812fb8c1>] del_gendisk+0x131/0x2b0
[  385.296326]  [<ffffffffa0000eba>] cleanup_mapped_device+0xda/0x130 [dm_mod]
[  385.304101]  [<ffffffffa000283b>] __dm_destroy+0x19b/0x260 [dm_mod]
[  385.311099]  [<ffffffffa00040c3>] dm_destroy+0x13/0x20 [dm_mod]
[  385.317709]  [<ffffffffa0009d0e>] dev_remove+0x11e/0x180 [dm_mod]
[  385.324516]  [<ffffffffa0009bf0>] ? dev_suspend+0x250/0x250 [dm_mod]
[  385.331614]  [<ffffffffa000a3e5>] ctl_ioctl+0x255/0x500 [dm_mod]
[  385.338319]  [<ffffffff8128c8d8>] ? SYSC_semtimedop+0x298/0xea0
[  385.344931]  [<ffffffffa000a6a3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[  385.351733]  [<ffffffff8120d038>] do_vfs_ioctl+0x2f8/0x4f0
[  385.357857]  [<ffffffff811207e4>] ? __audit_syscall_entry+0xb4/0x110
[  385.364950]  [<ffffffff8102367c>] ? do_audit_syscall_entry+0x6c/0x70
[  385.372041]  [<ffffffff8120d2b1>] SyS_ioctl+0x81/0xa0
[  385.377679]  [<ffffffff81025046>] ? syscall_trace_leave+0xc6/0x120
[  385.384578]  [<ffffffff81682f2e>] system_call_fastpath+0x12/0x71
[  385.391282] ---[ end trace af60d8ac7157d319 ]---

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -stable] block: destroy bdi before blockdev is unregistered.
  2015-04-28 16:41     ` Mike Snitzer
@ 2015-04-28 21:25       ` NeilBrown
  2015-04-29 13:35         ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2015-04-28 21:25 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, Azat Khuzhin, Christoph Hellwig,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml,
	device-mapper development

[-- Attachment #1: Type: text/plain, Size: 8089 bytes --]

On Tue, 28 Apr 2015 12:41:18 -0400 Mike Snitzer <snitzer@redhat.com> wrote:

> On Mon, Apr 27, 2015 at 12:12 AM, NeilBrown <neilb@suse.de> wrote:
> >
> > Because of the peculiar way that md devices are created (automatically
> > when the device node is opened), a new device can be created and
> > registered immediately after the
> >         blk_unregister_region(disk_devt(disk), disk->minors);
> > call in del_gendisk().
> >
> > Therefore it is important that all visible artifacts of the previous
> > device are removed before this call.  In particular, the 'bdi'.
> >
> > Since:
> > commit c4db59d31e39ea067c32163ac961e9c80198fd37
> > Author: Christoph Hellwig <hch@lst.de>
> >     fs: don't reassign dirty inodes to default_backing_dev_info
> >
> > moved the
> >    device_unregister(bdi->dev);
> > call from bdi_unregister() to bdi_destroy() it has been quite easy to
> > lose a race and have a new (e.g.) "md127" be created after the
> > blk_unregister_region() call and before bdi_destroy() is ultimately
> > called by the final 'put_disk', which must come after del_gendisk().
> >
> > The new device finds that the bdi name is already registered in sysfs
> > and complains
> >
> >> [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
> >> [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'
> >
> > We can fix this by moving the bdi_destroy() call out of
> > blk_release_queue() (which can happen very late when a refcount
> > reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
> > device driver calls it.
> >
> > Then it is only necessary for md to call blk_cleanup_queue() before
> > del_gendisk().  As loop.c devices are also created on demand by
> > opening the device node, we make the same change there.
> >
> > Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
> > Reported-by: Azat Khuzhin <a3at.mail@gmail.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: stable@vger.kernel.org (v4.0)
> > Signed-off-by: NeilBrown <neilb@suse.de>
> >
> > --
> > Hi Jens,
> >  if you could check this and forward on to Linus I'd really appreciate it.
> >
> > Thanks,
> > NeilBrown
> >
> >
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index fd154b94447a..7871603f0a29 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -552,6 +552,8 @@ void blk_cleanup_queue(struct request_queue *q)
> >                 q->queue_lock = &q->__queue_lock;
> >         spin_unlock_irq(lock);
> >
> > +       bdi_destroy(&q->backing_dev_info);
> > +
> >         /* @q is and will stay empty, shutdown and put */
> >         blk_put_queue(q);
> >  }
> > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > index faaf36ade7eb..2b8fd302f677 100644
> > --- a/block/blk-sysfs.c
> > +++ b/block/blk-sysfs.c
> > @@ -522,8 +522,6 @@ static void blk_release_queue(struct kobject *kobj)
> >
> >         blk_trace_shutdown(q);
> >
> > -       bdi_destroy(&q->backing_dev_info);
> > -
> >         ida_simple_remove(&blk_queue_ida, q->id);
> >         call_rcu(&q->rcu_head, blk_free_queue_rcu);
> >  }
> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> > index ae3fcb4199e9..d7173cb1ea76 100644
> > --- a/drivers/block/loop.c
> > +++ b/drivers/block/loop.c
> > @@ -1620,8 +1620,8 @@ out:
> >
> >  static void loop_remove(struct loop_device *lo)
> >  {
> > -       del_gendisk(lo->lo_disk);
> >         blk_cleanup_queue(lo->lo_queue);
> > +       del_gendisk(lo->lo_disk);
> >         blk_mq_free_tag_set(&lo->tag_set);
> >         put_disk(lo->lo_disk);
> >         kfree(lo);
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index d4f31e195e26..593a02476c78 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -4818,12 +4818,12 @@ static void md_free(struct kobject *ko)
> >         if (mddev->sysfs_state)
> >                 sysfs_put(mddev->sysfs_state);
> >
> > +       if (mddev->queue)
> > +               blk_cleanup_queue(mddev->queue);
> >         if (mddev->gendisk) {
> >                 del_gendisk(mddev->gendisk);
> >                 put_disk(mddev->gendisk);
> >         }
> > -       if (mddev->queue)
> > -               blk_cleanup_queue(mddev->queue);
> >
> >         kfree(mddev);
> >  }
> 
> I've taken this patch into consideration relative to DM, please see:
> http://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git/commit/?h=wip&id=5dac86fb79697e93297edb6a316b429236d00137
> 
> Point is in my private snitzer/wip branch DM now calls
> blk_cleanup_queue() before del_gendisk().
> 
> With your patch:
> 1) blk_cleanup_queue -> bdi_destroy -> bdi->dev = NULL;
> 2) del_gendisk -> bdi_unregister -> WARN_ON_ONCE(!bdi->dev)
> 
> So, in testing with DM I'm hitting bdi_unregister's WARN_ON(), are you
> seeing this WARN_ON?

Hmmm.  Yes I am.  I wonder how I missed that.  Thanks!

As bdi_set_min_ratio doesn't touch bdi->dev, there seems to be no need for
the test, or the warning.

I wonder if it would make sense to move the bdi_set_min_ratio() call to
bdi_destroy, and discard bdi_unregister??
There is a comment which suggests bdi_unregister might be of use later, but
it might be best to have a clean slate in which to add whatever might be
needed??

NeilBrown

diff --git a/block/genhd.c b/block/genhd.c
index e351fc521053..1d4435478e8a 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -657,7 +657,6 @@ void del_gendisk(struct gendisk *disk)
 	disk->flags &= ~GENHD_FL_UP;
 
 	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
-	bdi_unregister(&disk->queue->backing_dev_info);
 	blk_unregister_queue(disk);
 	blk_unregister_region(disk_devt(disk), disk->minors);
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index aff923ae8c4b..d87d8eced064 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -116,7 +116,6 @@ __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 		const char *fmt, ...);
 int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
-void bdi_unregister(struct backing_dev_info *bdi);
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
 			enum wb_reason reason);
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 880dd7437172..c178d13d6f4c 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -250,7 +250,6 @@ DEFINE_EVENT(writeback_class, name, \
 DEFINE_WRITEBACK_EVENT(writeback_nowork);
 DEFINE_WRITEBACK_EVENT(writeback_wake_background);
 DEFINE_WRITEBACK_EVENT(writeback_bdi_register);
-DEFINE_WRITEBACK_EVENT(writeback_bdi_unregister);
 
 DECLARE_EVENT_CLASS(wbc_class,
 	TP_PROTO(struct writeback_control *wbc, struct backing_dev_info *bdi),
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6dc4580df2af..000e7b3b9896 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -359,23 +359,6 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
 	flush_delayed_work(&bdi->wb.dwork);
 }
 
-/*
- * Called when the device behind @bdi has been removed or ejected.
- *
- * We can't really do much here except for reducing the dirty ratio at
- * the moment.  In the future we should be able to set a flag so that
- * the filesystem can handle errors at mark_inode_dirty time instead
- * of only at writeback time.
- */
-void bdi_unregister(struct backing_dev_info *bdi)
-{
-	if (WARN_ON_ONCE(!bdi->dev))
-		return;
-
-	bdi_set_min_ratio(bdi, 0);
-}
-EXPORT_SYMBOL(bdi_unregister);
-
 static void bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
 {
 	memset(wb, 0, sizeof(*wb));
@@ -443,6 +426,7 @@ void bdi_destroy(struct backing_dev_info *bdi)
 	int i;
 
 	bdi_wb_shutdown(bdi);
+	bdi_set_min_ratio(bdi, 0);
 
 	WARN_ON(!list_empty(&bdi->work_list));
 	WARN_ON(delayed_work_pending(&bdi->wb.dwork));

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH -stable] block: destroy bdi before blockdev is unregistered.
  2015-04-28 21:25       ` NeilBrown
@ 2015-04-29 13:35         ` Christoph Hellwig
  2015-04-29 16:02           ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2015-04-29 13:35 UTC (permalink / raw)
  To: NeilBrown
  Cc: Mike Snitzer, Jens Axboe, Azat Khuzhin, Christoph Hellwig,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml,
	device-mapper development, Peter Zijlstra

On Wed, Apr 29, 2015 at 07:25:30AM +1000, NeilBrown wrote:
> As bdi_set_min_ratio doesn't touch bdi->dev, there seems to be no need for
> the test, or the warning.
> 
> I wonder if it would make sense to move the bdi_set_min_ratio() call to
> bdi_destroy, and discard bdi_unregister??
> There is a comment which suggests bdi_unregister might be of use later, but
> it might be best to have a clean slate in which to add whatever might be
> needed??

This seems fine to me from the block dev point of view.  I don't really
understand the bdi_min_ratio logic, but Peter might have a better idea.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -stable] block: destroy bdi before blockdev is unregistered.
  2015-04-29 13:35         ` Christoph Hellwig
@ 2015-04-29 16:02           ` Peter Zijlstra
  2015-04-30  0:06             ` NeilBrown
  2015-04-30  0:32             ` [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy() NeilBrown
  0 siblings, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2015-04-29 16:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: NeilBrown, Mike Snitzer, Jens Axboe, Azat Khuzhin,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml,
	device-mapper development

On Wed, Apr 29, 2015 at 03:35:12PM +0200, Christoph Hellwig wrote:
> On Wed, Apr 29, 2015 at 07:25:30AM +1000, NeilBrown wrote:
> > As bdi_set_min_ratio doesn't touch bdi->dev, there seems to be no need for
> > the test, or the warning.
> > 
> > I wonder if it would make sense to move the bdi_set_min_ratio() call to
> > bdi_destroy, and discard bdi_unregister??
> > There is a comment which suggests bdi_unregister might be of use later, but
> > it might be best to have a clean slate in which to add whatever might be
> > needed??
> 
> This seems fine to me from the block dev point of view.  I don't really
> understand the bdi_min_ratio logic, but Peter might have a better idea.

Ah, that was a bit of digging, I've not looked at that in ages :-)

So if you look at bdi_dirty_limit()'s comment:

 * The bdi's share of dirty limit will be adapting to its throughput and
 * bounded by the bdi->min_ratio and/or bdi->max_ratio parameters, if set.

So the min_ratio is a minimum guaranteed fraction of the total
throughput.

Now the problem before commit ccb6108f5b0b ("mm/backing-dev.c: reset bdi
min_ratio in bdi_unregister()") was that since bdi_set_min_ratio()
keeps a global sum of bdi->min_ratio, you need to subtract from said
global sum when taking the BDI away. Otherwise we loose/leak a fraction
of the total throughput available (to the other BDIs).

Which is what that bdi_set_min_ratio(bdi, 0) in unregister does. It
resets the min_ratio for the bdi being taken out and frees up the min
allocated bandwidth for the others.

So I think moving that do destroy would be fine; assuming the delay
between unregister and destroy is typically 'short'. Because without
that you can 'leak' this min ratio for extended periods which means the
bandwidth is unavailable for other BDIs.

Does that make sense?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -stable] block: destroy bdi before blockdev is unregistered.
  2015-04-29 16:02           ` Peter Zijlstra
@ 2015-04-30  0:06             ` NeilBrown
  2015-04-30  0:32             ` [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy() NeilBrown
  1 sibling, 0 replies; 12+ messages in thread
From: NeilBrown @ 2015-04-30  0:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christoph Hellwig, Mike Snitzer, Jens Axboe, Azat Khuzhin,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml,
	device-mapper development

[-- Attachment #1: Type: text/plain, Size: 2458 bytes --]

On Wed, 29 Apr 2015 18:02:58 +0200 Peter Zijlstra <peterz@infradead.org>
wrote:

> On Wed, Apr 29, 2015 at 03:35:12PM +0200, Christoph Hellwig wrote:
> > On Wed, Apr 29, 2015 at 07:25:30AM +1000, NeilBrown wrote:
> > > As bdi_set_min_ratio doesn't touch bdi->dev, there seems to be no need for
> > > the test, or the warning.
> > > 
> > > I wonder if it would make sense to move the bdi_set_min_ratio() call to
> > > bdi_destroy, and discard bdi_unregister??
> > > There is a comment which suggests bdi_unregister might be of use later, but
> > > it might be best to have a clean slate in which to add whatever might be
> > > needed??
> > 
> > This seems fine to me from the block dev point of view.  I don't really
> > understand the bdi_min_ratio logic, but Peter might have a better idea.
> 
> Ah, that was a bit of digging, I've not looked at that in ages :-)
> 
> So if you look at bdi_dirty_limit()'s comment:
> 
>  * The bdi's share of dirty limit will be adapting to its throughput and
>  * bounded by the bdi->min_ratio and/or bdi->max_ratio parameters, if set.
> 
> So the min_ratio is a minimum guaranteed fraction of the total
> throughput.
> 
> Now the problem before commit ccb6108f5b0b ("mm/backing-dev.c: reset bdi
> min_ratio in bdi_unregister()") was that since bdi_set_min_ratio()
> keeps a global sum of bdi->min_ratio, you need to subtract from said
> global sum when taking the BDI away. Otherwise we loose/leak a fraction
> of the total throughput available (to the other BDIs).
> 
> Which is what that bdi_set_min_ratio(bdi, 0) in unregister does. It
> resets the min_ratio for the bdi being taken out and frees up the min
> allocated bandwidth for the others.
> 
> So I think moving that do destroy would be fine; assuming the delay
> between unregister and destroy is typically 'short'. Because without
> that you can 'leak' this min ratio for extended periods which means the
> bandwidth is unavailable for other BDIs.
> 
> Does that make sense?

Your assessment is almost exactly what I had come up with, so it definitely
makes sense :-)
'destroy' does come very shortly after 'unregister' (and immediately before
'blk_put_queue' which actually frees the struct).  However the driving force
for this patch was a desire to move blk_cleanup_queue(), which calls
'destroy', earlier.  So the net result is that bdi_set_min_ratio will be
called slightly sooner.

Thanks,
NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy()
  2015-04-29 16:02           ` Peter Zijlstra
  2015-04-30  0:06             ` NeilBrown
@ 2015-04-30  0:32             ` NeilBrown
  2015-04-30  8:35               ` Peter Zijlstra
  2015-05-06 16:11               ` [dm-devel] " Dan Williams
  1 sibling, 2 replies; 12+ messages in thread
From: NeilBrown @ 2015-04-30  0:32 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Peter Zijlstra, Christoph Hellwig, Mike Snitzer, Azat Khuzhin,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml,
	device-mapper development

[-- Attachment #1: Type: text/plain, Size: 4448 bytes --]


bdi_unregister() now contains very little functionality.

It contains a "WARN_ON" if bdi->dev is NULL.  This warning is of no
real consequence as bdi->dev isn't needed by anything else in the function,
and it triggers if
   blk_cleanup_queue() -> bdi_destroy()
is called before bdi_unregister, which a subsequent patch will make happen.
So this isn't wanted.

It also calls bdi_set_min_ratio().  This needs to be called after
writes through the bdi have all been flushed, and before the bdi is destroyed.
Calling it early is better than calling it late as it frees up a global
resource.

Calling it immediately after bdi_wb_shutdown() in bdi_destroy()
perfectly fits these requirements.

So bdi_unregister can be discarded with the important content moved to
bdi_destroy, as can the
  writeback_bdi_unregister
event which is already not used.

This is tagged for 'stable' as it is a pre-requisite for a subsequent
patch which moves calls to blk_cleanup_queue() before calls to
del_gendisk().  The commit identified as 'Fixes' removed a lot of
other functionality from bdi_unregister(), and made a change which
necessitated moving the blk_cleanup_queue() calls.

Reported-by: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org (v4.0)
Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
Signed-off-by: NeilBrown <neilb@suse.de>

---

Hi again Jens,
 would you be able to queue this patch *before* the other one:
   block: destroy bdi before blockdev is unregistered.

 If it has to come after I'll need to re-write the text a bit.
 If you could give me the commit hash to reference I'll do that.

diff --git a/block/genhd.c b/block/genhd.c
index e351fc521053..1d4435478e8a 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -657,7 +657,6 @@ void del_gendisk(struct gendisk *disk)
 	disk->flags &= ~GENHD_FL_UP;
 
 	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
-	bdi_unregister(&disk->queue->backing_dev_info);
 	blk_unregister_queue(disk);
 	blk_unregister_region(disk_devt(disk), disk->minors);
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index aff923ae8c4b..d87d8eced064 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -116,7 +116,6 @@ __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 		const char *fmt, ...);
 int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
-void bdi_unregister(struct backing_dev_info *bdi);
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
 			enum wb_reason reason);
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 880dd7437172..c178d13d6f4c 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -250,7 +250,6 @@ DEFINE_EVENT(writeback_class, name, \
 DEFINE_WRITEBACK_EVENT(writeback_nowork);
 DEFINE_WRITEBACK_EVENT(writeback_wake_background);
 DEFINE_WRITEBACK_EVENT(writeback_bdi_register);
-DEFINE_WRITEBACK_EVENT(writeback_bdi_unregister);
 
 DECLARE_EVENT_CLASS(wbc_class,
 	TP_PROTO(struct writeback_control *wbc, struct backing_dev_info *bdi),
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6dc4580df2af..000e7b3b9896 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -359,23 +359,6 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
 	flush_delayed_work(&bdi->wb.dwork);
 }
 
-/*
- * Called when the device behind @bdi has been removed or ejected.
- *
- * We can't really do much here except for reducing the dirty ratio at
- * the moment.  In the future we should be able to set a flag so that
- * the filesystem can handle errors at mark_inode_dirty time instead
- * of only at writeback time.
- */
-void bdi_unregister(struct backing_dev_info *bdi)
-{
-	if (WARN_ON_ONCE(!bdi->dev))
-		return;
-
-	bdi_set_min_ratio(bdi, 0);
-}
-EXPORT_SYMBOL(bdi_unregister);
-
 static void bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
 {
 	memset(wb, 0, sizeof(*wb));
@@ -443,6 +426,7 @@ void bdi_destroy(struct backing_dev_info *bdi)
 	int i;
 
 	bdi_wb_shutdown(bdi);
+	bdi_set_min_ratio(bdi, 0);
 
 	WARN_ON(!list_empty(&bdi->work_list));
 	WARN_ON(delayed_work_pending(&bdi->wb.dwork));

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy()
  2015-04-30  0:32             ` [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy() NeilBrown
@ 2015-04-30  8:35               ` Peter Zijlstra
  2015-05-06 16:11               ` [dm-devel] " Dan Williams
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2015-04-30  8:35 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jens Axboe, Christoph Hellwig, Mike Snitzer, Azat Khuzhin,
	Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo, Jan Kara, lkml,
	device-mapper development

On Thu, Apr 30, 2015 at 10:32:33AM +1000, NeilBrown wrote:
> 
> bdi_unregister() now contains very little functionality.
> 
> It contains a "WARN_ON" if bdi->dev is NULL.  This warning is of no
> real consequence as bdi->dev isn't needed by anything else in the function,
> and it triggers if
>    blk_cleanup_queue() -> bdi_destroy()
> is called before bdi_unregister, which a subsequent patch will make happen.
> So this isn't wanted.
> 
> It also calls bdi_set_min_ratio().  This needs to be called after
> writes through the bdi have all been flushed, and before the bdi is destroyed.
> Calling it early is better than calling it late as it frees up a global
> resource.
> 
> Calling it immediately after bdi_wb_shutdown() in bdi_destroy()
> perfectly fits these requirements.
> 
> So bdi_unregister can be discarded with the important content moved to
> bdi_destroy, as can the
>   writeback_bdi_unregister
> event which is already not used.
> 
> This is tagged for 'stable' as it is a pre-requisite for a subsequent
> patch which moves calls to blk_cleanup_queue() before calls to
> del_gendisk().  The commit identified as 'Fixes' removed a lot of
> other functionality from bdi_unregister(), and made a change which
> necessitated moving the blk_cleanup_queue() calls.
> 
> Reported-by: Mike Snitzer <snitzer@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: stable@vger.kernel.org (v4.0)
> Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37

Fixes: c4db59d31e39 ("fs: don't reassign dirty inodes to default_backing_dev_info")

> Signed-off-by: NeilBrown <neilb@suse.de>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dm-devel] [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy()
  2015-04-30  0:32             ` [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy() NeilBrown
  2015-04-30  8:35               ` Peter Zijlstra
@ 2015-05-06 16:11               ` Dan Williams
  2015-05-08  5:09                 ` [PATCH v2] " NeilBrown
  1 sibling, 1 reply; 12+ messages in thread
From: Dan Williams @ 2015-05-06 16:11 UTC (permalink / raw)
  To: device-mapper development
  Cc: Jens Axboe, Jan Kara, Mike Snitzer, Azat Khuzhin, Peter Zijlstra,
	lkml, Kernel.org-Linux-RAID, Guoqing Jiang, Tejun Heo,
	Christoph Hellwig, nicholas.w.moulin

On Wed, Apr 29, 2015 at 5:32 PM, NeilBrown <neilb@suse.de> wrote:
>
> bdi_unregister() now contains very little functionality.
>
> It contains a "WARN_ON" if bdi->dev is NULL.  This warning is of no
> real consequence as bdi->dev isn't needed by anything else in the function,
> and it triggers if
>    blk_cleanup_queue() -> bdi_destroy()
> is called before bdi_unregister, which a subsequent patch will make happen.
> So this isn't wanted.
>
> It also calls bdi_set_min_ratio().  This needs to be called after
> writes through the bdi have all been flushed, and before the bdi is destroyed.
> Calling it early is better than calling it late as it frees up a global
> resource.
>
> Calling it immediately after bdi_wb_shutdown() in bdi_destroy()
> perfectly fits these requirements.
>
> So bdi_unregister can be discarded with the important content moved to
> bdi_destroy, as can the
>   writeback_bdi_unregister
> event which is already not used.
>
> This is tagged for 'stable' as it is a pre-requisite for a subsequent
> patch which moves calls to blk_cleanup_queue() before calls to
> del_gendisk().  The commit identified as 'Fixes' removed a lot of
> other functionality from bdi_unregister(), and made a change which
> necessitated moving the blk_cleanup_queue() calls.
>
> Reported-by: Mike Snitzer <snitzer@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: stable@vger.kernel.org (v4.0)
> Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
> Signed-off-by: NeilBrown <neilb@suse.de>
>
> ---
>
> Hi again Jens,
>  would you be able to queue this patch *before* the other one:
>    block: destroy bdi before blockdev is unregistered.
>
>  If it has to come after I'll need to re-write the text a bit.
>  If you could give me the commit hash to reference I'll do that.

Seems it is after:

http://git.kernel.dk/?p=linux-block.git;a=commit;h=6cd18e71

Also, we gave both patches a try internally after seeing the duplicate
sysfs warning.  You can add:

Acked-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>

...on the re-send.

Thanks Neil!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] block: discard bdi_unregister() in favour of bdi_destroy()
  2015-05-06 16:11               ` [dm-devel] " Dan Williams
@ 2015-05-08  5:09                 ` NeilBrown
  0 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2015-05-08  5:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Dan Williams, device-mapper development, Jan Kara, Mike Snitzer,
	Azat Khuzhin, Peter Zijlstra, lkml, Kernel.org-Linux-RAID,
	Guoqing Jiang, Tejun Heo, Christoph Hellwig, nicholas.w.moulin

[-- Attachment #1: Type: text/plain, Size: 4274 bytes --]



bdi_unregister() now contains very little functionality.

It contains a "WARN_ON" if bdi->dev is NULL.  This warning is of no
real consequence as bdi->dev isn't needed by anything else in the function,
and it triggers if
   blk_cleanup_queue() -> bdi_destroy()
is called before bdi_unregister, which happens since
  Commit: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.")

So this isn't wanted.

It also calls bdi_set_min_ratio().  This needs to be called after
writes through the bdi have all been flushed, and before the bdi is destroyed.
Calling it early is better than calling it late as it frees up a global
resource.

Calling it immediately after bdi_wb_shutdown() in bdi_destroy()
perfectly fits these requirements.

So bdi_unregister() can be discarded with the important content moved to
bdi_destroy(), as can the
  writeback_bdi_unregister
event which is already not used.

Reported-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org (v4.0)
Fixes: c4db59d31e39 ("fs: don't reassign dirty inodes to default_backing_dev_info")
Fixes: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.")
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

---

hi Jens,
 this is a revised version of the comment - no code change - make it suitable to add
to your linux-block tree.

Thanks,
NeilBrown


diff --git a/block/genhd.c b/block/genhd.c
index e351fc521053..1d4435478e8a 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -657,7 +657,6 @@ void del_gendisk(struct gendisk *disk)
 	disk->flags &= ~GENHD_FL_UP;
 
 	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
-	bdi_unregister(&disk->queue->backing_dev_info);
 	blk_unregister_queue(disk);
 	blk_unregister_region(disk_devt(disk), disk->minors);
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index aff923ae8c4b..d87d8eced064 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -116,7 +116,6 @@ __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 		const char *fmt, ...);
 int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
-void bdi_unregister(struct backing_dev_info *bdi);
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
 			enum wb_reason reason);
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 880dd7437172..c178d13d6f4c 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -250,7 +250,6 @@ DEFINE_EVENT(writeback_class, name, \
 DEFINE_WRITEBACK_EVENT(writeback_nowork);
 DEFINE_WRITEBACK_EVENT(writeback_wake_background);
 DEFINE_WRITEBACK_EVENT(writeback_bdi_register);
-DEFINE_WRITEBACK_EVENT(writeback_bdi_unregister);
 
 DECLARE_EVENT_CLASS(wbc_class,
 	TP_PROTO(struct writeback_control *wbc, struct backing_dev_info *bdi),
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6dc4580df2af..000e7b3b9896 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -359,23 +359,6 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
 	flush_delayed_work(&bdi->wb.dwork);
 }
 
-/*
- * Called when the device behind @bdi has been removed or ejected.
- *
- * We can't really do much here except for reducing the dirty ratio at
- * the moment.  In the future we should be able to set a flag so that
- * the filesystem can handle errors at mark_inode_dirty time instead
- * of only at writeback time.
- */
-void bdi_unregister(struct backing_dev_info *bdi)
-{
-	if (WARN_ON_ONCE(!bdi->dev))
-		return;
-
-	bdi_set_min_ratio(bdi, 0);
-}
-EXPORT_SYMBOL(bdi_unregister);
-
 static void bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
 {
 	memset(wb, 0, sizeof(*wb));
@@ -443,6 +426,7 @@ void bdi_destroy(struct backing_dev_info *bdi)
 	int i;
 
 	bdi_wb_shutdown(bdi);
+	bdi_set_min_ratio(bdi, 0);
 
 	WARN_ON(!list_empty(&bdi->work_list));
 	WARN_ON(delayed_work_pending(&bdi->wb.dwork));

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-05-08  5:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20150414171537.GH25394@azat>
     [not found] ` <20150423160551.45345f96@notabene.brown>
2015-04-27  4:12   ` [PATCH -stable] block: destroy bdi before blockdev is unregistered NeilBrown
2015-04-27 13:03     ` Christoph Hellwig
2015-04-27 16:27     ` Jens Axboe
2015-04-28 16:41     ` Mike Snitzer
2015-04-28 21:25       ` NeilBrown
2015-04-29 13:35         ` Christoph Hellwig
2015-04-29 16:02           ` Peter Zijlstra
2015-04-30  0:06             ` NeilBrown
2015-04-30  0:32             ` [PATCH stable] block: discard bdi_unregister() in favour of bdi_destroy() NeilBrown
2015-04-30  8:35               ` Peter Zijlstra
2015-05-06 16:11               ` [dm-devel] " Dan Williams
2015-05-08  5:09                 ` [PATCH v2] " NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox