* Re: PM/hibernate swapfile regression
2009-07-21 10:50 ` PM/hibernate swapfile regression Alan Jenkins
@ 2009-07-21 13:41 ` Rafael J. Wysocki
2009-07-21 20:57 ` Rafael J. Wysocki
2009-07-21 21:55 ` Christoph Hellwig
2 siblings, 0 replies; 7+ messages in thread
From: Rafael J. Wysocki @ 2009-07-21 13:41 UTC (permalink / raw)
To: Alan Jenkins, Jens Axboe
Cc: Heiko Carstens, linux-kernel, Hans-Joachim Picht, pm list,
Arnd Schneider, linux-fsdevel
On Tuesday 21 July 2009, Alan Jenkins wrote:
> Heiko Carstens wrote:
> > On Fri, Jul 17, 2009 at 02:08:46PM +0100, Alan Jenkins wrote:
> >
> >> Rafael J. Wysocki wrote:
> >>
> >>> On Tuesday 14 July 2009, Heiko Carstens wrote:
> >>>
> >>>
> >>>> We've seen this bug [...]
> >>>>
> >>>> Looks like this was introduced with git commit a1bb7d61 "PM/hibernate: fix "swap
> >>>> breaks after hibernation failures"".
> >>>> Calling bdget while holding a spinlock doesn't seem to be a good idea...
> >>>>
> >>>>
> >>> Agreed, sorry for missing that.
> >>>
> >>> Alan, can you please prepare a fix?
> >>>
> >> Here's a quick & dirty patch [...]
> >>
> >
> > Thanks for the patch. Unfortunately Arnd was unable to reproduce the original
> > behaviour. But your patch makes sense anyway.
> > I also tested it and nothing broke. So should this go upstream?
>
> I do want to fix it, but I think there's a better way.
>
> It doesn't really need the sleeping bdget(). All we want is
> atomic_inc(&bdev->bd_inode->i_count). I think we should call it bdcopy().
Makes sense to me, but IMO that requres an ACK from Jens.
Jens, is the patch below fine with you?
Rafael
> ---------->
> From 643014ec079610a8b01dfd78c6949c1e8727195b Mon Sep 17 00:00:00 2001
> From: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
> Date: Tue, 21 Jul 2009 10:17:30 +0100
> Subject: [PATCH] PM/hibernate: replace bdget call with bdcopy (simple atomic_inc of i_count)
>
> Create bdcopy(). This function copies an existing reference to a
> block_device. It is safe to call from any context.
>
> Hibernation code wishes to copy a reference to the active swap device.
> Right now it calls bdget() under a spinlock, but this is wrong because
> bdget() can sleep. It doesn't need a full bdget() because we already
> hold a reference to active swap devices (and the spinlock protects
> against swapoff).
>
> Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
> CC: linux-fsdevel@vger.kernel.org
> ---
> fs/block_dev.c | 8 ++++++++
> include/linux/fs.h | 1 +
> mm/swapfile.c | 4 ++--
> 3 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 3a6d4fb..0b04974 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -564,6 +564,14 @@ struct block_device *bdget(dev_t dev)
>
> EXPORT_SYMBOL(bdget);
>
> +struct block_device *bdcopy(struct block_device *bdev)
> +{
> + atomic_inc(&bdev->bd_inode->i_count);
> + return bdev;
> +}
> +
> +EXPORT_SYMBOL(bdcopy);
> +
> long nr_blockdev_pages(void)
> {
> struct block_device *bdev;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 0872372..eeb1091 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1946,6 +1946,7 @@ extern void putname(const char *name);
> extern int register_blkdev(unsigned int, const char *);
> extern void unregister_blkdev(unsigned int, const char *);
> extern struct block_device *bdget(dev_t);
> +extern struct block_device *bdcopy(struct block_device *bdev);
> extern void bd_set_size(struct block_device *, loff_t size);
> extern void bd_forget(struct inode *inode);
> extern void bdput(struct block_device *);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index d1ade1a..272ea8e 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -753,7 +753,7 @@ int swap_type_of(dev_t device, sector_t offset, struct block_device **bdev_p)
>
> if (!bdev) {
> if (bdev_p)
> - *bdev_p = bdget(sis->bdev->bd_dev);
> + *bdev_p = bdcopy(sis->bdev);
>
> spin_unlock(&swap_lock);
> return i;
> @@ -765,7 +765,7 @@ int swap_type_of(dev_t device, sector_t offset, struct block_device **bdev_p)
> struct swap_extent, list);
> if (se->start_block == offset) {
> if (bdev_p)
> - *bdev_p = bdget(sis->bdev->bd_dev);
> + *bdev_p = bdcopy(sis->bdev);
>
> spin_unlock(&swap_lock);
> bdput(bdev);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PM/hibernate swapfile regression
2009-07-21 10:50 ` PM/hibernate swapfile regression Alan Jenkins
2009-07-21 13:41 ` Rafael J. Wysocki
@ 2009-07-21 20:57 ` Rafael J. Wysocki
2009-07-21 21:46 ` Johannes Weiner
2009-07-21 21:55 ` Christoph Hellwig
2 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2009-07-21 20:57 UTC (permalink / raw)
To: Alan Jenkins
Cc: Heiko Carstens, linux-kernel, Hans-Joachim Picht, pm list,
Arnd Schneider, linux-fsdevel
On Tuesday 21 July 2009, Alan Jenkins wrote:
> Heiko Carstens wrote:
> > On Fri, Jul 17, 2009 at 02:08:46PM +0100, Alan Jenkins wrote:
> >
> >> Rafael J. Wysocki wrote:
> >>
> >>> On Tuesday 14 July 2009, Heiko Carstens wrote:
> >>>
> >>>
> >>>> We've seen this bug [...]
> >>>>
> >>>> Looks like this was introduced with git commit a1bb7d61 "PM/hibernate: fix "swap
> >>>> breaks after hibernation failures"".
> >>>> Calling bdget while holding a spinlock doesn't seem to be a good idea...
> >>>>
> >>>>
> >>> Agreed, sorry for missing that.
> >>>
> >>> Alan, can you please prepare a fix?
> >>>
> >> Here's a quick & dirty patch [...]
> >>
> >
> > Thanks for the patch. Unfortunately Arnd was unable to reproduce the original
> > behaviour. But your patch makes sense anyway.
> > I also tested it and nothing broke. So should this go upstream?
>
> I do want to fix it, but I think there's a better way.
>
> It doesn't really need the sleeping bdget(). All we want is
> atomic_inc(&bdev->bd_inode->i_count). I think we should call it bdcopy().
>
> ---------->
> From 643014ec079610a8b01dfd78c6949c1e8727195b Mon Sep 17 00:00:00 2001
> From: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
> Date: Tue, 21 Jul 2009 10:17:30 +0100
> Subject: [PATCH] PM/hibernate: replace bdget call with bdcopy (simple atomic_inc of i_count)
>
> Create bdcopy(). This function copies an existing reference to a
> block_device. It is safe to call from any context.
>
> Hibernation code wishes to copy a reference to the active swap device.
> Right now it calls bdget() under a spinlock, but this is wrong because
> bdget() can sleep. It doesn't need a full bdget() because we already
> hold a reference to active swap devices (and the spinlock protects
> against swapoff).
>
> Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
> CC: linux-fsdevel@vger.kernel.org
> ---
> fs/block_dev.c | 8 ++++++++
> include/linux/fs.h | 1 +
> mm/swapfile.c | 4 ++--
> 3 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 3a6d4fb..0b04974 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -564,6 +564,14 @@ struct block_device *bdget(dev_t dev)
>
> EXPORT_SYMBOL(bdget);
>
> +struct block_device *bdcopy(struct block_device *bdev)
> +{
> + atomic_inc(&bdev->bd_inode->i_count);
> + return bdev;
> +}
Hmm. If you defined bdcopy() as static inline directly in mm/swapfile.c,
the patch would be slightly simpler.
Best,
Rafael
> +
> +EXPORT_SYMBOL(bdcopy);
> +
> long nr_blockdev_pages(void)
> {
> struct block_device *bdev;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 0872372..eeb1091 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1946,6 +1946,7 @@ extern void putname(const char *name);
> extern int register_blkdev(unsigned int, const char *);
> extern void unregister_blkdev(unsigned int, const char *);
> extern struct block_device *bdget(dev_t);
> +extern struct block_device *bdcopy(struct block_device *bdev);
> extern void bd_set_size(struct block_device *, loff_t size);
> extern void bd_forget(struct inode *inode);
> extern void bdput(struct block_device *);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index d1ade1a..272ea8e 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -753,7 +753,7 @@ int swap_type_of(dev_t device, sector_t offset, struct block_device **bdev_p)
>
> if (!bdev) {
> if (bdev_p)
> - *bdev_p = bdget(sis->bdev->bd_dev);
> + *bdev_p = bdcopy(sis->bdev);
>
> spin_unlock(&swap_lock);
> return i;
> @@ -765,7 +765,7 @@ int swap_type_of(dev_t device, sector_t offset, struct block_device **bdev_p)
> struct swap_extent, list);
> if (se->start_block == offset) {
> if (bdev_p)
> - *bdev_p = bdget(sis->bdev->bd_dev);
> + *bdev_p = bdcopy(sis->bdev);
>
> spin_unlock(&swap_lock);
> bdput(bdev);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PM/hibernate swapfile regression
2009-07-21 20:57 ` Rafael J. Wysocki
@ 2009-07-21 21:46 ` Johannes Weiner
0 siblings, 0 replies; 7+ messages in thread
From: Johannes Weiner @ 2009-07-21 21:46 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Alan Jenkins, Heiko Carstens, linux-kernel, Hans-Joachim Picht,
pm list, Arnd Schneider, linux-fsdevel
On Tue, Jul 21, 2009 at 10:57:21PM +0200, Rafael J. Wysocki wrote:
> On Tuesday 21 July 2009, Alan Jenkins wrote:
> > From 643014ec079610a8b01dfd78c6949c1e8727195b Mon Sep 17 00:00:00 2001
> > From: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
> > Date: Tue, 21 Jul 2009 10:17:30 +0100
> > Subject: [PATCH] PM/hibernate: replace bdget call with bdcopy (simple atomic_inc of i_count)
> >
> > Create bdcopy(). This function copies an existing reference to a
> > block_device. It is safe to call from any context.
> >
> > Hibernation code wishes to copy a reference to the active swap device.
> > Right now it calls bdget() under a spinlock, but this is wrong because
> > bdget() can sleep. It doesn't need a full bdget() because we already
> > hold a reference to active swap devices (and the spinlock protects
> > against swapoff).
> >
> > Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
> > CC: linux-fsdevel@vger.kernel.org
> > ---
> > fs/block_dev.c | 8 ++++++++
> > include/linux/fs.h | 1 +
> > mm/swapfile.c | 4 ++--
> > 3 files changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > index 3a6d4fb..0b04974 100644
> > --- a/fs/block_dev.c
> > +++ b/fs/block_dev.c
> > @@ -564,6 +564,14 @@ struct block_device *bdget(dev_t dev)
> >
> > EXPORT_SYMBOL(bdget);
> >
> > +struct block_device *bdcopy(struct block_device *bdev)
> > +{
> > + atomic_inc(&bdev->bd_inode->i_count);
> > + return bdev;
> > +}
>
> Hmm. If you defined bdcopy() as static inline directly in mm/swapfile.c,
> the patch would be slightly simpler.
Please don't do that.
This helper seems to be generic enough to live next to the rest of the
bd interface, don't you think? swapfile.c is a mess already...
Hannes
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PM/hibernate swapfile regression
2009-07-21 10:50 ` PM/hibernate swapfile regression Alan Jenkins
2009-07-21 13:41 ` Rafael J. Wysocki
2009-07-21 20:57 ` Rafael J. Wysocki
@ 2009-07-21 21:55 ` Christoph Hellwig
2009-07-25 21:58 ` Rafael J. Wysocki
2 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2009-07-21 21:55 UTC (permalink / raw)
To: Alan Jenkins
Cc: Rafael J. Wysocki, Heiko Carstens, linux-kernel,
Hans-Joachim Picht, pm list, Arnd Schneider, linux-fsdevel
On Tue, Jul 21, 2009 at 11:50:29AM +0100, Alan Jenkins wrote:
> +struct block_device *bdcopy(struct block_device *bdev)
> +{
> + atomic_inc(&bdev->bd_inode->i_count);
> + return bdev;
> +}
> +
> +EXPORT_SYMBOL(bdcopy);
The function name doesn't make any sense. You don't copy anything
here, but you grab a reference to it. A better name would be bdgrab,
mirroing the names of functions like igrab. A kerneldoc comment
documenting it would also be very helpful.
Why do you export it? The swapfile code is not actually modular.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PM/hibernate swapfile regression
2009-07-21 21:55 ` Christoph Hellwig
@ 2009-07-25 21:58 ` Rafael J. Wysocki
2009-07-27 5:12 ` Christoph Hellwig
0 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2009-07-25 21:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Alan Jenkins, Heiko Carstens, linux-kernel, Hans-Joachim Picht,
pm list, Arnd Schneider, linux-fsdevel
On Tuesday 21 July 2009, Christoph Hellwig wrote:
> On Tue, Jul 21, 2009 at 11:50:29AM +0100, Alan Jenkins wrote:
> > +struct block_device *bdcopy(struct block_device *bdev)
> > +{
> > + atomic_inc(&bdev->bd_inode->i_count);
> > + return bdev;
> > +}
> > +
> > +EXPORT_SYMBOL(bdcopy);
>
> The function name doesn't make any sense. You don't copy anything
> here, but you grab a reference to it. A better name would be bdgrab,
> mirroing the names of functions like igrab. A kerneldoc comment
> documenting it would also be very helpful.
>
> Why do you export it? The swapfile code is not actually modular.
Thanks for the comments. Does the one below look better?
Rafael
---
From: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Subject: PM / Hibernate: Replace bdget call with simple atomic_inc of i_count
Create bdgrab(). This function copies an existing reference to a
block_device. It is safe to call from any context.
Hibernation code wishes to copy a reference to the active swap device.
Right now it calls bdget() under a spinlock, but this is wrong because
bdget() can sleep. It doesn't need a full bdget() because we already
hold a reference to active swap devices (and the spinlock protects
against swapoff).
Signed-off-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
fs/block_dev.c | 10 ++++++++++
include/linux/fs.h | 1 +
mm/swapfile.c | 4 ++--
3 files changed, 13 insertions(+), 2 deletions(-)
Index: linux-2.6/fs/block_dev.c
===================================================================
--- linux-2.6.orig/fs/block_dev.c
+++ linux-2.6/fs/block_dev.c
@@ -564,6 +564,16 @@ struct block_device *bdget(dev_t dev)
EXPORT_SYMBOL(bdget);
+/**
+ * bdgrab -- Grab a reference to an already referenced block device
+ * @bdev: Block device to grab a reference to.
+ */
+struct block_device *bdgrab(struct block_device *bdev)
+{
+ atomic_inc(&bdev->bd_inode->i_count);
+ return bdev;
+}
+
long nr_blockdev_pages(void)
{
struct block_device *bdev;
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -1946,6 +1946,7 @@ extern void putname(const char *name);
extern int register_blkdev(unsigned int, const char *);
extern void unregister_blkdev(unsigned int, const char *);
extern struct block_device *bdget(dev_t);
+extern struct block_device *bdgrab(struct block_device *bdev);
extern void bd_set_size(struct block_device *, loff_t size);
extern void bd_forget(struct inode *inode);
extern void bdput(struct block_device *);
Index: linux-2.6/mm/swapfile.c
===================================================================
--- linux-2.6.orig/mm/swapfile.c
+++ linux-2.6/mm/swapfile.c
@@ -753,7 +753,7 @@ int swap_type_of(dev_t device, sector_t
if (!bdev) {
if (bdev_p)
- *bdev_p = bdget(sis->bdev->bd_dev);
+ *bdev_p = bdgrab(sis->bdev);
spin_unlock(&swap_lock);
return i;
@@ -765,7 +765,7 @@ int swap_type_of(dev_t device, sector_t
struct swap_extent, list);
if (se->start_block == offset) {
if (bdev_p)
- *bdev_p = bdget(sis->bdev->bd_dev);
+ *bdev_p = bdgrab(sis->bdev);
spin_unlock(&swap_lock);
bdput(bdev);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PM/hibernate swapfile regression
2009-07-25 21:58 ` Rafael J. Wysocki
@ 2009-07-27 5:12 ` Christoph Hellwig
0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2009-07-27 5:12 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Christoph Hellwig, Alan Jenkins, Heiko Carstens, linux-kernel,
Hans-Joachim Picht, pm list, Arnd Schneider, linux-fsdevel
On Sat, Jul 25, 2009 at 11:58:25PM +0200, Rafael J. Wysocki wrote:
> Thanks for the comments. Does the one below look better?
Yes, that looks much better to me.
^ permalink raw reply [flat|nested] 7+ messages in thread