* [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2)
@ 2010-05-07 7:25 Nitin Gupta
2010-05-07 7:25 ` [PATCH 1/3] Add flag to identify block swap devices Nitin Gupta
` (4 more replies)
0 siblings, 5 replies; 25+ messages in thread
From: Nitin Gupta @ 2010-05-07 7:25 UTC (permalink / raw)
To: Greg KH
Cc: Linus Torvalds, Pekka Enberg, Hugh Dickins, Cyp, Minchan Kim,
Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen, Andrew Morton,
driverdev, linux-kernel
(tested on mainline but should apply to linux-next cleanly)
* Changelog: v2 vs initial patches
- directly add swap free callback to block_device_operations
instead of using 'notifiers' for various swap events.
ramzswap driver creates RAM based block devices which can be
used (only) as swap disks. Pages swapped to these disks are
compressed and stored in memory itself.
However, these devices do not get any notification when a swap
slot is freed (swap_map[i] reaches 0). So, we cannot free memory
allocated corresponding to this swap slot. Such stale data can
quickly accumulate in (compressed) memory defeating the whole
purpose of such devices.
To overcome this problem, we now add a callback in struct
block_device_operations which is called as soon as a swap
slot is freed.
Nitin Gupta (3):
Add flag to identify block swap devices
Add swap slot free callback to block_device_operations
ramzswap: Handler for swap slot free callback
drivers/staging/ramzswap/TODO | 5 -----
drivers/staging/ramzswap/ramzswap_drv.c | 22 +++++++++++++---------
include/linux/blkdev.h | 2 ++
include/linux/swap.h | 1 +
mm/swapfile.c | 5 +++++
5 files changed, 21 insertions(+), 14 deletions(-)
delete mode 100644 drivers/staging/ramzswap/TODO
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/3] Add flag to identify block swap devices
2010-05-07 7:25 [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2) Nitin Gupta
@ 2010-05-07 7:25 ` Nitin Gupta
2010-05-07 8:03 ` jassi brar
2010-05-07 9:32 ` Nigel Cunningham
2010-05-07 7:25 ` [PATCH 2/3] Add swap slot free callback to block_device_operations Nitin Gupta
` (3 subsequent siblings)
4 siblings, 2 replies; 25+ messages in thread
From: Nitin Gupta @ 2010-05-07 7:25 UTC (permalink / raw)
To: Greg KH
Cc: Linus Torvalds, Pekka Enberg, Hugh Dickins, Cyp, Minchan Kim,
Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen, Andrew Morton,
driverdev, linux-kernel
Added SWP_BLKDEV flag to distinguish block and regular file backed
swap devices. We could also check if a swap is entire block device,
rather than a file, by:
S_ISBLK(swap_info_struct->swap_file->f_mapping->host->i_mode)
but, I think, simply checking this flag is more convenient.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
include/linux/swap.h | 1 +
mm/swapfile.c | 1 +
2 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1f59d93..ec2b7a4 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -146,6 +146,7 @@ enum {
SWP_DISCARDING = (1 << 3), /* now discarding a free cluster */
SWP_SOLIDSTATE = (1 << 4), /* blkdev seeks are cheap */
SWP_CONTINUED = (1 << 5), /* swap_map has count continuation */
+ SWP_BLKDEV = (1 << 6), /* its a block device */
/* add others here before... */
SWP_SCANNING = (1 << 8), /* refcount in scan_swap_map */
};
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 6cd0a8f..ecb069e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1884,6 +1884,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
if (error < 0)
goto bad_swap;
p->bdev = bdev;
+ p->flags |= SWP_BLKDEV;
} else if (S_ISREG(inode->i_mode)) {
p->bdev = inode->i_sb->s_bdev;
mutex_lock(&inode->i_mutex);
--
1.6.6.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 2/3] Add swap slot free callback to block_device_operations
2010-05-07 7:25 [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2) Nitin Gupta
2010-05-07 7:25 ` [PATCH 1/3] Add flag to identify block swap devices Nitin Gupta
@ 2010-05-07 7:25 ` Nitin Gupta
2010-05-07 9:22 ` Nigel Cunningham
2010-05-07 7:25 ` [PATCH 3/3] ramzswap: Handler for swap slot free callback Nitin Gupta
` (2 subsequent siblings)
4 siblings, 1 reply; 25+ messages in thread
From: Nitin Gupta @ 2010-05-07 7:25 UTC (permalink / raw)
To: Greg KH
Cc: Linus Torvalds, Pekka Enberg, Hugh Dickins, Cyp, Minchan Kim,
Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen, Andrew Morton,
driverdev, linux-kernel
This callback is required when RAM based devices are used as swap disks.
One such device is ramzswap which is used as compressed in-memory swap
disk. For such devices, we need a callback as soon as a swap slot is no
longer used to allow freeing memory allocated for this slot. Without this
callback, stale data can quickly accumulate in memory defeating the whole
purpose of such devices.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
include/linux/blkdev.h | 2 ++
mm/swapfile.c | 4 ++++
2 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6690e8b..413284a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1287,6 +1287,8 @@ struct block_device_operations {
unsigned long long);
int (*revalidate_disk) (struct gendisk *);
int (*getgeo)(struct block_device *, struct hd_geometry *);
+ /* this callback is with swap_lock and sometimes page table lock held */
+ void (*swap_slot_free_notify) (struct block_device *, unsigned long);
struct module *owner;
};
diff --git a/mm/swapfile.c b/mm/swapfile.c
index ecb069e..f5ccc47 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -574,6 +574,7 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
/* free if no reference */
if (!usage) {
+ struct gendisk *disk = p->bdev->bd_disk;
if (offset < p->lowest_bit)
p->lowest_bit = offset;
if (offset > p->highest_bit)
@@ -583,6 +584,9 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
swap_list.next = p->type;
nr_swap_pages++;
p->inuse_pages--;
+ if ((p->flags & SWP_BLKDEV) &&
+ disk->fops->swap_slot_free_notify)
+ disk->fops->swap_slot_free_notify(p->bdev, offset);
}
return usage;
--
1.6.6.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 3/3] ramzswap: Handler for swap slot free callback
2010-05-07 7:25 [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2) Nitin Gupta
2010-05-07 7:25 ` [PATCH 1/3] Add flag to identify block swap devices Nitin Gupta
2010-05-07 7:25 ` [PATCH 2/3] Add swap slot free callback to block_device_operations Nitin Gupta
@ 2010-05-07 7:25 ` Nitin Gupta
2010-05-07 7:44 ` [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2) Pekka Enberg
2010-05-07 19:55 ` Andrew Morton
4 siblings, 0 replies; 25+ messages in thread
From: Nitin Gupta @ 2010-05-07 7:25 UTC (permalink / raw)
To: Greg KH
Cc: Linus Torvalds, Pekka Enberg, Hugh Dickins, Cyp, Minchan Kim,
Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen, Andrew Morton,
driverdev, linux-kernel
Install handler for swap_slot_free_notify callback which is called
when a swap slot is no longer used. This handler immediately frees
memory allocated corresponding to the given swap slot.
Signed-off-by: Nitin Gupta
---
drivers/staging/ramzswap/TODO | 5 -----
drivers/staging/ramzswap/ramzswap_drv.c | 22 +++++++++++++---------
2 files changed, 13 insertions(+), 14 deletions(-)
delete mode 100644 drivers/staging/ramzswap/TODO
diff --git a/drivers/staging/ramzswap/TODO b/drivers/staging/ramzswap/TODO
deleted file mode 100644
index 8d64e28..0000000
--- a/drivers/staging/ramzswap/TODO
+++ /dev/null
@@ -1,5 +0,0 @@
-TODO:
- - Add support for swap notifiers
-
-Please send patches to Greg Kroah-Hartman <greg@kroah.com> and
-Nitin Gupta <ngupta@vflare.org>
diff --git a/drivers/staging/ramzswap/ramzswap_drv.c b/drivers/staging/ramzswap/ramzswap_drv.c
index ee5eb12..ab15276 100644
--- a/drivers/staging/ramzswap/ramzswap_drv.c
+++ b/drivers/staging/ramzswap/ramzswap_drv.c
@@ -795,14 +795,6 @@ static int ramzswap_write(struct ramzswap *rzs, struct bio *bio)
src = rzs->compress_buffer;
- /*
- * System swaps to same sector again when the stored page
- * is no longer referenced by any process. So, its now safe
- * to free the memory that was allocated for this page.
- */
- if (rzs->table[index].page || rzs_test_flag(rzs, index, RZS_ZERO))
- ramzswap_free_page(rzs, index);
-
mutex_lock(&rzs->lock);
user_mem = kmap_atomic(page, KM_USER0);
@@ -1295,9 +1287,21 @@ out:
return ret;
}
+void ramzswap_slot_free_notify(struct block_device *bdev, unsigned long index)
+{
+ struct ramzswap *rzs;
+
+ rzs = bdev->bd_disk->private_data;
+ ramzswap_free_page(rzs, index);
+ rzs_stat64_inc(rzs, &rzs->stats.notify_free);
+
+ return;
+}
+
static struct block_device_operations ramzswap_devops = {
.ioctl = ramzswap_ioctl,
- .owner = THIS_MODULE,
+ .swap_slot_free_notify = ramzswap_slot_free_notify,
+ .owner = THIS_MODULE
};
static int create_device(struct ramzswap *rzs, int device_id)
--
1.6.6.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2)
2010-05-07 7:25 [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2) Nitin Gupta
` (2 preceding siblings ...)
2010-05-07 7:25 ` [PATCH 3/3] ramzswap: Handler for swap slot free callback Nitin Gupta
@ 2010-05-07 7:44 ` Pekka Enberg
2010-05-07 14:51 ` Linus Torvalds
2010-05-07 19:55 ` Andrew Morton
4 siblings, 1 reply; 25+ messages in thread
From: Pekka Enberg @ 2010-05-07 7:44 UTC (permalink / raw)
To: Nitin Gupta
Cc: Greg KH, Linus Torvalds, Hugh Dickins, Cyp, Minchan Kim, Al Viro,
Christoph Hellwig, Jens Axboe, Andi Kleen, Andrew Morton,
driverdev, linux-kernel
Nitin Gupta kirjoitti:
> (tested on mainline but should apply to linux-next cleanly)
>
> * Changelog: v2 vs initial patches
> - directly add swap free callback to block_device_operations
> instead of using 'notifiers' for various swap events.
>
> ramzswap driver creates RAM based block devices which can be
> used (only) as swap disks. Pages swapped to these disks are
> compressed and stored in memory itself.
>
> However, these devices do not get any notification when a swap
> slot is freed (swap_map[i] reaches 0). So, we cannot free memory
> allocated corresponding to this swap slot. Such stale data can
> quickly accumulate in (compressed) memory defeating the whole
> purpose of such devices.
>
> To overcome this problem, we now add a callback in struct
> block_device_operations which is called as soon as a swap
> slot is freed.
>
> Nitin Gupta (3):
> Add flag to identify block swap devices
> Add swap slot free callback to block_device_operations
> ramzswap: Handler for swap slot free callback
>
> drivers/staging/ramzswap/TODO | 5 -----
> drivers/staging/ramzswap/ramzswap_drv.c | 22 +++++++++++++---------
> include/linux/blkdev.h | 2 ++
> include/linux/swap.h | 1 +
> mm/swapfile.c | 5 +++++
> 5 files changed, 21 insertions(+), 14 deletions(-)
> delete mode 100644 drivers/staging/ramzswap/TODO
The series looks good to me:
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/3] Add flag to identify block swap devices
2010-05-07 7:25 ` [PATCH 1/3] Add flag to identify block swap devices Nitin Gupta
@ 2010-05-07 8:03 ` jassi brar
2010-05-07 8:16 ` Nitin Gupta
2010-05-07 9:32 ` Nigel Cunningham
1 sibling, 1 reply; 25+ messages in thread
From: jassi brar @ 2010-05-07 8:03 UTC (permalink / raw)
To: Nitin Gupta
Cc: Greg KH, Linus Torvalds, Pekka Enberg, Hugh Dickins, Cyp,
Minchan Kim, linux-kernel, Al Viro
On Fri, May 7, 2010 at 4:25 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> Added SWP_BLKDEV flag to distinguish block and regular file backed
> swap devices. We could also check if a swap is entire block device,
> rather than a file, by:
> S_ISBLK(swap_info_struct->swap_file->f_mapping->host->i_mode)
> but, I think, simply checking this flag is more convenient.
This might make it convenient for now but is likely to increase complexity and
redundancy. Why not define a macro/inline to figure that out?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/3] Add flag to identify block swap devices
2010-05-07 8:03 ` jassi brar
@ 2010-05-07 8:16 ` Nitin Gupta
2010-05-07 8:56 ` jassi brar
0 siblings, 1 reply; 25+ messages in thread
From: Nitin Gupta @ 2010-05-07 8:16 UTC (permalink / raw)
To: jassi brar
Cc: Greg KH, Linus Torvalds, Pekka Enberg, Hugh Dickins, Cyp,
Minchan Kim, linux-kernel, Al Viro
On 05/07/2010 01:33 PM, jassi brar wrote:
> On Fri, May 7, 2010 at 4:25 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>> Added SWP_BLKDEV flag to distinguish block and regular file backed
>> swap devices. We could also check if a swap is entire block device,
>> rather than a file, by:
>> S_ISBLK(swap_info_struct->swap_file->f_mapping->host->i_mode)
>> but, I think, simply checking this flag is more convenient.
> This might make it convenient for now but is likely to increase complexity and
> redundancy. Why not define a macro/inline to figure that out?
>
Accessing such long pointer chain is maybe not good thing to do for
every swap_entry_free() call? Simple checking a flag is perhaps slightly
faster? I also can't see how this flag can later increase complexity
compared to creating new macro for this check.
Thanks,
Nitin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/3] Add flag to identify block swap devices
2010-05-07 8:16 ` Nitin Gupta
@ 2010-05-07 8:56 ` jassi brar
2010-05-07 9:24 ` Pekka Enberg
0 siblings, 1 reply; 25+ messages in thread
From: jassi brar @ 2010-05-07 8:56 UTC (permalink / raw)
To: ngupta
Cc: Greg KH, Linus Torvalds, Pekka Enberg, Hugh Dickins, Cyp,
Minchan Kim, linux-kernel, Al Viro
On Fri, May 7, 2010 at 5:16 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> On 05/07/2010 01:33 PM, jassi brar wrote:
>> On Fri, May 7, 2010 at 4:25 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>>> Added SWP_BLKDEV flag to distinguish block and regular file backed
>>> swap devices. We could also check if a swap is entire block device,
>>> rather than a file, by:
>>> S_ISBLK(swap_info_struct->swap_file->f_mapping->host->i_mode)
>>> but, I think, simply checking this flag is more convenient.
>> This might make it convenient for now but is likely to increase complexity and
>> redundancy. Why not define a macro/inline to figure that out?
>>
>
> Accessing such long pointer chain is maybe not good thing to do for
> every swap_entry_free() call? Simple checking a flag is perhaps slightly
> faster? I also can't see how this flag can later increase complexity
> compared to creating new macro for this check.
I am not sure about effects on the speed, that needs to be seen.
But here are points against using a new flag....
a) Defining a new flag hogs up precious real-estate of remaining just 2 bits
(apparently the new flags are to go before SWP_SCANNING)
b) Over time, some dependent code might evolve to depend upon this flag, while
others might use the indirection to deduct if it is block device
or not. That
will make it complicated to make any relevant change in future as we'll have
to keep track of more than one way to check the status. Creating a new
macro doesn't create a new path to reach the status, but a FLAG does.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] Add swap slot free callback to block_device_operations
2010-05-07 7:25 ` [PATCH 2/3] Add swap slot free callback to block_device_operations Nitin Gupta
@ 2010-05-07 9:22 ` Nigel Cunningham
2010-05-07 9:48 ` Nitin Gupta
0 siblings, 1 reply; 25+ messages in thread
From: Nigel Cunningham @ 2010-05-07 9:22 UTC (permalink / raw)
To: Nitin Gupta
Cc: Greg KH, Linus Torvalds, Pekka Enberg, Hugh Dickins, Cyp,
Minchan Kim, Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen,
Andrew Morton, driverdev, linux-kernel
Hi.
On 07/05/10 17:25, Nitin Gupta wrote:
> This callback is required when RAM based devices are used as swap disks.
> One such device is ramzswap which is used as compressed in-memory swap
> disk. For such devices, we need a callback as soon as a swap slot is no
> longer used to allow freeing memory allocated for this slot. Without this
> callback, stale data can quickly accumulate in memory defeating the whole
> purpose of such devices.
>
> Signed-off-by: Nitin Gupta<ngupta@vflare.org>
> ---
> include/linux/blkdev.h | 2 ++
> mm/swapfile.c | 4 ++++
> 2 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 6690e8b..413284a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1287,6 +1287,8 @@ struct block_device_operations {
> unsigned long long);
> int (*revalidate_disk) (struct gendisk *);
> int (*getgeo)(struct block_device *, struct hd_geometry *);
> + /* this callback is with swap_lock and sometimes page table lock held */
> + void (*swap_slot_free_notify) (struct block_device *, unsigned long);
> struct module *owner;
> };
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index ecb069e..f5ccc47 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -574,6 +574,7 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
>
> /* free if no reference */
> if (!usage) {
> + struct gendisk *disk = p->bdev->bd_disk;
> if (offset< p->lowest_bit)
> p->lowest_bit = offset;
> if (offset> p->highest_bit)
> @@ -583,6 +584,9 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
> swap_list.next = p->type;
> nr_swap_pages++;
> p->inuse_pages--;
> + if ((p->flags& SWP_BLKDEV)&&
> + disk->fops->swap_slot_free_notify)
> + disk->fops->swap_slot_free_notify(p->bdev, offset);
Is this p->flags & SWP_BLKDEV logic reversed? (Don't you want the
notifier called for devices that aren't backed by a block device?)
I also wonder whether leaving the p->flags & SWP_BLKDEV part out might
be a good idea. Other potential notifier users?
Regards,
Nigel
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/3] Add flag to identify block swap devices
2010-05-07 8:56 ` jassi brar
@ 2010-05-07 9:24 ` Pekka Enberg
0 siblings, 0 replies; 25+ messages in thread
From: Pekka Enberg @ 2010-05-07 9:24 UTC (permalink / raw)
To: jassi brar
Cc: ngupta, Greg KH, Linus Torvalds, Hugh Dickins, Cyp, Minchan Kim,
linux-kernel, Al Viro
On Fri, May 7, 2010 at 5:16 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>> On 05/07/2010 01:33 PM, jassi brar wrote:
>>> On Fri, May 7, 2010 at 4:25 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>>>> Added SWP_BLKDEV flag to distinguish block and regular file backed
>>>> swap devices. We could also check if a swap is entire block device,
>>>> rather than a file, by:
>>>> S_ISBLK(swap_info_struct->swap_file->f_mapping->host->i_mode)
>>>> but, I think, simply checking this flag is more convenient.
>>> This might make it convenient for now but is likely to increase complexity and
>>> redundancy. Why not define a macro/inline to figure that out?
>>>
>>
>> Accessing such long pointer chain is maybe not good thing to do for
>> every swap_entry_free() call? Simple checking a flag is perhaps slightly
>> faster? I also can't see how this flag can later increase complexity
>> compared to creating new macro for this check.
>
> I am not sure about effects on the speed, that needs to be seen.
> But here are points against using a new flag....
On Fri, May 7, 2010 at 11:56 AM, jassi brar <jassisinghbrar@gmail.com> wrote:
> a) Defining a new flag hogs up precious real-estate of remaining just 2 bits
> (apparently the new flags are to go before SWP_SCANNING)
>
> b) Over time, some dependent code might evolve to depend upon this flag, while
> others might use the indirection to deduct if it is block device
> or not. That
> will make it complicated to make any relevant change in future as we'll have
> to keep track of more than one way to check the status. Creating a new
> macro doesn't create a new path to reach the status, but a FLAG does.
I think the flag is cleaner and I'm not convinced by the above
speculative arguments that we should switch to a static inline
function.
Hugh, Linus, would you mind voicing your opinion which way to go here?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/3] Add flag to identify block swap devices
2010-05-07 7:25 ` [PATCH 1/3] Add flag to identify block swap devices Nitin Gupta
2010-05-07 8:03 ` jassi brar
@ 2010-05-07 9:32 ` Nigel Cunningham
1 sibling, 0 replies; 25+ messages in thread
From: Nigel Cunningham @ 2010-05-07 9:32 UTC (permalink / raw)
To: Nitin Gupta
Cc: Greg KH, Linus Torvalds, Pekka Enberg, Hugh Dickins, Cyp,
Minchan Kim, Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen,
Andrew Morton, driverdev, linux-kernel
Hi again.
On 07/05/10 17:25, Nitin Gupta wrote:
> Added SWP_BLKDEV flag to distinguish block and regular file backed
> swap devices. We could also check if a swap is entire block device,
> rather than a file, by:
> S_ISBLK(swap_info_struct->swap_file->f_mapping->host->i_mode)
> but, I think, simply checking this flag is more convenient.
>
> Signed-off-by: Nitin Gupta<ngupta@vflare.org>
> ---
> include/linux/swap.h | 1 +
> mm/swapfile.c | 1 +
> 2 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 1f59d93..ec2b7a4 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -146,6 +146,7 @@ enum {
> SWP_DISCARDING = (1<< 3), /* now discarding a free cluster */
> SWP_SOLIDSTATE = (1<< 4), /* blkdev seeks are cheap */
> SWP_CONTINUED = (1<< 5), /* swap_map has count continuation */
> + SWP_BLKDEV = (1<< 6), /* its a block device */
> /* add others here before... */
> SWP_SCANNING = (1<< 8), /* refcount in scan_swap_map */
> };
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 6cd0a8f..ecb069e 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1884,6 +1884,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> if (error< 0)
> goto bad_swap;
> p->bdev = bdev;
> + p->flags |= SWP_BLKDEV;
> } else if (S_ISREG(inode->i_mode)) {
> p->bdev = inode->i_sb->s_bdev;
> mutex_lock(&inode->i_mutex);
The more I read your patches, the more I think either I'm seriously
confused (entirely possible!) or you are.
Don't you want to distinguish RAM backed swap from swap that's either a
partition or a file? If that's the case, you should also be setting
SWP_BLKDEV in the S_ISREG part that follows iff the p->bdev is a regular
file.
Regards,
Nigel
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] Add swap slot free callback to block_device_operations
2010-05-07 9:22 ` Nigel Cunningham
@ 2010-05-07 9:48 ` Nitin Gupta
2010-05-07 10:40 ` Nigel Cunningham
0 siblings, 1 reply; 25+ messages in thread
From: Nitin Gupta @ 2010-05-07 9:48 UTC (permalink / raw)
To: nigel
Cc: Greg KH, Linus Torvalds, Pekka Enberg, Hugh Dickins, Cyp,
Minchan Kim, Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen,
Andrew Morton, driverdev, linux-kernel
On 05/07/2010 02:52 PM, Nigel Cunningham wrote:
> Hi.
>
> On 07/05/10 17:25, Nitin Gupta wrote:
>> This callback is required when RAM based devices are used as swap disks.
>> One such device is ramzswap which is used as compressed in-memory swap
>> disk. For such devices, we need a callback as soon as a swap slot is no
>> longer used to allow freeing memory allocated for this slot. Without
>> this
>> callback, stale data can quickly accumulate in memory defeating the whole
>> purpose of such devices.
>>
>> Signed-off-by: Nitin Gupta<ngupta@vflare.org>
>> ---
>> include/linux/blkdev.h | 2 ++
>> mm/swapfile.c | 4 ++++
>> 2 files changed, 6 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 6690e8b..413284a 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -1287,6 +1287,8 @@ struct block_device_operations {
>> unsigned long long);
>> int (*revalidate_disk) (struct gendisk *);
>> int (*getgeo)(struct block_device *, struct hd_geometry *);
>> + /* this callback is with swap_lock and sometimes page table lock
>> held */
>> + void (*swap_slot_free_notify) (struct block_device *, unsigned
>> long);
>> struct module *owner;
>> };
>>
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index ecb069e..f5ccc47 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -574,6 +574,7 @@ static unsigned char swap_entry_free(struct
>> swap_info_struct *p,
>>
>> /* free if no reference */
>> if (!usage) {
>> + struct gendisk *disk = p->bdev->bd_disk;
>> if (offset< p->lowest_bit)
>> p->lowest_bit = offset;
>> if (offset> p->highest_bit)
>> @@ -583,6 +584,9 @@ static unsigned char swap_entry_free(struct
>> swap_info_struct *p,
>> swap_list.next = p->type;
>> nr_swap_pages++;
>> p->inuse_pages--;
>> + if ((p->flags& SWP_BLKDEV)&&
>> + disk->fops->swap_slot_free_notify)
>> + disk->fops->swap_slot_free_notify(p->bdev, offset);
>
> Is this p->flags & SWP_BLKDEV logic reversed? (Don't you want the
> notifier called for devices that aren't backed by a block device?)
>
No, the logic here is correct: ramzswap is a block device for which
we want this callback. Though its a RAM backed, it is still a block
device.
(I hope it answers your question in the other mail also).
> I also wonder whether leaving the p->flags & SWP_BLKDEV part out might
> be a good idea. Other potential notifier users?
>
For regular files, 'offset' used here makes little sense. For block devices,
its simply offset in real device. Also, I doubt if *files* would ever
like to have such a callback.
Thanks,
Nitin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] Add swap slot free callback to block_device_operations
2010-05-07 9:48 ` Nitin Gupta
@ 2010-05-07 10:40 ` Nigel Cunningham
0 siblings, 0 replies; 25+ messages in thread
From: Nigel Cunningham @ 2010-05-07 10:40 UTC (permalink / raw)
To: ngupta
Cc: Greg KH, Linus Torvalds, Pekka Enberg, Hugh Dickins, Cyp,
Minchan Kim, Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen,
Andrew Morton, driverdev, linux-kernel
Hi.
On 07/05/10 19:48, Nitin Gupta wrote:
>>> + if ((p->flags& SWP_BLKDEV)&&
>>> + disk->fops->swap_slot_free_notify)
>>> + disk->fops->swap_slot_free_notify(p->bdev, offset);
>>
>> Is this p->flags& SWP_BLKDEV logic reversed? (Don't you want the
>> notifier called for devices that aren't backed by a block device?)
>>
>
> No, the logic here is correct: ramzswap is a block device for which
> we want this callback. Though its a RAM backed, it is still a block
> device.
>
> (I hope it answers your question in the other mail also).
It does; thanks.
>> I also wonder whether leaving the p->flags& SWP_BLKDEV part out might
>> be a good idea. Other potential notifier users?
>>
>
> For regular files, 'offset' used here makes little sense. For block devices,
> its simply offset in real device. Also, I doubt if *files* would ever
> like to have such a callback.
Okay.
Entire series Acked-by: Nigel Cunningham <nigel@tuxonice.net>
Thanks.
Nigel
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2)
2010-05-07 7:44 ` [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2) Pekka Enberg
@ 2010-05-07 14:51 ` Linus Torvalds
0 siblings, 0 replies; 25+ messages in thread
From: Linus Torvalds @ 2010-05-07 14:51 UTC (permalink / raw)
To: Pekka Enberg
Cc: Nitin Gupta, Greg KH, Hugh Dickins, Cyp, Minchan Kim, Al Viro,
Christoph Hellwig, Jens Axboe, Andi Kleen, Andrew Morton,
driverdev, linux-kernel
On Fri, 7 May 2010, Pekka Enberg wrote:
>
> The series looks good to me:
>
> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Yeah, I think I'm finally ok with it too.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Thanks,
Linus
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2)
2010-05-07 7:25 [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2) Nitin Gupta
` (3 preceding siblings ...)
2010-05-07 7:44 ` [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2) Pekka Enberg
@ 2010-05-07 19:55 ` Andrew Morton
2010-05-08 4:05 ` Nitin Gupta
2010-05-08 6:29 ` Pekka Enberg
4 siblings, 2 replies; 25+ messages in thread
From: Andrew Morton @ 2010-05-07 19:55 UTC (permalink / raw)
To: Nitin Gupta
Cc: Greg KH, Linus Torvalds, Pekka Enberg, Hugh Dickins, Cyp,
Minchan Kim, Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen,
driverdev, linux-kernel
On Fri, 7 May 2010 12:55:04 +0530
Nitin Gupta <ngupta@vflare.org> wrote:
> (tested on mainline but should apply to linux-next cleanly)
>
> * Changelog: v2 vs initial patches
> - directly add swap free callback to block_device_operations
> instead of using 'notifiers' for various swap events.
>
> ramzswap driver creates RAM based block devices which can be
> used (only) as swap disks. Pages swapped to these disks are
> compressed and stored in memory itself.
>
> However, these devices do not get any notification when a swap
> slot is freed (swap_map[i] reaches 0). So, we cannot free memory
> allocated corresponding to this swap slot. Such stale data can
> quickly accumulate in (compressed) memory defeating the whole
> purpose of such devices.
>
> To overcome this problem, we now add a callback in struct
> block_device_operations which is called as soon as a swap
> slot is freed.
>
> Nitin Gupta (3):
> Add flag to identify block swap devices
> Add swap slot free callback to block_device_operations
> ramzswap: Handler for swap slot free callback
>
> drivers/staging/ramzswap/TODO | 5 -----
> drivers/staging/ramzswap/ramzswap_drv.c | 22 +++++++++++++---------
> include/linux/blkdev.h | 2 ++
> include/linux/swap.h | 1 +
> mm/swapfile.c | 5 +++++
I didn't even realise that ramzswap had snuck in via the staging
backdoor.
<hasty review>
Looking at the changelogs I'm seeing no information about the
effectiveness of ramzswap - how much memory it saves. As that's the
entire point of the driver, that would be a rather important thing to
have included in the commit comments. We cannot make the decision to
merge ramzswap without this info.
The various lengthy pr_info() messages could do with some attention
from a native English speaker (sorry).
setup_swap_header() should use kmap_atomic().
The driver appears to be controlled by some nasty-looking ioctl against
some fd. None of it is documented anywhere. It should be. You're
proposing here a permanent extension to the kernel ABI which we will
need to maintain for ever. That's a big deal and it is the very first
thing reviewers will look at, before even considering the code.
The compat implications of the ioctl args will need to be examined.
RZSIO_GET_STATS looks to be hopeless from a long-term maintainability
POV. It's debug code and it would be better to move it into a debugfs
file, where we can then add and remove things at will.
The code would benefit from a lot more comments. For example, take
add_backing_swap_extent(). What is its role in the overall operation
of the driver? What are its locking preconditions? The meaning of its
return value? Generally, aim to be telling the reader the things whcih
he needs to know and which aren't obvious from the code itself.
add_backing_swap_extent() does an alloc_page(__GFP_ZERO). That means
it's GFP_ATOMIC and not __GFP_HIGH. But afacit it could have used the
much stronger GFP_KERNEL. Was the lakc of GFP_KERNEL deliberate?
There's no way for me to tell due to the lack of comments. If it _was_
deliberate then there should be a comment telling me why. If it wasn't
deliberate then, well, bug.
ramzswap_drv.h could use the __aligned() macro rather than that
__attribute__(()) mouthful.
The comment says that ramzswap_backing_extent "must fit exactly in a
page", but doesn't tell us why. It seems an odd requirement.
I wasn't able to determine the locking for
ramzswap.backing_swap_extent_list. Is there any? The definition site
would be an appropriate place to document this.
The CONFIG_RAMZSWAP_STATS=n definitions of rzs_stat_inc() and friends
should be inlined C functions, not macros. This can help to prevent
unused-variable warnings, it looks nicer and provides better
typechecking.
How does this driver actually *work*? I'm not seeing a description in
the code or the commit logs. setup_backing_swap_extents() appears to
be setting up some sort of in-memory layout on an existing regular file
in response to an ioctl? What are the requirements on that file?
Perahps its size must be greater-than-or-equal-to some argument which
was passed to that ioctl? Dunno. Can that file be sparse? Judging
from the use of bmap() I'd say "no".
The driver appears to create a /dev node called "ramzswap". If so, I
should be able to run mke2fs on it and cheerily use it as a regular
filesystem, should I not? If so then the entire driver is not
swap-specific at all and there should be no mention of "swap" anywhere!
ramzblock would be more appropriate.
Whoa. We appear to pass a filename into the driver via the ioctl and
then the driver does an internal filp_open(). Seems odd - it would be
more idiomatic to have userspace open the file and pass in the fd.
This sort of fundamental ABI design issue would be much better
discussed within the context of an English-language design description,
rather than by grovelling through undocumented C code.
The driver is overloading page->mapping for its own use. Beware that
many parts of the VM heavily overload existing page fields in many
weird and wonderful ways, because the pageframe is so space-critical.
Having a driver also overloading page fields needs to be considered in
the context of core kernel's activities, current and potentially in the
future.
page_address() returns a void*. Hence it is unnecessary and
undesirable that its return be typecast when assigned to another
pointer.
The driver only supports a single block device, kernel-wide?
ramzswap.table is a vmalloced array of `struct table' objects. The
code looks up the address on one `struct table' via rzs->table[pagenum]
and then passes the address of that struct into vmalloc_to_page(), thus
extracting a pointer to the pageframe which repsresents the page which
contains the indexed `struct table'. This is weird. Why should the
driver care about the outer `struct page' which happens to comtain the
indexed `struct table'? Oh. To fiddle with that page's ->mapping.
Which is presumably shared between all the `struct table's which are
contained within that page. Really, I shouldn't have to
reverse-engineer all of this to understand it. And if I don't
understand it, I cannot really review it, nor suggest alternatives.
map_backing_swap_page() contains c99-style local definitions right in
the middle of the code. That's a thing which we've been avoiding in
the kernel. The compiler should have warned - perhaps someone broke
the makefiles.
I've completely forgotten why we need this xvmalloc thing and I don't
recall whether we decided it would be a good thing to have as a generic
facility and of course it's all unexplained and undocumented. I won't
be looking at it today, for this reason.
handle_zero_page() should use clear_page(). Actually zero_user() will fit.
handle_uncompressed_page() can probably use copy_user_highpage().
Local variable `pagenum' in handle_ramzswap_fault() is unused.
Lack of comments over ramzswap_read() is irritating.
ramzswap_read(): if /* should NEVER happen */ _does_ happen then we
should return an error.
Am unable to determine what ramzswap.lock locks.
This decision:
if (rzs->backing_swap)
ramzswap_set_memlimit(rzs, totalram_pages << PAGE_SHIFT);
else
ramzswap_set_disksize(rzs, totalram_pages << PAGE_SHIFT);
looks pretty important. But I don't know what it's all doing.
Magic constants go in magic.h. It is unusual to use a string for a
magic constant. If it _has_ to be a string then don't put it in magic.h.
I'll give up there.
The overall idea and utility appear to be good and desirable, IMO. But
the code isn't productively reviewable in this state.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2)
2010-05-07 19:55 ` Andrew Morton
@ 2010-05-08 4:05 ` Nitin Gupta
2010-05-08 6:29 ` Pekka Enberg
1 sibling, 0 replies; 25+ messages in thread
From: Nitin Gupta @ 2010-05-08 4:05 UTC (permalink / raw)
To: Andrew Morton
Cc: Greg KH, Linus Torvalds, Pekka Enberg, Hugh Dickins, Cyp,
Minchan Kim, Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen,
driverdev, linux-kernel
On Sat, May 8, 2010 at 1:25 AM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Fri, 7 May 2010 12:55:04 +0530
> Nitin Gupta <ngupta@vflare.org> wrote:
<snip>
>
> <hasty review>
>
> Looking at the changelogs I'm seeing no information about the
> effectiveness of ramzswap - how much memory it saves. As that's the
> entire point of the driver, that would be a rather important thing to
> have included in the commit comments. We cannot make the decision to
> merge ramzswap without this info.
>
Documentation (drivers/staging/ramzswap/ramzswap.txt) points to the project
home page which has lots of performance numbers -- both positive and
negative cases.
Lot of your points are regarding lack of documentation and code comments.
Instead of replying to all such points individually, let me first send
patches with all the code commentary.
> The driver appears to be controlled by some nasty-looking ioctl against
> some fd. None of it is documented anywhere. It should be. You're
> proposing here a permanent extension to the kernel ABI which we will
> need to maintain for ever. That's a big deal and it is the very first
> thing reviewers will look at, before even considering the code.
>
ramzswap.txt points to 'rzscontrol' manpage where the effect of each of
these ioctls is documented:
http://compcache.googlecode.com/hg/sub-projects/rzscontrol/man/rzscontrol.html
I will add appropriate comments in code too.
> The compat implications of the ioctl args will need to be examined.
>
Ok, I will look into this.
> RZSIO_GET_STATS looks to be hopeless from a long-term maintainability
> POV. It's debug code and it would be better to move it into a debugfs
> file, where we can then add and remove things at will.
>
RZSIO_GET_STATS is how we send stats to userspace. Its not some debug code.
> The driver appears to create a /dev node called "ramzswap". If so, I
> should be able to run mke2fs on it and cheerily use it as a regular
> filesystem, should I not? If so then the entire driver is not
> swap-specific at all and there should be no mention of "swap" anywhere!
> ramzblock would be more appropriate.
>
You cannot use /dev/ramzswap{0,1,2...} devices for anything other than
swap devices since they can only handle page-aligned I/O requests.
This restriction highly simplifies the code as handling compression/
decompression for sub-page requests is hard and wasteful.
If someone needs a generic in-memory compressed block device, they
are welcome to extend ramzswap (or create a new driver).
>
> I've completely forgotten why we need this xvmalloc thing and I don't
> recall whether we decided it would be a good thing to have as a generic
> facility and of course it's all unexplained and undocumented. I won't
> be looking at it today, for this reason.
>
Justification for xvmalloc is present in initial commit message (644bf7)
>
> I'll give up there.
>
> The overall idea and utility appear to be good and desirable, IMO. But
> the code isn't productively reviewable in this state.
>
I will soon send patches for adding all these code comments. I hope that
will make code more reviewable.
In the meantime, would it be possible to commit these swap free notify
patches?
Thanks for your detailed comments.
Nitin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2)
2010-05-07 19:55 ` Andrew Morton
2010-05-08 4:05 ` Nitin Gupta
@ 2010-05-08 6:29 ` Pekka Enberg
2010-05-08 6:54 ` Nitin Gupta
2010-05-08 6:57 ` Nitin Gupta
1 sibling, 2 replies; 25+ messages in thread
From: Pekka Enberg @ 2010-05-08 6:29 UTC (permalink / raw)
To: Andrew Morton
Cc: Nitin Gupta, Greg KH, Linus Torvalds, Hugh Dickins, Cyp,
Minchan Kim, Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen,
driverdev, linux-kernel
Hi Andrew,
Andrew Morton wrote:
> Looking at the changelogs I'm seeing no information about the
> effectiveness of ramzswap - how much memory it saves. As that's the
> entire point of the driver, that would be a rather important thing to
> have included in the commit comments. We cannot make the decision to
> merge ramzswap without this info.
There's some benchmarks at ramzswap pages:
http://code.google.com/p/compcache/wiki/Performance
http://code.google.com/p/compcache/wiki/SwapDiskVsRamz
[ snip bunch of comments from Andrew that need to be addressed,
hopefully we'll get some help from the staging people ]
> The driver appears to be controlled by some nasty-looking ioctl against
> some fd. None of it is documented anywhere. It should be. You're
> proposing here a permanent extension to the kernel ABI which we will
> need to maintain for ever. That's a big deal and it is the very first
> thing reviewers will look at, before even considering the code.
I thought we got rid of it? Nitin?
> RZSIO_GET_STATS looks to be hopeless from a long-term maintainability
> POV. It's debug code and it would be better to move it into a debugfs
> file, where we can then add and remove things at will.
Yup.
> I've completely forgotten why we need this xvmalloc thing and I don't
> recall whether we decided it would be a good thing to have as a generic
> facility and of course it's all unexplained and undocumented. I won't
> be looking at it today, for this reason.
We need it because the slab allocator is not a good fit for this special
purpose driver due to fragmentation. Nitin, you had a nice web page
showing all the relevant numbers but I can't find it anymore.
Andrew, FWIW, I'm ok with xvmalloc() for this particular driver. There
was some discussion on making it more generic but I don't see it as a
merge-stopper for the driver.
> The overall idea and utility appear to be good and desirable, IMO. But
> the code isn't productively reviewable in this state.
I agree that the whole graduation step from staging to kernel proper is
not well-defined. Any suggestions? That said, I hope that doesn't stop
us from merging this patch series because the lack of notifiers cripples
the current ramzswap performance.
Pekka
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2)
2010-05-08 6:29 ` Pekka Enberg
@ 2010-05-08 6:54 ` Nitin Gupta
2010-05-08 7:05 ` Pekka Enberg
2010-05-08 6:57 ` Nitin Gupta
1 sibling, 1 reply; 25+ messages in thread
From: Nitin Gupta @ 2010-05-08 6:54 UTC (permalink / raw)
To: Pekka Enberg
Cc: Andrew Morton, Greg KH, Linus Torvalds, Hugh Dickins, Cyp,
Minchan Kim, Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen,
driverdev, linux-kernel
Hi,
On 05/08/2010 11:59 AM, Pekka Enberg wrote:
>
> Andrew Morton wrote:
>> The driver appears to be controlled by some nasty-looking ioctl against
>> some fd. None of it is documented anywhere. It should be. You're
>> proposing here a permanent extension to the kernel ABI which we will
>> need to maintain for ever. That's a big deal and it is the very first
>> thing reviewers will look at, before even considering the code.
>
> I thought we got rid of it? Nitin?
>
No, we didn't get rid of any ioctls. In fact, issuing ioctls is how we
configure various ramzswap devices (using rzscontrol utility). The
functionality of all these ioctls is well documented in rzscontrol
manpage. Now, I will also add relevant comments in the code.
>> RZSIO_GET_STATS looks to be hopeless from a long-term maintainability
>> POV. It's debug code and it would be better to move it into a debugfs
>> file, where we can then add and remove things at will.
>
> Yup.
>
Its not debug code. This ioctl is how we send stats about given ramzswap
device to userspace: rzscontrol /dev/ramzswap --stats, invokes this ioctl
and prints the stats. I will add comments to this part to make it clear.
Thanks,
Nitin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2)
2010-05-08 6:29 ` Pekka Enberg
2010-05-08 6:54 ` Nitin Gupta
@ 2010-05-08 6:57 ` Nitin Gupta
2010-05-08 7:26 ` Pekka Enberg
1 sibling, 1 reply; 25+ messages in thread
From: Nitin Gupta @ 2010-05-08 6:57 UTC (permalink / raw)
To: Pekka Enberg
Cc: Andrew Morton, Greg KH, Linus Torvalds, Hugh Dickins, Cyp,
Minchan Kim, Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen,
driverdev, linux-kernel
Oops, missed this part:
On 05/08/2010 11:59 AM, Pekka Enberg wrote:
> Andrew Morton wrote:
>> I've completely forgotten why we need this xvmalloc thing and I don't
>> recall whether we decided it would be a good thing to have as a generic
>> facility and of course it's all unexplained and undocumented. I won't
>> be looking at it today, for this reason.
>
> We need it because the slab allocator is not a good fit for this special
> purpose driver due to fragmentation. Nitin, you had a nice web page
> showing all the relevant numbers but I can't find it anymore.
>
xvmalloc performance numbers:
http://code.google.com/p/compcache/wiki/xvMalloc
http://code.google.com/p/compcache/wiki/xvMallocPerformance
Thanks,
Nitin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2)
2010-05-08 6:54 ` Nitin Gupta
@ 2010-05-08 7:05 ` Pekka Enberg
0 siblings, 0 replies; 25+ messages in thread
From: Pekka Enberg @ 2010-05-08 7:05 UTC (permalink / raw)
To: ngupta
Cc: Andrew Morton, Greg KH, Linus Torvalds, Hugh Dickins, Cyp,
Minchan Kim, Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen,
driverdev, linux-kernel
Nitin Gupta wrote:
> Its not debug code. This ioctl is how we send stats about given ramzswap
> device to userspace: rzscontrol /dev/ramzswap --stats, invokes this ioctl
> and prints the stats. I will add comments to this part to make it clear.
Another option would be sysfs if you want the stats to always be available.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2)
2010-05-08 6:57 ` Nitin Gupta
@ 2010-05-08 7:26 ` Pekka Enberg
2010-05-08 7:32 ` Nitin Gupta
0 siblings, 1 reply; 25+ messages in thread
From: Pekka Enberg @ 2010-05-08 7:26 UTC (permalink / raw)
To: ngupta
Cc: Andrew Morton, Greg KH, Linus Torvalds, Hugh Dickins, Cyp,
Minchan Kim, Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen,
driverdev, linux-kernel
Nitin Gupta wrote:
> Oops, missed this part:
>
> On 05/08/2010 11:59 AM, Pekka Enberg wrote:
>> Andrew Morton wrote:
>>> I've completely forgotten why we need this xvmalloc thing and I don't
>>> recall whether we decided it would be a good thing to have as a generic
>>> facility and of course it's all unexplained and undocumented. I won't
>>> be looking at it today, for this reason.
>> We need it because the slab allocator is not a good fit for this special
>> purpose driver due to fragmentation. Nitin, you had a nice web page
>> showing all the relevant numbers but I can't find it anymore.
>>
>
> xvmalloc performance numbers:
> http://code.google.com/p/compcache/wiki/xvMalloc
> http://code.google.com/p/compcache/wiki/xvMallocPerformance
I don't see the xvmalloc vs. kmalloc fragmentation numbers there. I
thought you had some?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2)
2010-05-08 7:26 ` Pekka Enberg
@ 2010-05-08 7:32 ` Nitin Gupta
0 siblings, 0 replies; 25+ messages in thread
From: Nitin Gupta @ 2010-05-08 7:32 UTC (permalink / raw)
To: Pekka Enberg
Cc: Andrew Morton, Greg KH, Linus Torvalds, Hugh Dickins, Cyp,
Minchan Kim, Al Viro, Christoph Hellwig, Jens Axboe, Andi Kleen,
driverdev, linux-kernel
On 05/08/2010 12:56 PM, Pekka Enberg wrote:
> Nitin Gupta wrote:
>> Oops, missed this part:
>>
>> On 05/08/2010 11:59 AM, Pekka Enberg wrote:
>>> Andrew Morton wrote:
>>>> I've completely forgotten why we need this xvmalloc thing and I don't
>>>> recall whether we decided it would be a good thing to have as a generic
>>>> facility and of course it's all unexplained and undocumented. I won't
>>>> be looking at it today, for this reason.
>>> We need it because the slab allocator is not a good fit for this special
>>> purpose driver due to fragmentation. Nitin, you had a nice web page
>>> showing all the relevant numbers but I can't find it anymore.
>>>
>>
>> xvmalloc performance numbers:
>> http://code.google.com/p/compcache/wiki/xvMalloc
>> http://code.google.com/p/compcache/wiki/xvMallocPerformance
>
> I don't see the xvmalloc vs. kmalloc fragmentation numbers there. I
> thought you had some?
>
TLSF vs kmalloc (SLUB):
http://code.google.com/p/compcache/wiki/AllocatorsComparison
By design, xvmalloc is very similar to TLSF. In fact, xvmalloc has less
metadata overhead than TLSF. So, we will surely get similar results for
xvmalloc vs kmalloc comparison too.
Thanks,
Nitin
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 2/3] Add swap slot free callback to block_device_operations
2010-05-17 5:32 [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2 resend) Nitin Gupta
@ 2010-05-17 5:32 ` Nitin Gupta
2010-05-17 12:01 ` Minchan Kim
0 siblings, 1 reply; 25+ messages in thread
From: Nitin Gupta @ 2010-05-17 5:32 UTC (permalink / raw)
To: Greg KH
Cc: Pekka Enberg, Linus Torvalds, Nigel Cunningham, Andrew Morton,
Minchan Kim, Hugh Dickins, Cyp, driverdev, linux-kernel
This callback is required when RAM based devices are used as swap disks.
One such device is ramzswap which is used as compressed in-memory swap
disk. For such devices, we need a callback as soon as a swap slot is no
longer used to allow freeing memory allocated for this slot. Without this
callback, stale data can quickly accumulate in memory defeating the whole
purpose of such devices.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
include/linux/blkdev.h | 2 ++
mm/swapfile.c | 4 ++++
2 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6690e8b..413284a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1287,6 +1287,8 @@ struct block_device_operations {
unsigned long long);
int (*revalidate_disk) (struct gendisk *);
int (*getgeo)(struct block_device *, struct hd_geometry *);
+ /* this callback is with swap_lock and sometimes page table lock held */
+ void (*swap_slot_free_notify) (struct block_device *, unsigned long);
struct module *owner;
};
diff --git a/mm/swapfile.c b/mm/swapfile.c
index ecb069e..f5ccc47 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -574,6 +574,7 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
/* free if no reference */
if (!usage) {
+ struct gendisk *disk = p->bdev->bd_disk;
if (offset < p->lowest_bit)
p->lowest_bit = offset;
if (offset > p->highest_bit)
@@ -583,6 +584,9 @@ static unsigned char swap_entry_free(struct swap_info_struct *p,
swap_list.next = p->type;
nr_swap_pages++;
p->inuse_pages--;
+ if ((p->flags & SWP_BLKDEV) &&
+ disk->fops->swap_slot_free_notify)
+ disk->fops->swap_slot_free_notify(p->bdev, offset);
}
return usage;
--
1.6.6.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] Add swap slot free callback to block_device_operations
2010-05-17 5:32 ` [PATCH 2/3] Add swap slot free callback to block_device_operations Nitin Gupta
@ 2010-05-17 12:01 ` Minchan Kim
2010-05-18 3:31 ` Nitin Gupta
0 siblings, 1 reply; 25+ messages in thread
From: Minchan Kim @ 2010-05-17 12:01 UTC (permalink / raw)
To: Nitin Gupta
Cc: Greg KH, Pekka Enberg, Linus Torvalds, Nigel Cunningham,
Andrew Morton, Hugh Dickins, Cyp, driverdev, linux-kernel
On Mon, May 17, 2010 at 2:32 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> This callback is required when RAM based devices are used as swap disks.
> One such device is ramzswap which is used as compressed in-memory swap
> disk. For such devices, we need a callback as soon as a swap slot is no
> longer used to allow freeing memory allocated for this slot. Without this
> callback, stale data can quickly accumulate in memory defeating the whole
> purpose of such devices.
>
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Looks good to me about code. so I added my review sign.
But I have some comments.
last time I said, I don't like there is a swap specific function in
block_device_operations.
It doesn't need many memory but it's not good about design sine
block_device_operations have common functions about block device.
But I don't have any good idea now where I put swap specific function.
And Linus already acked this idea. Hmm.
If there isn't any objection, I don't insist on my thought.
Nitpick :
AFAIR, Nitin introduced SWP_BLKDEV since he think access of long
pointers isn't good. ex)
S_ISBLK(swap_info_struct->swap_file->f_mapping->host->i_mode)
But now, we have to access p->bdev->bd_disk->fops->swap_slot_free_notify.
Isn't it all right?
--
Kind regards,
Minchan Kim
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/3] Add swap slot free callback to block_device_operations
2010-05-17 12:01 ` Minchan Kim
@ 2010-05-18 3:31 ` Nitin Gupta
0 siblings, 0 replies; 25+ messages in thread
From: Nitin Gupta @ 2010-05-18 3:31 UTC (permalink / raw)
To: Minchan Kim
Cc: Greg KH, Pekka Enberg, Linus Torvalds, Nigel Cunningham,
Andrew Morton, Hugh Dickins, Cyp, driverdev, linux-kernel
On 05/17/2010 05:31 PM, Minchan Kim wrote:
> On Mon, May 17, 2010 at 2:32 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>> This callback is required when RAM based devices are used as swap disks.
>> One such device is ramzswap which is used as compressed in-memory swap
>> disk. For such devices, we need a callback as soon as a swap slot is no
>> longer used to allow freeing memory allocated for this slot. Without this
>> callback, stale data can quickly accumulate in memory defeating the whole
>> purpose of such devices.
>>
>> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
>
> Looks good to me about code. so I added my review sign.
> But I have some comments.
>
> last time I said, I don't like there is a swap specific function in
> block_device_operations.
> It doesn't need many memory but it's not good about design sine
> block_device_operations have common functions about block device.
>
> But I don't have any good idea now where I put swap specific function.
> And Linus already acked this idea. Hmm.
>
> If there isn't any objection, I don't insist on my thought.
>
> Nitpick :
> AFAIR, Nitin introduced SWP_BLKDEV since he think access of long
> pointers isn't good. ex)
> S_ISBLK(swap_info_struct->swap_file->f_mapping->host->i_mode)
>
> But now, we have to access p->bdev->bd_disk->fops->swap_slot_free_notify.
> Isn't it all right?
>
I'm also not sure about this point but accessing yet another very long
pointer chain just to check if its a block device seems weird.
Anyways, its trivial to remove this swap flag if its later decided that
its not really needed.
Thanks,
Nitin
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2010-05-18 3:35 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-07 7:25 [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2) Nitin Gupta
2010-05-07 7:25 ` [PATCH 1/3] Add flag to identify block swap devices Nitin Gupta
2010-05-07 8:03 ` jassi brar
2010-05-07 8:16 ` Nitin Gupta
2010-05-07 8:56 ` jassi brar
2010-05-07 9:24 ` Pekka Enberg
2010-05-07 9:32 ` Nigel Cunningham
2010-05-07 7:25 ` [PATCH 2/3] Add swap slot free callback to block_device_operations Nitin Gupta
2010-05-07 9:22 ` Nigel Cunningham
2010-05-07 9:48 ` Nitin Gupta
2010-05-07 10:40 ` Nigel Cunningham
2010-05-07 7:25 ` [PATCH 3/3] ramzswap: Handler for swap slot free callback Nitin Gupta
2010-05-07 7:44 ` [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2) Pekka Enberg
2010-05-07 14:51 ` Linus Torvalds
2010-05-07 19:55 ` Andrew Morton
2010-05-08 4:05 ` Nitin Gupta
2010-05-08 6:29 ` Pekka Enberg
2010-05-08 6:54 ` Nitin Gupta
2010-05-08 7:05 ` Pekka Enberg
2010-05-08 6:57 ` Nitin Gupta
2010-05-08 7:26 ` Pekka Enberg
2010-05-08 7:32 ` Nitin Gupta
-- strict thread matches above, loose matches on Subject: below --
2010-05-17 5:32 [PATCH 0/3] ramzswap: Eliminate stale data from compressed memory (v2 resend) Nitin Gupta
2010-05-17 5:32 ` [PATCH 2/3] Add swap slot free callback to block_device_operations Nitin Gupta
2010-05-17 12:01 ` Minchan Kim
2010-05-18 3:31 ` Nitin Gupta
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox