* [PATCH] swap: Fix swap size in case of block devices
@ 2009-08-30 16:19 Nitin Gupta
2009-08-31 11:27 ` Hugh Dickins
0 siblings, 1 reply; 6+ messages in thread
From: Nitin Gupta @ 2009-08-30 16:19 UTC (permalink / raw)
To: Andrew Morton; +Cc: Rik van Riel, linux-kernel, linux-mm
During swapon, swap size is set to number of usable pages in the given
swap file/block device minus 1 (for header page). In case of block devices,
this size is incorrectly set as one page less than the actual due to an
off-by-one error. For regular files, this size is set correctly.
Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
mm/swapfile.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 8ffdc0d..3d37b97 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1951,9 +1951,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
if (error)
goto bad_swap;
+ /* excluding header page */
nr_good_pages = swap_header->info.last_page -
- swap_header->info.nr_badpages -
- 1 /* header page */;
+ swap_header->info.nr_badpages;
if (nr_good_pages) {
swap_map[0] = SWAP_MAP_BAD;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] swap: Fix swap size in case of block devices
2009-08-30 16:19 [PATCH] swap: Fix swap size in case of block devices Nitin Gupta
@ 2009-08-31 11:27 ` Hugh Dickins
2009-08-31 17:21 ` Nitin Gupta
0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2009-08-31 11:27 UTC (permalink / raw)
To: Nitin Gupta
Cc: Andrew Morton, Rik van Riel, Karel Zak, linux-kernel, linux-mm
On Sun, 30 Aug 2009, Nitin Gupta wrote:
> During swapon, swap size is set to number of usable pages in the given
> swap file/block device minus 1 (for header page). In case of block devices,
> this size is incorrectly set as one page less than the actual due to an
> off-by-one error. For regular files, this size is set correctly.
>
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
I agree that there's an off-by-one disagreement between swapon and mkswap
regarding last_page. The kernel seems to interpret it as the index of
what I'd call the end page, the first page beyond the swap area.
I'd never noticed that until the beginning of this year, and out of
caution I've done nothing about it. I believe that the kernel has
been wrong since Linux 2.2.? or 2.3.?, ever since swap header version 1
was first introduced; and that mkswap has set it the same way all along.
But I've not spent time on the research to establish that for sure.
What if there used to be a version of mkswap which set last_page one
greater as the kernel expects? Neither Karel nor I think that's the
case, but we're not absolutely certain. And what if (I suppose I'm
getting even more wildly cautious here!) someone has learnt that
that page remains untouched and is now putting it to other use?
or has compensated for the off-by-one and is setting it one greater,
beyond the end of the partition, not using mkswap?
Since nobody has been hurt by it in all these years, I felt safer
to go on leaving that discrepancy as is. Call me over cautious.
Regarding your patch comment: I'm puzzled by the remark "For regular
files, this size is set correctly". Do you mean that mkswap is
setting last_page one higher when dealing with a regular file rather
than a block device (I was unaware of that, but never looked to see)?
But your patch appears to be to code shared equally between block
devices and regular files, so then you'd be introducing a bug on
regular files? And shouldn't mkswap be fixed to be consistent
with itself? Hopefully I've misunderstood: please explain further.
And regarding the patch itself: my understanding is that the problem
is with the interpretation of last_page, so I don't think one change
to nr_good_pages would be enough to fix it - you'd need to change the
other places where last_page is referred to too.
I'm still disinclined to make any change here myself (beyond
a comment noting the discrepancy); but tell me I'm a fool.
Hugh
> ---
>
> mm/swapfile.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 8ffdc0d..3d37b97 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1951,9 +1951,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
> if (error)
> goto bad_swap;
>
> + /* excluding header page */
> nr_good_pages = swap_header->info.last_page -
> - swap_header->info.nr_badpages -
> - 1 /* header page */;
> + swap_header->info.nr_badpages;
>
> if (nr_good_pages) {
> swap_map[0] = SWAP_MAP_BAD;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] swap: Fix swap size in case of block devices
2009-08-31 11:27 ` Hugh Dickins
@ 2009-08-31 17:21 ` Nitin Gupta
2009-08-31 19:26 ` Hugh Dickins
0 siblings, 1 reply; 6+ messages in thread
From: Nitin Gupta @ 2009-08-31 17:21 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Rik van Riel, Karel Zak, linux-kernel, linux-mm
On 08/31/2009 04:57 PM, Hugh Dickins wrote:
> On Sun, 30 Aug 2009, Nitin Gupta wrote:
>
>> During swapon, swap size is set to number of usable pages in the given
>> swap file/block device minus 1 (for header page). In case of block devices,
>> this size is incorrectly set as one page less than the actual due to an
>> off-by-one error. For regular files, this size is set correctly.
>>
>> Signed-off-by: Nitin Gupta<ngupta@vflare.org>
>
> I agree that there's an off-by-one disagreement between swapon and mkswap
> regarding last_page. The kernel seems to interpret it as the index of
> what I'd call the end page, the first page beyond the swap area.
>
> I'd never noticed that until the beginning of this year, and out of
> caution I've done nothing about it. I believe that the kernel has
> been wrong since Linux 2.2.? or 2.3.?, ever since swap header version 1
> was first introduced; and that mkswap has set it the same way all along.
>
> But I've not spent time on the research to establish that for sure.
>
> What if there used to be a version of mkswap which set last_page one
> greater as the kernel expects? Neither Karel nor I think that's the
> case, but we're not absolutely certain. And what if (I suppose I'm
> getting even more wildly cautious here!) someone has learnt that
> that page remains untouched and is now putting it to other use?
> or has compensated for the off-by-one and is setting it one greater,
> beyond the end of the partition, not using mkswap?
>
All this regarding mkswap is even more unlikely considering that for
regular files, kernel uses last page too (details below). Only for block
devices, we leave last page unused due to this bug.
> Since nobody has been hurt by it in all these years, I felt safer
> to go on leaving that discrepancy as is. Call me over cautious.
>
> Regarding your patch comment: I'm puzzled by the remark "For regular
> files, this size is set correctly". Do you mean that mkswap is
> setting last_page one higher when dealing with a regular file rather
> than a block device (I was unaware of that, but never looked to see)?
> But your patch appears to be to code shared equally between block
> devices and regular files, so then you'd be introducing a bug on
> regular files? And shouldn't mkswap be fixed to be consistent
> with itself? Hopefully I've misunderstood: please explain further.
>
mkswap sets last_page correctly: 0-based index of last usable
swap page. To explain why this bug affects only block swap devices,
some code walkthrough is done below:
(BTW, I only checked mkswap which is part of util-linux-ng 2.14.2).
swapon()
{
...
nr_good_pages = swap_header->info.last_page -
swap_header->info.nr_badpages -
1 /* header page */;
====
off-by-one error: for both regular and block device case, but...
====
if (nr_good_pages) {
swap_map[0] = SWAP_MAP_BAD;
p->max = maxpages;
p->pages = nr_good_pages;
nr_extents = setup_swap_extents(p, &span);
====
For block devices, setup_swap_extents() leaves p->pages untouched.
For regular files, it sets p->pages
== total usable swap pages (including header page) - 1;
====
if (nr_extents < 0) {
error = nr_extents;
goto bad_swap;
}
nr_good_pages = p->pages;
====
So, for block device, nr_good_pages == last_page - nr_badpages - 1
== (total pages - 1) - nr_badpages - 1 (error)
For regular files, nr_good_pages == total pages - 1 (correct)
====
}
...
}
With this fix, block device case is corrected to last_page - nr_badpages - 1
while regular file case remain correct since setup_swap_extents() still gives
same correct value in p->pages (== total pages - 1).
> And regarding the patch itself: my understanding is that the problem
> is with the interpretation of last_page, so I don't think one change
> to nr_good_pages would be enough to fix it - you'd need to change the
> other places where last_page is referred to too.
>
I looked at other instances of last_page in swapon() -- all these other
instances looked correct to me.
> I'm still disinclined to make any change here myself (beyond
> a comment noting the discrepancy); but tell me I'm a fool.
>
I agree that nobody would bother losing 1 swap slot, so it might
not be desirable to have this fix. But IMHO, I don't see any reason
to leave this discrepancy between regular files and swap devices -- its
just so odd.
Thanks for your detailed comments.
Regards,
Nitin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] swap: Fix swap size in case of block devices
2009-08-31 17:21 ` Nitin Gupta
@ 2009-08-31 19:26 ` Hugh Dickins
2009-09-01 7:11 ` Nitin Gupta
0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2009-08-31 19:26 UTC (permalink / raw)
To: Nitin Gupta
Cc: Andrew Morton, Rik van Riel, Karel Zak, linux-kernel, linux-mm
On Mon, 31 Aug 2009, Nitin Gupta wrote:
>
> mkswap sets last_page correctly: 0-based index of last usable
> swap page. To explain why this bug affects only block swap devices,
> some code walkthrough is done below:
> (BTW, I only checked mkswap which is part of util-linux-ng 2.14.2).
>
> swapon()
> {
> ...
> nr_good_pages = swap_header->info.last_page -
> swap_header->info.nr_badpages -
> 1 /* header page */;
>
> ====
> off-by-one error: for both regular and block device case, but...
> ====
>
> if (nr_good_pages) {
> swap_map[0] = SWAP_MAP_BAD;
> p->max = maxpages;
> p->pages = nr_good_pages;
> nr_extents = setup_swap_extents(p, &span);
> ====
> For block devices, setup_swap_extents() leaves p->pages untouched.
> For regular files, it sets p->pages
> == total usable swap pages (including header page) - 1;
I think you're overlooking the "page < sis->max" condition
in setup_swap_extents()'s loop. So at the end of the loop,
if no pages were lost to fragmentation, we have
sis->max = page_no; /* no change */
sis->pages = page_no - 1; /* no change */
> ====
> if (nr_extents < 0) {
> error = nr_extents;
> goto bad_swap;
> }
> nr_good_pages = p->pages;
>
> ====
> So, for block device, nr_good_pages == last_page - nr_badpages - 1
> == (total pages - 1) - nr_badpages - 1 (error)
> For regular files, nr_good_pages == total pages - 1
> (correct)
> ====
>
> }
> ...
> }
>
>
> With this fix, block device case is corrected to last_page - nr_badpages - 1
> while regular file case remain correct since setup_swap_extents() still gives
> same correct value in p->pages (== total pages - 1).
>
>
> > And regarding the patch itself: my understanding is that the problem
> > is with the interpretation of last_page, so I don't think one change
> > to nr_good_pages would be enough to fix it - you'd need to change the
> > other places where last_page is referred to too.
> >
>
> I looked at other instances of last_page in swapon() -- all these other
> instances looked correct to me.
I believe they're all consistent with the off-by-oneness of nr_good_pages.
p->max, for example, is consistently one more than p->pages, so long as
there are no bad pages and no overflowing the swp_entry_t.
Perhaps you're placing too much faith in your interpretation of "max"?
I dislike several conventions in swapfile.c, it does lend itself to
off-by-oneness.
>
> > I'm still disinclined to make any change here myself (beyond
> > a comment noting the discrepancy); but tell me I'm a fool.
> >
>
> I agree that nobody would bother losing 1 swap slot, so it might
> not be desirable to have this fix. But IMHO, I don't see any reason
> to leave this discrepancy between regular files and swap devices -- its
> just so odd.
Yes, I'd dislike that discrepancy between regular files and block
devices, if I could see it. Though I'd probably still be cautious
about the disk partitions.
dd if=/dev/zero of=/swap bs=200k # says 204800 bytes (205kB)
mkswap /swap # says size = 196 KiB
swapon /swap # dmesg says Adding 192k swap
which is what I've come to expect from the off-by-one,
even on regular files.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] swap: Fix swap size in case of block devices
2009-08-31 19:26 ` Hugh Dickins
@ 2009-09-01 7:11 ` Nitin Gupta
2009-09-01 9:23 ` Hugh Dickins
0 siblings, 1 reply; 6+ messages in thread
From: Nitin Gupta @ 2009-09-01 7:11 UTC (permalink / raw)
To: Hugh Dickins
Cc: Andrew Morton, Rik van Riel, Karel Zak, linux-kernel, linux-mm
On Tue, Sep 1, 2009 at 12:56 AM, Hugh Dickins<hugh.dickins@tiscali.co.uk> wrote:
> On Mon, 31 Aug 2009, Nitin Gupta wrote:
>> For block devices, setup_swap_extents() leaves p->pages untouched.
>> For regular files, it sets p->pages
>> == total usable swap pages (including header page) - 1;
>
> I think you're overlooking the "page < sis->max" condition
> in setup_swap_extents()'s loop. So at the end of the loop,
> if no pages were lost to fragmentation, we have
>
> sis->max = page_no; /* no change */
> sis->pages = page_no - 1; /* no change */
>
Oh, I missed this loop condition. The variable naming is so bad, I
find it very hard to follow this part of code.
Still, if there is even a single page in swap file that is not usable
(i.e. non-contiguous on disk) -- which is what usually happens for swap
files of any practical size -- setup_swap_extents() gives correct value
in sis->pages == total usable pages (including header) - 1;
However, if all the file pages are usable, it gives off-by-one error, as
you noted.
> Yes, I'd dislike that discrepancy between regular files and block
> devices, if I could see it. Though I'd probably still be cautious
> about the disk partitions.
> dd if=/dev/zero of=/swap bs=200k # says 204800 bytes (205kB)
> mkswap /swap # says size = 196 KiB
> swapon /swap # dmesg says Adding 192k swap
> which is what I've come to expect from the off-by-one,
> even on regular files.
In general, its not correct to compare size repored by mkswap and
swapon like this. The size reported by mkswap includes pages which
are not contiguous on disk. While, kernel considers only
PAGE_SIZE-length, PAGE_SIZE-aligned contiguous run of blocks. So, size
reported by mkswap and swapon can vary wildly. For e.g.:
(on mtdram with ext2 fs)
dd if=/dev/zero of=swap.dd bs=1M count=10
mkswap swap.dd # says size = 10236 KiB
swapon swap.dd # says Adding 10112k swap
====
So, to summarize:
1. mkswap always behaves correctly: It sets number of pages in swap file
minus one as 'last_page' in swap header (since this is a 0-based index).
This same value (total pages - 1) is printed out as size since it knows
that first page is swap header.
2. swapon() for block devices: off-by-one error causing last swap page
to remain unused.
3. swapon() for regular files:
3.1 off-by-one error if every swap page in this file is usable i.e.
every PAGE_SIZE-length, PAGE_SIZE-aligned chunk is contiguous on
disk.
3.2 correct size value if there is at least one swap page which is
unusable -- which is expected from swap file of any practical
size.
I will go through swap code again to find other possible off-by-one
errors. The revised patch will fix these inconsistencies.
Thanks,
Nitin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] swap: Fix swap size in case of block devices
2009-09-01 7:11 ` Nitin Gupta
@ 2009-09-01 9:23 ` Hugh Dickins
0 siblings, 0 replies; 6+ messages in thread
From: Hugh Dickins @ 2009-09-01 9:23 UTC (permalink / raw)
To: Nitin Gupta
Cc: Andrew Morton, Rik van Riel, Karel Zak, linux-kernel, linux-mm
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3791 bytes --]
On Tue, 1 Sep 2009, Nitin Gupta wrote:
> On Tue, Sep 1, 2009 at 12:56 AM, Hugh Dickins<hugh.dickins@tiscali.co.uk> wrote:
> > On Mon, 31 Aug 2009, Nitin Gupta wrote:
> >> For block devices, setup_swap_extents() leaves p->pages untouched.
> >> For regular files, it sets p->pages
> >> == total usable swap pages (including header page) - 1;
> >
> > I think you're overlooking the "page < sis->max" condition
> > in setup_swap_extents()'s loop. So at the end of the loop,
> > if no pages were lost to fragmentation, we have
> >
> > sis->max = page_no; /* no change */
> > sis->pages = page_no - 1; /* no change */
> >
>
> Oh, I missed this loop condition. The variable naming is so bad, I
> find it very hard to follow this part of code.
>
> Still, if there is even a single page in swap file that is not usable
> (i.e. non-contiguous on disk) -- which is what usually happens for swap
> files of any practical size -- setup_swap_extents() gives correct value
> in sis->pages == total usable pages (including header) - 1;
>
> However, if all the file pages are usable, it gives off-by-one error, as
> you noted.
Right, I see your point now: when the regular file is fragmented thus,
setup_swap_extents() would allow it to use the final page of the file,
which would otherwise be (erroneously) disallowed.
But I would reword your "what usually happens" to "what happens in
the general case": perhaps I'm wrong, but I think that usually these
days people are creating swap files on filesystems with 4kB block
size, where there's no issue of intra-page fragmentation lowering
that page count (but there may still be inter-page fragmentation
to make swapping to the file less efficient than to a partition).
>
> > Yes, I'd dislike that discrepancy between regular files and block
> > devices, if I could see it. Though I'd probably still be cautious
> > about the disk partitions.
>
> > dd if=/dev/zero of=/swap bs=200k # says 204800 bytes (205kB)
> > mkswap /swap # says size = 196 KiB
> > swapon /swap # dmesg says Adding 192k swap
>
> > which is what I've come to expect from the off-by-one,
> > even on regular files.
>
> In general, its not correct to compare size repored by mkswap and
> swapon like this. The size reported by mkswap includes pages which
> are not contiguous on disk. While, kernel considers only
> PAGE_SIZE-length, PAGE_SIZE-aligned contiguous run of blocks. So, size
> reported by mkswap and swapon can vary wildly. For e.g.:
>
> (on mtdram with ext2 fs)
> dd if=/dev/zero of=swap.dd bs=1M count=10
> mkswap swap.dd # says size = 10236 KiB
> swapon swap.dd # says Adding 10112k swap
If the filesystem has block size 1kB or 2kB, yes.
>
> ====
>
> So, to summarize:
>
> 1. mkswap always behaves correctly: It sets number of pages in swap file
> minus one as 'last_page' in swap header (since this is a 0-based index).
> This same value (total pages - 1) is printed out as size since it knows
> that first page is swap header.
>
> 2. swapon() for block devices: off-by-one error causing last swap page
> to remain unused.
>
> 3. swapon() for regular files:
> 3.1 off-by-one error if every swap page in this file is usable i.e.
> every PAGE_SIZE-length, PAGE_SIZE-aligned chunk is contiguous on
> disk.
> 3.2 correct size value if there is at least one swap page which is
> unusable -- which is expected from swap file of any practical
> size.
>
>
> I will go through swap code again to find other possible off-by-one
> errors. The revised patch will fix these inconsistencies.
Thanks.
Hugh
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-09-01 9:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-30 16:19 [PATCH] swap: Fix swap size in case of block devices Nitin Gupta
2009-08-31 11:27 ` Hugh Dickins
2009-08-31 17:21 ` Nitin Gupta
2009-08-31 19:26 ` Hugh Dickins
2009-09-01 7:11 ` Nitin Gupta
2009-09-01 9:23 ` Hugh Dickins
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).