* [PATCH] swap: Fix swap size in case of block devices @ 2009-08-30 16:19 Nitin Gupta 2009-08-31 11:27 ` Hugh Dickins 0 siblings, 1 reply; 6+ messages in thread From: Nitin Gupta @ 2009-08-30 16:19 UTC (permalink / raw) To: Andrew Morton; +Cc: Rik van Riel, linux-kernel, linux-mm During swapon, swap size is set to number of usable pages in the given swap file/block device minus 1 (for header page). In case of block devices, this size is incorrectly set as one page less than the actual due to an off-by-one error. For regular files, this size is set correctly. Signed-off-by: Nitin Gupta <ngupta@vflare.org> --- mm/swapfile.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 8ffdc0d..3d37b97 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1951,9 +1951,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) if (error) goto bad_swap; + /* excluding header page */ nr_good_pages = swap_header->info.last_page - - swap_header->info.nr_badpages - - 1 /* header page */; + swap_header->info.nr_badpages; if (nr_good_pages) { swap_map[0] = SWAP_MAP_BAD; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] swap: Fix swap size in case of block devices 2009-08-30 16:19 [PATCH] swap: Fix swap size in case of block devices Nitin Gupta @ 2009-08-31 11:27 ` Hugh Dickins 2009-08-31 17:21 ` Nitin Gupta 0 siblings, 1 reply; 6+ messages in thread From: Hugh Dickins @ 2009-08-31 11:27 UTC (permalink / raw) To: Nitin Gupta Cc: Andrew Morton, Rik van Riel, Karel Zak, linux-kernel, linux-mm On Sun, 30 Aug 2009, Nitin Gupta wrote: > During swapon, swap size is set to number of usable pages in the given > swap file/block device minus 1 (for header page). In case of block devices, > this size is incorrectly set as one page less than the actual due to an > off-by-one error. For regular files, this size is set correctly. > > Signed-off-by: Nitin Gupta <ngupta@vflare.org> I agree that there's an off-by-one disagreement between swapon and mkswap regarding last_page. The kernel seems to interpret it as the index of what I'd call the end page, the first page beyond the swap area. I'd never noticed that until the beginning of this year, and out of caution I've done nothing about it. I believe that the kernel has been wrong since Linux 2.2.? or 2.3.?, ever since swap header version 1 was first introduced; and that mkswap has set it the same way all along. But I've not spent time on the research to establish that for sure. What if there used to be a version of mkswap which set last_page one greater as the kernel expects? Neither Karel nor I think that's the case, but we're not absolutely certain. And what if (I suppose I'm getting even more wildly cautious here!) someone has learnt that that page remains untouched and is now putting it to other use? or has compensated for the off-by-one and is setting it one greater, beyond the end of the partition, not using mkswap? Since nobody has been hurt by it in all these years, I felt safer to go on leaving that discrepancy as is. Call me over cautious. Regarding your patch comment: I'm puzzled by the remark "For regular files, this size is set correctly". Do you mean that mkswap is setting last_page one higher when dealing with a regular file rather than a block device (I was unaware of that, but never looked to see)? But your patch appears to be to code shared equally between block devices and regular files, so then you'd be introducing a bug on regular files? And shouldn't mkswap be fixed to be consistent with itself? Hopefully I've misunderstood: please explain further. And regarding the patch itself: my understanding is that the problem is with the interpretation of last_page, so I don't think one change to nr_good_pages would be enough to fix it - you'd need to change the other places where last_page is referred to too. I'm still disinclined to make any change here myself (beyond a comment noting the discrepancy); but tell me I'm a fool. Hugh > --- > > mm/swapfile.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 8ffdc0d..3d37b97 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1951,9 +1951,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) > if (error) > goto bad_swap; > > + /* excluding header page */ > nr_good_pages = swap_header->info.last_page - > - swap_header->info.nr_badpages - > - 1 /* header page */; > + swap_header->info.nr_badpages; > > if (nr_good_pages) { > swap_map[0] = SWAP_MAP_BAD; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] swap: Fix swap size in case of block devices 2009-08-31 11:27 ` Hugh Dickins @ 2009-08-31 17:21 ` Nitin Gupta 2009-08-31 19:26 ` Hugh Dickins 0 siblings, 1 reply; 6+ messages in thread From: Nitin Gupta @ 2009-08-31 17:21 UTC (permalink / raw) To: Hugh Dickins Cc: Andrew Morton, Rik van Riel, Karel Zak, linux-kernel, linux-mm On 08/31/2009 04:57 PM, Hugh Dickins wrote: > On Sun, 30 Aug 2009, Nitin Gupta wrote: > >> During swapon, swap size is set to number of usable pages in the given >> swap file/block device minus 1 (for header page). In case of block devices, >> this size is incorrectly set as one page less than the actual due to an >> off-by-one error. For regular files, this size is set correctly. >> >> Signed-off-by: Nitin Gupta<ngupta@vflare.org> > > I agree that there's an off-by-one disagreement between swapon and mkswap > regarding last_page. The kernel seems to interpret it as the index of > what I'd call the end page, the first page beyond the swap area. > > I'd never noticed that until the beginning of this year, and out of > caution I've done nothing about it. I believe that the kernel has > been wrong since Linux 2.2.? or 2.3.?, ever since swap header version 1 > was first introduced; and that mkswap has set it the same way all along. > > But I've not spent time on the research to establish that for sure. > > What if there used to be a version of mkswap which set last_page one > greater as the kernel expects? Neither Karel nor I think that's the > case, but we're not absolutely certain. And what if (I suppose I'm > getting even more wildly cautious here!) someone has learnt that > that page remains untouched and is now putting it to other use? > or has compensated for the off-by-one and is setting it one greater, > beyond the end of the partition, not using mkswap? > All this regarding mkswap is even more unlikely considering that for regular files, kernel uses last page too (details below). Only for block devices, we leave last page unused due to this bug. > Since nobody has been hurt by it in all these years, I felt safer > to go on leaving that discrepancy as is. Call me over cautious. > > Regarding your patch comment: I'm puzzled by the remark "For regular > files, this size is set correctly". Do you mean that mkswap is > setting last_page one higher when dealing with a regular file rather > than a block device (I was unaware of that, but never looked to see)? > But your patch appears to be to code shared equally between block > devices and regular files, so then you'd be introducing a bug on > regular files? And shouldn't mkswap be fixed to be consistent > with itself? Hopefully I've misunderstood: please explain further. > mkswap sets last_page correctly: 0-based index of last usable swap page. To explain why this bug affects only block swap devices, some code walkthrough is done below: (BTW, I only checked mkswap which is part of util-linux-ng 2.14.2). swapon() { ... nr_good_pages = swap_header->info.last_page - swap_header->info.nr_badpages - 1 /* header page */; ==== off-by-one error: for both regular and block device case, but... ==== if (nr_good_pages) { swap_map[0] = SWAP_MAP_BAD; p->max = maxpages; p->pages = nr_good_pages; nr_extents = setup_swap_extents(p, &span); ==== For block devices, setup_swap_extents() leaves p->pages untouched. For regular files, it sets p->pages == total usable swap pages (including header page) - 1; ==== if (nr_extents < 0) { error = nr_extents; goto bad_swap; } nr_good_pages = p->pages; ==== So, for block device, nr_good_pages == last_page - nr_badpages - 1 == (total pages - 1) - nr_badpages - 1 (error) For regular files, nr_good_pages == total pages - 1 (correct) ==== } ... } With this fix, block device case is corrected to last_page - nr_badpages - 1 while regular file case remain correct since setup_swap_extents() still gives same correct value in p->pages (== total pages - 1). > And regarding the patch itself: my understanding is that the problem > is with the interpretation of last_page, so I don't think one change > to nr_good_pages would be enough to fix it - you'd need to change the > other places where last_page is referred to too. > I looked at other instances of last_page in swapon() -- all these other instances looked correct to me. > I'm still disinclined to make any change here myself (beyond > a comment noting the discrepancy); but tell me I'm a fool. > I agree that nobody would bother losing 1 swap slot, so it might not be desirable to have this fix. But IMHO, I don't see any reason to leave this discrepancy between regular files and swap devices -- its just so odd. Thanks for your detailed comments. Regards, Nitin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] swap: Fix swap size in case of block devices 2009-08-31 17:21 ` Nitin Gupta @ 2009-08-31 19:26 ` Hugh Dickins 2009-09-01 7:11 ` Nitin Gupta 0 siblings, 1 reply; 6+ messages in thread From: Hugh Dickins @ 2009-08-31 19:26 UTC (permalink / raw) To: Nitin Gupta Cc: Andrew Morton, Rik van Riel, Karel Zak, linux-kernel, linux-mm On Mon, 31 Aug 2009, Nitin Gupta wrote: > > mkswap sets last_page correctly: 0-based index of last usable > swap page. To explain why this bug affects only block swap devices, > some code walkthrough is done below: > (BTW, I only checked mkswap which is part of util-linux-ng 2.14.2). > > swapon() > { > ... > nr_good_pages = swap_header->info.last_page - > swap_header->info.nr_badpages - > 1 /* header page */; > > ==== > off-by-one error: for both regular and block device case, but... > ==== > > if (nr_good_pages) { > swap_map[0] = SWAP_MAP_BAD; > p->max = maxpages; > p->pages = nr_good_pages; > nr_extents = setup_swap_extents(p, &span); > ==== > For block devices, setup_swap_extents() leaves p->pages untouched. > For regular files, it sets p->pages > == total usable swap pages (including header page) - 1; I think you're overlooking the "page < sis->max" condition in setup_swap_extents()'s loop. So at the end of the loop, if no pages were lost to fragmentation, we have sis->max = page_no; /* no change */ sis->pages = page_no - 1; /* no change */ > ==== > if (nr_extents < 0) { > error = nr_extents; > goto bad_swap; > } > nr_good_pages = p->pages; > > ==== > So, for block device, nr_good_pages == last_page - nr_badpages - 1 > == (total pages - 1) - nr_badpages - 1 (error) > For regular files, nr_good_pages == total pages - 1 > (correct) > ==== > > } > ... > } > > > With this fix, block device case is corrected to last_page - nr_badpages - 1 > while regular file case remain correct since setup_swap_extents() still gives > same correct value in p->pages (== total pages - 1). > > > > And regarding the patch itself: my understanding is that the problem > > is with the interpretation of last_page, so I don't think one change > > to nr_good_pages would be enough to fix it - you'd need to change the > > other places where last_page is referred to too. > > > > I looked at other instances of last_page in swapon() -- all these other > instances looked correct to me. I believe they're all consistent with the off-by-oneness of nr_good_pages. p->max, for example, is consistently one more than p->pages, so long as there are no bad pages and no overflowing the swp_entry_t. Perhaps you're placing too much faith in your interpretation of "max"? I dislike several conventions in swapfile.c, it does lend itself to off-by-oneness. > > > I'm still disinclined to make any change here myself (beyond > > a comment noting the discrepancy); but tell me I'm a fool. > > > > I agree that nobody would bother losing 1 swap slot, so it might > not be desirable to have this fix. But IMHO, I don't see any reason > to leave this discrepancy between regular files and swap devices -- its > just so odd. Yes, I'd dislike that discrepancy between regular files and block devices, if I could see it. Though I'd probably still be cautious about the disk partitions. dd if=/dev/zero of=/swap bs=200k # says 204800 bytes (205kB) mkswap /swap # says size = 196 KiB swapon /swap # dmesg says Adding 192k swap which is what I've come to expect from the off-by-one, even on regular files. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] swap: Fix swap size in case of block devices 2009-08-31 19:26 ` Hugh Dickins @ 2009-09-01 7:11 ` Nitin Gupta 2009-09-01 9:23 ` Hugh Dickins 0 siblings, 1 reply; 6+ messages in thread From: Nitin Gupta @ 2009-09-01 7:11 UTC (permalink / raw) To: Hugh Dickins Cc: Andrew Morton, Rik van Riel, Karel Zak, linux-kernel, linux-mm On Tue, Sep 1, 2009 at 12:56 AM, Hugh Dickins<hugh.dickins@tiscali.co.uk> wrote: > On Mon, 31 Aug 2009, Nitin Gupta wrote: >> For block devices, setup_swap_extents() leaves p->pages untouched. >> For regular files, it sets p->pages >> == total usable swap pages (including header page) - 1; > > I think you're overlooking the "page < sis->max" condition > in setup_swap_extents()'s loop. So at the end of the loop, > if no pages were lost to fragmentation, we have > > sis->max = page_no; /* no change */ > sis->pages = page_no - 1; /* no change */ > Oh, I missed this loop condition. The variable naming is so bad, I find it very hard to follow this part of code. Still, if there is even a single page in swap file that is not usable (i.e. non-contiguous on disk) -- which is what usually happens for swap files of any practical size -- setup_swap_extents() gives correct value in sis->pages == total usable pages (including header) - 1; However, if all the file pages are usable, it gives off-by-one error, as you noted. > Yes, I'd dislike that discrepancy between regular files and block > devices, if I could see it. Though I'd probably still be cautious > about the disk partitions. > dd if=/dev/zero of=/swap bs=200k # says 204800 bytes (205kB) > mkswap /swap # says size = 196 KiB > swapon /swap # dmesg says Adding 192k swap > which is what I've come to expect from the off-by-one, > even on regular files. In general, its not correct to compare size repored by mkswap and swapon like this. The size reported by mkswap includes pages which are not contiguous on disk. While, kernel considers only PAGE_SIZE-length, PAGE_SIZE-aligned contiguous run of blocks. So, size reported by mkswap and swapon can vary wildly. For e.g.: (on mtdram with ext2 fs) dd if=/dev/zero of=swap.dd bs=1M count=10 mkswap swap.dd # says size = 10236 KiB swapon swap.dd # says Adding 10112k swap ==== So, to summarize: 1. mkswap always behaves correctly: It sets number of pages in swap file minus one as 'last_page' in swap header (since this is a 0-based index). This same value (total pages - 1) is printed out as size since it knows that first page is swap header. 2. swapon() for block devices: off-by-one error causing last swap page to remain unused. 3. swapon() for regular files: 3.1 off-by-one error if every swap page in this file is usable i.e. every PAGE_SIZE-length, PAGE_SIZE-aligned chunk is contiguous on disk. 3.2 correct size value if there is at least one swap page which is unusable -- which is expected from swap file of any practical size. I will go through swap code again to find other possible off-by-one errors. The revised patch will fix these inconsistencies. Thanks, Nitin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] swap: Fix swap size in case of block devices 2009-09-01 7:11 ` Nitin Gupta @ 2009-09-01 9:23 ` Hugh Dickins 0 siblings, 0 replies; 6+ messages in thread From: Hugh Dickins @ 2009-09-01 9:23 UTC (permalink / raw) To: Nitin Gupta Cc: Andrew Morton, Rik van Riel, Karel Zak, linux-kernel, linux-mm [-- Attachment #1: Type: TEXT/PLAIN, Size: 3791 bytes --] On Tue, 1 Sep 2009, Nitin Gupta wrote: > On Tue, Sep 1, 2009 at 12:56 AM, Hugh Dickins<hugh.dickins@tiscali.co.uk> wrote: > > On Mon, 31 Aug 2009, Nitin Gupta wrote: > >> For block devices, setup_swap_extents() leaves p->pages untouched. > >> For regular files, it sets p->pages > >> == total usable swap pages (including header page) - 1; > > > > I think you're overlooking the "page < sis->max" condition > > in setup_swap_extents()'s loop. So at the end of the loop, > > if no pages were lost to fragmentation, we have > > > > sis->max = page_no; /* no change */ > > sis->pages = page_no - 1; /* no change */ > > > > Oh, I missed this loop condition. The variable naming is so bad, I > find it very hard to follow this part of code. > > Still, if there is even a single page in swap file that is not usable > (i.e. non-contiguous on disk) -- which is what usually happens for swap > files of any practical size -- setup_swap_extents() gives correct value > in sis->pages == total usable pages (including header) - 1; > > However, if all the file pages are usable, it gives off-by-one error, as > you noted. Right, I see your point now: when the regular file is fragmented thus, setup_swap_extents() would allow it to use the final page of the file, which would otherwise be (erroneously) disallowed. But I would reword your "what usually happens" to "what happens in the general case": perhaps I'm wrong, but I think that usually these days people are creating swap files on filesystems with 4kB block size, where there's no issue of intra-page fragmentation lowering that page count (but there may still be inter-page fragmentation to make swapping to the file less efficient than to a partition). > > > Yes, I'd dislike that discrepancy between regular files and block > > devices, if I could see it. Though I'd probably still be cautious > > about the disk partitions. > > > dd if=/dev/zero of=/swap bs=200k # says 204800 bytes (205kB) > > mkswap /swap # says size = 196 KiB > > swapon /swap # dmesg says Adding 192k swap > > > which is what I've come to expect from the off-by-one, > > even on regular files. > > In general, its not correct to compare size repored by mkswap and > swapon like this. The size reported by mkswap includes pages which > are not contiguous on disk. While, kernel considers only > PAGE_SIZE-length, PAGE_SIZE-aligned contiguous run of blocks. So, size > reported by mkswap and swapon can vary wildly. For e.g.: > > (on mtdram with ext2 fs) > dd if=/dev/zero of=swap.dd bs=1M count=10 > mkswap swap.dd # says size = 10236 KiB > swapon swap.dd # says Adding 10112k swap If the filesystem has block size 1kB or 2kB, yes. > > ==== > > So, to summarize: > > 1. mkswap always behaves correctly: It sets number of pages in swap file > minus one as 'last_page' in swap header (since this is a 0-based index). > This same value (total pages - 1) is printed out as size since it knows > that first page is swap header. > > 2. swapon() for block devices: off-by-one error causing last swap page > to remain unused. > > 3. swapon() for regular files: > 3.1 off-by-one error if every swap page in this file is usable i.e. > every PAGE_SIZE-length, PAGE_SIZE-aligned chunk is contiguous on > disk. > 3.2 correct size value if there is at least one swap page which is > unusable -- which is expected from swap file of any practical > size. > > > I will go through swap code again to find other possible off-by-one > errors. The revised patch will fix these inconsistencies. Thanks. Hugh ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-09-01 9:24 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-30 16:19 [PATCH] swap: Fix swap size in case of block devices Nitin Gupta 2009-08-31 11:27 ` Hugh Dickins 2009-08-31 17:21 ` Nitin Gupta 2009-08-31 19:26 ` Hugh Dickins 2009-09-01 7:11 ` Nitin Gupta 2009-09-01 9:23 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).