* [PATCH] swp_entry_t vs. swap pte.
@ 2004-03-24 18:09 Martin Schwidefsky
2004-03-24 22:04 ` David S. Miller
0 siblings, 1 reply; 3+ messages in thread
From: Martin Schwidefsky @ 2004-03-24 18:09 UTC (permalink / raw)
To: linux-arch
Hi,
I got a bug report for s390 with an oops in mk_swap_pte which has
its cause in the arch-independent swap entries vs. the pte coded
swap entries. The swp_entry_t uses 27 bits for the offset and 5
bits for the type. In sys_swapon this definition is used to find
out how many pages there can be at maximum:
---
p->lowest_bit = 1;
maxpages = swp_offset(swp_entry(0,~0UL)))) - 1;
if (maxpages > swap_header->info.last_page)
maxpages = swap_header->info.last_page;
---
maxpages always is 0x7fffff for 32 bit and 0x7ffffffffffffff for
64 bit. This is kind of suboptimal because the architecture may
be more restrictive on the number of bits in the offset.
The current situation is:
offset type max swap
bits bits size
alpha 24 5 64 GB
arm 23 7 32 GB
cris 20 7 4 GB
h8300 ?? ? ?
i386 24 5 64 GB
ia64 54 7 big
m68k 20 8 4 GB
mips-32 20 7 4 GB
mips-64 24 8 64 GB
parisc 24 5 64 GB
ppc 24 5 64 GB
ppc64 48? 6 big
s390-32 19 6 2 GB
s390-64 52 6 big
sh 22 8 16 GB
sparc-32 19 7 2 GB
sparc-64 43 8 big
v850 ?? ? ?
x86_64 40? 6 big
In my case the swap device had 2.5 GB, mkswap happily created a
swap file of that size. sys_swapon didn't object either but the
first try to create a swap entry for a page with an offset > 7ffff
crashed the machine. The same will happen on i386 with a swap
device > 64 GB.
I created a patch that should fix the problem. It uses
swp_type(pte_to_swp_entry(swp_entry_to_pte(swp_entry(~0UL,0))))
to find the highest possible swap type number and
swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0,~0UL))))
to find the highest possible swap offset. This should work with the
existing __swp_entry/__swp_type/__swp_offset definitions for all
architectures except for s390 because I've added a BUG_ON in
__swp_entry if the created swap pte is dubious. Oh, well.
By the way there is room for improvement for some architectures to
increase the maximum size of a single swap device by using some of
the bits currently used for the type. The architecture independent
swp_entry_t limits the number of swap files to 32 anyway so there
is not use for more than 5 type bits. While I was at it I did this
change for s390.
See patch against 2.6.5-rc2-mm2 @EOM.
blue skies,
Martin.
diff -urN linux-2.6/include/asm-s390/pgtable.h linux-2.6-swp/include/asm-s390/pgtable.h
--- linux-2.6/include/asm-s390/pgtable.h Wed Mar 24 18:26:50 2004
+++ linux-2.6-swp/include/asm-s390/pgtable.h Wed Mar 24 18:27:27 2004
@@ -744,11 +744,11 @@
* Bit 30 and 31 are used to distinguish the different page types. For
* a swapped page these bits need to be zero.
* This leaves the bits 1-19 and bits 24-29 to store type and offset.
- * We use the 6 bits from 24-29 for the type and the 19 bits from 1-19
- * for the offset.
- * 0| offset |0110| type |00|
- * 0 0000000001111111111 2222 222222 33
- * 0 1234567890123456789 0123 456789 01
+ * We use the 5 bits from 25-29 for the type and the 20 bits from 1-19
+ * plus 24 for the offset.
+ * 0| offset |0110|o|type |00|
+ * 0 0000000001111111111 2222 2 22222 33
+ * 0 1234567890123456789 0123 4 56789 01
*
* 64 bit swap entry format:
* A page-table entry has some bits we have to treat in a special way.
@@ -761,26 +761,22 @@
* Bit 62 and 63 are used to distinguish the different page types. For
* a swapped page these bits need to be zero.
* This leaves the bits 0-51 and bits 56-61 to store type and offset.
- * We use the 6 bits from 56-61 for the type and the 52 bits from 0-51
- * for the offset.
- * | offset |0110| type |00|
- * 0000000000111111111122222222223333333333444444444455 5555 555566 66
- * 0123456789012345678901234567890123456789012345678901 2345 678901 23
+ * We use the 5 bits from 57-61 for the type and the 53 bits from 0-51
+ * plus 56 for the offset.
+ * | offset |0110|o|type |00|
+ * 0000000000111111111122222222223333333333444444444455 5555 5 55566 66
+ * 0123456789012345678901234567890123456789012345678901 2345 6 78901 23
*/
extern inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
{
pte_t pte;
- pte_val(pte) = (type << 2) | (offset << 12) | _PAGE_INVALID_SWAP;
-#ifndef __s390x__
- BUG_ON((pte_val(pte) & 0x80000901) != 0);
-#else /* __s390x__ */
- BUG_ON((pte_val(pte) & 0x901) != 0);
-#endif /* __s390x__ */
+ pte_val(pte) = _PAGE_INVALID_SWAP | ((type & 0x1f) << 2) |
+ ((offset & 1) << 7) | ((offset & 0xffffe) << 11);
return pte;
}
-#define __swp_type(entry) (((entry).val >> 2) & 0x3f)
-#define __swp_offset(entry) ((entry).val >> 12)
+#define __swp_type(entry) (((entry).val >> 2) & 0x1f)
+#define __swp_offset(entry) (((entry).val >> 11) | (((entry).val >> 7) & 1))
#define __swp_entry(type,offset) ((swp_entry_t) { pte_val(mk_swap_pte((type),(offset))) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
diff -urN linux-2.6/mm/swapfile.c linux-2.6-swp/mm/swapfile.c
--- linux-2.6/mm/swapfile.c Wed Mar 24 18:26:50 2004
+++ linux-2.6-swp/mm/swapfile.c Wed Mar 24 18:29:37 2004
@@ -1302,7 +1302,7 @@
if (!(p->flags & SWP_USED))
break;
error = -EPERM;
- if (type >= MAX_SWAPFILES) {
+ if (type > swp_type(pte_to_swp_entry(swp_entry_to_pte(swp_entry(~0UL,0))))) {
swap_list_unlock();
goto out;
}
@@ -1424,7 +1424,7 @@
}
p->lowest_bit = 1;
- maxpages = swp_offset(swp_entry(0,~0UL)) - 1;
+ maxpages = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0,~0UL)))) - 1;
if (maxpages > swap_header->info.last_page)
maxpages = swap_header->info.last_page;
p->highest_bit = maxpages - 1;
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] swp_entry_t vs. swap pte.
2004-03-24 18:09 [PATCH] swp_entry_t vs. swap pte Martin Schwidefsky
@ 2004-03-24 22:04 ` David S. Miller
0 siblings, 0 replies; 3+ messages in thread
From: David S. Miller @ 2004-03-24 22:04 UTC (permalink / raw)
To: Martin Schwidefsky; +Cc: linux-arch
On Wed, 24 Mar 2004 19:09:39 +0100
Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
> I created a patch that should fix the problem. It uses
>
> swp_type(pte_to_swp_entry(swp_entry_to_pte(swp_entry(~0UL,0))))
>
> to find the highest possible swap type number and
>
> swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0,~0UL))))
>
> to find the highest possible swap offset. This should work with the
> existing __swp_entry/__swp_type/__swp_offset definitions for all
> architectures except for s390 because I've added a BUG_ON in
> __swp_entry if the created swap pte is dubious. Oh, well.
This looks fine to me Martin.
I'm not going to fiddle with the swp bit fields on sparc64, the
offset range is huge enough :-)
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] swp_entry_t vs. swap pte.
@ 2004-03-25 13:02 Martin Schwidefsky
0 siblings, 0 replies; 3+ messages in thread
From: Martin Schwidefsky @ 2004-03-25 13:02 UTC (permalink / raw)
To: akpm; +Cc: linux-arch
Hi Andrew,
as requested the swp_entry_t vs. swap pte patch against bitkeeper with
comments on what the heck we are doing there.
blue skies,
Martin.
ChangeLog:
[PATCH] swp_entry_t vs. swap pte.
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
This fixes a problem in sys_swapon that can cause the creation of
invalid swap ptes. This has its cause in the arch-independent swap
entries vs. the pte coded swap entries. The swp_entry_t uses 27
bits for the offset and 5 bits for the type. In sys_swapon this
definition is used to find how many swap devices and how many pages
on each device there can be. But the swap entries encoded in a pte
can be subject to additional restrictions due to the hardware
besides the 27/5 division of the bits in the swp_entry_t type.
This is solved by adding pte_to_swp_entry and swp_entry_to_pte
calls to the calculations for maximum type and offset.
In addition the s390 swap pte division for offset/type is changed
from 19/6 bits to 20/5 bits.
diff -urN linux-2.6/include/asm-s390/pgtable.h linux-2.6-swp/include/asm-s390/pgtable.h
--- linux-2.6/include/asm-s390/pgtable.h Thu Mar 11 03:55:42 2004
+++ linux-2.6-swp/include/asm-s390/pgtable.h Thu Mar 25 10:00:12 2004
@@ -719,14 +719,14 @@
* information in the lowcore.
* Bit 21 and bit 22 are the page invalid bit and the page protection
* bit. We set both to indicate a swapped page.
- * Bit 31 is used as the software page present bit. If a page is
- * swapped this obviously has to be zero.
- * This leaves the bits 1-19 and bits 24-30 to store type and offset.
- * We use the 7 bits from 24-30 for the type and the 19 bits from 1-19
- * for the offset.
- * 0| offset |0110|type |0
- * 00000000001111111111222222222233
- * 01234567890123456789012345678901
+ * Bit 30 and 31 are used to distinguish the different page types. For
+ * a swapped page these bits need to be zero.
+ * This leaves the bits 1-19 and bits 24-29 to store type and offset.
+ * We use the 5 bits from 25-29 for the type and the 20 bits from 1-19
+ * plus 24 for the offset.
+ * 0| offset |0110|o|type |00|
+ * 0 0000000001111111111 2222 2 22222 33
+ * 0 1234567890123456789 0123 4 56789 01
*
* 64 bit swap entry format:
* A page-table entry has some bits we have to treat in a special way.
@@ -736,29 +736,25 @@
* information in the lowcore.
* Bit 53 and bit 54 are the page invalid bit and the page protection
* bit. We set both to indicate a swapped page.
- * Bit 63 is used as the software page present bit. If a page is
- * swapped this obviously has to be zero.
- * This leaves the bits 0-51 and bits 56-62 to store type and offset.
- * We use the 7 bits from 56-62 for the type and the 52 bits from 0-51
- * for the offset.
- * | offset |0110|type |0
- * 0000000000111111111122222222223333333333444444444455555555556666
- * 0123456789012345678901234567890123456789012345678901234567890123
+ * Bit 62 and 63 are used to distinguish the different page types. For
+ * a swapped page these bits need to be zero.
+ * This leaves the bits 0-51 and bits 56-61 to store type and offset.
+ * We use the 5 bits from 57-61 for the type and the 53 bits from 0-51
+ * plus 56 for the offset.
+ * | offset |0110|o|type |00|
+ * 0000000000111111111122222222223333333333444444444455 5555 5 55566 66
+ * 0123456789012345678901234567890123456789012345678901 2345 6 78901 23
*/
extern inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
{
pte_t pte;
- pte_val(pte) = (type << 1) | (offset << 12) | _PAGE_INVALID_SWAP;
-#ifndef __s390x__
- BUG_ON((pte_val(pte) & 0x80000901) != 0);
-#else /* __s390x__ */
- BUG_ON((pte_val(pte) & 0x901) != 0);
-#endif /* __s390x__ */
+ pte_val(pte) = _PAGE_INVALID_SWAP | ((type & 0x1f) << 2) |
+ ((offset & 1) << 7) | ((offset & 0xffffe) << 11);
return pte;
}
-#define __swp_type(entry) (((entry).val >> 1) & 0x3f)
-#define __swp_offset(entry) ((entry).val >> 12)
+#define __swp_type(entry) (((entry).val >> 2) & 0x1f)
+#define __swp_offset(entry) (((entry).val >> 11) | (((entry).val >> 7) & 1))
#define __swp_entry(type,offset) ((swp_entry_t) { pte_val(mk_swap_pte((type),(offset))) })
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
diff -urN linux-2.6/mm/swapfile.c linux-2.6-swp/mm/swapfile.c
--- linux-2.6/mm/swapfile.c Thu Mar 25 09:58:15 2004
+++ linux-2.6-swp/mm/swapfile.c Thu Mar 25 13:00:47 2004
@@ -1242,7 +1242,19 @@
if (!(p->flags & SWP_USED))
break;
error = -EPERM;
- if (type >= MAX_SWAPFILES) {
+ /*
+ * Test if adding another swap device is possible. There are
+ * two limiting factors: 1) the number of bits for the swap
+ * type swp_entry_t definition and 2) the number of bits for
+ * the swap type in the swap ptes as defined by the different
+ * architectures. To honor both limitations a swap entry
+ * with swap offset 0 and swap type ~0UL is created, encoded
+ * to a swap pte, decoded to a swp_entry_t again and finally
+ * the swap type part is extracted. This will mask all bits
+ * from the initial ~0UL that can't be encoded in either the
+ * swp_entry_t or the architecture definition of a swap pte.
+ */
+ if (type > swp_type(pte_to_swp_entry(swp_entry_to_pte(swp_entry(~0UL,0))))) {
swap_list_unlock();
goto out;
}
@@ -1364,7 +1376,21 @@
}
p->lowest_bit = 1;
- maxpages = swp_offset(swp_entry(0,~0UL)) - 1;
+ /*
+ * Find out how many pages are allowed for a single swap
+ * device. There are two limiting factors: 1) the number of
+ * bits for the swap offset in the swp_entry_t type and
+ * 2) the number of bits in the a swap pte as defined by
+ * the different architectures. In order to find the
+ * largest possible bit mask a swap entry with swap type 0
+ * and swap offset ~0UL is created, encoded to a swap pte,
+ * decoded to a swp_entry_t again and finally the swap
+ * offset is extracted. This will mask all the bits from
+ * the initial ~0UL mask that can't be encoded in either
+ * the swp_entry_t or the architecture definition of a
+ * swap pte.
+ */
+ maxpages = swp_offset(pte_to_swp_entry(swp_entry_to_pte(swp_entry(0,~0UL)))) - 1;
if (maxpages > swap_header->info.last_page)
maxpages = swap_header->info.last_page;
p->highest_bit = maxpages - 1;
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2004-03-25 13:20 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-03-24 18:09 [PATCH] swp_entry_t vs. swap pte Martin Schwidefsky
2004-03-24 22:04 ` David S. Miller
-- strict thread matches above, loose matches on Subject: below --
2004-03-25 13:02 Martin Schwidefsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox