* clear_bit_unlock_is_negative_byte
@ 2023-07-20 19:27 Matthew Wilcox
2023-07-20 22:37 ` clear_bit_unlock_is_negative_byte Michael Schmitz
2023-07-21 6:34 ` clear_bit_unlock_is_negative_byte Andreas Schwab
0 siblings, 2 replies; 20+ messages in thread
From: Matthew Wilcox @ 2023-07-20 19:27 UTC (permalink / raw)
To: linux-m68k
I'm looking to implement clear_bit_unlock_is_negative_byte() on every
architecture so we can delete the ifdeffery around maybe-we-have-it
and remove the simple implementation from filemap.c. Here's what I've
come up with for m68k:
+static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
+ volatile unsigned long *p)
+{
+ char result;
+ char mask = 1 << nr; /* nr guaranteed to be < 7 */
+
+ __asm__ __volatile__ ("eori %1, %2; smi %0"
+ : "=d" (result)
+ : "i" (mask), "o" (*p)
+ : "memory");
+ return result;
+}
It compiles, so I feel Very Pleased With Myself, since I haven't written
m68k assmbly in 25 years. But I have questions.
First, m68k is big-endian, so I suspect I'm accessing the wrong byte.
Should something in there be adding 3 to 'p'? Better to do it in the
asm, or in the constraints so the compiler can see it?
Second, have I properly communicated to the assembler that this is
a byte-size operation, and it needs to check bit 7 and not bits 15 or 31
to set the negative flag?
Third, can this be done better? x86 has __GCC_ASM_FLAG_OUTPUTS__
so it doesn't need the equivalent of the SMI instruction to move the
condition to an output variable; it can just tell the compiler that
the N flag communicates the result that it's looking for. Does m68k
have __GCC_ASM_FLAG_OUTPUTS__ or did nobody do that work yet?
Fourth, we could do this is with ANDI instead of EORI. It's mildly
safer, but we really shouldn't have two threads clearing the lock bit
that race with each other. We can't do it with BCLR because that
doesn't set the N flag. If we do that, we'd need to invert the mask.
Appreciate your time looking at this.
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: clear_bit_unlock_is_negative_byte 2023-07-20 19:27 clear_bit_unlock_is_negative_byte Matthew Wilcox @ 2023-07-20 22:37 ` Michael Schmitz 2023-07-21 1:12 ` clear_bit_unlock_is_negative_byte Michael Schmitz 2023-07-21 6:34 ` clear_bit_unlock_is_negative_byte Andreas Schwab 1 sibling, 1 reply; 20+ messages in thread From: Michael Schmitz @ 2023-07-20 22:37 UTC (permalink / raw) To: Matthew Wilcox, linux-m68k Hi Matthew, Am 21.07.2023 um 07:27 schrieb Matthew Wilcox: > I'm looking to implement clear_bit_unlock_is_negative_byte() on every > architecture so we can delete the ifdeffery around maybe-we-have-it > and remove the simple implementation from filemap.c. Here's what I've > come up with for m68k: > > +static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr, > + volatile unsigned long *p) > +{ > + char result; > + char mask = 1 << nr; /* nr guaranteed to be < 7 */ > + > + __asm__ __volatile__ ("eori %1, %2; smi %0" > + : "=d" (result) > + : "i" (mask), "o" (*p) > + : "memory"); > + return result; > +} > > It compiles, so I feel Very Pleased With Myself, since I haven't written > m68k assmbly in 25 years. But I have questions. Unfortunately, after plugging this in, the boot hangs at the first block device encountered: calling proc_hardware_init+0x0/0x20 @ 1 initcall proc_hardware_init+0x0/0x20 returned 0 after 40 usecs calling rtc_init+0x0/0x5e @ 1 initcall rtc_init+0x0/0x5e returned 0 after 40 usecs calling atari_nvram_init+0x0/0x42 @ 1 initcall atari_nvram_init+0x0/0x42 returned 0 after 40 usecs calling nfhd_init+0x0/0x206 @ 1 nfhd8: found device with 20971440 blocks (512 bytes) Doesn't eat up memory, doesn't sit in a tight spin, just never proceeds past that point. Running old and new code side by side (well, actually sequentially) in clear_bit_unlock_is_negative_byte() and comparing results, the return values generated by both options match but the actual contents of the pointer passed to the function (after the mods) does differ: nfhd8: found device with 20971440 blocks (512 bytes) cbu_inb memval mismatch: 2004 12005 cbu_inb memval mismatch: 2004 12005 nfhd8: AHDI p1 p2 cbu_inb memval mismatch: 26 10027 cbu_inb memval mismatch: 26 10027 (first value is from the old code, second value from the new code). It looks to me that the eori operates on a 16 bit word here? (Disclaimer: I have written m68k assembly occasionally in the past years, but I struggle with inline asm...) Cheers, Michael > > First, m68k is big-endian, so I suspect I'm accessing the wrong byte. > Should something in there be adding 3 to 'p'? Better to do it in the > asm, or in the constraints so the compiler can see it? > > Second, have I properly communicated to the assembler that this is > a byte-size operation, and it needs to check bit 7 and not bits 15 or 31 > to set the negative flag? > > Third, can this be done better? x86 has __GCC_ASM_FLAG_OUTPUTS__ > so it doesn't need the equivalent of the SMI instruction to move the > condition to an output variable; it can just tell the compiler that > the N flag communicates the result that it's looking for. Does m68k > have __GCC_ASM_FLAG_OUTPUTS__ or did nobody do that work yet? > > Fourth, we could do this is with ANDI instead of EORI. It's mildly > safer, but we really shouldn't have two threads clearing the lock bit > that race with each other. We can't do it with BCLR because that > doesn't set the N flag. If we do that, we'd need to invert the mask. > > Appreciate your time looking at this. > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-20 22:37 ` clear_bit_unlock_is_negative_byte Michael Schmitz @ 2023-07-21 1:12 ` Michael Schmitz 2023-07-21 1:32 ` clear_bit_unlock_is_negative_byte Matthew Wilcox 0 siblings, 1 reply; 20+ messages in thread From: Michael Schmitz @ 2023-07-21 1:12 UTC (permalink / raw) To: Matthew Wilcox, linux-m68k Hi Matthew, Am 21.07.2023 um 10:37 schrieb Michael Schmitz: > Hi Matthew, > > Am 21.07.2023 um 07:27 schrieb Matthew Wilcox: >> I'm looking to implement clear_bit_unlock_is_negative_byte() on every >> architecture so we can delete the ifdeffery around maybe-we-have-it >> and remove the simple implementation from filemap.c. Here's what I've >> come up with for m68k: >> >> +static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr, >> + volatile unsigned long *p) >> +{ >> + char result; >> + char mask = 1 << nr; /* nr guaranteed to be < 7 */ >> + >> + __asm__ __volatile__ ("eori %1, %2; smi %0" >> + : "=d" (result) >> + : "i" (mask), "o" (*p) >> + : "memory"); >> + return result; >> +} >> >> It compiles, so I feel Very Pleased With Myself, since I haven't written >> m68k assmbly in 25 years. But I have questions. > > Unfortunately, after plugging this in, the boot hangs at the first block > device encountered: > > calling proc_hardware_init+0x0/0x20 @ 1 > initcall proc_hardware_init+0x0/0x20 returned 0 after 40 usecs > calling rtc_init+0x0/0x5e @ 1 > initcall rtc_init+0x0/0x5e returned 0 after 40 usecs > calling atari_nvram_init+0x0/0x42 @ 1 > initcall atari_nvram_init+0x0/0x42 returned 0 after 40 usecs > calling nfhd_init+0x0/0x206 @ 1 > nfhd8: found device with 20971440 blocks (512 bytes) > > Doesn't eat up memory, doesn't sit in a tight spin, just never proceeds > past that point. > > Running old and new code side by side (well, actually sequentially) in > clear_bit_unlock_is_negative_byte() and comparing results, the return > values generated by both options match but the actual contents of the > pointer passed to the function (after the mods) does differ: > > nfhd8: found device with 20971440 blocks (512 bytes) > cbu_inb memval mismatch: 2004 12005 > cbu_inb memval mismatch: 2004 12005 > nfhd8: AHDI p1 p2 > cbu_inb memval mismatch: 26 10027 > cbu_inb memval mismatch: 26 10027 > > (first value is from the old code, second value from the new code). Logging the bit nr. and value passed in: nfhd8: found device with 20971440 blocks (512 bytes) cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 2004 12005 2005 0 nfhd8: AHDI p1 p2 cbu_inb memval mismatch: 36 10037 37 0 cbu_inb memval mismatch: 36 10037 37 0 Plenty more later in the boot: calling brd_init+0x0/0xca @ 1 cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 26 10027 27 0 cbu_inb memval mismatch: 26 10027 27 0 cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 36 10037 37 0 cbu_inb memval mismatch: 36 10037 37 0 cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 36 10037 37 0 cbu_inb memval mismatch: 36 10037 37 0 cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 26 10027 27 0 cbu_inb memval mismatch: 26 10027 27 0 cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 36 10037 37 0 cbu_inb memval mismatch: 36 10037 37 0 cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 26 10027 27 0 cbu_inb memval mismatch: 26 10027 27 0 cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 26 10027 27 0 cbu_inb memval mismatch: 26 10027 27 0 cbu_inb memval mismatch: 2004 12005 2005 0 cbu_inb memval mismatch: 2004 12005 2005 0 > > It looks to me that the eori operates on a 16 bit word here? And it's bits 32-17 of the longword. The instruction you need is eori.b, and you'll have to increase the mem pointer by 3 bytes. With that change, I see no further mismatches until the return values begin to differ once disk access begins: sd 0:0:1:0: [sdb] Preferred minimum I/O size 512 bytes cbu_inb retval mismatch: 1 ff 2084 2084 2085 0 rtc-generic rtc-generic: registered as rtc0 cbu_inb retval mismatch: 1 ff 2094 2094 2095 0 ... sd 0:0:0:0: [sda] Attached SCSI disk probe of 0:0:0:0 returned 0 after 58395182 usecs cbu_inb retval mismatch: 1 ff 2094 2094 2095 0 sdb: RDSK (512) sdb1 (DOS^G)(res 2 spb 2) sdb2 (SFS^B)(res 2 spb 1) sdb3 (SFS^B)(res 2 spb 2) sdb4 ((res 2 spb 1) sdb: p4 size 18446744071971831216 extends beyond EOD, enabling native capacity cbu_inb retval mismatch: 1 ff 2084 2084 2085 0 (return value from old and new code, value of mem from old and new code, original value, bit nr). Bit 7 was already set before xor, and wasn't cleared. I suspect that's why the return value is no longer 1? Time to dig out that 68k programmer's manual... Cheers, Michael > > (Disclaimer: I have written m68k assembly occasionally in the past > years, but I struggle with inline asm...) > > Cheers, > > Michael > >> >> First, m68k is big-endian, so I suspect I'm accessing the wrong byte. >> Should something in there be adding 3 to 'p'? Better to do it in the >> asm, or in the constraints so the compiler can see it? >> >> Second, have I properly communicated to the assembler that this is >> a byte-size operation, and it needs to check bit 7 and not bits 15 or 31 >> to set the negative flag? >> >> Third, can this be done better? x86 has __GCC_ASM_FLAG_OUTPUTS__ >> so it doesn't need the equivalent of the SMI instruction to move the >> condition to an output variable; it can just tell the compiler that >> the N flag communicates the result that it's looking for. Does m68k >> have __GCC_ASM_FLAG_OUTPUTS__ or did nobody do that work yet? >> >> Fourth, we could do this is with ANDI instead of EORI. It's mildly >> safer, but we really shouldn't have two threads clearing the lock bit >> that race with each other. We can't do it with BCLR because that >> doesn't set the N flag. If we do that, we'd need to invert the mask. >> >> Appreciate your time looking at this. >> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-21 1:12 ` clear_bit_unlock_is_negative_byte Michael Schmitz @ 2023-07-21 1:32 ` Matthew Wilcox 2023-07-21 1:43 ` clear_bit_unlock_is_negative_byte Michael Schmitz 0 siblings, 1 reply; 20+ messages in thread From: Matthew Wilcox @ 2023-07-21 1:32 UTC (permalink / raw) To: Michael Schmitz; +Cc: linux-m68k On Fri, Jul 21, 2023 at 01:12:48PM +1200, Michael Schmitz wrote: > Logging the bit nr. and value passed in: > > nfhd8: found device with 20971440 blocks (512 bytes) > cbu_inb memval mismatch: 2004 12005 2005 0 > cbu_inb memval mismatch: 2004 12005 2005 0 > nfhd8: AHDI p1 p2 > cbu_inb memval mismatch: 36 10037 37 0 > cbu_inb memval mismatch: 36 10037 37 0 I'm not quite sure what values you're printing here? And whether they're hex or decimal. Could you show me the printk string? > The instruction you need is eori.b, and you'll have to increase the mem > pointer by 3 bytes. With that change, I see no further mismatches until the > return values begin to differ once disk access begins: Ah thanks. The perils of working from Motorola official docs and then trying to use the GNU assembler ... > sd 0:0:1:0: [sdb] Preferred minimum I/O size 512 bytes > cbu_inb retval mismatch: 1 ff 2084 2084 2085 0 > rtc-generic rtc-generic: registered as rtc0 > cbu_inb retval mismatch: 1 ff 2094 2094 2095 0 > ... > sd 0:0:0:0: [sda] Attached SCSI disk > probe of 0:0:0:0 returned 0 after 58395182 usecs > cbu_inb retval mismatch: 1 ff 2094 2094 2095 0 > sdb: RDSK (512) sdb1 (DOS^G)(res 2 spb 2) sdb2 (SFS^B)(res 2 spb 1) sdb3 > (SFS^B)(res 2 spb 2) sdb4 ((res 2 spb 1) > sdb: p4 size 18446744071971831216 extends beyond EOD, enabling native > capacity > cbu_inb retval mismatch: 1 ff 2084 2084 2085 0 > > (return value from old and new code, value of mem from old and new code, > original value, bit nr). OK, so the new code would set the byte to 0xff (that's how Sxx works). But it's returning a bool, so that shouldn't matter. 2084/2084/2085 would make sense; one of the two functions has cleared the bottom bit. > Bit 7 was already set before xor, and wasn't cleared. I suspect that's why > the return value is no longer 1? Ah, it's not supposed to be cleared. The way this works is that bit 0 is the lock bit; if someone's waiting on the folio, they set bit 7. If bit 7 is set when we clear bit 0, we look on the wait queue. If there's nobody on the wait queue, we clear bit 7. So ... I think you've fixed it! ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-21 1:32 ` clear_bit_unlock_is_negative_byte Matthew Wilcox @ 2023-07-21 1:43 ` Michael Schmitz 2023-07-21 17:03 ` clear_bit_unlock_is_negative_byte Matthew Wilcox 2023-07-22 6:24 ` clear_bit_unlock_is_negative_byte Andreas Schwab 0 siblings, 2 replies; 20+ messages in thread From: Michael Schmitz @ 2023-07-21 1:43 UTC (permalink / raw) To: Matthew Wilcox; +Cc: linux-m68k Hi Matthew, Am 21.07.2023 um 13:32 schrieb Matthew Wilcox: > On Fri, Jul 21, 2023 at 01:12:48PM +1200, Michael Schmitz wrote: >> Logging the bit nr. and value passed in: >> >> nfhd8: found device with 20971440 blocks (512 bytes) >> cbu_inb memval mismatch: 2004 12005 2005 0 >> cbu_inb memval mismatch: 2004 12005 2005 0 >> nfhd8: AHDI p1 p2 >> cbu_inb memval mismatch: 36 10037 37 0 >> cbu_inb memval mismatch: 36 10037 37 0 > > I'm not quite sure what values you're printing here? And whether > they're hex or decimal. Could you show me the printk string? All hex, except the last (nr). Values are mem (as modified by the original code), copy of mem (as modified by inline asm), the original value of mem, and nr. > >> The instruction you need is eori.b, and you'll have to increase the mem >> pointer by 3 bytes. With that change, I see no further mismatches until the >> return values begin to differ once disk access begins: > > Ah thanks. The perils of working from Motorola official docs and then > trying to use the GNU assembler ... > >> sd 0:0:1:0: [sdb] Preferred minimum I/O size 512 bytes >> cbu_inb retval mismatch: 1 ff 2084 2084 2085 0 >> rtc-generic rtc-generic: registered as rtc0 >> cbu_inb retval mismatch: 1 ff 2094 2094 2095 0 >> ... >> sd 0:0:0:0: [sda] Attached SCSI disk >> probe of 0:0:0:0 returned 0 after 58395182 usecs >> cbu_inb retval mismatch: 1 ff 2094 2094 2095 0 >> sdb: RDSK (512) sdb1 (DOS^G)(res 2 spb 2) sdb2 (SFS^B)(res 2 spb 1) sdb3 >> (SFS^B)(res 2 spb 2) sdb4 ((res 2 spb 1) >> sdb: p4 size 18446744071971831216 extends beyond EOD, enabling native >> capacity >> cbu_inb retval mismatch: 1 ff 2084 2084 2085 0 >> >> (return value from old and new code, value of mem from old and new code, >> original value, bit nr). > > OK, so the new code would set the byte to 0xff (that's how Sxx works). > But it's returning a bool, so that shouldn't matter. 2084/2084/2085 > would make sense; one of the two functions has cleared the bottom bit. > >> Bit 7 was already set before xor, and wasn't cleared. I suspect that's why >> the return value is no longer 1? > > Ah, it's not supposed to be cleared. The way this works is that bit 0 > is the lock bit; if someone's waiting on the folio, they set bit 7. If > bit 7 is set when we clear bit 0, we look on the wait queue. If there's > nobody on the wait queue, we clear bit 7. Right, that's what I meant to say. I'd only seen cases where bit 0 had been set and was cleared. This isn't an actual production system of sorts, just an ARAnyM instance I can fire up quickly to see patched kernels crash horribly. > > So ... I think you've fixed it! This is what I have tests running on right now: static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr, volatile unsigned long *p) { unsigned char *cp = (unsigned char *) p; char result; char mask = 1 << nr; /* nr guaranteed to be < 7 */ __asm__ __volatile__ ("eori.b %1, %2; smi %0" : "=d" (result) : "i" (mask), "o" (*(cp+3)) : "memory"); return result; } I'm sure you can do all the casting to char and increment by 3 in the asm argument... If there's a simple way to exercise this code path using standard Unix tools (or stress-ng which I ought to have somewhere), drop me a hint. Cheers, Michael ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-21 1:43 ` clear_bit_unlock_is_negative_byte Michael Schmitz @ 2023-07-21 17:03 ` Matthew Wilcox 2023-07-21 22:07 ` clear_bit_unlock_is_negative_byte Michael Schmitz 2023-07-22 6:24 ` clear_bit_unlock_is_negative_byte Andreas Schwab 1 sibling, 1 reply; 20+ messages in thread From: Matthew Wilcox @ 2023-07-21 17:03 UTC (permalink / raw) To: Michael Schmitz; +Cc: linux-m68k On Fri, Jul 21, 2023 at 01:43:06PM +1200, Michael Schmitz wrote: > > Ah, it's not supposed to be cleared. The way this works is that bit 0 > > is the lock bit; if someone's waiting on the folio, they set bit 7. If > > bit 7 is set when we clear bit 0, we look on the wait queue. If there's > > nobody on the wait queue, we clear bit 7. > > Right, that's what I meant to say. I'd only seen cases where bit 0 had been > set and was cleared. This isn't an actual production system of sorts, just > an ARAnyM instance I can fire up quickly to see patched kernels crash > horribly. Well, I appreciate the testing! > This is what I have tests running on right now: > > static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr, > volatile unsigned long *p) > { > unsigned char *cp = (unsigned char *) p; > char result; > char mask = 1 << nr; /* nr guaranteed to be < 7 */ > > __asm__ __volatile__ ("eori.b %1, %2; smi %0" > : "=d" (result) > : "i" (mask), "o" (*(cp+3)) > : "memory"); > return result; > } I thought it a little odd to use an unsigned char when we're testing to see if it's negative, so I went with this: static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr, volatile unsigned long *p) { char result; char mask = 1 << nr; /* nr guaranteed to be < 7 */ char *cp = (char *)p + 3; /* m68k is big-endian */ __asm__ __volatile__ ("eori.b %1, %2; smi %0" : "=d" (result) : "i" (mask), "o" (*cp) : "memory"); return result; } > I'm sure you can do all the casting to char and increment by 3 in the asm > argument... I'd rather not. I looked at doing the offset by three inside the asm, but it seems like gcc is smart enough to do that without help: 000006e0 <folio_unlock>: 6e0: 206f 0004 moveal %sp@(4),%a0 6e4: 0a28 0001 0003 eorib #1,%a0@(3) 6ea: 5bc0 smi %d0 6ec: 4a00 tstb %d0 6ee: 670a beqs 6fa <folio_unlock+0x1a> 6f0: 42a7 clrl %sp@- 6f2: 2f08 movel %a0,%sp@- 6f4: 4eba fcec jsr %pc@(3e2 <folio_wake_bit>) 6f8: 508f addql #8,%sp 6fa: 4e75 rts You'll note the smi/tstb pair are unnecessary. It could simply BPL to the RTS instruction, but we can't tell GCC that because we don't have the __GCC_ASM_FLAG_OUTPUTS__ feature. By the way, before this optimisation, it was this: 000006fc <folio_unlock>: 6fc: 206f 0004 moveal %sp@(4),%a0 700: 08a8 0000 0003 bclr #0,%a0@(3) 706: 2010 movel %a0@,%d0 708: 4a00 tstb %d0 70a: 6c0a bges 716 <folio_unlock+0x1a> 70c: 42a7 clrl %sp@- 70e: 2f08 movel %a0,%sp@- 710: 4eba fcd0 jsr %pc@(3e2 <folio_wake_bit>) 714: 508f addql #8,%sp 716: 4e75 rts which is the same number of instructions, but one more memory reference. It's a read-after-write hazard, but I don't know if that affects any m68k implementation; my impression is that even on an '060 there aren't any real performance implications. Kudos to gcc for figuring out that testing bit 7 can be done with the tstb instruction. > If there's a simple way to exercise this code path using standard Unix tools > (or stress-ng which I ought to have somewhere), drop me a hint. Oh, it's so common to have a waiter on a folio unlock that just making it to the login prompt is enough to declare comfidently that this works. CPU implementations with memory barriers and such fanciness are a little harder to be confident in, but this looks good to me. I generally run xfstests, but that's just because I have it all set up and ready to go. I'll drop your Tested-by on this if that's OK? If you want a Co-developed-by credit, that's fine with me too! ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-21 17:03 ` clear_bit_unlock_is_negative_byte Matthew Wilcox @ 2023-07-21 22:07 ` Michael Schmitz 0 siblings, 0 replies; 20+ messages in thread From: Michael Schmitz @ 2023-07-21 22:07 UTC (permalink / raw) To: Matthew Wilcox; +Cc: linux-m68k Hi Matthew, Am 22.07.2023 um 05:03 schrieb Matthew Wilcox: > I thought it a little odd to use an unsigned char when we're testing > to see if it's negative, so I went with this: > > static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr, > volatile unsigned long *p) > { > char result; > char mask = 1 << nr; /* nr guaranteed to be < 7 */ > char *cp = (char *)p + 3; /* m68k is big-endian */ > > __asm__ __volatile__ ("eori.b %1, %2; smi %0" > : "=d" (result) > : "i" (mask), "o" (*cp) > : "memory"); > return result; > } I don't think signedness make a difference here, as we don't leave anything about the test to the compiler. >> I'm sure you can do all the casting to char and increment by 3 in the asm >> argument... > > I'd rather not. I looked at doing the offset by three inside the asm, > but it seems like gcc is smart enough to do that without help: > > 000006e0 <folio_unlock>: > 6e0: 206f 0004 moveal %sp@(4),%a0 > 6e4: 0a28 0001 0003 eorib #1,%a0@(3) > 6ea: 5bc0 smi %d0 > 6ec: 4a00 tstb %d0 > 6ee: 670a beqs 6fa <folio_unlock+0x1a> > 6f0: 42a7 clrl %sp@- > 6f2: 2f08 movel %a0,%sp@- > 6f4: 4eba fcec jsr %pc@(3e2 <folio_wake_bit>) > 6f8: 508f addql #8,%sp > 6fa: 4e75 rts > > You'll note the smi/tstb pair are unnecessary. It could simply BPL to > the RTS instruction, but we can't tell GCC that because we don't have > the __GCC_ASM_FLAG_OUTPUTS__ feature. I see. __GCC_ASM_FLAG_OUTPUTS__ appears to be rare enough to assume m68k won't get to use it this decade. > > By the way, before this optimisation, it was this: > > 000006fc <folio_unlock>: > 6fc: 206f 0004 moveal %sp@(4),%a0 > 700: 08a8 0000 0003 bclr #0,%a0@(3) > 706: 2010 movel %a0@,%d0 > 708: 4a00 tstb %d0 > 70a: 6c0a bges 716 <folio_unlock+0x1a> > 70c: 42a7 clrl %sp@- > 70e: 2f08 movel %a0,%sp@- > 710: 4eba fcd0 jsr %pc@(3e2 <folio_wake_bit>) > 714: 508f addql #8,%sp > 716: 4e75 rts > > which is the same number of instructions, but one more memory reference. Seeing as a0 was just loaded from memory above, and should be in the cache, I doubt there's much performance difference from the additional memory access here. Haven't looked at how many clock cycles each of these instructions take. > It's a read-after-write hazard, but I don't know if that affects any > m68k implementation; my impression is that even on an '060 there aren't Unless something modifies the memory location pointed to by a0 (during preemption of any kind), I can't see a race here (a0 is saved/restored during interrupts, syscalls and such). We haven't seen any races here in the past with this code AFAICS. > any real performance implications. Kudos to gcc for figuring out that > testing bit 7 can be done with the tstb instruction. > >> If there's a simple way to exercise this code path using standard Unix tools >> (or stress-ng which I ought to have somewhere), drop me a hint. > > Oh, it's so common to have a waiter on a folio unlock that just making > it to the login prompt is enough to declare comfidently that this works. > CPU implementations with memory barriers and such fanciness are a little > harder to be confident in, but this looks good to me. I generally run > xfstests, but that's just because I have it all set up and ready to go. Should be all good then. > I'll drop your Tested-by on this if that's OK? If you want a > Co-developed-by credit, that's fine with me too! Tested-by is fine, thanks. I wouldn't want anyone to assume I'm familiar with folios and the like... Cheers, Michael ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-21 1:43 ` clear_bit_unlock_is_negative_byte Michael Schmitz 2023-07-21 17:03 ` clear_bit_unlock_is_negative_byte Matthew Wilcox @ 2023-07-22 6:24 ` Andreas Schwab 2023-07-22 14:45 ` clear_bit_unlock_is_negative_byte Matthew Wilcox 1 sibling, 1 reply; 20+ messages in thread From: Andreas Schwab @ 2023-07-22 6:24 UTC (permalink / raw) To: Michael Schmitz; +Cc: Matthew Wilcox, linux-m68k On Jul 21 2023, Michael Schmitz wrote: > static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr, > volatile unsigned long *p) > { > unsigned char *cp = (unsigned char *) p; > char result; > char mask = 1 << nr; /* nr guaranteed to be < 7 */ > > __asm__ __volatile__ ("eori.b %1, %2; smi %0" > : "=d" (result) > : "i" (mask), "o" (*(cp+3)) That should use "id" as constraint, so that the compiler can share the constant with other insns. Also, the third operand is modified, so it needs to be marked as in/out. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-22 6:24 ` clear_bit_unlock_is_negative_byte Andreas Schwab @ 2023-07-22 14:45 ` Matthew Wilcox 2023-07-22 15:26 ` clear_bit_unlock_is_negative_byte Andreas Schwab 0 siblings, 1 reply; 20+ messages in thread From: Matthew Wilcox @ 2023-07-22 14:45 UTC (permalink / raw) To: Andreas Schwab; +Cc: Michael Schmitz, linux-m68k On Sat, Jul 22, 2023 at 08:24:34AM +0200, Andreas Schwab wrote: > On Jul 21 2023, Michael Schmitz wrote: > > > static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr, > > volatile unsigned long *p) > > { > > unsigned char *cp = (unsigned char *) p; > > char result; > > char mask = 1 << nr; /* nr guaranteed to be < 7 */ > > > > __asm__ __volatile__ ("eori.b %1, %2; smi %0" > > : "=d" (result) > > : "i" (mask), "o" (*(cp+3)) > > That should use "id" as constraint, so that the compiler can share the > constant with other insns. Also, the third operand is modified, so it > needs to be marked as in/out. The "o" constraint is shared with bfset_mem_set_bit, bfclr_mem_test_and_clear_bit and a few other functions. Should they all be changed? Some use "+m". ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-22 14:45 ` clear_bit_unlock_is_negative_byte Matthew Wilcox @ 2023-07-22 15:26 ` Andreas Schwab 2023-07-22 15:38 ` clear_bit_unlock_is_negative_byte Matthew Wilcox 0 siblings, 1 reply; 20+ messages in thread From: Andreas Schwab @ 2023-07-22 15:26 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Michael Schmitz, linux-m68k On Jul 22 2023, Matthew Wilcox wrote: > On Sat, Jul 22, 2023 at 08:24:34AM +0200, Andreas Schwab wrote: >> On Jul 21 2023, Michael Schmitz wrote: >> >> > static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr, >> > volatile unsigned long *p) >> > { >> > unsigned char *cp = (unsigned char *) p; >> > char result; >> > char mask = 1 << nr; /* nr guaranteed to be < 7 */ >> > >> > __asm__ __volatile__ ("eori.b %1, %2; smi %0" >> > : "=d" (result) >> > : "i" (mask), "o" (*(cp+3)) >> >> That should use "id" as constraint, so that the compiler can share the >> constant with other insns. Also, the third operand is modified, so it >> needs to be marked as in/out. > > The "o" constraint is shared with bfset_mem_set_bit, > bfclr_mem_test_and_clear_bit and a few other functions. Should they > all be changed? They use a memory clobber, because the "o" constraint cannot accurately describe the situation there. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-22 15:26 ` clear_bit_unlock_is_negative_byte Andreas Schwab @ 2023-07-22 15:38 ` Matthew Wilcox 0 siblings, 0 replies; 20+ messages in thread From: Matthew Wilcox @ 2023-07-22 15:38 UTC (permalink / raw) To: Andreas Schwab; +Cc: Michael Schmitz, linux-m68k On Sat, Jul 22, 2023 at 05:26:07PM +0200, Andreas Schwab wrote: > On Jul 22 2023, Matthew Wilcox wrote: > > > On Sat, Jul 22, 2023 at 08:24:34AM +0200, Andreas Schwab wrote: > >> On Jul 21 2023, Michael Schmitz wrote: > >> > >> > static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr, > >> > volatile unsigned long *p) > >> > { > >> > unsigned char *cp = (unsigned char *) p; > >> > char result; > >> > char mask = 1 << nr; /* nr guaranteed to be < 7 */ > >> > > >> > __asm__ __volatile__ ("eori.b %1, %2; smi %0" > >> > : "=d" (result) > >> > : "i" (mask), "o" (*(cp+3)) > >> > >> That should use "id" as constraint, so that the compiler can share the > >> constant with other insns. Also, the third operand is modified, so it > >> needs to be marked as in/out. > > > > The "o" constraint is shared with bfset_mem_set_bit, > > bfclr_mem_test_and_clear_bit and a few other functions. Should they > > all be changed? > > They use a memory clobber, because the "o" constraint cannot accurately > describe the situation there. My understanding is that clear_bit_unlock_is_negative_byte() also needs the "memory" clobber because it has "release" semantics (ie all previous stores must be visible before this store is visible). static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr, volatile unsigned long *p) { char result; char mask = 1 << nr; /* nr guaranteed to be < 7 */ char *cp = (char *)p + 3; /* m68k is big-endian */ __asm__ __volatile__ ("eori.b %1, %2; smi %0" : "=d" (result) : "di" (mask), "o" (*cp) : "memory"); return result; } is what I currently have. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-20 19:27 clear_bit_unlock_is_negative_byte Matthew Wilcox 2023-07-20 22:37 ` clear_bit_unlock_is_negative_byte Michael Schmitz @ 2023-07-21 6:34 ` Andreas Schwab 2023-07-21 8:57 ` clear_bit_unlock_is_negative_byte Brad Boyer 2023-07-21 11:59 ` clear_bit_unlock_is_negative_byte Matthew Wilcox 1 sibling, 2 replies; 20+ messages in thread From: Andreas Schwab @ 2023-07-21 6:34 UTC (permalink / raw) To: Matthew Wilcox; +Cc: linux-m68k On Jul 20 2023, Matthew Wilcox wrote: > +static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr, > + volatile unsigned long *p) > +{ > + char result; > + char mask = 1 << nr; /* nr guaranteed to be < 7 */ > + > + __asm__ __volatile__ ("eori %1, %2; smi %0" Why are you using XOR if you want to clear a bit? If it operates on a byte, why does it receive a pointer to long? -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-21 6:34 ` clear_bit_unlock_is_negative_byte Andreas Schwab @ 2023-07-21 8:57 ` Brad Boyer 2023-07-21 9:18 ` clear_bit_unlock_is_negative_byte Andreas Schwab 2023-07-21 11:59 ` clear_bit_unlock_is_negative_byte Matthew Wilcox 1 sibling, 1 reply; 20+ messages in thread From: Brad Boyer @ 2023-07-21 8:57 UTC (permalink / raw) To: Andreas Schwab; +Cc: Matthew Wilcox, linux-m68k On Fri, Jul 21, 2023 at 08:34:55AM +0200, Andreas Schwab wrote: > On Jul 20 2023, Matthew Wilcox wrote: > > > +static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr, > > + volatile unsigned long *p) > > +{ > > + char result; > > + char mask = 1 << nr; /* nr guaranteed to be < 7 */ > > + > > + __asm__ __volatile__ ("eori %1, %2; smi %0" > > Why are you using XOR if you want to clear a bit? If it operates on a > byte, why does it receive a pointer to long? I presume the thing with byte is to try to save an extension word. Using a long immediate argument makes the instruction take 6 bytes instead of 4. Since we (apparently?) know we don't want to modify the other three bytes, we can safely take shortcuts like that. However, I would agree that using XOR here seems wrong. It fundamentally presumes the bit was previously set. The other architectures I checked with asm versions of this look like they do the equivalent of andi with the inverted mask. We are also presuming that nr is always a compile time constant, but it looks like that should be true? If it can be immediate, the inversion would be done by the compiler. The powerpc version doesn't appear to make either assumption. It looks like it's loading and storing a "word" (32-bit in that case) to do this. However, the instructions it's using don't even have byte equivalents. Brad Boyer flar@allandria.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-21 8:57 ` clear_bit_unlock_is_negative_byte Brad Boyer @ 2023-07-21 9:18 ` Andreas Schwab 0 siblings, 0 replies; 20+ messages in thread From: Andreas Schwab @ 2023-07-21 9:18 UTC (permalink / raw) To: Brad Boyer; +Cc: Matthew Wilcox, linux-m68k On Jul 21 2023, Brad Boyer wrote: > On Fri, Jul 21, 2023 at 08:34:55AM +0200, Andreas Schwab wrote: >> On Jul 20 2023, Matthew Wilcox wrote: >> >> > +static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr, >> > + volatile unsigned long *p) >> > +{ >> > + char result; >> > + char mask = 1 << nr; /* nr guaranteed to be < 7 */ >> > + >> > + __asm__ __volatile__ ("eori %1, %2; smi %0" >> >> Why are you using XOR if you want to clear a bit? If it operates on a >> byte, why does it receive a pointer to long? > > I presume the thing with byte is to try to save an extension word. Why is this hardcoded in the function name? -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-21 6:34 ` clear_bit_unlock_is_negative_byte Andreas Schwab 2023-07-21 8:57 ` clear_bit_unlock_is_negative_byte Brad Boyer @ 2023-07-21 11:59 ` Matthew Wilcox 2023-07-21 12:52 ` clear_bit_unlock_is_negative_byte Andreas Schwab 2023-07-21 20:29 ` clear_bit_unlock_is_negative_byte Brad Boyer 1 sibling, 2 replies; 20+ messages in thread From: Matthew Wilcox @ 2023-07-21 11:59 UTC (permalink / raw) To: Andreas Schwab; +Cc: linux-m68k On Fri, Jul 21, 2023 at 08:34:55AM +0200, Andreas Schwab wrote: > On Jul 20 2023, Matthew Wilcox wrote: > > > +static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr, > > + volatile unsigned long *p) > > +{ > > + char result; > > + char mask = 1 << nr; /* nr guaranteed to be < 7 */ > > + > > + __asm__ __volatile__ ("eori %1, %2; smi %0" > > Why are you using XOR if you want to clear a bit? If it operates on a > byte, why does it receive a pointer to long? It's a clever hack. This function has exactly one user, and it's an important one -- folio_unlock(). Bit 7 is set if there are other threads waiting for this folio to be unlocked. There are two reasonable implementations, depending what kind of CPU you have; you can either load-locked; clear the bottom bit, store-conditional, test bit 7. Or x86 and m68k have the perfect instruction to clear a bit and set the Negative flag if bit 7 is set. As I said in the earlier email, BCLR doesn't affect the N flag, but EORI and ANDI do. We are guaranteed that the bit we're clearing is set, so EORI will work. ANDI would also work. Do you happen to know if __GCC_ASM_FLAG_OUTPUTS__ is implemented for m68k so we can save the SMI instruction? ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-21 11:59 ` clear_bit_unlock_is_negative_byte Matthew Wilcox @ 2023-07-21 12:52 ` Andreas Schwab 2023-07-21 20:29 ` clear_bit_unlock_is_negative_byte Brad Boyer 1 sibling, 0 replies; 20+ messages in thread From: Andreas Schwab @ 2023-07-21 12:52 UTC (permalink / raw) To: Matthew Wilcox; +Cc: linux-m68k On Jul 21 2023, Matthew Wilcox wrote: > Do you happen to know if __GCC_ASM_FLAG_OUTPUTS__ is implemented for > m68k so we can save the SMI instruction? $ git grep __GCC_ASM_FLAG_OUTPUTS__ -- gcc/config gcc/config/aarch64/aarch64-c.cc: builtin_define ("__GCC_ASM_FLAG_OUTPUTS__"); gcc/config/arm/arm-c.cc: def_or_undef_macro (pfile, "__GCC_ASM_FLAG_OUTPUTS__", !TARGET_THUMB1); gcc/config/i386/i386-c.cc: cpp_define (parse_in, "__GCC_ASM_FLAG_OUTPUTS__"); -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-21 11:59 ` clear_bit_unlock_is_negative_byte Matthew Wilcox 2023-07-21 12:52 ` clear_bit_unlock_is_negative_byte Andreas Schwab @ 2023-07-21 20:29 ` Brad Boyer 2023-07-22 3:42 ` clear_bit_unlock_is_negative_byte Matthew Wilcox 1 sibling, 1 reply; 20+ messages in thread From: Brad Boyer @ 2023-07-21 20:29 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Andreas Schwab, linux-m68k On Fri, Jul 21, 2023 at 12:59:48PM +0100, Matthew Wilcox wrote: > On Fri, Jul 21, 2023 at 08:34:55AM +0200, Andreas Schwab wrote: > > Why are you using XOR if you want to clear a bit? If it operates on a > > byte, why does it receive a pointer to long? > > It's a clever hack. This function has exactly one user, and it's an > important one -- folio_unlock(). Bit 7 is set if there are other > threads waiting for this folio to be unlocked. There are two reasonable > implementations, depending what kind of CPU you have; you can either > load-locked; clear the bottom bit, store-conditional, test bit 7. Or > x86 and m68k have the perfect instruction to clear a bit and set the > Negative flag if bit 7 is set. > > As I said in the earlier email, BCLR doesn't affect the N flag, but > EORI and ANDI do. We are guaranteed that the bit we're clearing is set, > so EORI will work. ANDI would also work. There shouldn't be any performance difference between using eori.b and andi.b, since the compiler would fully generate the immediate mask value either way. Wouldn't readability suggest that we should use the AND operation to make it more obvious what the code is doing and make it more consistent with the other implementations of this function? Brad Boyer flar@allandria.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-21 20:29 ` clear_bit_unlock_is_negative_byte Brad Boyer @ 2023-07-22 3:42 ` Matthew Wilcox 2023-07-22 23:49 ` clear_bit_unlock_is_negative_byte Brad Boyer 0 siblings, 1 reply; 20+ messages in thread From: Matthew Wilcox @ 2023-07-22 3:42 UTC (permalink / raw) To: Brad Boyer; +Cc: Andreas Schwab, linux-m68k On Fri, Jul 21, 2023 at 01:29:04PM -0700, Brad Boyer wrote: > On Fri, Jul 21, 2023 at 12:59:48PM +0100, Matthew Wilcox wrote: > > On Fri, Jul 21, 2023 at 08:34:55AM +0200, Andreas Schwab wrote: > > > Why are you using XOR if you want to clear a bit? If it operates on a > > > byte, why does it receive a pointer to long? > > > > It's a clever hack. This function has exactly one user, and it's an > > important one -- folio_unlock(). Bit 7 is set if there are other > > threads waiting for this folio to be unlocked. There are two reasonable > > implementations, depending what kind of CPU you have; you can either > > load-locked; clear the bottom bit, store-conditional, test bit 7. Or > > x86 and m68k have the perfect instruction to clear a bit and set the > > Negative flag if bit 7 is set. > > > > As I said in the earlier email, BCLR doesn't affect the N flag, but > > EORI and ANDI do. We are guaranteed that the bit we're clearing is set, > > so EORI will work. ANDI would also work. > > There shouldn't be any performance difference between using eori.b and > andi.b, since the compiler would fully generate the immediate mask > value either way. Wouldn't readability suggest that we should use the > AND operation to make it more obvious what the code is doing and make > it more consistent with the other implementations of this function? That brings me to the next phase of this which is not yet fully baked, so I wasn't proposing it for discussion at this time. https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/folio-flags What I want to do is allow us to set/clear multiple bits in the same instruction that unlocks the folio. The obvious one is folio_mark_uptodate(); this is commonly paired with folio_unlock(), and it's a crucial part of page cache reads. There are some less obvious ones like folio_start_writeback() and folio_unlock(), which isn't included in this patch series. If we're going to set one bit and clear another bit, we have to use the xor/eor instruction, and that's what we do. On some architectures, such as MIPS, there's actually a separate function to implement all of this and so passing it a (constant) mask (instead of calculating that mask from what is now a variable bit) makes sense, and we can then use that function to implement both clear_bit_unlock_is_negative_byte() and change_and_unlock_is_negative_byte(). I'm still going around on this. I might change the API to always pass in a mask from folio_set_unlock(). And maybe we actualy get rid of clear_bit_unlock_is_negative_byte() since it's essentially a subset of change_and_unlock_is_negative_byte(). But I can change m68k to use andi.b for now if you feel strongly. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-22 3:42 ` clear_bit_unlock_is_negative_byte Matthew Wilcox @ 2023-07-22 23:49 ` Brad Boyer 2023-07-23 1:08 ` clear_bit_unlock_is_negative_byte Michael Schmitz 0 siblings, 1 reply; 20+ messages in thread From: Brad Boyer @ 2023-07-22 23:49 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Andreas Schwab, linux-m68k On Sat, Jul 22, 2023 at 04:42:33AM +0100, Matthew Wilcox wrote: > What I want to do is allow us to set/clear multiple bits in the same > instruction that unlocks the folio. The obvious one is > folio_mark_uptodate(); this is commonly paired with folio_unlock(), > and it's a crucial part of page cache reads. There are some less > obvious ones like folio_start_writeback() and folio_unlock(), > which isn't included in this patch series. If we're going to set > one bit and clear another bit, we have to use the xor/eor instruction, > and that's what we do. > > On some architectures, such as MIPS, there's actually a > separate function to implement all of this and so passing it a > (constant) mask (instead of calculating that mask from what > is now a variable bit) makes sense, and we can then use that > function to implement both clear_bit_unlock_is_negative_byte() and > change_and_unlock_is_negative_byte(). I'm still going around on this. > I might change the API to always pass in a mask from folio_set_unlock(). > > And maybe we actualy get rid of clear_bit_unlock_is_negative_byte() > since it's essentially a subset of change_and_unlock_is_negative_byte(). > But I can change m68k to use andi.b for now if you feel strongly. I do prefer to be consistent. We already have strange inconsistencies for bad reasons or no reason at all. However, if it's temporary, I suppose it doesn't matter that much. With the new code in the diff in your git tree, the use of eori is now consistent if you get rid of the old clear bit version as separate code. If there's a decent chance it will be a while before that next stage, I would personally recommend using the andi. However, I would defer to the others on the list if people think it's fine. Brad Boyer flar@allandria.com ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: clear_bit_unlock_is_negative_byte 2023-07-22 23:49 ` clear_bit_unlock_is_negative_byte Brad Boyer @ 2023-07-23 1:08 ` Michael Schmitz 0 siblings, 0 replies; 20+ messages in thread From: Michael Schmitz @ 2023-07-23 1:08 UTC (permalink / raw) To: Brad Boyer, Matthew Wilcox; +Cc: Andreas Schwab, linux-m68k Hi Brad, Am 23.07.2023 um 11:49 schrieb Brad Boyer: > On Sat, Jul 22, 2023 at 04:42:33AM +0100, Matthew Wilcox wrote: >> What I want to do is allow us to set/clear multiple bits in the same >> instruction that unlocks the folio. The obvious one is >> folio_mark_uptodate(); this is commonly paired with folio_unlock(), >> and it's a crucial part of page cache reads. There are some less >> obvious ones like folio_start_writeback() and folio_unlock(), >> which isn't included in this patch series. If we're going to set >> one bit and clear another bit, we have to use the xor/eor instruction, >> and that's what we do. >> >> On some architectures, such as MIPS, there's actually a >> separate function to implement all of this and so passing it a >> (constant) mask (instead of calculating that mask from what >> is now a variable bit) makes sense, and we can then use that >> function to implement both clear_bit_unlock_is_negative_byte() and >> change_and_unlock_is_negative_byte(). I'm still going around on this. >> I might change the API to always pass in a mask from folio_set_unlock(). >> >> And maybe we actualy get rid of clear_bit_unlock_is_negative_byte() >> since it's essentially a subset of change_and_unlock_is_negative_byte(). >> But I can change m68k to use andi.b for now if you feel strongly. > > I do prefer to be consistent. We already have strange inconsistencies > for bad reasons or no reason at all. However, if it's temporary, I > suppose it doesn't matter that much. With the new code in the diff in > your git tree, the use of eori is now consistent if you get rid of the > old clear bit version as separate code. If there's a decent chance it > will be a while before that next stage, I would personally recommend > using the andi. However, I would defer to the others on the list if > people think it's fine. First off - I know nothing about the code calling this function, so my argument may be mooted by design, or proof showing use of xor is safe ... The only thing that has me worried about the xor scheme is Matthew's assertion that the bit meant to be cleared is known to be set before clear_bit_unlock_is_negative_byte() is called. Should that be incorrect in corner cases, I'd expect things to fall apart with the proposed patch. Personally, I'd prefer we catch such cases sooner rather than later, should they exist. But others may take the view, with rather more justification, that the old code using andi is safer and ought to be used until other changes force use of xor. Just my $.02 worth... Cheers, Michael > > Brad Boyer > flar@allandria.com > ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2023-07-23 1:08 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-20 19:27 clear_bit_unlock_is_negative_byte Matthew Wilcox 2023-07-20 22:37 ` clear_bit_unlock_is_negative_byte Michael Schmitz 2023-07-21 1:12 ` clear_bit_unlock_is_negative_byte Michael Schmitz 2023-07-21 1:32 ` clear_bit_unlock_is_negative_byte Matthew Wilcox 2023-07-21 1:43 ` clear_bit_unlock_is_negative_byte Michael Schmitz 2023-07-21 17:03 ` clear_bit_unlock_is_negative_byte Matthew Wilcox 2023-07-21 22:07 ` clear_bit_unlock_is_negative_byte Michael Schmitz 2023-07-22 6:24 ` clear_bit_unlock_is_negative_byte Andreas Schwab 2023-07-22 14:45 ` clear_bit_unlock_is_negative_byte Matthew Wilcox 2023-07-22 15:26 ` clear_bit_unlock_is_negative_byte Andreas Schwab 2023-07-22 15:38 ` clear_bit_unlock_is_negative_byte Matthew Wilcox 2023-07-21 6:34 ` clear_bit_unlock_is_negative_byte Andreas Schwab 2023-07-21 8:57 ` clear_bit_unlock_is_negative_byte Brad Boyer 2023-07-21 9:18 ` clear_bit_unlock_is_negative_byte Andreas Schwab 2023-07-21 11:59 ` clear_bit_unlock_is_negative_byte Matthew Wilcox 2023-07-21 12:52 ` clear_bit_unlock_is_negative_byte Andreas Schwab 2023-07-21 20:29 ` clear_bit_unlock_is_negative_byte Brad Boyer 2023-07-22 3:42 ` clear_bit_unlock_is_negative_byte Matthew Wilcox 2023-07-22 23:49 ` clear_bit_unlock_is_negative_byte Brad Boyer 2023-07-23 1:08 ` clear_bit_unlock_is_negative_byte Michael Schmitz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox