clear_bit_unlock_is_negative

public inbox for linux-m68k@lists.linux-m68k.org
 help / color / mirror / Atom feed

* clear_bit_unlock_is_negative_byte
@ 2023-07-20 19:27 Matthew Wilcox
  2023-07-20 22:37 ` clear_bit_unlock_is_negative_byte Michael Schmitz
  2023-07-21  6:34 ` clear_bit_unlock_is_negative_byte Andreas Schwab
  0 siblings, 2 replies; 20+ messages in thread
From: Matthew Wilcox @ 2023-07-20 19:27 UTC (permalink / raw)
  To: linux-m68k

I'm looking to implement clear_bit_unlock_is_negative_byte() on every
architecture so we can delete the ifdeffery around maybe-we-have-it
and remove the simple implementation from filemap.c.  Here's what I've
come up with for m68k:

+static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
+               volatile unsigned long *p)
+{
+       char result;
+       char mask = 1 << nr;    /* nr guaranteed to be < 7 */
+
+       __asm__ __volatile__ ("eori %1, %2; smi %0"
+               : "=d" (result)
+               : "i" (mask), "o" (*p)
+               : "memory");
+       return result;
+}

It compiles, so I feel Very Pleased With Myself, since I haven't written
m68k assmbly in 25 years.  But I have questions.

First, m68k is big-endian, so I suspect I'm accessing the wrong byte.
Should something in there be adding 3 to 'p'?  Better to do it in the
asm, or in the constraints so the compiler can see it?

Second, have I properly communicated to the assembler that this is
a byte-size operation, and it needs to check bit 7 and not bits 15 or 31
to set the negative flag?

Third, can this be done better?  x86 has __GCC_ASM_FLAG_OUTPUTS__
so it doesn't need the equivalent of the SMI instruction to move the
condition to an output variable; it can just tell the compiler that
the N flag communicates the result that it's looking for.  Does m68k
have __GCC_ASM_FLAG_OUTPUTS__ or did nobody do that work yet?

Fourth, we could do this is with ANDI instead of EORI.  It's mildly
safer, but we really shouldn't have two threads clearing the lock bit
that race with each other.  We can't do it with BCLR because that
doesn't set the N flag.  If we do that, we'd need to invert the mask.

Appreciate your time looking at this.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-20 19:27 clear_bit_unlock_is_negative_byte Matthew Wilcox
@ 2023-07-20 22:37 ` Michael Schmitz
  2023-07-21  1:12   ` clear_bit_unlock_is_negative_byte Michael Schmitz
  2023-07-21  6:34 ` clear_bit_unlock_is_negative_byte Andreas Schwab
  1 sibling, 1 reply; 20+ messages in thread
From: Michael Schmitz @ 2023-07-20 22:37 UTC (permalink / raw)
  To: Matthew Wilcox, linux-m68k

Hi Matthew,

Am 21.07.2023 um 07:27 schrieb Matthew Wilcox:
> I'm looking to implement clear_bit_unlock_is_negative_byte() on every
> architecture so we can delete the ifdeffery around maybe-we-have-it
> and remove the simple implementation from filemap.c.  Here's what I've
> come up with for m68k:
>
> +static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
> +               volatile unsigned long *p)
> +{
> +       char result;
> +       char mask = 1 << nr;    /* nr guaranteed to be < 7 */
> +
> +       __asm__ __volatile__ ("eori %1, %2; smi %0"
> +               : "=d" (result)
> +               : "i" (mask), "o" (*p)
> +               : "memory");
> +       return result;
> +}
>
> It compiles, so I feel Very Pleased With Myself, since I haven't written
> m68k assmbly in 25 years.  But I have questions.

Unfortunately, after plugging this in, the boot hangs at the first block 
device encountered:

calling  proc_hardware_init+0x0/0x20 @ 1
initcall proc_hardware_init+0x0/0x20 returned 0 after 40 usecs
calling  rtc_init+0x0/0x5e @ 1
initcall rtc_init+0x0/0x5e returned 0 after 40 usecs
calling  atari_nvram_init+0x0/0x42 @ 1
initcall atari_nvram_init+0x0/0x42 returned 0 after 40 usecs
calling  nfhd_init+0x0/0x206 @ 1
nfhd8: found device with 20971440 blocks (512 bytes)

Doesn't eat up memory, doesn't sit in a tight spin, just never proceeds 
past that point.

Running old and new code side by side (well, actually sequentially) in 
clear_bit_unlock_is_negative_byte()  and comparing results, the return 
values generated by both options match but the actual contents of the 
pointer passed to the function (after the mods) does differ:

nfhd8: found device with 20971440 blocks (512 bytes)
cbu_inb memval mismatch: 2004 12005
cbu_inb memval mismatch: 2004 12005
  nfhd8: AHDI p1 p2
cbu_inb memval mismatch: 26 10027
cbu_inb memval mismatch: 26 10027

(first value is from the old code, second value from the new code).

It looks to me that the eori operates on a 16 bit word here?

(Disclaimer: I have written m68k assembly occasionally in the past 
years, but I struggle with inline asm...)

Cheers,

	Michael

>
> First, m68k is big-endian, so I suspect I'm accessing the wrong byte.
> Should something in there be adding 3 to 'p'?  Better to do it in the
> asm, or in the constraints so the compiler can see it?
>
> Second, have I properly communicated to the assembler that this is
> a byte-size operation, and it needs to check bit 7 and not bits 15 or 31
> to set the negative flag?
>
> Third, can this be done better?  x86 has __GCC_ASM_FLAG_OUTPUTS__
> so it doesn't need the equivalent of the SMI instruction to move the
> condition to an output variable; it can just tell the compiler that
> the N flag communicates the result that it's looking for.  Does m68k
> have __GCC_ASM_FLAG_OUTPUTS__ or did nobody do that work yet?
>
> Fourth, we could do this is with ANDI instead of EORI.  It's mildly
> safer, but we really shouldn't have two threads clearing the lock bit
> that race with each other.  We can't do it with BCLR because that
> doesn't set the N flag.  If we do that, we'd need to invert the mask.
>
> Appreciate your time looking at this.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-20 22:37 ` clear_bit_unlock_is_negative_byte Michael Schmitz
@ 2023-07-21  1:12   ` Michael Schmitz
  2023-07-21  1:32     ` clear_bit_unlock_is_negative_byte Matthew Wilcox
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Schmitz @ 2023-07-21  1:12 UTC (permalink / raw)
  To: Matthew Wilcox, linux-m68k

Hi Matthew,

Am 21.07.2023 um 10:37 schrieb Michael Schmitz:
> Hi Matthew,
>
> Am 21.07.2023 um 07:27 schrieb Matthew Wilcox:
>> I'm looking to implement clear_bit_unlock_is_negative_byte() on every
>> architecture so we can delete the ifdeffery around maybe-we-have-it
>> and remove the simple implementation from filemap.c.  Here's what I've
>> come up with for m68k:
>>
>> +static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
>> +               volatile unsigned long *p)
>> +{
>> +       char result;
>> +       char mask = 1 << nr;    /* nr guaranteed to be < 7 */
>> +
>> +       __asm__ __volatile__ ("eori %1, %2; smi %0"
>> +               : "=d" (result)
>> +               : "i" (mask), "o" (*p)
>> +               : "memory");
>> +       return result;
>> +}
>>
>> It compiles, so I feel Very Pleased With Myself, since I haven't written
>> m68k assmbly in 25 years.  But I have questions.
>
> Unfortunately, after plugging this in, the boot hangs at the first block
> device encountered:
>
> calling  proc_hardware_init+0x0/0x20 @ 1
> initcall proc_hardware_init+0x0/0x20 returned 0 after 40 usecs
> calling  rtc_init+0x0/0x5e @ 1
> initcall rtc_init+0x0/0x5e returned 0 after 40 usecs
> calling  atari_nvram_init+0x0/0x42 @ 1
> initcall atari_nvram_init+0x0/0x42 returned 0 after 40 usecs
> calling  nfhd_init+0x0/0x206 @ 1
> nfhd8: found device with 20971440 blocks (512 bytes)
>
> Doesn't eat up memory, doesn't sit in a tight spin, just never proceeds
> past that point.
>
> Running old and new code side by side (well, actually sequentially) in
> clear_bit_unlock_is_negative_byte()  and comparing results, the return
> values generated by both options match but the actual contents of the
> pointer passed to the function (after the mods) does differ:
>
> nfhd8: found device with 20971440 blocks (512 bytes)
> cbu_inb memval mismatch: 2004 12005
> cbu_inb memval mismatch: 2004 12005
>  nfhd8: AHDI p1 p2
> cbu_inb memval mismatch: 26 10027
> cbu_inb memval mismatch: 26 10027
>
> (first value is from the old code, second value from the new code).

Logging the bit nr. and value passed in:

nfhd8: found device with 20971440 blocks (512 bytes)
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 2004 12005 2005 0
  nfhd8: AHDI p1 p2
cbu_inb memval mismatch: 36 10037 37 0
cbu_inb memval mismatch: 36 10037 37 0

Plenty more later in the boot:
calling  brd_init+0x0/0xca @ 1
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 26 10027 27 0
cbu_inb memval mismatch: 26 10027 27 0
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 36 10037 37 0
cbu_inb memval mismatch: 36 10037 37 0
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 36 10037 37 0
cbu_inb memval mismatch: 36 10037 37 0
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 26 10027 27 0
cbu_inb memval mismatch: 26 10027 27 0
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 36 10037 37 0
cbu_inb memval mismatch: 36 10037 37 0
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 26 10027 27 0
cbu_inb memval mismatch: 26 10027 27 0
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 26 10027 27 0
cbu_inb memval mismatch: 26 10027 27 0
cbu_inb memval mismatch: 2004 12005 2005 0
cbu_inb memval mismatch: 2004 12005 2005 0

>
> It looks to me that the eori operates on a 16 bit word here?

And it's bits 32-17 of the longword.

The instruction you need is eori.b, and you'll have to increase the mem 
pointer by 3 bytes. With that change, I see no further mismatches until 
the return values begin to differ once disk access begins:

sd 0:0:1:0: [sdb] Preferred minimum I/O size 512 bytes
cbu_inb retval mismatch: 1 ff 2084 2084 2085 0
rtc-generic rtc-generic: registered as rtc0
cbu_inb retval mismatch: 1 ff 2094 2094 2095 0
...
sd 0:0:0:0: [sda] Attached SCSI disk
probe of 0:0:0:0 returned 0 after 58395182 usecs
cbu_inb retval mismatch: 1 ff 2094 2094 2095 0
  sdb: RDSK (512) sdb1 (DOS^G)(res 2 spb 2) sdb2 (SFS^B)(res 2 spb 1) 
sdb3 (SFS^B)(res 2 spb 2) sdb4 ((res 2 spb 1)
sdb: p4 size 18446744071971831216 extends beyond EOD, enabling native 
capacity
cbu_inb retval mismatch: 1 ff 2084 2084 2085 0

(return value from old and new code, value of mem from old and new code, 
original value, bit nr).

Bit 7 was already set before xor, and wasn't cleared. I suspect that's 
why the return value is no longer 1?

Time to dig out that 68k programmer's manual...

Cheers,

	Michael


>
> (Disclaimer: I have written m68k assembly occasionally in the past
> years, but I struggle with inline asm...)
>
> Cheers,
>
>     Michael
>
>>
>> First, m68k is big-endian, so I suspect I'm accessing the wrong byte.
>> Should something in there be adding 3 to 'p'?  Better to do it in the
>> asm, or in the constraints so the compiler can see it?
>>
>> Second, have I properly communicated to the assembler that this is
>> a byte-size operation, and it needs to check bit 7 and not bits 15 or 31
>> to set the negative flag?
>>
>> Third, can this be done better?  x86 has __GCC_ASM_FLAG_OUTPUTS__
>> so it doesn't need the equivalent of the SMI instruction to move the
>> condition to an output variable; it can just tell the compiler that
>> the N flag communicates the result that it's looking for.  Does m68k
>> have __GCC_ASM_FLAG_OUTPUTS__ or did nobody do that work yet?
>>
>> Fourth, we could do this is with ANDI instead of EORI.  It's mildly
>> safer, but we really shouldn't have two threads clearing the lock bit
>> that race with each other.  We can't do it with BCLR because that
>> doesn't set the N flag.  If we do that, we'd need to invert the mask.
>>
>> Appreciate your time looking at this.
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-21  1:12   ` clear_bit_unlock_is_negative_byte Michael Schmitz
@ 2023-07-21  1:32     ` Matthew Wilcox
  2023-07-21  1:43       ` clear_bit_unlock_is_negative_byte Michael Schmitz
  0 siblings, 1 reply; 20+ messages in thread
From: Matthew Wilcox @ 2023-07-21  1:32 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: linux-m68k

On Fri, Jul 21, 2023 at 01:12:48PM +1200, Michael Schmitz wrote:
> Logging the bit nr. and value passed in:
> 
> nfhd8: found device with 20971440 blocks (512 bytes)
> cbu_inb memval mismatch: 2004 12005 2005 0
> cbu_inb memval mismatch: 2004 12005 2005 0
>  nfhd8: AHDI p1 p2
> cbu_inb memval mismatch: 36 10037 37 0
> cbu_inb memval mismatch: 36 10037 37 0

I'm not quite sure what values you're printing here?  And whether
they're hex or decimal.  Could you show me the printk string?

> The instruction you need is eori.b, and you'll have to increase the mem
> pointer by 3 bytes. With that change, I see no further mismatches until the
> return values begin to differ once disk access begins:

Ah thanks.  The perils of working from Motorola official docs and then
trying to use the GNU assembler ...

> sd 0:0:1:0: [sdb] Preferred minimum I/O size 512 bytes
> cbu_inb retval mismatch: 1 ff 2084 2084 2085 0
> rtc-generic rtc-generic: registered as rtc0
> cbu_inb retval mismatch: 1 ff 2094 2094 2095 0
> ...
> sd 0:0:0:0: [sda] Attached SCSI disk
> probe of 0:0:0:0 returned 0 after 58395182 usecs
> cbu_inb retval mismatch: 1 ff 2094 2094 2095 0
>  sdb: RDSK (512) sdb1 (DOS^G)(res 2 spb 2) sdb2 (SFS^B)(res 2 spb 1) sdb3
> (SFS^B)(res 2 spb 2) sdb4 ((res 2 spb 1)
> sdb: p4 size 18446744071971831216 extends beyond EOD, enabling native
> capacity
> cbu_inb retval mismatch: 1 ff 2084 2084 2085 0
> 
> (return value from old and new code, value of mem from old and new code,
> original value, bit nr).

OK, so the new code would set the byte to 0xff (that's how Sxx works).
But it's returning a bool, so that shouldn't matter.  2084/2084/2085
would make sense; one of the two functions has cleared the bottom bit.

> Bit 7 was already set before xor, and wasn't cleared. I suspect that's why
> the return value is no longer 1?

Ah, it's not supposed to be cleared.  The way this works is that bit 0
is the lock bit; if someone's waiting on the folio, they set bit 7.  If
bit 7 is set when we clear bit 0, we look on the wait queue.  If there's
nobody on the wait queue, we clear bit 7.

So ... I think you've fixed it!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-21  1:32     ` clear_bit_unlock_is_negative_byte Matthew Wilcox
@ 2023-07-21  1:43       ` Michael Schmitz
  2023-07-21 17:03         ` clear_bit_unlock_is_negative_byte Matthew Wilcox
  2023-07-22  6:24         ` clear_bit_unlock_is_negative_byte Andreas Schwab
  0 siblings, 2 replies; 20+ messages in thread
From: Michael Schmitz @ 2023-07-21  1:43 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-m68k

Hi Matthew,

Am 21.07.2023 um 13:32 schrieb Matthew Wilcox:
> On Fri, Jul 21, 2023 at 01:12:48PM +1200, Michael Schmitz wrote:
>> Logging the bit nr. and value passed in:
>>
>> nfhd8: found device with 20971440 blocks (512 bytes)
>> cbu_inb memval mismatch: 2004 12005 2005 0
>> cbu_inb memval mismatch: 2004 12005 2005 0
>>  nfhd8: AHDI p1 p2
>> cbu_inb memval mismatch: 36 10037 37 0
>> cbu_inb memval mismatch: 36 10037 37 0
>
> I'm not quite sure what values you're printing here?  And whether
> they're hex or decimal.  Could you show me the printk string?

All hex, except the last (nr). Values are mem (as modified by the 
original code), copy of mem (as modified by inline asm), the original 
value of mem, and nr.

>
>> The instruction you need is eori.b, and you'll have to increase the mem
>> pointer by 3 bytes. With that change, I see no further mismatches until the
>> return values begin to differ once disk access begins:
>
> Ah thanks.  The perils of working from Motorola official docs and then
> trying to use the GNU assembler ...
>
>> sd 0:0:1:0: [sdb] Preferred minimum I/O size 512 bytes
>> cbu_inb retval mismatch: 1 ff 2084 2084 2085 0
>> rtc-generic rtc-generic: registered as rtc0
>> cbu_inb retval mismatch: 1 ff 2094 2094 2095 0
>> ...
>> sd 0:0:0:0: [sda] Attached SCSI disk
>> probe of 0:0:0:0 returned 0 after 58395182 usecs
>> cbu_inb retval mismatch: 1 ff 2094 2094 2095 0
>>  sdb: RDSK (512) sdb1 (DOS^G)(res 2 spb 2) sdb2 (SFS^B)(res 2 spb 1) sdb3
>> (SFS^B)(res 2 spb 2) sdb4 ((res 2 spb 1)
>> sdb: p4 size 18446744071971831216 extends beyond EOD, enabling native
>> capacity
>> cbu_inb retval mismatch: 1 ff 2084 2084 2085 0
>>
>> (return value from old and new code, value of mem from old and new code,
>> original value, bit nr).
>
> OK, so the new code would set the byte to 0xff (that's how Sxx works).
> But it's returning a bool, so that shouldn't matter.  2084/2084/2085
> would make sense; one of the two functions has cleared the bottom bit.
>
>> Bit 7 was already set before xor, and wasn't cleared. I suspect that's why
>> the return value is no longer 1?
>
> Ah, it's not supposed to be cleared.  The way this works is that bit 0
> is the lock bit; if someone's waiting on the folio, they set bit 7.  If
> bit 7 is set when we clear bit 0, we look on the wait queue.  If there's
> nobody on the wait queue, we clear bit 7.

Right, that's what I meant to say. I'd only seen cases where bit 0 had 
been set and was cleared. This isn't an actual production system of 
sorts, just an ARAnyM instance I can fire up quickly to see patched 
kernels crash horribly.

>
> So ... I think you've fixed it!

This is what I have tests running on right now:

static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
                 volatile unsigned long *p)
{
         unsigned char *cp = (unsigned char *) p;
         char result;
         char mask = 1 << nr;    /* nr guaranteed to be < 7 */

         __asm__ __volatile__ ("eori.b %1, %2; smi %0"
                 : "=d" (result)
                 : "i" (mask), "o" (*(cp+3))
                 : "memory");
         return result;
}

I'm sure you can do all the casting to char and increment by 3 in the 
asm argument...

If there's a simple way to exercise this code path using standard Unix 
tools (or stress-ng which I ought to have somewhere), drop me a hint.

Cheers,

	Michael

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-21  1:43       ` clear_bit_unlock_is_negative_byte Michael Schmitz
@ 2023-07-21 17:03         ` Matthew Wilcox
  2023-07-21 22:07           ` clear_bit_unlock_is_negative_byte Michael Schmitz
  2023-07-22  6:24         ` clear_bit_unlock_is_negative_byte Andreas Schwab
  1 sibling, 1 reply; 20+ messages in thread
From: Matthew Wilcox @ 2023-07-21 17:03 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: linux-m68k

On Fri, Jul 21, 2023 at 01:43:06PM +1200, Michael Schmitz wrote:
> > Ah, it's not supposed to be cleared.  The way this works is that bit 0
> > is the lock bit; if someone's waiting on the folio, they set bit 7.  If
> > bit 7 is set when we clear bit 0, we look on the wait queue.  If there's
> > nobody on the wait queue, we clear bit 7.
> 
> Right, that's what I meant to say. I'd only seen cases where bit 0 had been
> set and was cleared. This isn't an actual production system of sorts, just
> an ARAnyM instance I can fire up quickly to see patched kernels crash
> horribly.

Well, I appreciate the testing!

> This is what I have tests running on right now:
> 
> static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
>                 volatile unsigned long *p)
> {
>         unsigned char *cp = (unsigned char *) p;
>         char result;
>         char mask = 1 << nr;    /* nr guaranteed to be < 7 */
> 
>         __asm__ __volatile__ ("eori.b %1, %2; smi %0"
>                 : "=d" (result)
>                 : "i" (mask), "o" (*(cp+3))
>                 : "memory");
>         return result;
> }

I thought it a little odd to use an unsigned char when we're testing
to see if it's negative, so I went with this:

static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
                volatile unsigned long *p)
{
        char result;
        char mask = 1 << nr;            /* nr guaranteed to be < 7 */
        char *cp = (char *)p + 3;       /* m68k is big-endian */

        __asm__ __volatile__ ("eori.b %1, %2; smi %0"
                : "=d" (result)
                : "i" (mask), "o" (*cp)
                : "memory");
        return result;
}

> I'm sure you can do all the casting to char and increment by 3 in the asm
> argument...

I'd rather not.  I looked at doing the offset by three inside the asm,
but it seems like gcc is smart enough to do that without help:

000006e0 <folio_unlock>:
     6e0:       206f 0004       moveal %sp@(4),%a0
     6e4:       0a28 0001 0003  eorib #1,%a0@(3)
     6ea:       5bc0            smi %d0
     6ec:       4a00            tstb %d0
     6ee:       670a            beqs 6fa <folio_unlock+0x1a>
     6f0:       42a7            clrl %sp@-
     6f2:       2f08            movel %a0,%sp@-
     6f4:       4eba fcec       jsr %pc@(3e2 <folio_wake_bit>)
     6f8:       508f            addql #8,%sp
     6fa:       4e75            rts

You'll note the smi/tstb pair are unnecessary.  It could simply BPL to
the RTS instruction, but we can't tell GCC that because we don't have
the __GCC_ASM_FLAG_OUTPUTS__ feature.

By the way, before this optimisation, it was this:

000006fc <folio_unlock>:
     6fc:       206f 0004       moveal %sp@(4),%a0
     700:       08a8 0000 0003  bclr #0,%a0@(3)
     706:       2010            movel %a0@,%d0
     708:       4a00            tstb %d0
     70a:       6c0a            bges 716 <folio_unlock+0x1a>
     70c:       42a7            clrl %sp@-
     70e:       2f08            movel %a0,%sp@-
     710:       4eba fcd0       jsr %pc@(3e2 <folio_wake_bit>)
     714:       508f            addql #8,%sp
     716:       4e75            rts

which is the same number of instructions, but one more memory reference.
It's a read-after-write hazard, but I don't know if that affects any
m68k implementation; my impression is that even on an '060 there aren't
any real performance implications.  Kudos to gcc for figuring out that
testing bit 7 can be done with the tstb instruction.

> If there's a simple way to exercise this code path using standard Unix tools
> (or stress-ng which I ought to have somewhere), drop me a hint.

Oh, it's so common to have a waiter on a folio unlock that just making
it to the login prompt is enough to declare comfidently that this works.
CPU implementations with memory barriers and such fanciness are a little
harder to be confident in, but this looks good to me.  I generally run
xfstests, but that's just because I have it all set up and ready to go.

I'll drop your Tested-by on this if that's OK?  If you want a
Co-developed-by credit, that's fine with me too!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-21 17:03         ` clear_bit_unlock_is_negative_byte Matthew Wilcox
@ 2023-07-21 22:07           ` Michael Schmitz
  0 siblings, 0 replies; 20+ messages in thread
From: Michael Schmitz @ 2023-07-21 22:07 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-m68k

Hi Matthew,

Am 22.07.2023 um 05:03 schrieb Matthew Wilcox:
> I thought it a little odd to use an unsigned char when we're testing
> to see if it's negative, so I went with this:
>
> static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
>                 volatile unsigned long *p)
> {
>         char result;
>         char mask = 1 << nr;            /* nr guaranteed to be < 7 */
>         char *cp = (char *)p + 3;       /* m68k is big-endian */
>
>         __asm__ __volatile__ ("eori.b %1, %2; smi %0"
>                 : "=d" (result)
>                 : "i" (mask), "o" (*cp)
>                 : "memory");
>         return result;
> }

I don't think signedness make a difference here, as we don't leave 
anything about the test to the compiler.

>> I'm sure you can do all the casting to char and increment by 3 in the asm
>> argument...
>
> I'd rather not.  I looked at doing the offset by three inside the asm,
> but it seems like gcc is smart enough to do that without help:
>
> 000006e0 <folio_unlock>:
>      6e0:       206f 0004       moveal %sp@(4),%a0
>      6e4:       0a28 0001 0003  eorib #1,%a0@(3)
>      6ea:       5bc0            smi %d0
>      6ec:       4a00            tstb %d0
>      6ee:       670a            beqs 6fa <folio_unlock+0x1a>
>      6f0:       42a7            clrl %sp@-
>      6f2:       2f08            movel %a0,%sp@-
>      6f4:       4eba fcec       jsr %pc@(3e2 <folio_wake_bit>)
>      6f8:       508f            addql #8,%sp
>      6fa:       4e75            rts
>
> You'll note the smi/tstb pair are unnecessary.  It could simply BPL to
> the RTS instruction, but we can't tell GCC that because we don't have
> the __GCC_ASM_FLAG_OUTPUTS__ feature.

I see. __GCC_ASM_FLAG_OUTPUTS__ appears to be rare enough to assume m68k 
won't get to use it this decade.

>
> By the way, before this optimisation, it was this:
>
> 000006fc <folio_unlock>:
>      6fc:       206f 0004       moveal %sp@(4),%a0
>      700:       08a8 0000 0003  bclr #0,%a0@(3)
>      706:       2010            movel %a0@,%d0
>      708:       4a00            tstb %d0
>      70a:       6c0a            bges 716 <folio_unlock+0x1a>
>      70c:       42a7            clrl %sp@-
>      70e:       2f08            movel %a0,%sp@-
>      710:       4eba fcd0       jsr %pc@(3e2 <folio_wake_bit>)
>      714:       508f            addql #8,%sp
>      716:       4e75            rts
>
> which is the same number of instructions, but one more memory reference.

Seeing as a0 was just loaded from memory above, and should be in the 
cache, I doubt there's much performance difference from the additional 
memory access here. Haven't looked at how many clock cycles each of 
these instructions take.

> It's a read-after-write hazard, but I don't know if that affects any
> m68k implementation; my impression is that even on an '060 there aren't

Unless something modifies the memory location pointed to by a0 (during 
preemption of any kind), I can't see a race here (a0 is saved/restored 
during interrupts, syscalls and such).
We haven't seen any races here in the past with this code AFAICS.

> any real performance implications.  Kudos to gcc for figuring out that
> testing bit 7 can be done with the tstb instruction.
>
>> If there's a simple way to exercise this code path using standard Unix tools
>> (or stress-ng which I ought to have somewhere), drop me a hint.
>
> Oh, it's so common to have a waiter on a folio unlock that just making
> it to the login prompt is enough to declare comfidently that this works.
> CPU implementations with memory barriers and such fanciness are a little
> harder to be confident in, but this looks good to me.  I generally run
> xfstests, but that's just because I have it all set up and ready to go.

Should be all good then.

> I'll drop your Tested-by on this if that's OK?  If you want a
> Co-developed-by credit, that's fine with me too!

Tested-by is fine, thanks. I wouldn't want anyone to assume I'm familiar 
with folios and the like...

Cheers,

	Michael



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-21  1:43       ` clear_bit_unlock_is_negative_byte Michael Schmitz
  2023-07-21 17:03         ` clear_bit_unlock_is_negative_byte Matthew Wilcox
@ 2023-07-22  6:24         ` Andreas Schwab
  2023-07-22 14:45           ` clear_bit_unlock_is_negative_byte Matthew Wilcox
  1 sibling, 1 reply; 20+ messages in thread
From: Andreas Schwab @ 2023-07-22  6:24 UTC (permalink / raw)
  To: Michael Schmitz; +Cc: Matthew Wilcox, linux-m68k

On Jul 21 2023, Michael Schmitz wrote:

> static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
>                 volatile unsigned long *p)
> {
>         unsigned char *cp = (unsigned char *) p;
>         char result;
>         char mask = 1 << nr;    /* nr guaranteed to be < 7 */
>
>         __asm__ __volatile__ ("eori.b %1, %2; smi %0"
>                 : "=d" (result)
>                 : "i" (mask), "o" (*(cp+3))

That should use "id" as constraint, so that the compiler can share the
constant with other insns.  Also, the third operand is modified, so it
needs to be marked as in/out.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-22  6:24         ` clear_bit_unlock_is_negative_byte Andreas Schwab
@ 2023-07-22 14:45           ` Matthew Wilcox
  2023-07-22 15:26             ` clear_bit_unlock_is_negative_byte Andreas Schwab
  0 siblings, 1 reply; 20+ messages in thread
From: Matthew Wilcox @ 2023-07-22 14:45 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Michael Schmitz, linux-m68k

On Sat, Jul 22, 2023 at 08:24:34AM +0200, Andreas Schwab wrote:
> On Jul 21 2023, Michael Schmitz wrote:
> 
> > static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
> >                 volatile unsigned long *p)
> > {
> >         unsigned char *cp = (unsigned char *) p;
> >         char result;
> >         char mask = 1 << nr;    /* nr guaranteed to be < 7 */
> >
> >         __asm__ __volatile__ ("eori.b %1, %2; smi %0"
> >                 : "=d" (result)
> >                 : "i" (mask), "o" (*(cp+3))
> 
> That should use "id" as constraint, so that the compiler can share the
> constant with other insns.  Also, the third operand is modified, so it
> needs to be marked as in/out.

The "o" constraint is shared with bfset_mem_set_bit,
bfclr_mem_test_and_clear_bit and a few other functions.  Should they
all be changed?  Some use "+m".

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-22 14:45           ` clear_bit_unlock_is_negative_byte Matthew Wilcox
@ 2023-07-22 15:26             ` Andreas Schwab
  2023-07-22 15:38               ` clear_bit_unlock_is_negative_byte Matthew Wilcox
  0 siblings, 1 reply; 20+ messages in thread
From: Andreas Schwab @ 2023-07-22 15:26 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Michael Schmitz, linux-m68k

On Jul 22 2023, Matthew Wilcox wrote:

> On Sat, Jul 22, 2023 at 08:24:34AM +0200, Andreas Schwab wrote:
>> On Jul 21 2023, Michael Schmitz wrote:
>> 
>> > static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
>> >                 volatile unsigned long *p)
>> > {
>> >         unsigned char *cp = (unsigned char *) p;
>> >         char result;
>> >         char mask = 1 << nr;    /* nr guaranteed to be < 7 */
>> >
>> >         __asm__ __volatile__ ("eori.b %1, %2; smi %0"
>> >                 : "=d" (result)
>> >                 : "i" (mask), "o" (*(cp+3))
>> 
>> That should use "id" as constraint, so that the compiler can share the
>> constant with other insns.  Also, the third operand is modified, so it
>> needs to be marked as in/out.
>
> The "o" constraint is shared with bfset_mem_set_bit,
> bfclr_mem_test_and_clear_bit and a few other functions.  Should they
> all be changed?

They use a memory clobber, because the "o" constraint cannot accurately
describe the situation there.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-22 15:26             ` clear_bit_unlock_is_negative_byte Andreas Schwab
@ 2023-07-22 15:38               ` Matthew Wilcox
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2023-07-22 15:38 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Michael Schmitz, linux-m68k

On Sat, Jul 22, 2023 at 05:26:07PM +0200, Andreas Schwab wrote:
> On Jul 22 2023, Matthew Wilcox wrote:
> 
> > On Sat, Jul 22, 2023 at 08:24:34AM +0200, Andreas Schwab wrote:
> >> On Jul 21 2023, Michael Schmitz wrote:
> >> 
> >> > static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
> >> >                 volatile unsigned long *p)
> >> > {
> >> >         unsigned char *cp = (unsigned char *) p;
> >> >         char result;
> >> >         char mask = 1 << nr;    /* nr guaranteed to be < 7 */
> >> >
> >> >         __asm__ __volatile__ ("eori.b %1, %2; smi %0"
> >> >                 : "=d" (result)
> >> >                 : "i" (mask), "o" (*(cp+3))
> >> 
> >> That should use "id" as constraint, so that the compiler can share the
> >> constant with other insns.  Also, the third operand is modified, so it
> >> needs to be marked as in/out.
> >
> > The "o" constraint is shared with bfset_mem_set_bit,
> > bfclr_mem_test_and_clear_bit and a few other functions.  Should they
> > all be changed?
> 
> They use a memory clobber, because the "o" constraint cannot accurately
> describe the situation there.

My understanding is that clear_bit_unlock_is_negative_byte() also
needs the "memory" clobber because it has "release" semantics (ie
all previous stores must be visible before this store is visible).

static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
                volatile unsigned long *p)
{
        char result;
        char mask = 1 << nr;            /* nr guaranteed to be < 7 */
        char *cp = (char *)p + 3;       /* m68k is big-endian */

        __asm__ __volatile__ ("eori.b %1, %2; smi %0"
                : "=d" (result)
                : "di" (mask), "o" (*cp)
                : "memory");
        return result;
}

is what I currently have.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-20 19:27 clear_bit_unlock_is_negative_byte Matthew Wilcox
  2023-07-20 22:37 ` clear_bit_unlock_is_negative_byte Michael Schmitz
@ 2023-07-21  6:34 ` Andreas Schwab
  2023-07-21  8:57   ` clear_bit_unlock_is_negative_byte Brad Boyer
  2023-07-21 11:59   ` clear_bit_unlock_is_negative_byte Matthew Wilcox
  1 sibling, 2 replies; 20+ messages in thread
From: Andreas Schwab @ 2023-07-21  6:34 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-m68k

On Jul 20 2023, Matthew Wilcox wrote:

> +static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
> +               volatile unsigned long *p)
> +{
> +       char result;
> +       char mask = 1 << nr;    /* nr guaranteed to be < 7 */
> +
> +       __asm__ __volatile__ ("eori %1, %2; smi %0"

Why are you using XOR if you want to clear a bit?  If it operates on a
byte, why does it receive a pointer to long?

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-21  6:34 ` clear_bit_unlock_is_negative_byte Andreas Schwab
@ 2023-07-21  8:57   ` Brad Boyer
  2023-07-21  9:18     ` clear_bit_unlock_is_negative_byte Andreas Schwab
  2023-07-21 11:59   ` clear_bit_unlock_is_negative_byte Matthew Wilcox
  1 sibling, 1 reply; 20+ messages in thread
From: Brad Boyer @ 2023-07-21  8:57 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Matthew Wilcox, linux-m68k

On Fri, Jul 21, 2023 at 08:34:55AM +0200, Andreas Schwab wrote:
> On Jul 20 2023, Matthew Wilcox wrote:
> 
> > +static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
> > +               volatile unsigned long *p)
> > +{
> > +       char result;
> > +       char mask = 1 << nr;    /* nr guaranteed to be < 7 */
> > +
> > +       __asm__ __volatile__ ("eori %1, %2; smi %0"
> 
> Why are you using XOR if you want to clear a bit?  If it operates on a
> byte, why does it receive a pointer to long?

I presume the thing with byte is to try to save an extension word.
Using a long immediate argument makes the instruction take 6 bytes
instead of 4. Since we (apparently?) know we don't want to modify
the other three bytes, we can safely take shortcuts like that.
However, I would agree that using XOR here seems wrong. It
fundamentally presumes the bit was previously set. The other
architectures I checked with asm versions of this look like they
do the equivalent of andi with the inverted mask. We are also
presuming that nr is always a compile time constant, but it looks
like that should be true? If it can be immediate, the inversion
would be done by the compiler. The powerpc version doesn't appear to
make either assumption. It looks like it's loading and storing a
"word" (32-bit in that case) to do this. However, the instructions
it's using don't even have byte equivalents.

	Brad Boyer
	flar@allandria.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-21  8:57   ` clear_bit_unlock_is_negative_byte Brad Boyer
@ 2023-07-21  9:18     ` Andreas Schwab
  0 siblings, 0 replies; 20+ messages in thread
From: Andreas Schwab @ 2023-07-21  9:18 UTC (permalink / raw)
  To: Brad Boyer; +Cc: Matthew Wilcox, linux-m68k

On Jul 21 2023, Brad Boyer wrote:

> On Fri, Jul 21, 2023 at 08:34:55AM +0200, Andreas Schwab wrote:
>> On Jul 20 2023, Matthew Wilcox wrote:
>> 
>> > +static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
>> > +               volatile unsigned long *p)
>> > +{
>> > +       char result;
>> > +       char mask = 1 << nr;    /* nr guaranteed to be < 7 */
>> > +
>> > +       __asm__ __volatile__ ("eori %1, %2; smi %0"
>> 
>> Why are you using XOR if you want to clear a bit?  If it operates on a
>> byte, why does it receive a pointer to long?
>
> I presume the thing with byte is to try to save an extension word.

Why is this hardcoded in the function name?

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-21  6:34 ` clear_bit_unlock_is_negative_byte Andreas Schwab
  2023-07-21  8:57   ` clear_bit_unlock_is_negative_byte Brad Boyer
@ 2023-07-21 11:59   ` Matthew Wilcox
  2023-07-21 12:52     ` clear_bit_unlock_is_negative_byte Andreas Schwab
  2023-07-21 20:29     ` clear_bit_unlock_is_negative_byte Brad Boyer
  1 sibling, 2 replies; 20+ messages in thread
From: Matthew Wilcox @ 2023-07-21 11:59 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: linux-m68k

On Fri, Jul 21, 2023 at 08:34:55AM +0200, Andreas Schwab wrote:
> On Jul 20 2023, Matthew Wilcox wrote:
> 
> > +static inline bool clear_bit_unlock_is_negative_byte(unsigned int nr,
> > +               volatile unsigned long *p)
> > +{
> > +       char result;
> > +       char mask = 1 << nr;    /* nr guaranteed to be < 7 */
> > +
> > +       __asm__ __volatile__ ("eori %1, %2; smi %0"
> 
> Why are you using XOR if you want to clear a bit?  If it operates on a
> byte, why does it receive a pointer to long?

It's a clever hack.  This function has exactly one user, and it's an
important one -- folio_unlock().  Bit 7 is set if there are other
threads waiting for this folio to be unlocked.  There are two reasonable
implementations, depending what kind of CPU you have; you can either
load-locked; clear the bottom bit, store-conditional, test bit 7.  Or
x86 and m68k have the perfect instruction to clear a bit and set the
Negative flag if bit 7 is set.

As I said in the earlier email, BCLR doesn't affect the N flag, but
EORI and ANDI do.  We are guaranteed that the bit we're clearing is set,
so EORI will work.  ANDI would also work.

Do you happen to know if __GCC_ASM_FLAG_OUTPUTS__ is implemented for
m68k so we can save the SMI instruction?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-21 11:59   ` clear_bit_unlock_is_negative_byte Matthew Wilcox
@ 2023-07-21 12:52     ` Andreas Schwab
  2023-07-21 20:29     ` clear_bit_unlock_is_negative_byte Brad Boyer
  1 sibling, 0 replies; 20+ messages in thread
From: Andreas Schwab @ 2023-07-21 12:52 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-m68k

On Jul 21 2023, Matthew Wilcox wrote:

> Do you happen to know if __GCC_ASM_FLAG_OUTPUTS__ is implemented for
> m68k so we can save the SMI instruction?

$ git grep __GCC_ASM_FLAG_OUTPUTS__ -- gcc/config
gcc/config/aarch64/aarch64-c.cc:  builtin_define ("__GCC_ASM_FLAG_OUTPUTS__");
gcc/config/arm/arm-c.cc:  def_or_undef_macro (pfile, "__GCC_ASM_FLAG_OUTPUTS__", !TARGET_THUMB1);
gcc/config/i386/i386-c.cc:  cpp_define (parse_in, "__GCC_ASM_FLAG_OUTPUTS__");

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-21 11:59   ` clear_bit_unlock_is_negative_byte Matthew Wilcox
  2023-07-21 12:52     ` clear_bit_unlock_is_negative_byte Andreas Schwab
@ 2023-07-21 20:29     ` Brad Boyer
  2023-07-22  3:42       ` clear_bit_unlock_is_negative_byte Matthew Wilcox
  1 sibling, 1 reply; 20+ messages in thread
From: Brad Boyer @ 2023-07-21 20:29 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andreas Schwab, linux-m68k

On Fri, Jul 21, 2023 at 12:59:48PM +0100, Matthew Wilcox wrote:
> On Fri, Jul 21, 2023 at 08:34:55AM +0200, Andreas Schwab wrote:
> > Why are you using XOR if you want to clear a bit?  If it operates on a
> > byte, why does it receive a pointer to long?
> 
> It's a clever hack.  This function has exactly one user, and it's an
> important one -- folio_unlock().  Bit 7 is set if there are other
> threads waiting for this folio to be unlocked.  There are two reasonable
> implementations, depending what kind of CPU you have; you can either
> load-locked; clear the bottom bit, store-conditional, test bit 7.  Or
> x86 and m68k have the perfect instruction to clear a bit and set the
> Negative flag if bit 7 is set.
> 
> As I said in the earlier email, BCLR doesn't affect the N flag, but
> EORI and ANDI do.  We are guaranteed that the bit we're clearing is set,
> so EORI will work.  ANDI would also work.

There shouldn't be any performance difference between using eori.b and
andi.b, since the compiler would fully generate the immediate mask
value either way. Wouldn't readability suggest that we should use the
AND operation to make it more obvious what the code is doing and make
it more consistent with the other implementations of this function?

	Brad Boyer
	flar@allandria.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-21 20:29     ` clear_bit_unlock_is_negative_byte Brad Boyer
@ 2023-07-22  3:42       ` Matthew Wilcox
  2023-07-22 23:49         ` clear_bit_unlock_is_negative_byte Brad Boyer
  0 siblings, 1 reply; 20+ messages in thread
From: Matthew Wilcox @ 2023-07-22  3:42 UTC (permalink / raw)
  To: Brad Boyer; +Cc: Andreas Schwab, linux-m68k

On Fri, Jul 21, 2023 at 01:29:04PM -0700, Brad Boyer wrote:
> On Fri, Jul 21, 2023 at 12:59:48PM +0100, Matthew Wilcox wrote:
> > On Fri, Jul 21, 2023 at 08:34:55AM +0200, Andreas Schwab wrote:
> > > Why are you using XOR if you want to clear a bit?  If it operates on a
> > > byte, why does it receive a pointer to long?
> > 
> > It's a clever hack.  This function has exactly one user, and it's an
> > important one -- folio_unlock().  Bit 7 is set if there are other
> > threads waiting for this folio to be unlocked.  There are two reasonable
> > implementations, depending what kind of CPU you have; you can either
> > load-locked; clear the bottom bit, store-conditional, test bit 7.  Or
> > x86 and m68k have the perfect instruction to clear a bit and set the
> > Negative flag if bit 7 is set.
> > 
> > As I said in the earlier email, BCLR doesn't affect the N flag, but
> > EORI and ANDI do.  We are guaranteed that the bit we're clearing is set,
> > so EORI will work.  ANDI would also work.
> 
> There shouldn't be any performance difference between using eori.b and
> andi.b, since the compiler would fully generate the immediate mask
> value either way. Wouldn't readability suggest that we should use the
> AND operation to make it more obvious what the code is doing and make
> it more consistent with the other implementations of this function?

That brings me to the next phase of this which is not yet fully baked,
so I wasn't proposing it for discussion at this time.

https://git.infradead.org/users/willy/pagecache.git/shortlog/refs/heads/folio-flags

What I want to do is allow us to set/clear multiple bits in the same
instruction that unlocks the folio.  The obvious one is
folio_mark_uptodate(); this is commonly paired with folio_unlock(),
and it's a crucial part of page cache reads.  There are some less
obvious ones like folio_start_writeback() and folio_unlock(),
which isn't included in this patch series.  If we're going to set
one bit and clear another bit, we have to use the xor/eor instruction,
and that's what we do.

On some architectures, such as MIPS, there's actually a
separate function to implement all of this and so passing it a
(constant) mask (instead of calculating that mask from what
is now a variable bit) makes sense, and we can then use that
function to implement both clear_bit_unlock_is_negative_byte() and
change_and_unlock_is_negative_byte().  I'm still going around on this.
I might change the API to always pass in a mask from folio_set_unlock().

And maybe we actualy get rid of clear_bit_unlock_is_negative_byte()
since it's essentially a subset of change_and_unlock_is_negative_byte().
But I can change m68k to use andi.b for now if you feel strongly.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-22  3:42       ` clear_bit_unlock_is_negative_byte Matthew Wilcox
@ 2023-07-22 23:49         ` Brad Boyer
  2023-07-23  1:08           ` clear_bit_unlock_is_negative_byte Michael Schmitz
  0 siblings, 1 reply; 20+ messages in thread
From: Brad Boyer @ 2023-07-22 23:49 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andreas Schwab, linux-m68k

On Sat, Jul 22, 2023 at 04:42:33AM +0100, Matthew Wilcox wrote:
> What I want to do is allow us to set/clear multiple bits in the same
> instruction that unlocks the folio.  The obvious one is
> folio_mark_uptodate(); this is commonly paired with folio_unlock(),
> and it's a crucial part of page cache reads.  There are some less
> obvious ones like folio_start_writeback() and folio_unlock(),
> which isn't included in this patch series.  If we're going to set
> one bit and clear another bit, we have to use the xor/eor instruction,
> and that's what we do.
> 
> On some architectures, such as MIPS, there's actually a
> separate function to implement all of this and so passing it a
> (constant) mask (instead of calculating that mask from what
> is now a variable bit) makes sense, and we can then use that
> function to implement both clear_bit_unlock_is_negative_byte() and
> change_and_unlock_is_negative_byte().  I'm still going around on this.
> I might change the API to always pass in a mask from folio_set_unlock().
> 
> And maybe we actualy get rid of clear_bit_unlock_is_negative_byte()
> since it's essentially a subset of change_and_unlock_is_negative_byte().
> But I can change m68k to use andi.b for now if you feel strongly.

I do prefer to be consistent. We already have strange inconsistencies
for bad reasons or no reason at all. However, if it's temporary, I
suppose it doesn't matter that much. With the new code in the diff in
your git tree, the use of eori is now consistent if you get rid of the
old clear bit version as separate code. If there's a decent chance it
will be a while before that next stage, I would personally recommend
using the andi. However, I would defer to the others on the list if
people think it's fine.

	Brad Boyer
	flar@allandria.com


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: clear_bit_unlock_is_negative_byte
  2023-07-22 23:49         ` clear_bit_unlock_is_negative_byte Brad Boyer
@ 2023-07-23  1:08           ` Michael Schmitz
  0 siblings, 0 replies; 20+ messages in thread
From: Michael Schmitz @ 2023-07-23  1:08 UTC (permalink / raw)
  To: Brad Boyer, Matthew Wilcox; +Cc: Andreas Schwab, linux-m68k

Hi Brad,

Am 23.07.2023 um 11:49 schrieb Brad Boyer:
> On Sat, Jul 22, 2023 at 04:42:33AM +0100, Matthew Wilcox wrote:
>> What I want to do is allow us to set/clear multiple bits in the same
>> instruction that unlocks the folio.  The obvious one is
>> folio_mark_uptodate(); this is commonly paired with folio_unlock(),
>> and it's a crucial part of page cache reads.  There are some less
>> obvious ones like folio_start_writeback() and folio_unlock(),
>> which isn't included in this patch series.  If we're going to set
>> one bit and clear another bit, we have to use the xor/eor instruction,
>> and that's what we do.
>>
>> On some architectures, such as MIPS, there's actually a
>> separate function to implement all of this and so passing it a
>> (constant) mask (instead of calculating that mask from what
>> is now a variable bit) makes sense, and we can then use that
>> function to implement both clear_bit_unlock_is_negative_byte() and
>> change_and_unlock_is_negative_byte().  I'm still going around on this.
>> I might change the API to always pass in a mask from folio_set_unlock().
>>
>> And maybe we actualy get rid of clear_bit_unlock_is_negative_byte()
>> since it's essentially a subset of change_and_unlock_is_negative_byte().
>> But I can change m68k to use andi.b for now if you feel strongly.
>
> I do prefer to be consistent. We already have strange inconsistencies
> for bad reasons or no reason at all. However, if it's temporary, I
> suppose it doesn't matter that much. With the new code in the diff in
> your git tree, the use of eori is now consistent if you get rid of the
> old clear bit version as separate code. If there's a decent chance it
> will be a while before that next stage, I would personally recommend
> using the andi. However, I would defer to the others on the list if
> people think it's fine.

First off - I know nothing about the code calling this function, so my 
argument may be mooted by design, or proof showing use of xor is safe ...

The only thing that has me worried about the xor scheme is Matthew's 
assertion that the bit meant to be cleared is known to be set before 
clear_bit_unlock_is_negative_byte() is called. Should that be incorrect 
in corner cases, I'd expect things to fall apart with the proposed patch.

Personally, I'd prefer we catch such cases sooner rather than later, 
should they exist. But others may take the view, with rather more 
justification, that the old code using andi is safer and ought to be 
used until other changes force use of xor.

Just my $.02 worth...

Cheers,

	Michael
>
> 	Brad Boyer
> 	flar@allandria.com
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-07-23  1:08 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-20 19:27 clear_bit_unlock_is_negative_byte Matthew Wilcox
2023-07-20 22:37 ` clear_bit_unlock_is_negative_byte Michael Schmitz
2023-07-21  1:12   ` clear_bit_unlock_is_negative_byte Michael Schmitz
2023-07-21  1:32     ` clear_bit_unlock_is_negative_byte Matthew Wilcox
2023-07-21  1:43       ` clear_bit_unlock_is_negative_byte Michael Schmitz
2023-07-21 17:03         ` clear_bit_unlock_is_negative_byte Matthew Wilcox
2023-07-21 22:07           ` clear_bit_unlock_is_negative_byte Michael Schmitz
2023-07-22  6:24         ` clear_bit_unlock_is_negative_byte Andreas Schwab
2023-07-22 14:45           ` clear_bit_unlock_is_negative_byte Matthew Wilcox
2023-07-22 15:26             ` clear_bit_unlock_is_negative_byte Andreas Schwab
2023-07-22 15:38               ` clear_bit_unlock_is_negative_byte Matthew Wilcox
2023-07-21  6:34 ` clear_bit_unlock_is_negative_byte Andreas Schwab
2023-07-21  8:57   ` clear_bit_unlock_is_negative_byte Brad Boyer
2023-07-21  9:18     ` clear_bit_unlock_is_negative_byte Andreas Schwab
2023-07-21 11:59   ` clear_bit_unlock_is_negative_byte Matthew Wilcox
2023-07-21 12:52     ` clear_bit_unlock_is_negative_byte Andreas Schwab
2023-07-21 20:29     ` clear_bit_unlock_is_negative_byte Brad Boyer
2023-07-22  3:42       ` clear_bit_unlock_is_negative_byte Matthew Wilcox
2023-07-22 23:49         ` clear_bit_unlock_is_negative_byte Brad Boyer
2023-07-23  1:08           ` clear_bit_unlock_is_negative_byte Michael Schmitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox