From: NeilBrown <neilb@suse.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: Thomas Graf <tgraf@suug.ch>,
Herbert Xu <herbert@gondor.apana.org.au>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash bucket.
Date: Thu, 11 Apr 2019 10:48:42 +1000 [thread overview]
Message-ID: <87r2a9xt79.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20190410193418.GA32402@roeck-us.net>
[-- Attachment #1: Type: text/plain, Size: 6550 bytes --]
On Wed, Apr 10 2019, Guenter Roeck wrote:
> Hi,
>
> On Tue, Apr 02, 2019 at 10:07:45AM +1100, NeilBrown wrote:
>> This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
>> bucket pointer to lock the hash chain for that bucket.
>>
>> The benefits of a bit spin_lock are:
>> - no need to allocate a separate array of locks.
>> - no need to have a configuration option to guide the
>> choice of the size of this array
>> - locking cost is often a single test-and-set in a cache line
>> that will have to be loaded anyway. When inserting at, or removing
>> from, the head of the chain, the unlock is free - writing the new
>> address in the bucket head implicitly clears the lock bit.
>> For __rhashtable_insert_fast() we ensure this always happens
>> when adding a new key.
>> - even when lockings costs 2 updates (lock and unlock), they are
>> in a cacheline that needs to be read anyway.
>>
>> The cost of using a bit spin_lock is a little bit of code complexity,
>> which I think is quite manageable.
>>
>> Bit spin_locks are sometimes inappropriate because they are not fair -
>> if multiple CPUs repeatedly contend of the same lock, one CPU can
>> easily be starved. This is not a credible situation with rhashtable.
>> Multiple CPUs may want to repeatedly add or remove objects, but they
>> will typically do so at different buckets, so they will attempt to
>> acquire different locks.
>>
>> As we have more bit-locks than we previously had spinlocks (by at
>> least a factor of two) we can expect slightly less contention to
>> go with the slightly better cache behavior and reduced memory
>> consumption.
>>
>> To enhance type checking, a new struct is introduced to represent the
>> pointer plus lock-bit
>> that is stored in the bucket-table. This is "struct rhash_lock_head"
>> and is empty. A pointer to this needs to be cast to either an
>> unsigned lock, or a "struct rhash_head *" to be useful.
>> Variables of this type are most often called "bkt".
>>
>> Previously "pprev" would sometimes point to a bucket, and sometimes a
>> ->next pointer in an rhash_head. As these are now different types,
>> pprev is NULL when it would have pointed to the bucket. In that case,
>> 'blk' is used, together with correct locking protocol.
>>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>
> This patch causes my qemu q800 boot test to crash reliably.
>
> Starting network: Unable to handle kernel access at virtual address (ptrval)
> Oops: 00000000
> Modules linked in:
> PC: [<00248b90>] sk_filter_trim_cap+0x56/0x158
> SR: 2000 SP: (ptrval) a2: 07a30aa0
> d0: 07836300 d1: 0783666c d2: 00000001 d3: 0025c192
> d4: 0025a636 d5: 00248b3a a0: 078363fe a1: 60000000
> Process ip (pid: 66, task=(ptrval))
> Frame format=7 eff addr=6000000c ssw=0505 faddr=6000000c
> wb 1 stat/addr/data: 0000 00000000 00000000
> wb 2 stat/addr/data: 0000 00000000 00000000
> wb 3 stat/addr/data: 0000 6000000c 00000000
> push data: 00000000 00000000 00000000 00000000
> Stack from 07a5bdec:
> 07a5be5c 0025c192 0025a636 00248b3a 00000000 ef9febc8 078363fe 0787a2d0
> 0783645c 07a5be5c 0025c192 0025a636 00248b3a 0787a2d0 07a2c000 0025c470
> 078363fe 0787a2d0 00000001 00000000 00000000 07a5be98 07a17500 00000000
> 0787a2d0 07a2c000 07a5bef8 07a5beac 7fffffff 0025c7e2 07a2c000 0787a2d0
> 00000000 00000000 00000000 0000000c ef9feb14 ef9feb14 00000000 0031a52e
> 07a5bef8 07a5bf28 0000000c 0781eba0 00000000 00000042 00000000 00000000
> Call Trace: [<0025c192>] netlink_attachskb+0x0/0x138
> [<0025a636>] __netlink_lookup.isra.3+0x0/0xbc
> [<00248b3a>] sk_filter_trim_cap+0x0/0x158
> [<0025c192>] netlink_attachskb+0x0/0x138
> [<0025a636>] __netlink_lookup.isra.3+0x0/0xbc
> [<00248b3a>] sk_filter_trim_cap+0x0/0x158
> [<0025c470>] netlink_unicast+0x170/0x1be
> [<0025c7e2>] netlink_sendmsg+0x288/0x2b2
> [<0021a5be>] sock_sendmsg+0x1c/0x44
> [<0021b8a6>] __sys_sendto+0xac/0xd2
> [<00100000>] ext4_read_block_bitmap_nowait+0x4d4/0x4ec
> [<00100000>] ext4_read_block_bitmap_nowait+0x4d4/0x4ec
> [<00219906>] sock_alloc_file+0x50/0x80
> [<000b7a1c>] fd_install+0x12/0x18
> [<0021b1cc>] __sys_socket+0x7e/0x9c
> [<0021b8ea>] sys_sendto+0x1e/0x24
> [<00002a40>] syscall+0x8/0xc
> [<0000c004>] ATANTBL+0x618/0x800
> Code: 4a89 6604 4280 60ea 2c2b 000c 2748 000c <2869> 000c 082c 0003 0002 6728 4878 0014 7620 4873 3800 486e ffec 4eb9 002e 5b88
Thanks for testing and for the report.
The above code disassembles to:
0: 4a89 tstl %a1
2: 6604 bnes 0x8
4: 4280 clrl %d0
6: 60ea bras 0xfffffff2
8: 2c2b 000c movel %a3@(12),%d6
c: 2748 000c movel %a0,%a3@(12)
10:* 2869 000c moveal %a1@(12),%a4 <-- trapping instruction
14: 082c 0003 0002 btst #3,%a4@(2)
1a: 6728 beqs 0x44
1c: 4878 0014 pea 0x14
20: 7620 moveq #32,%d3
22: 4873 3800 pea %a3@(0000000000000000,%d3:l)
26: 486e ffec pea %fp@(-20)
2a: 4eb9 002e 5b88 jsr 0x2e5b88
And as %a1 is 60000000, it crashes.
I'm not familiar with m68k assembler, but a bit of hunting suggests that
moveal %a1@(12),%a4
means "add 12 to %1, load an address from there, then dereference that
address again.
I think that both skb->sk and filter->prog are at offsets of 12, so I
guess
8: 2c2b 000c movel %a3@(12),%d6
is
struct sock *save_sk = skb->sk;
c: 2748 000c movel %a0,%a3@(12)
is
skb->sk = sk;
and
10:* 2869 000c moveal %a1@(12),%a4 <-- trapping instruction
14: 082c 0003 0002 btst #3,%a4@(2)
is
bpf_prog_run_save_cb(filter->prog, skb);
if (unlikely(prog->cb_access)) {
cb_access might be bit 3 of the byte 2 along from *prog, which is 12
along from a1.
That makes a1 'filter', loaded from sk->sk_filter, where 'sk' is %a0,
078363fe
That address is 2-byte aligned, which is probably wrong.... If addresses
of structs aren't always 4-byte aligned, then using BIT(1) for a lock
bit isn't going to work.
.... and after googling a bit I see that 68000 require 2-byte alignment,
but not 4-byte. Oh..
That means there aren't two spare bits in an address, so I cannot use
one for the NULLS and one for a lock bit. Bother.
I might be able to find a different way forward, but for now I think we
need to drop this series.
Thanks,
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2019-04-11 0:48 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-01 23:07 [PATCH 0/4 v2] Convert rhashtable to use bitlocks NeilBrown
2019-04-01 23:07 ` [PATCH 4/4] rhashtable: add lockdep tracking to bucket bit-spin-locks NeilBrown
2019-04-01 23:07 ` [PATCH 1/4] rhashtable: use cmpxchg() in nested_table_alloc() NeilBrown
2019-04-08 2:34 ` Herbert Xu
2019-04-01 23:07 ` [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash bucket NeilBrown
2019-04-02 10:11 ` David Laight
2019-04-02 21:10 ` NeilBrown
2019-04-03 9:26 ` David Laight
2019-04-04 0:13 ` NeilBrown
2019-04-08 2:34 ` Herbert Xu
2019-04-10 19:34 ` Guenter Roeck
2019-04-11 0:48 ` NeilBrown [this message]
2019-04-11 2:15 ` David Miller
2019-04-11 6:13 ` NeilBrown
2019-04-11 6:40 ` NeilBrown
2019-04-11 12:44 ` Guenter Roeck
2019-04-01 23:07 ` [PATCH 2/4] rhashtable: allow rht_bucket_var to return NULL NeilBrown
2019-04-08 2:12 ` [PATCH 0/4 v2] Convert rhashtable to use bitlocks David Miller
2019-04-08 2:35 ` Herbert Xu
-- strict thread matches above, loose matches on Subject: below --
2019-03-25 5:05 [PATCH 0/4] " NeilBrown
2019-03-25 5:05 ` [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash bucket NeilBrown
2019-03-26 5:03 ` Herbert Xu
2019-03-26 15:36 ` Paul E. McKenney
2019-03-27 3:45 ` Herbert Xu
2019-03-26 22:35 ` NeilBrown
2019-03-27 3:45 ` Herbert Xu
2019-03-27 15:04 ` Paul E. McKenney
2019-03-26 5:27 ` Herbert Xu
2019-03-26 22:40 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r2a9xt79.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=herbert@gondor.apana.org.au \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@roeck-us.net \
--cc=netdev@vger.kernel.org \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.