netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: Thomas Graf <tgraf@suug.ch>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash bucket.
Date: Thu, 11 Apr 2019 10:48:42 +1000	[thread overview]
Message-ID: <87r2a9xt79.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20190410193418.GA32402@roeck-us.net>

[-- Attachment #1: Type: text/plain, Size: 6550 bytes --]

On Wed, Apr 10 2019, Guenter Roeck wrote:

> Hi,
>
> On Tue, Apr 02, 2019 at 10:07:45AM +1100, NeilBrown wrote:
>> This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
>> bucket pointer to lock the hash chain for that bucket.
>> 
>> The benefits of a bit spin_lock are:
>>  - no need to allocate a separate array of locks.
>>  - no need to have a configuration option to guide the
>>    choice of the size of this array
>>  - locking cost is often a single test-and-set in a cache line
>>    that will have to be loaded anyway.  When inserting at, or removing
>>    from, the head of the chain, the unlock is free - writing the new
>>    address in the bucket head implicitly clears the lock bit.
>>    For __rhashtable_insert_fast() we ensure this always happens
>>    when adding a new key.
>>  - even when lockings costs 2 updates (lock and unlock), they are
>>    in a cacheline that needs to be read anyway.
>> 
>> The cost of using a bit spin_lock is a little bit of code complexity,
>> which I think is quite manageable.
>> 
>> Bit spin_locks are sometimes inappropriate because they are not fair -
>> if multiple CPUs repeatedly contend of the same lock, one CPU can
>> easily be starved.  This is not a credible situation with rhashtable.
>> Multiple CPUs may want to repeatedly add or remove objects, but they
>> will typically do so at different buckets, so they will attempt to
>> acquire different locks.
>> 
>> As we have more bit-locks than we previously had spinlocks (by at
>> least a factor of two) we can expect slightly less contention to
>> go with the slightly better cache behavior and reduced memory
>> consumption.
>> 
>> To enhance type checking, a new struct is introduced to represent the
>>   pointer plus lock-bit
>> that is stored in the bucket-table.  This is "struct rhash_lock_head"
>> and is empty.  A pointer to this needs to be cast to either an
>> unsigned lock, or a "struct rhash_head *" to be useful.
>> Variables of this type are most often called "bkt".
>> 
>> Previously "pprev" would sometimes point to a bucket, and sometimes a
>> ->next pointer in an rhash_head.  As these are now different types,
>> pprev is NULL when it would have pointed to the bucket. In that case,
>> 'blk' is used, together with correct locking protocol.
>> 
>> Signed-off-by: NeilBrown <neilb@suse.com>
>
> This patch causes my qemu q800 boot test to crash reliably.
>
> Starting network: Unable to handle kernel access at virtual address (ptrval)
> Oops: 00000000
> Modules linked in:
> PC: [<00248b90>] sk_filter_trim_cap+0x56/0x158
> SR: 2000  SP: (ptrval)  a2: 07a30aa0
> d0: 07836300    d1: 0783666c    d2: 00000001    d3: 0025c192
> d4: 0025a636    d5: 00248b3a    a0: 078363fe    a1: 60000000
> Process ip (pid: 66, task=(ptrval))
> Frame format=7 eff addr=6000000c ssw=0505 faddr=6000000c
> wb 1 stat/addr/data: 0000 00000000 00000000
> wb 2 stat/addr/data: 0000 00000000 00000000
> wb 3 stat/addr/data: 0000 6000000c 00000000
> push data: 00000000 00000000 00000000 00000000
> Stack from 07a5bdec:
>         07a5be5c 0025c192 0025a636 00248b3a 00000000 ef9febc8 078363fe 0787a2d0
>         0783645c 07a5be5c 0025c192 0025a636 00248b3a 0787a2d0 07a2c000 0025c470
>         078363fe 0787a2d0 00000001 00000000 00000000 07a5be98 07a17500 00000000
>         0787a2d0 07a2c000 07a5bef8 07a5beac 7fffffff 0025c7e2 07a2c000 0787a2d0
>         00000000 00000000 00000000 0000000c ef9feb14 ef9feb14 00000000 0031a52e
>         07a5bef8 07a5bf28 0000000c 0781eba0 00000000 00000042 00000000 00000000
> Call Trace: [<0025c192>] netlink_attachskb+0x0/0x138
>  [<0025a636>] __netlink_lookup.isra.3+0x0/0xbc
>  [<00248b3a>] sk_filter_trim_cap+0x0/0x158
>  [<0025c192>] netlink_attachskb+0x0/0x138
>  [<0025a636>] __netlink_lookup.isra.3+0x0/0xbc
>  [<00248b3a>] sk_filter_trim_cap+0x0/0x158
>  [<0025c470>] netlink_unicast+0x170/0x1be
>  [<0025c7e2>] netlink_sendmsg+0x288/0x2b2
>  [<0021a5be>] sock_sendmsg+0x1c/0x44
>  [<0021b8a6>] __sys_sendto+0xac/0xd2
>  [<00100000>] ext4_read_block_bitmap_nowait+0x4d4/0x4ec
>  [<00100000>] ext4_read_block_bitmap_nowait+0x4d4/0x4ec
>  [<00219906>] sock_alloc_file+0x50/0x80
>  [<000b7a1c>] fd_install+0x12/0x18
>  [<0021b1cc>] __sys_socket+0x7e/0x9c
>  [<0021b8ea>] sys_sendto+0x1e/0x24
>  [<00002a40>] syscall+0x8/0xc
>  [<0000c004>] ATANTBL+0x618/0x800
> Code: 4a89 6604 4280 60ea 2c2b 000c 2748 000c <2869> 000c 082c 0003 0002 6728 4878 0014 7620 4873 3800 486e ffec 4eb9 002e 5b88

Thanks for testing and for the report.
The above code disassembles to:

   0:	4a89           	tstl %a1
   2:	6604           	bnes 0x8
   4:	4280           	clrl %d0
   6:	60ea           	bras 0xfffffff2
   8:	2c2b 000c      	movel %a3@(12),%d6
   c:	2748 000c      	movel %a0,%a3@(12)
  10:*	2869 000c      	moveal %a1@(12),%a4		<-- trapping instruction
  14:	082c 0003 0002 	btst #3,%a4@(2)
  1a:	6728           	beqs 0x44
  1c:	4878 0014      	pea 0x14
  20:	7620           	moveq #32,%d3
  22:	4873 3800      	pea %a3@(0000000000000000,%d3:l)
  26:	486e ffec      	pea %fp@(-20)
  2a:	4eb9 002e 5b88 	jsr 0x2e5b88

And as %a1 is 60000000, it crashes.
I'm not familiar with m68k assembler, but a bit of hunting suggests that
  moveal %a1@(12),%a4

means "add 12 to %1, load an address from there, then dereference that
address again.
I think that both skb->sk and filter->prog are at offsets of 12, so I
guess
   8:	2c2b 000c      	movel %a3@(12),%d6
is
   struct sock *save_sk = skb->sk;

   c:	2748 000c      	movel %a0,%a3@(12)
is
  		skb->sk = sk;
and
  10:*	2869 000c      	moveal %a1@(12),%a4		<-- trapping instruction
  14:	082c 0003 0002 	btst #3,%a4@(2)
is
     bpf_prog_run_save_cb(filter->prog, skb);
                if (unlikely(prog->cb_access)) {

cb_access might be bit 3 of the byte 2 along from *prog, which is 12
along from a1.

That makes a1 'filter', loaded from sk->sk_filter, where 'sk' is %a0,
  078363fe

That address is 2-byte aligned, which is probably wrong.... If addresses
of structs aren't always 4-byte aligned, then using BIT(1) for a lock
bit isn't going to work.

.... and after googling a bit I see that 68000 require 2-byte alignment,
but not 4-byte.  Oh..

That means there aren't two spare bits in an address, so I cannot use
one for the NULLS and one for a lock bit.  Bother.

I might be able to find a different way forward, but for now I think we
need to drop this series.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2019-04-11  0:48 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-01 23:07 [PATCH 0/4 v2] Convert rhashtable to use bitlocks NeilBrown
2019-04-01 23:07 ` [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash bucket NeilBrown
2019-04-02 10:11   ` David Laight
2019-04-02 21:10     ` NeilBrown
2019-04-03  9:26       ` David Laight
2019-04-04  0:13         ` NeilBrown
2019-04-08  2:34   ` Herbert Xu
2019-04-10 19:34   ` Guenter Roeck
2019-04-11  0:48     ` NeilBrown [this message]
2019-04-11  2:15       ` David Miller
2019-04-11  6:13       ` NeilBrown
2019-04-11  6:40         ` NeilBrown
2019-04-11 12:44           ` Guenter Roeck
2019-04-01 23:07 ` [PATCH 4/4] rhashtable: add lockdep tracking to bucket bit-spin-locks NeilBrown
2019-04-01 23:07 ` [PATCH 2/4] rhashtable: allow rht_bucket_var to return NULL NeilBrown
2019-04-01 23:07 ` [PATCH 1/4] rhashtable: use cmpxchg() in nested_table_alloc() NeilBrown
2019-04-08  2:34   ` Herbert Xu
2019-04-08  2:12 ` [PATCH 0/4 v2] Convert rhashtable to use bitlocks David Miller
2019-04-08  2:35   ` Herbert Xu
  -- strict thread matches above, loose matches on Subject: below --
2019-03-25  5:05 [PATCH 0/4] " NeilBrown
2019-03-25  5:05 ` [PATCH 3/4] rhashtable: use bit_spin_locks to protect hash bucket NeilBrown
2019-03-26  5:03   ` Herbert Xu
2019-03-26 15:36     ` Paul E. McKenney
2019-03-27  3:45       ` Herbert Xu
2019-03-26 22:35     ` NeilBrown
2019-03-27  3:45       ` Herbert Xu
2019-03-27 15:04       ` Paul E. McKenney
2019-03-26  5:27   ` Herbert Xu
2019-03-26 22:40     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r2a9xt79.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=netdev@vger.kernel.org \
    --cc=tgraf@suug.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).