From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:46103)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <imammedo@redhat.com>) id 1bXTh0-00039m-8H
	for qemu-devel@nongnu.org; Wed, 10 Aug 2016 09:37:12 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <imammedo@redhat.com>) id 1bXTgv-0001co-OU
	for qemu-devel@nongnu.org; Wed, 10 Aug 2016 09:37:09 -0400
Received: from mx1.redhat.com ([209.132.183.28]:40930)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <imammedo@redhat.com>) id 1bXTgv-0001cQ-Fb
	for qemu-devel@nongnu.org; Wed, 10 Aug 2016 09:37:05 -0400
Date: Wed, 10 Aug 2016 15:36:59 +0200
From: Igor Mammedov <imammedo@redhat.com>
Message-ID: <20160810153659.4b0172bd@nial.brq.redhat.com>
In-Reply-To: <1465568813-19771-15-git-send-email-rth@twiddle.net>
References: <1465568813-19771-1-git-send-email-rth@twiddle.net>
	<1465568813-19771-15-git-send-email-rth@twiddle.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PULL 14/15] tb hash: track translated blocks with
 qht
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Richard Henderson <rth@twiddle.net>
Cc: qemu-devel@nongnu.org, peter.maydell@linaro.org, "Emilio G. Cota" <cota@braap.org>, Paolo Bonzini <pbonzini@redhat.com>, Peter Crosthwaite <crosthwaite.peter@gmail.com>, MTTCG Devel <mttcg@greensocs.com>

On Fri, 10 Jun 2016 07:26:52 -0700
Richard Henderson <rth@twiddle.net> wrote:

This patch make SIGSEGVs QEMU when debugging KVM guest with gdb

Steps to reproduce:

Seabios:
  clone and build current master with
     defconfig plus CONFIG_RELOCATE_INIT=3Dn on top of it

QEMU:
  ./configure --enable-debug --target-list=3Dx86_64-softmmu
  ./x86_64-softmmu/qemu-system-x86_64 -enable-kvm  -bios ../seabios/out/bio=
s.bin -s -S

GDB:
  gdb ../seabios/out/rom.o
  (gdb) b smp_setup
  (gdb) target remote localhost:1234
  (gdb) c
  Continuing.
  Remote connection closed


QEMU's backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x0000555555c19c6f in qht_reset_size (ht=3D0x555556240058 <tcg_ctx+216>, n_=
elems=3D0x8000) at util/qht.c:422
422	    if (n_buckets !=3D map->n_buckets) {
(gdb) bt
#0  0x0000555555c19c6f in qht_reset_size (ht=3D0x555556240058 <tcg_ctx+216>=
, n_elems=3D0x8000) at util/qht.c:422
#1  0x0000555555754bdd in tb_flush (cpu=3D0x0) at /home/imammedo/builds/qem=
u/translate-all.c:855
#2  0x000055555579026e in gdb_vm_state_change (opaque=3D0x0, running=3D0x0,=
 state=3DRUN_STATE_DEBUG) at /home/imammedo/builds/qemu/gdbstub.c:1276
#3  0x00005555558be59f in vm_state_notify (running=3D0x0, state=3DRUN_STATE=
_DEBUG) at vl.c:1585
#4  0x000055555578137e in do_vm_stop (state=3DRUN_STATE_DEBUG) at /home/ima=
mmedo/builds/qemu/cpus.c:743
#5  0x0000555555782b5e in vm_stop (state=3DRUN_STATE_DEBUG) at /home/imamme=
do/builds/qemu/cpus.c:1476
#6  0x00005555558bebdf in main_loop_should_exit () at vl.c:1856
#7  0x00005555558bed57 in main_loop () at vl.c:1912
#8  0x00005555558c678a in main (argc=3D0x6, argv=3D0x7fffffffe0c8, envp=3D0=
x7fffffffe100) at vl.c:4603


> From: "Emilio G. Cota" <cota@braap.org>
>=20
> Having a fixed-size hash table for keeping track of all translation blocks
> is suboptimal: some workloads are just too big or too small to get maximum
> performance from the hash table. The MRU promotion policy helps improve
> performance when the hash table is a little undersized, but it cannot
> make up for severely undersized hash tables.
>=20
> Furthermore, frequent MRU promotions result in writes that are a scalabil=
ity
> bottleneck. For scalability, lookups should only perform reads, not write=
s.
> This is not a big deal for now, but it will become one once MTTCG matures.
>=20
> The appended fixes these issues by using qht as the implementation of
> the TB hash table. This solution is superior to other alternatives consid=
ered,
> namely:
>=20
> - master: implementation in QEMU before this patchset
> - xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU.
> - xxhash-rcu: fixed buckets + xxhash + RCU list + MRU.
>               MRU is implemented here by adding an intermediate struct
>               that contains the u32 hash and a pointer to the TB; this
>               allows us, on an MRU promotion, to copy said struct (that i=
s not
>               at the head), and put this new copy at the head. After a gr=
ace
>               period, the original non-head struct can be eliminated, and
>               after another grace period, freed.
> - qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize +
>                    no MRU for lookups; MRU for inserts.
> The appended solution is the following:
> - qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize +
>                  no MRU for lookups; MRU for inserts.
>=20
> The plots below compare the considered solutions. The Y axis shows the
> boot time (in seconds) of a debian jessie image with arm-softmmu; the X a=
xis
> sweeps the number of buckets (or initial number of buckets for qht-autore=
size).
> The plots in PNG format (and with errorbars) can be seen here:
>   http://imgur.com/a/Awgnq
>=20
> Each test runs 5 times, and the entire QEMU process is pinned to a
> single core for repeatability of results.
>=20
>                             Host: Intel Xeon E5-2690
>=20
>   28 ++------------+-------------+-------------+-------------+-----------=
-++
>      A*****        +             +             +             master **A**=
* +
>   27 ++    *                                                 xxhash ##B##=
#++
>      |      A******A******                               xxhash-rcu $$C$$=
$ |
>   26 C$$                  A******A******            qht-fixed-nomru*%%D%%=
%++
>      D%%$$                              A******A******A*qht-dyn-mru A*E**=
**A
>   25 ++ %%$$                                          qht-dyn-nomru &&F&&=
&++
>      B#####%                                                             =
  |
>   24 ++    #C$$$$$                                                       =
 ++
>      |      B###  $                                                      =
  |
>      |          ## C$$$$$$                                               =
  |
>   23 ++           #       C$$$$$$                                        =
 ++
>      |             B######       C$$$$$$                                %=
%%D
>   22 ++                  %B######       C$$$$$$C$$$$$$C$$$$$$C$$$$$$C$$$$=
$$C
>      |                    D%%%%%%B######      @E@@@@@@    %%%D%%%@@@E@@@@=
@@E
>   21 E@@@@@@E@@@@@@F&&&@@@E@@@&&&D%%%%%%B######B######B######B######B####=
##B
>      +             E@@@   F&&&   +      E@     +      F&&&   +           =
  +
>   20 ++------------+-------------+-------------+-------------+-----------=
-++
>      14            16            18            20            22          =
  24
>                              log2 number of buckets
>=20
>                                  Host: Intel i7-4790K
>=20
>   14.5 ++------------+------------+-------------+------------+-----------=
-++
>        A**           +            +             +            master **A**=
* +
>     14 ++ **                                                 xxhash ##B##=
#++
>   13.5 ++   **                                           xxhash-rcu $$C$$=
$++
>        |                                            qht-fixed-nomru %%D%%=
% |
>     13 ++     A******                                   qht-dyn-mru @@E@@=
@++
>        |             A*****A******A******             qht-dyn-nomru &&F&&=
& |
>   12.5 C$$                               A******A******A*****A******    *=
**A
>     12 ++ $$                                                        A*** =
 ++
>        D%%% $$                                                           =
  |
>   11.5 ++  %%                                                            =
 ++
>        B###  %C$$$$$$                                                    =
  |
>     11 ++  ## D%%%%% C$$$$$                                              =
 ++
>        |     #      %      C$$$$$$                                       =
  |
>   10.5 F&&&&&&B######D%%%%%       C$$$$$$C$$$$$$C$$$$$$C$$$$$C$$$$$$    $=
$$C
>     10 E@@@@@@E@@@@@@B#####B######B######E@@@@@@E@@@%%%D%%%%%D%%%###B####=
##B
>        +             F&&          D%%%%%%B######B######B#####B###@@@D%%% =
  +
>    9.5 ++------------+------------+-------------+------------+-----------=
-++
>        14            16           18            20           22          =
  24
>                               log2 number of buckets
>=20
> Note that the original point before this patch series is X=3D15 for "mast=
er";
> the little sensitivity to the increased number of buckets is due to the
> poor hashing function in master.
>=20
> xxhash-rcu has significant overhead due to the constant churn of allocati=
ng
> and deallocating intermediate structs for implementing MRU. An alternative
> would be do consider failed lookups as "maybe not there", and then
> acquire the external lock (tb_lock in this case) to really confirm that
> there was indeed a failed lookup. This, however, would not be enough
> to implement dynamic resizing--this is more complex: see
> "Resizable, Scalable, Concurrent Hash Tables via Relativistic
> Programming" by Triplett, McKenney and Walpole. This solution was
> discarded due to the very coarse RCU read critical sections that we have
> in MTTCG; resizing requires waiting for readers after every pointer updat=
e,
> and resizes require many pointer updates, so this would quickly become
> prohibitive.
>=20
> qht-fixed-nomru shows that MRU promotion is advisable for undersized
> hash tables.
>=20
> However, qht-dyn-mru shows that MRU promotion is not important if the
> hash table is properly sized: there is virtually no difference in
> performance between qht-dyn-nomru and qht-dyn-mru.
>=20
> Before this patch, we're at X=3D15 on "xxhash"; after this patch, we're at
> X=3D15 @ qht-dyn-nomru. This patch thus matches the best performance that=
 we
> can achieve with optimum sizing of the hash table, while keeping the hash
> table scalable for readers.
>=20
> The improvement we get before and after this patch for booting debian jes=
sie
> with arm-softmmu is:
>=20
> - Intel Xeon E5-2690: 10.5% less time
> - Intel i7-4790K: 5.2% less time
>=20
> We could get this same improvement _for this particular workload_ by
> statically increasing the size of the hash table. But this would hurt
> workloads that do not need a large hash table. The dynamic (upward)
> resizing allows us to start small and enlarge the hash table as needed.
>=20
> A quick note on downsizing: the table is resized back to 2**15 buckets
> on every tb_flush; this makes sense because it is not guaranteed that the
> table will reach the same number of TBs later on (e.g. most bootup code is
> thrown away after boot); it makes sense to grow the hash table as
> more code blocks are translated. This also avoids the complication of
> having to build downsizing hysteresis logic into qht.
>=20
> Reviewed-by: Sergey Fedorov <serge.fedorov@linaro.org>
> Reviewed-by: Alex Benn=C3=A9e <alex.bennee@linaro.org>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Message-Id: <1465412133-3029-15-git-send-email-cota@braap.org>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  cpu-exec.c                | 86 +++++++++++++++++++++++------------------=
------
>  include/exec/exec-all.h   |  2 --
>  include/exec/tb-context.h |  7 ++--
>  include/exec/tb-hash.h    |  3 +-
>  translate-all.c           | 85 +++++++++++++++++++++--------------------=
-----
>  5 files changed, 86 insertions(+), 97 deletions(-)
>=20
> diff --git a/cpu-exec.c b/cpu-exec.c
> index b9e294c..b840e1d 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -225,57 +225,57 @@ static void cpu_exec_nocache(CPUState *cpu, int max=
_cycles,
>  }
>  #endif
> =20
> +struct tb_desc {
> +    target_ulong pc;
> +    target_ulong cs_base;
> +    CPUArchState *env;
> +    tb_page_addr_t phys_page1;
> +    uint32_t flags;
> +};
> +
> +static bool tb_cmp(const void *p, const void *d)
> +{
> +    const TranslationBlock *tb =3D p;
> +    const struct tb_desc *desc =3D d;
> +
> +    if (tb->pc =3D=3D desc->pc &&
> +        tb->page_addr[0] =3D=3D desc->phys_page1 &&
> +        tb->cs_base =3D=3D desc->cs_base &&
> +        tb->flags =3D=3D desc->flags) {
> +        /* check next page if needed */
> +        if (tb->page_addr[1] =3D=3D -1) {
> +            return true;
> +        } else {
> +            tb_page_addr_t phys_page2;
> +            target_ulong virt_page2;
> +
> +            virt_page2 =3D (desc->pc & TARGET_PAGE_MASK) + TARGET_PAGE_S=
IZE;
> +            phys_page2 =3D get_page_addr_code(desc->env, virt_page2);
> +            if (tb->page_addr[1] =3D=3D phys_page2) {
> +                return true;
> +            }
> +        }
> +    }
> +    return false;
> +}
> +
>  static TranslationBlock *tb_find_physical(CPUState *cpu,
>                                            target_ulong pc,
>                                            target_ulong cs_base,
>                                            uint32_t flags)
>  {
> -    CPUArchState *env =3D (CPUArchState *)cpu->env_ptr;
> -    TranslationBlock *tb, **tb_hash_head, **ptb1;
> +    tb_page_addr_t phys_pc;
> +    struct tb_desc desc;
>      uint32_t h;
> -    tb_page_addr_t phys_pc, phys_page1;
> =20
> -    /* find translated block using physical mappings */
> -    phys_pc =3D get_page_addr_code(env, pc);
> -    phys_page1 =3D phys_pc & TARGET_PAGE_MASK;
> +    desc.env =3D (CPUArchState *)cpu->env_ptr;
> +    desc.cs_base =3D cs_base;
> +    desc.flags =3D flags;
> +    desc.pc =3D pc;
> +    phys_pc =3D get_page_addr_code(desc.env, pc);
> +    desc.phys_page1 =3D phys_pc & TARGET_PAGE_MASK;
>      h =3D tb_hash_func(phys_pc, pc, flags);
> -
> -    /* Start at head of the hash entry */
> -    ptb1 =3D tb_hash_head =3D &tcg_ctx.tb_ctx.tb_phys_hash[h];
> -    tb =3D *ptb1;
> -
> -    while (tb) {
> -        if (tb->pc =3D=3D pc &&
> -            tb->page_addr[0] =3D=3D phys_page1 &&
> -            tb->cs_base =3D=3D cs_base &&
> -            tb->flags =3D=3D flags) {
> -
> -            if (tb->page_addr[1] =3D=3D -1) {
> -                /* done, we have a match */
> -                break;
> -            } else {
> -                /* check next page if needed */
> -                target_ulong virt_page2 =3D (pc & TARGET_PAGE_MASK) +
> -                                          TARGET_PAGE_SIZE;
> -                tb_page_addr_t phys_page2 =3D get_page_addr_code(env, vi=
rt_page2);
> -
> -                if (tb->page_addr[1] =3D=3D phys_page2) {
> -                    break;
> -                }
> -            }
> -        }
> -
> -        ptb1 =3D &tb->phys_hash_next;
> -        tb =3D *ptb1;
> -    }
> -
> -    if (tb) {
> -        /* Move the TB to the head of the list */
> -        *ptb1 =3D tb->phys_hash_next;
> -        tb->phys_hash_next =3D *tb_hash_head;
> -        *tb_hash_head =3D tb;
> -    }
> -    return tb;
> +    return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
>  }
> =20
>  static TranslationBlock *tb_find_slow(CPUState *cpu,
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index e076397..c1f59fa 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -215,8 +215,6 @@ struct TranslationBlock {
> =20
>      void *tc_ptr;    /* pointer to the translated code */
>      uint8_t *tc_search;  /* pointer to search data */
> -    /* next matching tb for physical address. */
> -    struct TranslationBlock *phys_hash_next;
>      /* original tb when cflags has CF_NOCACHE */
>      struct TranslationBlock *orig_tb;
>      /* first and second physical page containing code. The lower bit
> diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
> index 5efe3d9..e209c1c 100644
> --- a/include/exec/tb-context.h
> +++ b/include/exec/tb-context.h
> @@ -21,9 +21,10 @@
>  #define QEMU_TB_CONTEXT_H_
> =20
>  #include "qemu/thread.h"
> +#include "qemu/qht.h"
> =20
> -#define CODE_GEN_PHYS_HASH_BITS     15
> -#define CODE_GEN_PHYS_HASH_SIZE     (1 << CODE_GEN_PHYS_HASH_BITS)
> +#define CODE_GEN_HTABLE_BITS     15
> +#define CODE_GEN_HTABLE_SIZE     (1 << CODE_GEN_HTABLE_BITS)
> =20
>  typedef struct TranslationBlock TranslationBlock;
>  typedef struct TBContext TBContext;
> @@ -31,7 +32,7 @@ typedef struct TBContext TBContext;
>  struct TBContext {
> =20
>      TranslationBlock *tbs;
> -    TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
> +    struct qht htable;
>      int nb_tbs;
>      /* any access to the tbs or the page table must use this lock */
>      QemuMutex tb_lock;
> diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
> index 88ccfd1..1d0200b 100644
> --- a/include/exec/tb-hash.h
> +++ b/include/exec/tb-hash.h
> @@ -20,7 +20,6 @@
>  #ifndef EXEC_TB_HASH
>  #define EXEC_TB_HASH
> =20
> -#include "exec/exec-all.h"
>  #include "exec/tb-hash-xx.h"
> =20
>  /* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for
> @@ -49,7 +48,7 @@ static inline unsigned int tb_jmp_cache_hash_func(targe=
t_ulong pc)
>  static inline
>  uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t =
flags)
>  {
> -    return tb_hash_func5(phys_pc, pc, flags) & (CODE_GEN_PHYS_HASH_SIZE =
- 1);
> +    return tb_hash_func5(phys_pc, pc, flags);
>  }
> =20
>  #endif
> diff --git a/translate-all.c b/translate-all.c
> index d75737c..b620fcc 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -735,6 +735,13 @@ static inline void code_gen_alloc(size_t tb_size)
>      qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
>  }
> =20
> +static void tb_htable_init(void)
> +{
> +    unsigned int mode =3D QHT_MODE_AUTO_RESIZE;
> +
> +    qht_init(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE, mode);
> +}
> +
>  /* Must be called before using the QEMU cpus. 'tb_size' is the size
>     (in bytes) allocated to the translation buffer. Zero means default
>     size. */
> @@ -742,6 +749,7 @@ void tcg_exec_init(unsigned long tb_size)
>  {
>      cpu_gen_init();
>      page_init();
> +    tb_htable_init();
>      code_gen_alloc(tb_size);
>  #if defined(CONFIG_SOFTMMU)
>      /* There's no guest base to take into account, so go ahead and
> @@ -846,7 +854,7 @@ void tb_flush(CPUState *cpu)
>          cpu->tb_flushed =3D true;
>      }
> =20
> -    memset(tcg_ctx.tb_ctx.tb_phys_hash, 0, sizeof(tcg_ctx.tb_ctx.tb_phys=
_hash));
> +    qht_reset_size(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
>      page_flush_tb();
> =20
>      tcg_ctx.code_gen_ptr =3D tcg_ctx.code_gen_buffer;
> @@ -857,60 +865,46 @@ void tb_flush(CPUState *cpu)
> =20
>  #ifdef DEBUG_TB_CHECK
> =20
> -static void tb_invalidate_check(target_ulong address)
> +static void
> +do_tb_invalidate_check(struct qht *ht, void *p, uint32_t hash, void *use=
rp)
>  {
> -    TranslationBlock *tb;
> -    int i;
> +    TranslationBlock *tb =3D p;
> +    target_ulong addr =3D *(target_ulong *)userp;
> =20
> -    address &=3D TARGET_PAGE_MASK;
> -    for (i =3D 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) {
> -        for (tb =3D tcg_ctx.tb_ctx.tb_phys_hash[i]; tb !=3D NULL;
> -             tb =3D tb->phys_hash_next) {
> -            if (!(address + TARGET_PAGE_SIZE <=3D tb->pc ||
> -                  address >=3D tb->pc + tb->size)) {
> -                printf("ERROR invalidate: address=3D" TARGET_FMT_lx
> -                       " PC=3D%08lx size=3D%04x\n",
> -                       address, (long)tb->pc, tb->size);
> -            }
> -        }
> +    if (!(addr + TARGET_PAGE_SIZE <=3D tb->pc || addr >=3D tb->pc + tb->=
size)) {
> +        printf("ERROR invalidate: address=3D" TARGET_FMT_lx
> +               " PC=3D%08lx size=3D%04x\n", addr, (long)tb->pc, tb->size=
);
>      }
>  }
> =20
> -/* verify that all the pages have correct rights for code */
> -static void tb_page_check(void)
> +static void tb_invalidate_check(target_ulong address)
>  {
> -    TranslationBlock *tb;
> -    int i, flags1, flags2;
> -
> -    for (i =3D 0; i < CODE_GEN_PHYS_HASH_SIZE; i++) {
> -        for (tb =3D tcg_ctx.tb_ctx.tb_phys_hash[i]; tb !=3D NULL;
> -                tb =3D tb->phys_hash_next) {
> -            flags1 =3D page_get_flags(tb->pc);
> -            flags2 =3D page_get_flags(tb->pc + tb->size - 1);
> -            if ((flags1 & PAGE_WRITE) || (flags2 & PAGE_WRITE)) {
> -                printf("ERROR page flags: PC=3D%08lx size=3D%04x f1=3D%x=
 f2=3D%x\n",
> -                       (long)tb->pc, tb->size, flags1, flags2);
> -            }
> -        }
> -    }
> +    address &=3D TARGET_PAGE_MASK;
> +    qht_iter(&tcg_ctx.tb_ctx.htable, do_tb_invalidate_check, &address);
>  }
> =20
> -#endif
> -
> -static inline void tb_hash_remove(TranslationBlock **ptb, TranslationBlo=
ck *tb)
> +static void
> +do_tb_page_check(struct qht *ht, void *p, uint32_t hash, void *userp)
>  {
> -    TranslationBlock *tb1;
> +    TranslationBlock *tb =3D p;
> +    int flags1, flags2;
> =20
> -    for (;;) {
> -        tb1 =3D *ptb;
> -        if (tb1 =3D=3D tb) {
> -            *ptb =3D tb1->phys_hash_next;
> -            break;
> -        }
> -        ptb =3D &tb1->phys_hash_next;
> +    flags1 =3D page_get_flags(tb->pc);
> +    flags2 =3D page_get_flags(tb->pc + tb->size - 1);
> +    if ((flags1 & PAGE_WRITE) || (flags2 & PAGE_WRITE)) {
> +        printf("ERROR page flags: PC=3D%08lx size=3D%04x f1=3D%x f2=3D%x=
\n",
> +               (long)tb->pc, tb->size, flags1, flags2);
>      }
>  }
> =20
> +/* verify that all the pages have correct rights for code */
> +static void tb_page_check(void)
> +{
> +    qht_iter(&tcg_ctx.tb_ctx.htable, do_tb_page_check, NULL);
> +}
> +
> +#endif
> +
>  static inline void tb_page_remove(TranslationBlock **ptb, TranslationBlo=
ck *tb)
>  {
>      TranslationBlock *tb1;
> @@ -998,7 +992,7 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page=
_addr_t page_addr)
>      /* remove the TB from the hash list */
>      phys_pc =3D tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
>      h =3D tb_hash_func(phys_pc, tb->pc, tb->flags);
> -    tb_hash_remove(&tcg_ctx.tb_ctx.tb_phys_hash[h], tb);
> +    qht_remove(&tcg_ctx.tb_ctx.htable, tb, h);
> =20
>      /* remove the TB from the page list */
>      if (tb->page_addr[0] !=3D page_addr) {
> @@ -1128,13 +1122,10 @@ static void tb_link_page(TranslationBlock *tb, tb=
_page_addr_t phys_pc,
>                           tb_page_addr_t phys_page2)
>  {
>      uint32_t h;
> -    TranslationBlock **ptb;
> =20
>      /* add in the hash table */
>      h =3D tb_hash_func(phys_pc, tb->pc, tb->flags);
> -    ptb =3D &tcg_ctx.tb_ctx.tb_phys_hash[h];
> -    tb->phys_hash_next =3D *ptb;
> -    *ptb =3D tb;
> +    qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
> =20
>      /* add in the page list */
>      tb_alloc_page(tb, 0, phys_pc & TARGET_PAGE_MASK);