From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org
Subject: [Bug 106132] New: bar.sync encoding incorrect for GM107
Date: Thu, 19 Apr 2018 06:39:11 +0000
Message-ID:
Bug ID
106132
Summary
bar.sync encoding incorrect for GM107
Product
Mesa
Version
git
Hardware
All
OS
All
Status
NEW
Severity
normal
Priority
medium
Component
Drivers/DRI/nouveau
Assignee
nouveau@lists.freedesktop.org
Reporter
vriestj@gmail.com
QA Contact
nouveau@lists.freedesktop.org
[ Quoted text copied from
https://lists.freedesktop.org/archives/mesa-dev/2016-March/108926.h=
tml ]
I've been playing around with bar.sync in ptx, JIT-compiling it to GM107 (my
quadro m1200 card), and disassembling with cuobjdump -sass.
I looked at nv50_ir_emit_gm107.cpp to understand the instruction encoding.
> [Mesa-dev] [PATCH] gm107/ir: add emission for BAR
> Samuel Pitoiset samuel.pitoiset at gmail.com
> Tue Mar 1 17:44:42 UTC 2016
>
> + // barrier id
> + if (insn->src(0).getFile() =3D=3D FILE_GPR) {
> + emitGPR(0x08, insn->src(0));
> + } else {
> + ImmediateValue *imm =3D insn->getSrc(0)->asImm();
> + assert(imm);
> + emitField(0x08, 8, imm->reg.data.u32);
> + emitField(0x2b, 1, 1);
> + }
> +
> + // thread count
> + if (insn->src(1).getFile() =3D=3D FILE_GPR) {
> + emitGPR(0x14, insn->src(1));
> + } else {
> + ImmediateValue *imm =3D insn->getSrc(0)->asImm();
This should probably be using getSrc(1) ?
> + assert(imm);
> + emitField(0x14, 12, imm->reg.data.u32);
> + emitField(0x2c, 1, 1);
> + }
>From these examples:
...
x x
BAR.SYNC 0x0; /* 0xf0a81b8000070000 */
BAR.SYNC 0xf; /* 0xf0a81b8000070f00 */
BAR.SYNC R0; /* 0xf0a80b8000070000 */
BAR.SYNC R2; /* 0xf0a80b8000070200 */
...
I derive these offsets:
- barrier id : 0x08
- barrier id imm vs reg: 0x2c
and from these examples:
...
x x
BAR.SYNC 0x0; /* 0xf0a81b8000070000 */
BAR.SYNC 0x0, R2; /* 0xf0a8138000270000 */
...
I derive these offsets:
- thread count : 0x14
- thread count imm vs reg: 0x2b
But when looking at the code snippet above, the roles seem reversed: 0x2b is
used for barrier id, and 0x2c is used for the thread count.