[PATCH 0/3] RISC-V: Implement "generic subsets" for disassembler

Tsukasa OI research_trasio@irq.a4lg.com
Sat Jul 30 04:20:38 GMT 2022
This patch is postponed because I will merge this into a larger patchset
with general core disassembler improvements.

Thanks,
Tsukasa

On 2022/06/03 21:05, Tsukasa OI wrote:
> Hello,
> 
> I noticed another disassembler issue and this is a fix.
> 
> 
> As I submitted in my previous patchset:
> <https://sourceware.org/pipermail/binutils/2022-June/121159.html>,
> there is a problem while disassembling zext.h instruction on
> RV{32,64}_Zbb_Zbkb configuration.
> 
> On RV32, zext.h on Zbb is a specialized form of pack (Zbkb).
> On RV64, zext.h on Zbb is a specialized form of packw (Zbkb).
> 
> Note that, if Zbkb extension is disabled, zext.h is a single
> instruction with no "generalized" forms.
> 
> 
> When **both** Zbb and Zbkb extensions are enabled and a zext.h
> instruction is disassembled with `-M no-aliases', it should print
> a non-alias instruction (that is, either pack or packw).
> 
>     # Input source file
>     _start:
>         zext.h a0, a1   # Specialized form on Zbb
> 
>     # Expected (RV32_Zbb_Zbkb, objdump -d -M no-aliases)
>     80000028 <_start>:
>     80000028:	0805c533          	pack	a0,a1,zero
> 
> However, it prints an alias.
> 
>     # Actual (RV32_Zbb_Zbkb, objdump -d -M no-aliases)
>     80000028 <_start>:
>     80000028:	0805c533          	zext.h	a0,a1
> 
> Note that, depending on -march, an "alias" is no longer an alias
> (as I noted earlier).
> 
>     # Expected/Actual (RV32_Zbb, objdump -d -M no-aliases)
>     80000028 <_start>:
>     80000028:	0805c533          	zext.h	a0,a1
> 
> 
> In general, this kind of issue occurs when:
> 
> 1.  There are two instructions
>     (one of them is a specialized form of another).
> 2.  Those requirements are different (separate) but can co-exist.
> 
> Because of 2., both instructions cannot be simple aliases (INSN_ALIAS
> cannot be used).  However on non-alias instructions, if a match is
> found, riscv_disassemble_insn thinks this is it and quits too early.
> 
> Because zext.h is declared before pack and packw, generalized forms are
> not matched for zext.h.
> 
> 
> 
> 
> As a solution, I propose a new concept: generic subsets.
> 
> For instructions with INSN_GENERICS, opcode matching cannot early-quit.
> Instead it searches an instruction with:
> 
> -   Longest mask (by default; when -M no-aliases is NOT specified)
> -   Shortest mask (when -M no-aliases is specified)
> 
> "Length" of the mask equals population count.  More one bits on the mask
> means more specialized instruction form.
> 
> It fixes disassembler on following instructions and configurations:
> 
> -   zext.h  (Zbb)       <--> pack       (Zbkb; RV32)
> -   zext.h  (Zbb)       <--> packw      (Zbkb; RV64)
> 
> Note that INSN_GENERICS (new flag in PATCH 2) must be set **both** on
> specialized and generic forms.  In the example above, those instructions
> require INSN_GENERICS:
> 
> -   zext.h (two XLEN-specific forms)
> -   pack
> -   packw
> 
> This concept can be used to following instruction pairs where the same
> issue can occur once non-ratified instruction is ratified.
> 
> -   orc.b   (Zbb)       <--> gorci      (NOT RATIFIED)
> -   brev8   (Zbkb)      <--> grevi      (NOT RATIFIED)
> -   zip     (Zbkb)      <--> shfli      (NOT RATIFIED)
> -   unzip   (Zbkb)      <--> unshfli    (NOT RATIFIED)
> -   rev8    (Zbb/Zbkb)  <--> grevi      (NOT RATIFIED)
> 
> 
> 
> 
> PATCH 1: Reorganize riscv_disassemble_insn function
> 
> riscv_disassemble_insn function (before PATCH 1) works like this:
> 
>     FOREACH (opcode)
>     {
>         if (!opcode.MATCH(insn))
>             continue;
>         //
>         // Print insn with matched opcode.
>         //
>         return insnlen;
>     }
>     //
>     // opcode not matched.
>     //
> 
> Because instruction printing code is in the opcode loop, this is not
> good to fix the problem.  This patch reorganizes the function like this:
> 
>     matched_opcode = NULL;
>     // Step 1: opcode matching
>     FOREACH (opcode)
>     {
>         if (!opcode.MATCH(insn))
>             continue;
>         matched_opcode = opcode;
>         break;
>     }
>     // Step 2: opcode printing
>     if (matched_opcode != NULL)
>     {
>         //
>         // Print insn with matched_opcode.
>         //
>         return insnlen;
>     }
>     //
>     // opcode not matched.
>     //
> 
> PATCH 1 splits opcode loop to two separate steps: opcode matching and
> printing.  With this, we can modify opcode matching step to implement
> "generic subsets".
> 
> 
> 
> 
> PATCH 2: Implement "generic subsets"
> 
> With this commit, riscv_disassemble_insn function works like this:
> 
>     int masklen;
>     matched_opcode = NULL;
>     // Step 1: opcode matching
>     FOREACH (opcode)
>     {
>         if (!opcode.MATCH(insn))
>             continue;
>         if (!(opcode.pinfo & INSN_GENERICS))
>         {
>             matched_opcode = opcode;
>             break;  // Early-quit as before (for non generic subsets)
>         }
>         // Handling of generic subsets
>         if (matched_opcode == NULL)
>         {
>             matched_opcode = opcode;
>             // Set corresponding masklen.
>         }
>         else if (no_aliases)
>         {
>             // Prefer opcode with shortest mask.
>             // Update match_opcode and masklen where necessary.
>         }
>         else
>         {
>             // Prefer opcode with longest mask.
>             // Update match_opcode and masklen where necessary.
>         }
>     }
>     // Step 2: opcode printing (continues...)
> 
> PATCH 2 also adds INSN_GENERICS flag to two zext.h forms and pack/packw
> instructions.
> 
> 
> 
> 
> PATCH 3 contains additional testcases.
> 
> 
> 
> 
> Thanks,
> Tsukasa
> 
> 
> 
> 
> Tsukasa OI (3):
>   RISC-V: Split disasm opcode matching and printing
>   RISC-V: Implement "generic subsets" for disasm
>   RISC-V: Add testcases disassembling zext.h
> 
>  gas/testsuite/gas/riscv/zbb-zext.h-dis-1-1.d | 11 +++
>  gas/testsuite/gas/riscv/zbb-zext.h-dis-1-2.d | 11 +++
>  gas/testsuite/gas/riscv/zbb-zext.h-dis-1-3.d | 11 +++
>  gas/testsuite/gas/riscv/zbb-zext.h-dis-1.s   |  2 +
>  gas/testsuite/gas/riscv/zbb-zext.h-dis-2.d   | 11 +++
>  gas/testsuite/gas/riscv/zbb-zext.h-dis-2.s   |  4 +
>  include/opcode/riscv.h                       |  4 +
>  opcodes/riscv-dis.c                          | 93 ++++++++++++--------
>  opcodes/riscv-opc.c                          |  8 +-
>  9 files changed, 114 insertions(+), 41 deletions(-)
>  create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-1.d
>  create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-2.d
>  create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1-3.d
>  create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-1.s
>  create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.d
>  create mode 100644 gas/testsuite/gas/riscv/zbb-zext.h-dis-2.s
> 
> 
> base-commit: 625b6eae091709b95471eae92d42dc6bc71e6553


More information about the Binutils mailing list