This artifact contains the Tool InstrSem as well as code and detailed explanation to reproduce results in the paper which do not rely on special hardware.
Code and information for results relying on special hardware is also included but might require some additional tinkering depending on the setup.
To setup the artifact, you need to have docker installed.
Further, your user must be part of the docker group or all scripts must be run as root (not recommended).
To setup the docker container required for evaluation, you can use the setup_docker.sh script:
bash setup_docker.sh
The docker container relies on QEMU to run InstrSem on different architectures. No special hardware is necessary for the artifact evaluation.
If you want to evaluate case-studies of the paper that rely on specific hardware and you have this hardware available refer to the Other Case Studies section.
This artifact includes everything required to reverse-engineer GhostWrite, the undocumented Apple Mul53 extension, and undocumented P550 instructions if you have the required hardware.
We will start with a basic demonstration of InstrSem on different instructions of different ISAs.
For this, you can run the script basic_demonstration.py after setting up the docker container:
python3 basic_demonstration.py
The expected output is contained in results/basic_demonstration.txt.
Note that the actual output might slightly differ (e.g., order of registers for the encoding may be different).
The script uses InstrSem to find the semantics of Aarch64, RISC-V, Loongarch64, and x86-64 instructions. For each instruction, the mnemonic will be printed, followed by the command used to reverse it with InstrSem. Then, once InstrSem has terminated, the recovered semantics and information how registers and immediates are encoded are printed.
For instance, the following output is given for the Aarch64 add x1, x2, x3 instruction:
=== add x1, x2, x3 ===
docker run --rm -i -v ./instr_reverser:/instr_reverser artifact_instrsem python3 main.py aarch64-socket 0x8b030041 0 32 2 500
x2 -> gpr_a: x2: 5|6|7|8|9
x1 -> gpr_b: x1: 0|1|2|3|4
x3 -> gpr_c: x3: 16|17|18|19|20
variable bits: [5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 16, 17, 18, 19, 20]
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0
reg_a 4 3 2 1 0
reg_b 4 3 2 1 0
reg_c 4 3 2 1 0
pc = (pc + 0x4)
pstate = 0x0
gpr_b = (gpr_a + gpr_c)
We can see that InstrSem figured out that register x2 is encoded using bits 5-9 of the instruction.
Similar, registers x1 and x3 are encoded using instruction bits 0-4 and 16-20 respectively.
All these instruction bits can be changed and the encoding semantics are still known.
Thus, these bits are thus listed as variable bits.
Below, InstrSem visualizes the encoding of the instruction by first displaying fixed bits (bits 10-15 and 21-31). The fixed bits are followed by bits used to encode registers (reg_a using bits 5-9, reg_b using bits 0-5, and reg_c using bits 16-20).
Then, the generalized semantics follows.
This instruction increases the pc by 4 (pc = (pc + 0x4)).
pstate = 0x0 is an artifact of our Aarch64 runner and can be ignored.
Lastly gpr_b = (gpr_a + gpr_c) describes that the general purpose register that is encoded using the bits of reg_b is obtained by adding two other general purpose registers.
This basic demonstration should verify the following claims of the paper:
- InstrSem is generic and can recover and generalize instruction semantics across ISAs
- InstrSem can uncover the semantics of undocumented Loongarch64 vector instructions (the last Loongarch64 instruction is a SIMD vector addition)
To replicate the results of our evaluation on the RISC-V integer instructions (Table 3 in the paper), you can run the eval_riscv64.py script.
The script will output runtime relevant information (semantics of currently reversed instruction etc.) in stderr and create the table in markdown format in stdout.
We recommend running it the following way:
python3 -u eval_riscv64.py > table_rv64.md
It should output the semantics of instructions to stderr.
Once it is done executing, the table in table_rv64.md should roughly match Table 3 in the paper (also in results/eval_riscv64.md).
The column indicating success should be equivalent.
The column indicating runtime should be similar.
Because of some optimizations that were missing during our evaluation in the paper and differences in hardware, slightly different runtimes are possible.
However, we do not expect the runtimes to be significantly higher than the table in the paper unless you are runnning a weak CPU (the table in the paper was created using a Ryzen 7 5700U).
The script produces a table which contains one row for each tested instruction.
For all 64-bit RISC-V Integer instructions (except jalr, more on that later), we supply exactly one encoding to InstrSem and expect that it can recover and generalize the full semantics.
For jalr two candidates (one with odd and one with even immediate) are required for InstrSem to cover the whole semantics (for each, one bit of the immediate is not generalized).
The table contains the following columns:
Encoding: The encoding that was supplied to InstrSemWorks: Whether correcct semantics were obtained and generalizedTime: Runtime of InstrSemBits: Amount of bits in the instruction encoding that are variable, i.e., encode a register or immediateDepth: Parameter passed to InstrSem to limit the maximum size of syntax trees for semantics recoverySamples: Parameter passed to InstrSem to decide how many input-output samples are collected
- InstrSem can recover 34 of 38 64-bit RISC-V integer instructions
In the paper, we show that InstrSem is also applicable to complex ISAs like x86 by building a very basic x86-64 backend.
We use it on the 20 most common mnemonics we encounter in the libc.
To run the same evaluation, you can use the eval_x86-64.py script:
python3 -u eval_x86-64.py > table_x86-64.md
We expect the recovered table to match the table in results/eval_x86-64.md apart from minor runtime differences.
The table is created like the one for eval_riscv64.py and contains the same information.
However, the Bits column of the table is not correct for failing instructions.
- InstrSem is also applicable to complex ISAs like x86-64
- More concretly: InstrSem correctly recovers the semantics of 13 of 20 x86-64 instructions most commonly found in libc with a minimal x86-64 backend
In the paper, we use InstrSem on the full loongarch64 encoding space on QEMU and on hardware (Loongson 3A5000 CPU).
This requires first creating a bitmap of all instructions that exist.
As this bitmap is slow to create and compresses well, it is included with the artifact.
Then, a random instruction from the encoding space is chosen and supplied to InstrSem.
If InstrSem is able to reverse-engineer and generalize it, all bits covered by the instruction encoding are cleared in the bitmap and the result is written to instr_reverser/out in a file named after the instruction encoding.
You can run this tool using:
python3 eval_loongarch64.py
We encourage you to run this evaluation for 10h but shorter or longer runs are also possible. When you are done, you can use CTRL+C to terminate the evaluation. This might still lead to an active docker container which you can kill. You can also interrupt the evaluation using CTRL+C (only shortly press it to raise a single KeyboardInterrupt). Then when you later start the skript again, it will continue instead of starting from scratch.
We expect the tool to create a multitude of files in the out directory.
Each file is named after the encoding that was supplied to InstrSem and contains the recovered instruction semantics.
We further expect that multiple instruction semantics are reported in stdout.
Additionally, whenever an instruction is found, the script prints the runtime (in seconds) and total number of recovered encodings to stdout.
If you run the evaluation for 10 hours, we expect at least 600 million instruction encodings to be covert.
The script runs InstrSem on random instructions of the Loongarch64 encoding space.
If semantics are found, they are printed to stdout and stored in a file in the out directory.
- InstrSem can be run on a full RISC instruction set
- The amount of covered encodings drastically increases due too InstrSem's generalization step
In the paper we claim that InstrSem finds QEMU crashes. We added an additional script that triggers a QEMU crash:
You can run InstrSem on the crashing instruction using:
python3 eval_qemu_crash.py
We expect multiple QEMU crashes:
(...)
**
ERROR:../target/loongarch/tcg/insn_trans/trans_vec.c.inc:3574:vldi_get_value: code should not be reached
Bail out! ERROR:../target/loongarch/tcg/insn_trans/trans_vec.c.inc:3574:vldi_get_value: code should not be reached
(...)
Finally, InstrSem gives up on reversing because sample collection failed too many times:
(...)
Exception: too many sample collections failed!
The instruction encoding implemented in QEMU is too lenient and allows illegal variants of an instruction to decode.
Then, when the instruction is emulated, the illegally decoded instruction cannot be handled and an assertion is triggered.
The issues were reported to QEMU maintainers and are fixed on the master branch.
- InstrSem triggers QEMU crashes
Below is information to run case studies that require specific hardware. As this relies on specific hardware and some additional modifications might be required depending on the software stack, we do not expect artifact reviewers to run all of them and reproduce all of our results. The above artifacts should already verify the main claims of our paper. If you have access to the required hardware, feel free to run a subset of the case studies below. As we cannot control or provide the full environment for the hardware, it might not be as straight-forward as the architecture independent case studies.
In out paper, we reverse-engineer the GhostWrite instruction encodings.
GhostWrite is an undocumented unprivileged instruction on T-Head C910 CPUs that writes directly to physical memory.
For this, we patch the riscv-64 runner to map parts /dev/mem to memory as an identity mapping (thus simulating physical memory).
This patch can be found in ghostwrite-runner.patch.
It has to be applied to instr_reverser/backend/runner_socket.c.
Then you can re-compile the riscv-64 runner using make riscv64-socket in instr_reverser/backend.
Note that the runner should now no longer be used for other instructions!
To not write to memory regions not contained in the identity mapping, we further added constraints to all riscv64 general purpose registers.
This can be done by changing line 19 of instr_reverser/backend/riscv64.py from:
instr_rev.Register(REG_NAMES[i], 64, [], [(5, i)], encoding_group="gpr") for i in range(32) # general purpose registers
to
instr_rev.Register(REG_NAMES[i], 64, [solver.create_less_constraint(REG_NAMES[i], 64, MAPPING_END - 0x4000), solver.create_greatereq_constraint(REG_NAMES[i], 64, MAPPING_START + 0x4000)], [(5, i)], encoding_group="gpr") for i in range(32) # general purpose registers
Where MAPPING_START and MAPPING_END should be replaced by start and end addresses of the physical memory mapping.
(See runner_socket.c after patching. You may need to adjust those ranges depending on your hardware)
With these patches, you can start the reverser using the remote runner (riscv64-socket-remote):
docker run --rm -it -p 13372:13372 -v ./instr_reverser:/instr_reverser artifact_instrsem python3 main.py riscv64-socket-remote 0b00010000000000001000000100100111 13372 32 2 300
(Encoding for GhostWrite: 0b00010000000000001000000100100111)
This will listen on port 13372 for a runner to connect.
Then, you start the instr_reverser/backend/runner_socket_riscv64 on the affected riscv machine by supplying the ip and port of the remote runner as integers (i.e., 192.168.178.2 should be supplied as 0xc0a8b202).
This should connect the runner to the reverser and the reverser should start working.
In some cases, the runner might crash.
Thus, it can be helpful to start the runner in an infinite loop on the affected device.
If you also want InstrSem to output the set of bits that can be changed without modifying the instruction semantics, you have to uncomment line 616 in instr_reverser/clusterer.py:
flippy_bits = self._cluster_alias(instruction, semantics, initial_state)
Note that, depending on the kernel version, the signal handling structs might differ. If this is the case, deeper modifications to the runner code are required.
The Apple M1 and M2 CPUs contain an undocumented extension for 53-bit simd multiplication.
This can be reversed if a Apple M1 device running Asahi Linux is available.
For this, no patches to the runner are necessary.
You can use a similar remote runner setup as for GhostWrite.
Instead of riscv64-socket-remote, aarch64-vector-socket-remote should be used:
docker run --rm -it -p 13372:13372 -v ./instr_reverser:/instr_reverser artifact_instrsem python3 main.py aarch64-vector-socket-remote 0b00000000001000000000010000100010 13372 32 3 300
(Encoding for mul53hi.2d: 0b00000000001000000000010000100010, encoding for mul53lo.2d: 0b00000000001000000000000000100010)
The instr_reverser/backend/runner_socket_aarch64-vector binary should be executed on the affected machine the same as instr_reverser/backend/runner_socket_riscv64 for reversing GhostWrite.
For mul53hi.2d we limit the vector registers to 53 bits.
This can be done by modifying line 21 of instr_reverser/backend/aarch64.py from:
instr_rev.Register(f"v{i}", 16*8, [], [(5, i)], encoding_group="vec") for i in range(32)
to:
instr_rev.Register(f"v{i}", 16*8, [solver.create_less_constraint(f"v{i}", 16*8, 1 << 53)], [(5, i)], encoding_group="vec") for i in range(32)
Note that, depending on the kernel version, the signal handling structs might differ. If this is the case, deeper modifications to the runner code are required.
The SiFive P550 CPU contains undocumented instructions including minimum or maximum instructions. These can be reverse-engineered using InstrSem with the default riscv64 runner.
For this, the remote setup has to be used. The reverser is executed using the following command:
docker run --rm -it -p 13372:13372 -v ./instr_reverser:/instr_reverser artifact_instrsem python3 main.py riscv64-socket-remote 0b01011100000100010001000110001011 13372 32 2 300
(encoding for signed maximum: 0b01011100000100010001000110001011)
It will then listen on port 13372 for the runner which has to be started on the affected device and passed the ip and port of the reverser as integers (see Reversing GhostWrite for more information).
Note that, depending on the kernel version, the signal handling structs might differ. If this is the case, deeper modifications to the runner code are required.