# What is a VM (Virtual Machine)

A **virtual machine (VM)** is basically a type of program that works like an **interpreter**. But instead of reading and running regular code written by programmers, it reads and runs something called **bytecode**.

## What is Bytecode?

**Bytecode** is a compact, machine-readable instruction set designed for efficient execution by software interpreters. Unlike source code, it uses numeric codes and references to represent the structure and logic of a program after compilation.

For example:

* In **Java**, the code you write gets turned into **bytecode** first. Then, the **Java Virtual Machine (JVM)** reads and runs that **bytecode**.
* In **Python**, the `.py` files you write are turned into **bytecode** files (`.pyc`), which the Python interpreter runs.

## How Does a VM work?

A **VM** reads **bytecode**, figures out what it’s supposed to do, and then performs those actions step by step. Here’s how it usually works:

* **Reads Bytecode**: The **VM** looks at the **bytecode** one instruction at a time.
* **Understands It**: The **VM** translates that **bytecode** into actions (like adding numbers, printing text, or moving data around).
* **Runs It**: The **VM** carries out those actions on the computer it’s running on.

The **VM** is like a bridge between **bytecode** and the actual computer.

## Why use a VM?

* **Works Everywhere**: **Bytecode** can run on any computer that has the right **VM**. This is why Java is called "write once, run anywhere."
* **Safe to Run**: The **VM** makes sure the program doesn’t harm your computer by running it in a controlled environment.
* **Adds Layers of Complexity**: Reverse engineers must analyze both the **VM** and the **bytecode** it executes, significantly increasing the effort and time required to understand the code.

## Examples of VMs

* **Java Virtual Machine (JVM)**:
  * Reads Java **bytecode** and runs it on any device with the **JVM** installed.
* **Python Interpreter**:
  * Reads Python **bytecode** (`.pyc` files) and executes it step by step.
* **JavaScript Engines** (like V8 in Chrome):
  * Reads JavaScript code, turns it into **bytecode**, and runs it.

## Virtual Machines (VMs) in CTF Challenge

In **CTF** challenges, virtual machines are used to mess with the program's logic and test your reverse engineering skills. You'll need to dig into a custom **VM** and its **bytecode** to grab the flag.

## Why?

* Obfuscation: The program's real logic is buried under a mess of custom **bytecode** and a **VM**, making it tough to crack.
* Complexity: You gotta reverse-engineer the whole **VM**, not just the program.
* Customization: Creators can throw in funky instruction sets and weird flows to throw you off.

*For making your life harder* 😅

## How?

#### Get the Problem

Start by checking out the binary or script you got. Look for clues about the **VM**:

* Does it have custom **bytecode**?
* Are there functions that handle the instructions?

#### **Disassemble and Analyze the VM**

Use tools like **IDA**, **Ghidra**, or **Binary Ninja** reverse-engineer the **VM**:

* Find the opcode handler or the instruction dispatcher.
* Map each opcode to its thing (math, memory stuff, jumps).
* Look at the **VM**’s memory structure—registers, stack, or program counter.

#### **Simulate the VM**

Instead of stepping through it manually, code an emulator in Python or C. It’s easier than you think.

* Implement the **VM**'s instruction set.
* Feed the **bytecode** to your emulator and watch what happens.

#### **Debug the Execution**

If coding an emulator is too much, use a debugger like **GDB** or **IDA's debugger**:

* Set breakpoints in opcode handlers
* Watch the registers, memory, or program counter to see how it runs.

## VM Implementation with Simple If Statements

Writing a simple **VM** isn’t that hard. At the end of the day, a **VM** is just a loop that reads **bytecode** and runs it using a bunch of if statements (or something like that) for each opcode. Here’s a basic example:

* Instruction set: Each **bytecode** (**opcode**) triggers a specific operation.
* Execution: The **VM** uses if-else statements to run the right function for each opcode.

```python
def run_vm(bytecode):
    pc = 0  # Program counter
    while pc < len(bytecode):
        opcode = bytecode[pc]
        if opcode == 0x01:
            add_instruction()
        elif opcode == 0x02:
            subtract_instruction()
        # More opcodes...
        pc += 1
```

The **VM** is pretty much just a bunch of if statements that link opcodes to specific tasks. This makes creating a custom **VM** for challenges a lot simpler and helps break down the **bytecode** in a clean way.

## Stack-based VM example

```c
#include <stdio.h>
#include <stdlib.h>

// Define the opcodes
#define PUSH 0x01  // Opcode to push data to the stack
#define STOP 0x02  // Opcode to stop the VM

// Define a stack with a fixed size
#define STACK_SIZE 10
unsigned char stack[STACK_SIZE];  // Stack to hold the values
int sp = -1;  // Stack pointer

// Function to push a value onto the stack
void push(unsigned char value) {
    if (sp < STACK_SIZE - 1) {
        stack[++sp] = value;
        printf("Pushed value 0x%X onto the stack.\n", value);
    } else {
        printf("Stack overflow!\n");
    }
}
// Function to stop the VM
void stop() {
    printf("VM stopped.\n");
}
// Function to execute the bytecode
void execute(unsigned char* bytecode, int bytecode_size) {
    int pc = 0;  // Program counter
    while (pc < bytecode_size) {
        unsigned char opcode = bytecode[pc];  // Get the current opcode
        pc++;

        if (opcode == PUSH) {
            unsigned char value = bytecode[pc];  // Get the immediate value
            pc++;
            push(value);  // Push the immediate value onto the stack
        } else if (opcode == STOP) {
            stop();  // Stop the VM
            break;
        } else {
            printf("Unknown opcode 0x%X\n", opcode);
            break;
        }
    }
}

// Function to print the final stack contents
void print_stack() {
    printf("Final stack contents: ");
    for (int i = 0; i <= sp; i++) {
        printf("'%c' ", stack[i]);
    }
    printf("\n");
}

int main() {
    // Bytecode for the VM: Push the ASCII values of "password"
    unsigned char bytecode[] = {
        PUSH, 0x70,  // 'p'
        PUSH, 0x61,  // 'a'
        PUSH, 0x73,  // 's'
        PUSH, 0x73,  // 's'
        PUSH, 0x77,  // 'w'
        PUSH, 0x6f,  // 'o'
        PUSH, 0x72,  // 'r'
        PUSH, 0x64,  // 'd'
        STOP         // Stop the VM
    };
    int bytecode_size = sizeof(bytecode) / sizeof(bytecode[0]);

    // Execute the bytecode on the VM
    execute(bytecode, bytecode_size);
    return 0;
}
```

The earlier example just shows how a basic **VM** works with simple **bytecode** and stack stuff. In tougher **CTFs**, you'll deal with memory tricks and complex opcodes. Understanding how the **VM** messes with memory and how opcodes interact with it is key to cracking those challenges.

## Challenge 10: CATBERT Ransomware

This challenge was one of my favorites in Flare-on 11. It had a cool **UEFI** firmware image with a sick VM that made me want you to get the hang of it.

Let's run the image within **QEMU**

```bash
qemu-system-x86_64 -drive format=raw,file=disk.img -biosbios.bin
```

<figure><img src="/files/kD4UcItfLuiLPTAU5cSr" alt=""><figcaption><p>Fig 1: SICK CAT!</p></figcaption></figure>

Just had to show you this sick cat!

Anyway, let’s move on and dig into the **VM** analysis.

<figure><img src="/files/RLmH0ay7PTbaqT9TNYM2" alt=""><figcaption><p>Fig 2: Sidekick is the GOAT</p></figcaption></figure>

I used **Binary Ninja's Sidekick** plugin to help me wrap my head around the VM ops. It had stack, memory, and a ton of other stuff going on... which was way too much for me!

<pre class="language-python"><code class="lang-python">def leftRotate(n, d, bits):
     return ((n &#x3C;&#x3C; d)|(n >> (bits - d))) &#x26; 0xff 
 
def rightRotate(n, d, bits):
     return ((n >> d)|(n &#x3C;&#x3C; (bits - d))) &#x26; 0xff

class SimpleVM:
    def __init__(self, bytecode, stack_size=1024, memory_size=1024):
        self.ip = 0
        self.bytecode = bytecode
        self.stack = [0] * stack_size
        self.stack_top = 0
        self.memory = [0] * memory_size
        self.result = 0

    def next_op(self):
        op = self.bytecode[self.ip]
        self.ip += 1
        return op

    def next_arg(self):
        arg = (self.bytecode[self.ip] &#x3C;&#x3C; 8) + self.bytecode[self.ip + 1]
        self.ip += 2
        return arg

    def pop(self):
        self.stack_top -= 1
        return self.stack[self.stack_top]

    def push(self, val):
        self.stack[self.stack_top] = (val &#x26; 0xffffffffffffffff)
        self.stack_top += 1

    def peek(self):
        return self.stack[self.stack_top - 1]

    def tos_ptr(self):
        return self.stack_top - 1

    def interpret(self):
        while self.ip &#x3C; len(self.bytecode):
            instruction = self.next_op()

            if instruction == 0x01:  # OP_PUSH #
                arg = self.next_arg()
                self.push(arg)
                print(f"0x{self.ip:04x}: OP_PUSH: Pushed 0x{arg:08x} onto the stack")

            elif instruction == 0x02:  # OP_LOADI #
                addr = self.next_arg()
                val = self.memory[addr]
                self.push(val)
                print(f"0x{self.ip:04x}: OP_LOADI: Pushed value 0x{val:08x} from memory address 0x{addr:08x} onto the stack")

            elif instruction == 0x03:  # OP_LOADADDI #
                addr = self.next_arg()
                val = self.memory[addr]
                val2 = self.pop()
                self.push(val + val2)
                print(f"0x{self.ip:04x}: OP_LOADADDI: Pushed the sum of 0x{val2:08x} from STACK, 0x{val2:08x} from memory address 0x{addr:08x} onto the stack")

            elif instruction == 0x04:  # OP_STOREI #
                addr = self.next_arg()
                val = self.pop()
                self.memory[addr] = val
                print(f"0x{self.ip:04x}: OP_STOREI: Stored 0x{val:08x} into memory address 0x{addr:08x}")

            elif instruction == 0x05:  # OP_LOAD #
                addr = self.pop()
                val = self.memory[addr]
                self.push(val)
                print(f"0x{self.ip:04x}: OP_LOAD: LOAD value 0x{val:08x} from memory address 0x{addr:08x} onto the stack")

            elif instruction == 0x06:  # OP_STORE #
                val = self.pop()
                addr = self.pop()
                self.memory[addr] = val
                print(f"0x{self.ip:04x}: OP_STORE: Stored 0x{val:08x} into memory address 0x{addr:08x}")

            elif instruction == 0x07:  # OP_DUP #
                val = self.peek()
                self.push(val)
                print(f"0x{self.ip:04x}: OP_DUP: Duplicated top stack value 0x{val:08x}")

            elif instruction == 0x08:  # OP_DISCARD #
                val = self.pop()
                val = self.pop()
                print(f"0x{self.ip:04x}: OP_POP: Popped top stack value 0x{val:08x}")

            elif instruction == 0x09:  # OP_ADD #
                val1 = self.pop()
                val2 = self.pop()
                result = val2 + val1
                self.push(result)
                print(f"0x{self.ip:04x}: OP_SUB: Summed 0x{val1:08x} from 0x{val2:08x}, result 0x{result:08x}")
                
            elif instruction == 0x0A:  # OP_ADDI #
                arg = self.next_arg()
                self.stack[self.tos_ptr()] += arg
                print(f"0x{self.ip:04x}: OP_ADD: Added 0x{arg:08x} to top stack value")

            elif instruction == 0x0B:  # OP_SUB #
                val1 = self.pop()
                val2 = self.pop()
                result = val1 - val2
                self.push(result)
                print(f"0x{self.ip:04x}: OP_SUB: Subtracted 0x{val1:08x} from 0x{val2:08x}, result 0x{result:08x}")

            elif instruction == 0x0C:  # OP_DIV #
                val1 = self.pop()
                val2 = self.pop()
                if val1 == 0:
                    print("OP_DIV: Division by zero error")
                    return -1
                result = val1 // val2
                self.push(result)
                print(f"0x{self.ip:04x}: OP_DIV: Divided 0x{val1:08x} by 0x{val2:08x}, result 0x{result:08x}")

            elif instruction == 0x0D:  # OP_MUL #
                val1 = self.pop()
                val2 = self.pop()
                result = val2 * val1
                self.push(result)
                print(f"0x{self.ip:04x}: OP_MUL: Multiplied 0x{val2:08x} by 0x{val1:08x}, result 0x{result:08x}")

            elif instruction == 0x0E:  # OP_JUMP #
                addr = self.next_arg()
                print(f"0x{self.ip:04x}: OP_JUMP: Jumped to address 0x{addr:08x}")
                self.ip = addr

            elif instruction == 0x0F:  # OP_JUMP_IF_TRUE #
                addr = self.next_arg()
                if self.pop():
                    print(f"0x{self.ip:04x}: OP_JUMP_IF_TRUE: Jumped to address 0x{addr:08x} because top stack value was true")
                    self.ip = addr

            elif instruction == 0x10:  # OP_JUMP_IF_FALSE #
                addr = self.next_arg()
                if not self.pop():
                    print(f"0x{self.ip:04x}: OP_JUMP_IF_FALSE: Jumped to address 0x{addr:08x} because top stack value was false")
                    self.ip = addr

            elif instruction == 0x11:  # OP_CMP_EQ #
                val1 = self.pop()
                val2 = self.pop()
                result = int(val2 == val1)
                self.push(result)
                print(f"0x{self.ip:04x}: OP_CMP_EQ: Compared 0x{val2:08x} == 0x{val1:08x}, result 0x{result:08x}")

            elif instruction == 0x12:  # OP_CMP_LT #
                val1 = self.pop()
                val2 = self.pop()
                result = int(val2 &#x3C; val1)
                self.push(result)
                print(f"0x{self.ip:04x}: OP_CMP_LT: Compared 0x{val2:08x} &#x3C; 0x{val1:08x}, result 0x{result:08x}")

            elif instruction == 0x13:  # OP_CMP_LE #
                val1 = self.pop()
                val2 = self.pop()
                result = int(val2 &#x3C;= val1)
                self.push(result)
                print(f"0x{self.ip:04x}: OP_CMP_LE: Compared 0x{val2:08x} &#x3C;= 0x{val1:08x}, result 0x{result:08x}")

            elif instruction == 0x14:  # OP_CMP_GT #
                val1 = self.pop()
                val2 = self.pop()
                result = int(val2 > val1)
                self.push(result)
                print(f"0x{self.ip:04x}: OP_CMP_GT: Compared 0x{val2:08x} > 0x{val1:08x}, result 0x{result:08x}")

            elif instruction == 0x15:  # OP_CMP_GE #
                val1 = self.pop()
                val2 = self.pop()
                result = int(val2 >= val1)
                self.push(result)
                print(f"0x{self.ip:04x}: OP_CMP_GE: Compared 0x{val2:08x} >= 0x{val1:08x}, result 0x{result:08x}")

            elif instruction == 0x16:  # OP_CMP_GEI #
                val1 = self.next_arg()
                val2 = self.pop()
                result = int(val2 >= val1)
                self.push(result)
                print(f"0x{self.ip:04x}: OP_CMP_GE: Compared 0x{val2:08x} >= 0x{val1:08x}, result 0x{result:08x}")

    
            elif instruction == 0x18:  # OP_DONE #
                print(f"0x{self.ip:04x}: OP_DONE")
                return 0

            elif instruction == 0x19:  # OP_SET_RES #
                val = self.pop()
                self.result = val
                print(f"0x{self.ip:04x}: OP_SET_RES: Stored value 0x{val:08x}")


            elif instruction == 0x1A:  # OP_XOR #
                val1 = self.pop()
                val2 = self.pop()
                result = val2 ^ val1
                self.push(result)
                print(f"0x{self.ip:04x}: OP_XOR: XORed 0x{val2:08x} with 0x{val1:08x}, result 0x{result:08x}")

            elif instruction == 0x1B:  # OP_OR #
                val1 = self.pop()
                val2 = self.pop()
                result = val2 | val1
                self.push(result)
                print(f"0x{self.ip:04x}: OP_OR: ORed 0x{val2:08x} with 0x{val1:08x}, result 0x{result:08x}")

            elif instruction == 0x1C:  # OP_AND #
                val1 = self.pop()
                val2 = self.pop()
                result = val2 &#x26; val1
                self.push(result)
                print(f"0x{self.ip:04x}: OP_AND: ANDed 0x{val2:08x} with 0x{val1:08x}, result 0x{result:08x}")

            elif instruction == 0x1D:  # OP_MOD #
                val1 = self.pop()
                val2 = self.pop()
                result = val2 % val1
                self.push(result)
                print(f"0x{self.ip:04x}: OP_MOD: 0x{val2:08x} modulo 0x{val1:08x}, result 0x{result:08x}")

            elif instruction == 0x1E:  # OP_SHL #
                val1 = self.pop()
                val2 = self.pop()
                result = val2 &#x3C;&#x3C; val1
                self.push(result)
                print(f"0x{self.ip:04x}: OP_SHL: Left shifted 0x{val2:08x} by 0x{val1:08x}, result 0x{result:08x}")

            elif instruction == 0x1F:  # OP_SHR #
                val1 = self.pop()
                val2 = self.pop()
                result = val2 >> (val1 % 64)
                self.push(result)
                print(f"0x{self.ip:04x}: OP_SHR: Right shifted 0x{val2:08x} by 0x{val1:08x}, result 0x{result:08x}")

            elif instruction == 0x20:  # OP_ROL32 #
                val1 = self.pop()
                val2 = self.pop()
                result = leftRotate(val2, val1, 32)
                self.push(result)
                print(f"0x{self.ip:04x}: OP_ROL32: Rotated left 0x{val2:08x} by 0x{val1:08x}, result 0x{result:08x}")

            elif instruction == 0x21:  # OP_ROR32 #
                val1 = self.pop()
                val2 = self.pop()
                result = rightRotate(val2, val1, 32)
                self.push(result)
                print(f"0x{self.ip:04x}: OP_ROR32: Rotated right 0x{val2:08x} by 0x{val1:08x}, result 0x{result:08x}")

            elif instruction == 0x22:  # OP_ROL_16 #
                val1 = self.pop()
                val2 = self.pop()
                result = leftRotate(val2, val1, 16)
                self.push(result)
                print(f"0x{self.ip:04x}: OP_ROL_16: Rotated left 0x{val:08x} by 16 bits, result 0x{result:08x}")

            elif instruction == 0x23:  # OP_ROR_16 #
                val1 = self.pop()
                val2 = self.pop()
                result = rightRotate(val2, val1, 16)
                self.push(result)
                print(f"0x{self.ip:04x}: OP_ROR_16: Rotated right 0x{val:08x} by 16 bits, result 0x{result:08x}")

            elif instruction == 0x24:  # OP_ROL_8 #
                val1 = self.pop()
                val2 = self.pop()
                result = leftRotate(val2, val1, 8)
                self.push(result)
                print(f"0x{self.ip:04x}: OP_ROL_8: Rotated left 0x{val:08x} by 8 bits, result 0x{result:08x}")

            elif instruction == 0x25:  # OP_ROR_8 #
                val1 = self.pop()
                val2 = self.pop()
                result = rightRotate(val2, val1, 8)
                self.push(result)
                print(f"0x{self.ip:04x}: OP_ROR_8: Rotated right 0x{val:08x} by 8 bits, result 0x{result:08x}")


bytecode = [...]
<strong>vm = SimpleVM(bytecode)
</strong>vm.interpret()

</code></pre>

This is the final result of the **VM** I wrote in pure **Python**, which really helped me understand how a **VM** works and how to simulate it. This **VM** was actually based on [**PigletVM**](https://github.com/vkazanov/bytecode-interpreters-post/blob/master/pigletvm.c).

Building a **VM** is a solid project that gives you a real look at how computers tick on a low level. Start simple and add complexity, and you’ll end up with a beast of a **VM** that can run some pretty complex stuff. Whether you're making a language interpreter or playing with new architectures, the basics of **VMs** are always the same.

**Hope you enjoyed reading and keep cracking!**


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://iq0.gitbook.io/iq0/a/vm.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
