Instruction set. The set includes:
Arithmetic instructions. ARM supports add, subtract and multiply instructions. Integer divide instructions are only implemented by ARM cores based on their architectures.
Registers. Registers R0 to R7 are unbanked, R13 (stack pointer) and R14 (link register) are banked, and R15 is the program counter. The current program status register (CPSR) holds APSR flags, current processor mode and interrupt disable flags.
Conditional execution. Every ARM instruction has a conditional execution feature called predication, which is implemented with a 4-bit condition code selector called predicate.
Pipelines. ARM7 has a three-stage pipeline; stages being fetch, decode and execute. Cortex-A8 has 13 stages.
Co-processors. Co-processors are used to extend the instruction set. The co-processor space is divided logically into 16 co-processors with numbers from zero to 15.
Debugging. All modern ARM processors include hardware debugging facilities, allowing software debuggers to perform operations such as halting, stepping and breakpointing of code starting from reset. These facilities are built using JTAG support, though some newer cores optionally support ARM’s own two-wire SWD protocol. ARMv7 architecture defines basic debugging facilities at an architectural level.
Thumb and Thumb-2. To improve compiled code-density, processors have featured Thumb instruction sets. When in this state, the processor executes the Thumb instruction set, a compact 16-bit encoding for a subset of the ARM instruction set. The space-saving comes from making some of the instruction operands implicit and limiting the number of possibilities, compared to ARM instructions executed in the ARM instruction set state.
Thumb-2 extends the limited 16-bit instruction set of Thumb with additional 32-bit instructions to give the instruction set more breadth, thus producing a variable-length instruction set. A stated aim for Thumb-2 was to achieve code density similar to Thumb with performance similar to the ARM instruction set on 32-bit memory. In ARMv7, this goal can be said to have been met.
Jazelle. Jazelle direct byte-code execution (DBX) is a technique that allows Java byte-code to be executed directly in the ARM architecture as a third execution state, alongside existing ARM and Thumb modes.
NEON. The advanced SIMD extension (NEON) is a combined 64-bit and 128-bit SIMD instruction set that provides standardised acceleration for media and signal-processing applications. NEON is included in all Cortex-A8 devices but is optional in Cortex-A9 devices. It features a comprehensive instruction set, separate register files and independent execution hardware.
TrustZone. This provides a low-cost alternative to adding an additional dedicated security core to an SoC by providing two virtual processors backed by hardware based access control. This lets the application core switch between two states, referred to as worlds, in order to prevent information from leaking from the more-trusted world to the less-trusted world.
ARMv8
ARMv8-A represents a fundamental change to ARM architecture. It adds a 64-bit architecture called AArch64 and a new A64 instruction set. AArch64 provides user-space compatibility with AArch32 and A32. Thumb instruction sets have no 64-bit counterpart. It allows 32-bit applications to be executed in a 64-bit operating system (OS), and a 32-bit OS to be under the control of a 64-bit hypervisor.
ARMv8 architecture has the following features:
1. New instruction set: A64
2. Thirty one general-purpose 64-bit registers
3. Dedicated stack pointer
4. Program counter is no longer accessible as a register
5. Most instructions can take 32-bit or 64-bit arguments
6. Addresses assumed to be 64-bit
7. Advanced SIMD (NEON) is enhanced
8. New exception system
Advantages of ARMv8 over ARMv7
ARMv8 uses program-counter relative addressing. The program counter and stack pointer are no more general-purpose registers, thereby increasing the number of general-purpose registers from 14 to 31. More registers reduce the need for register-to-stack copies. Double-precision vectors with IEEE support have been added. The peculiar interrupt modes and banked registers are mostly gone. The load/store multiple instructions have been replaced with load/store pairs. NEON has also improved. For the smartphone world, ARMv8 offers support of more than 4GB RAM.