Discovering new blind Rowhammer attack approaches on Raspberry Pi 4B
Introduction
Rowhammer is a hardware based attack that exploits a vulnerability in DRAM modules. I explored Rowhammer attacks on Raspberry Pi 4B during my Masters degree at Carnegie Mellon University's Information Networking Institute. By repeatedly accessing a memory location, an attacker can induce voltage fluctuations and flip bits in nearby memory locations. This can lead to privilege escalation, data corruption or theft. Rowhammer attacks have been demonstrated on x86 systems using various techniques, such as CFLUSH, non-temporal instructions and cache eviction strategies. However, rowhammer attacks are not limited to x86 systems. I compared the results of my investigation with previous work on rowhammer attacks on Raspberry Pi 3B+ SBC based on ARMv8 architecture.
Raspberry Pi 4B is a popular SBC that offers a quad-core 64-bit ARM Cortex-A72 CPU, up to 8 GB of LPDDR4 RAM, and various peripherals. It runs a Linux-based operating system and supports user-level access to physical memory via /dev/mem
device. To launch a rowhammer attack on the Raspberry Pi 4B, I verified that the previous attacks against the Raspberry Pi 3B+ still worked against Raspberry 4B. To bypass the cache and access the DRAM directly, one possible way is to use ARM cache maintenance instruction, such as DC CIVAC (Data Cache Clean and Invalidate by Virtual Address to Point of Coherency), which can be executed in user mode and flushes the cache line containing the specified address. I implemented the methods in C (code yet to be open-sourced) using inline assembly and tested them on a Raspberry Pi 4B with 8 GB of RAM.I tested the implementation on both the physical system and also on an emulated virtual machine on the same host. This work was done with the help of Professor Patrick Tague! We used a simple rowhammer test program that allocates a large array of bytes in memory and randomly selects two addresses within the array that are likely to be in different rows of DRAM. We then repeatedly access these two addresses using DC CIVAC followed by Zero by Virtual Address (ZVA), while periodically checking the rest of the array for bit flips. We ran this program for several hours with different parameters, such as the number of accesses per iteration, the interval between iterations, and the size of the array. Our results indicated that rowhammer attacks are possible on the Raspberry Pi 4B using DC CIVAC and ZVA instruction. We found that it took very few iterations, in hundreds of rounds, to trigger bit flips when there were more than 20 candidate pages that were aligned for double rowhammer.
By repeatedly accessing a specific row of memory cells, an attacker can induce bit flips in adjacent rows, corrupting data and compromising security. If the attack is performed on two adjacent rows of the victim row, it speeds up the rowhammer attacks and is known as double sided hammering (refer Fig. 1). Our attacks leveraged double sided hammering.
Background
The rowhammer attack is executed by using Data Cache Clean and Invalidate by Virtual Address to PoC (DC CIVAC), Data Cache Zero by Virtual Address (ZVA) instructions without Data Synchronization Barrier(DSB). Past work has identified that DSB instructions are not needed and decrease the efficacy of rowhammer attacks. The ability to preload the data cache with zero values using the DC ZVA instruction is new in ARMv8-A. Cache line zeroing behaves in a similar fashion to a prefetch, as a way of hinting to the processor that certain addresses are likely to be used in the future. However, a zeroing operation can be much quicker as there is no need to wait for external memory accesses to complete. It is however unnoticed that the energy requirements of multiple ZVA actions is high, and triggering numerous ZVA actions quickly in correctly aligned physical pages of DRAM, can speed up the bit flips.
A sample of the code used to execute the rowhammer attack is given below:
asm volatile(
"dc civac, %0\n\t"
"dc civac, %1\n\t"
::"r" (page1), "r" (page2)
);
for (int j = 0; j < ROUNDS; j++) {
asm volatile(
"dc zva, %0\n\t"
"dc zva, %1\n\t"
::"r" (page1), "r" (page2)
);
}
Results and Discussion
In the prior work on Raspberry Pi 3B+, researchers found DC ZVA instructions triggering bit flips after 60 iterations, however, we found that bit flips were noticeable much sooner on the Raspberry Pi 4B. Using the ZVA instruction induces the most bit flips, although the other cache maintenance instructions show effectiveness in flipping bits. In addition, the first bit flip is obtained on average after the 8th round of hammering.
Heuristic Analysis of Aligned Pages
We allocated 32768 pages and wrote a single byte to each page after allocation to speed up the page allotment process. After accumulating the 4096 byte pages,we determine the physical page numbers of the virtual pages. From experimental data, we notice that, to perform a successful rowhammer attack, the rows for hammering must be 0x10000 apart in the physical space. This aligns the two physical pages for double row hammering. The victim page is the 8th page from the first page in this setup.
We performed the above allotment test, 500 times and observed the following results:
We identified that on average, 92% of the double rowhammer target pages in a process were actually 16 pages apart in virtual memory. In addition, we identified that up-to 5% of the total allocations in the 32768 pages were both 16 pages apart in virtual memory and also the correct target pages for double rowhammer.
This means, that an attacker can choose to blindly allocate a high amount of memory and have up to 5% chance of picking 2 such pages, that are 16 pages apart in virtual memory to be correctly aligned for double rowhammer ZVA based attack. With enough compute, we can easily cycle through all the pages in a less amount of time, because we are able to demonstrate the viability of the DC ZVA approach in under 10 cycles when we have the physical page numbers.
This information is crucial for attackers to be able to blindly accrue rows in an isolated process while a victim process is running.
To demonstrate viability of this attack, we would have to collect more data for two process page allocations to identify the correct page offset information needed to trigger bit flips in separate processes. We were able to demonstrate the single process based bit flips using blind heuristic attacks.
Garbage Collection and Isolated Process Attacks
We also looked into garbage collection from the kernel when performing blind isolated process attacks. If we have the capability to fork a new process, and continuously hammer somewhat aligned rows with 5% probability, then we have a higher chance of triggering bit flips in an isolated victim process. We were unable to trigger an actual bit flip with this approach, but with more heuristic data, we can improve our chances of triggering bit flips by leveraging kernel based garbage collection. The Linux kernel does lazy bindings on page allocations. Pages are not written to, unless the page is actually written to by a process. Similarly, pages are not mapped physically to a new process, unless the process writes to it. This CoW (copy on write) functionality is common for memory management in newer kernels.
Defenses against Heuristic driven DC ZVA Attacks
Based on my investigation and reading, Zeroing Virtual address based instructions cannot be moved to a trap based implementation in the kernel due to the performance reasons. Many popular tools, like gcc, leverage this instruction for improved performance. An additional delay in allocating pages across processes might delay heuristic attacks. Also an additional logic layer in page allocation could be implemented that denies the allocation of double rowhammer aligned pages to a single process.