Optimizing Single-Core CPU software

Fully exploiting the abilities of modern hardware is hard. Commonly written software does not fully exploit all of the underlying hardware optimization abilities, and often needs specialized software to achieve better utilization. This post will explore some of the most common optimization techniques for single-core binaries executing in modern hardware. We will cover concepts as Instruction-Level Parallelism, cache analysis for specific computations and briefly touch over vectorization. We conclude with a short example showcasing how these techniques are actually used and an analysis of these results. We report a maximum of 10x speedup over the standard naïve matrix-matrix multiplication code.

October 21, 2025 · Reading Time: 16 minutes ·  By Xuanqiang Angelo Huang

Hello World

First blog post

November 30, 2022 · Reading Time: 2 minutes ·  By Xuanqiang Angelo Huang