Skip to content

74 Benchmarking in C++

1. Benchmarking

Benchmarking here is not just a tool you use to benchmark your code; it's about how to correctly write C++ code if you want to measure its performance. There are multiple ways to measure the performance of C++ code, and here we only discuss the method used by Cherno.

First, let's write some code that we want to test:

int main()
{
    int value = 0;
    for (int i = 0; i < 1000000; i++)
        value += 2;

    std::cout << value << std::endl;   // 2000000

    __debugbreak(); // VS-specific syntax for breakpoints
}

Now, to analyze how fast our code is, we can create a simple scoped timer (refer to 63 Timing in C++):

#include <chrono>

class Timer
{
private:
    std::chrono::time_point<std::chrono::high_resolution_clock> m_StartTimepoint;
public:
    Timer()
    {
        m_StartTimepoint = std::chrono::high_resolution_clock::now();
    }

    ~Timer()
    {
        Stop();
    }
    void Stop()
    {
        auto endTimepoint = std::chrono::high_resolution_clock::now();

        auto start = std::chrono::time_point_cast<std::chrono::microseconds>(m_StartTimepoint).time_since_epoch().count();

        auto end = std::chrono::time_point_cast<std::chrono::microseconds>(endTimepoint).time_since_epoch().count();

        auto duration = end - start;
        double ms = duration * 0.001;

        std::cout << duration << "us (" << ms << "ms)\n";
    }
};

int main()
{
    int value = 0; // Moved outside the scope to ensure it can be printed
    {
        Timer timer;
        for (int i = 0; i < 1000000; i++)
            value += 2;
    }
    std::cout << value << std::endl;
    __debugbreak();
}

Output of the timer:

It's important to ensure that what you're measuring is actually the compiled code, because in Release mode, the compiler optimizes the assembly instructions. In this example, it only records the time needed to print the variable 1E8480h (two million) (since the print is outside the scope, it even counts nothing):

Instead of the time you want to measure for adding a million times:

2. Performance Comparison of Smart Pointers

int main()
{
    struct Vector2
    {
        float x, y;
    };

    std::cout << "Make Shared\n";
    {
        std::array < std::shared_ptr<Vector2>, 1000> sharedPtrs;
        Timer timer;  // Not counting the time to create the array
        for (int i = 0; i < sharedPtrs.size(); i++)
            sharedPtrs[i] = std::make_shared<Vector2>();
    }
    std::cout << "New Shared\n";
    {
        std::array < std::shared_ptr<Vector2>, 1000> sharedPtrs;
        Timer timer;
        for (int i = 0; i < sharedPtrs.size(); i++)
            sharedPtrs[i] = std::shared_ptr<Vector2>(new Vector2());
    }
    std::cout << "Make Unique\n";
    {
        std::array < std::unique_ptr<Vector2>, 1000> sharedPtrs;
        Timer timer;
        for (int i = 0; i < sharedPtrs.size(); i++)
            sharedPtrs[i] = std::make_unique<Vector2>();
    }
    __debugbreak();
}

Measuring the performance comparison of shared_ptr and unique_ptr twice:

As expected, unique_ptr takes less time than shared_ptr, but the time difference between make_shared and new is minimal. An important thing to note is that we are actually analyzing in Debug mode, which has many additional safety measures that take time and are not ideal for measuring performance.

Switching to Release mode, we can see that make_shared is significantly faster than new:

Therefore, it's crucial to ensure that the code you're analyzing is meaningful in Release mode, as you won't be releasing code in Debug mode.