In the majority of cases, the inbuilt benchmarking utility should be used. It can be accessed using the --benchmark
command on the Factorio executable.
Multiple tests should be conducted for the same map, to minimize the run to run variance.
Where applicable running system fans locked to 100% speed may be beneficial. This can reduce the chance of thermal throttling having a different magnitude of effect between maps/runs.
Designs should be scaled to around the 60UPS mark to ensure that non-linear scaling doesn't change results done at low production levels.
A MTU (minimum tilable unit) should be decided upon that accurately describes how a design would be built in a typical game. A good MTU would be the production neccessary for 10k SPM worth of a resource. Within that MTU you achieve the most representative view of what a typical game design would be. An example of this is achieving a certain level of beacon sharing within a MTU.