criterion performance measurements
overview
want to understand this report?
Countdown Bench/by-hand
lower bound | estimate | upper bound | |
---|---|---|---|
OLS regression | xxx | xxx | xxx |
R² goodness-of-fit | xxx | xxx | xxx |
Mean execution time | 7.237494831455597e-6 | 7.363731809861872e-6 | 7.558046801999589e-6 |
Standard deviation | 4.080842187884806e-7 | 5.185310681781179e-7 | 6.967894150660691e-7 |
Outlying measurements have severe (0.7609492284411736%) effect on estimated standard deviation.
Countdown Bench/transformers
lower bound | estimate | upper bound | |
---|---|---|---|
OLS regression | xxx | xxx | xxx |
R² goodness-of-fit | xxx | xxx | xxx |
Mean execution time | 7.320434991306961e-6 | 7.490122370204433e-6 | 7.660580686151803e-6 |
Standard deviation | 4.942779303445531e-7 | 5.695653573836083e-7 | 6.877230924139728e-7 |
Outlying measurements have severe (0.7898271703514237%) effect on estimated standard deviation.
Countdown Bench/mtl
lower bound | estimate | upper bound | |
---|---|---|---|
OLS regression | xxx | xxx | xxx |
R² goodness-of-fit | xxx | xxx | xxx |
Mean execution time | 7.119459237710864e-6 | 7.274165790907763e-6 | 7.483650506412111e-6 |
Standard deviation | 4.510179846235928e-7 | 5.922871942810139e-7 | 8.202856152800278e-7 |
Outlying measurements have severe (0.8094752053992437%) effect on estimated standard deviation.
Countdown Bench/freer-simple
lower bound | estimate | upper bound | |
---|---|---|---|
OLS regression | xxx | xxx | xxx |
R² goodness-of-fit | xxx | xxx | xxx |
Mean execution time | 8.763250679937886e-4 | 8.957840405828633e-4 | 9.146825801738329e-4 |
Standard deviation | 5.2679607338642893e-5 | 6.307756032928624e-5 | 7.618033525113808e-5 |
Outlying measurements have severe (0.5740633925977642%) effect on estimated standard deviation.
Countdown Bench/fused-effects
lower bound | estimate | upper bound | |
---|---|---|---|
OLS regression | xxx | xxx | xxx |
R² goodness-of-fit | xxx | xxx | xxx |
Mean execution time | 7.298162123624447e-6 | 7.467936802066649e-6 | 7.731719788057568e-6 |
Standard deviation | 5.153549249469233e-7 | 6.806914076081651e-7 | 1.06654543879595e-6 |
Outlying measurements have severe (0.8448927682237%) effect on estimated standard deviation.
Countdown Bench/polysemy
lower bound | estimate | upper bound | |
---|---|---|---|
OLS regression | xxx | xxx | xxx |
R² goodness-of-fit | xxx | xxx | xxx |
Mean execution time | 4.0457838061555935e-3 | 4.1251221030614e-3 | 4.201218675015713e-3 |
Standard deviation | 2.0496847488316676e-4 | 2.511550886321523e-4 | 3.367285041553945e-4 |
Outlying measurements have moderate (0.37206319333955157%) effect on estimated standard deviation.
understanding this report
In this report, each function benchmarked by criterion is assigned a section of its own. The charts in each section are active; if you hover your mouse over data points and annotations, you will see more details.
- The chart on the left is a kernel density estimate (also known as a KDE) of time measurements. This graphs the probability of any given time measurement occurring. A spike indicates that a measurement of a particular time occurred; its height indicates how often that measurement was repeated.
- The chart on the right is the raw data from which the kernel density estimate is built. The x axis indicates the number of loop iterations, while the y axis shows measured execution time for the given number of loop iterations. The line behind the values is the linear regression prediction of execution time for a given number of iterations. Ideally, all measurements will be on (or very near) this line.
Under the charts is a small table. The first two rows are the results of a linear regression run on the measurements displayed in the right-hand chart.
- OLS regression indicates the time estimated for a single loop iteration using an ordinary least-squares regression model. This number is more accurate than the mean estimate below it, as it more effectively eliminates measurement overhead and other constant factors.
- R² goodness-of-fit is a measure of how accurately the linear regression model fits the observed measurements. If the measurements are not too noisy, R² should lie between 0.99 and 1, indicating an excellent fit. If the number is below 0.99, something is confounding the accuracy of the linear model.
- Mean execution time and standard deviation are statistics calculated from execution time divided by number of iterations.
We use a statistical technique called the bootstrap to provide confidence intervals on our estimates. The bootstrap-derived upper and lower bounds on estimates let you see how accurate we believe those estimates to be. (Hover the mouse over the table headers to see the confidence levels.)
A noisy benchmarking environment can cause some or many measurements to fall far from the mean. These outlying measurements can have a significant inflationary effect on the estimate of the standard deviation. We calculate and display an estimate of the extent to which the standard deviation has been inflated by outliers.