As systems get more complex, the need for machine-assisted performance analysis grows. Kernighan and Pike note “Measurement is a crucial component of performance improvement since reasoning and intuition are fallible guides and must be supplemented with tools like timing commands and profilers.” [tpop].
Linux is generally well-served in terms of development tools, and there are a wide selection of profiling packages available. However, several of these tools are not well-publicised, and many are under-documented. To date, there has been no comprehensive survey of the choices available: this document hopes to fill this gap.
It is worth mentioning some guidelines that should be followed when doing performance analysis. Probably the number one rule is : analyse the results. Think about what the results could be implying; don't take them at face value. Consider whether the profiling technique could be harming the accuracy of the profiling data.
Pay close attention to your profiling environment. Are you running realistic tests ? Have you avoided narrowing in on a particular workload at the expense of the common case ? Amdahl's law indicates the analyst should avoid focussing on a small part of the system, until it is ascertained that optimisation will benefit the common case.
Are you profiling production code ? Performance analysis of unoptimised code peppered with debug statements carries the risk of mis-optimisation. Make sure your optimisation decisions are governed by realistic data, not intuition.
Everyone knows Knuth's famous maxim “Premature optimisation is the root of all evil”, but it is still ignored all too frequently (this maxim is similar to the Extreme Programming rule "you aren't going to need it"). Too much developer time is spent optimising code that doesn't need optimisation. This leaves open the question as to when the right time to do performance analysis is. Commonly this is done during the alpha or beta phase of a release's lifecycle, and often in parallel with unstable development for far-reaching changes. This can prove to be a problem with a development tree in high flux, as profiling data can quickly become outdated - this is, of course, a development management issue, and need not concern us here.
When you identify a bottleneck in your program, there are two principal ways to view it. First, it can be considered on the procedural level: this is the sort of analysis that leads to, for example, inner loop optimisation, inlining decisions, and other such transformations. Second, an architectural point of view can be taken: here the underlying algorithms are considered; why does the particular algorithm used not work efficiently enough for the important cases, and how can the system be re-worked to fix this.
Both points of view are of use, though it is probably fair to say that the architectural considerations are more important. Re-workings on this level more often than not lead to more significant gains than procedural analysis, although they are offset by higher development costs. Procedural analyses are most useful when tweaking the performance of a system approaching the end of a release cycle, and are generally cheap to implement. The majority of premature optimisation is a result of procedural changes guided by intuition. Procedural changes often makes code harder to read; this accretion of junk code can easily turn into a significant maintenance burden, especially with large projects. In general the developer should avoid making micro-optimisations that could affect code readability until they have proven their worth in extensive analysis work.