Chapter 1. Introduction to the Performance Analyzer

You can use the Performance Analyzer to check your program for performance problems. If there are areas in which performance can be improved, it helps you find those areas and helps you to make the changes.

This chapter provides a brief introduction to the Performance Analyzer tools and describes how to use them to solve performance problems; it includes the following sections:

Chapter 2, “Features in the Performance Analyzer”, describes the features of the Performance Analyzer in more detail.

Performance Analysis Overview

To conduct performance analysis, you first run an experiment to collect performance data. Specify the objective of your experiment through a task menu or with the SpeedShop command ssrun(1). The Performance Analyzer reads the required data and provides charts, tables, and annotated code to help you analyze the results.

There are three general techniques for collecting performance data:

  • Counting. This involves counting the exact number of times each function or basic block has been executed. This requires instrumenting the program; that is, inserting code into the executable to collect counts.

  • Profiling. The program's program counter (PC), call stack, and/or resource consumption are periodically examined and recorded. For a list of resources, see “Resource Usage Graphs” in Chapter 2.

  • Tracing. Events that impact performance, such as reads and writes, system calls, floating-point exceptions, and memory allocations, reallocations, and frees, can be traced.

Sources of Performance Problems

To tune a program's performance, you must first determine where machine resources are being used. At any point in a process, there is one limiting resource controlling the speed of execution. Processes can be slowed down by:

  • CPU speed and availability: a CPU-bound process spends its time executing in the CPU and is limited by CPU speed and availability. To improve the performance of CPU-bound processes, you may need to streamline your code. This can entail modifying algorithms, reordering code to avoid interlocks, removing nonessential steps, blocking to keep data in cache and registers, or using alternative algorithms.

  • I/O processing: an I/O-bound process has to wait for input/output (I/O) to complete. I/O may be limited by disk access speeds or memory caching. To improve the performance of I/O-bound processes, you can try one of the following techniques:

    • Improve overlap of I/O with computation

    • Optimize data usage to minimize disk access

    • Use data compression

  • Memory size and availability: a program that continuously needs to swap out pages of memory is called memory-bound. Page thrashing is often due to accessing virtual memory on a haphazard rather than strategic basis; cache misses result. Insufficient memory bandwidth could also be the problem.

    To fix a memory-bound process, you can try to improve the memory reference patterns or, if possible, decrease the memory used by the program.

  • Bugs: you may find that a bug is causing the performance problem. For example, you may find that you are reading in the same file twice in different parts of the program, that floating-point exceptions are slowing down your program, that old code has not been completely removed, or that you are leaking memory (making malloc calls without the corresponding calls to free).

  • Performance phases: because programs exhibit different behavior during different phases of operation, you need to identify the limiting resource during each phase. A program can be I/O-bound while it reads in data, CPU-bound while it performs computation, and I/O-bound again in its final stage while it writes out data. Once you've identified the limiting resource in a phase, you can perform an in-depth analysis to find the problem. And after you have solved that problem, you can check for other problems within the phase. Performance analysis is an iterative process.