The analytical process may be identified with a system containing three major components: sampling, analysis, and data interpretation. Each component may be used as a filter to eliminate a large number of organic compounds from further consideration. Optimization of the system requires maximum discrimination for each filter (component). The mass distribution and Shannon information content of low resolution, binary encoded mass spectra have been calculated for a set of 78 volatile organic compounds which are routinely sought in ambient air samples. This set of compounds and their binary encoded mass spectra will be used as an example of preselection of variables for SIMCA pattern recognition using Shannon information content. A total of 153 different masses in the spectra of the 78 compounds are compressed into a set of 16 key masses. This set of 16 masses contains a maximum of 13 bits of information, neglecting correlation between masses. These 13 bits of information theoretically are sufficient to distinguish among 8200 compounds. The Shannon information content for different analytical methods also will be discussed.