The visual system processes natural scenes in a split second. Part of this process is the extraction of "gist," a global first
impression. It is unclear, however, how the human visual system computes this information. Here, we show that, when human
observers categorize global information in real-world scenes, the brain exhibits strong sensitivity to low-level summary statistics.
Subjects rated a specific instance of a global scene property, naturalness, for a large set of natural scenes while EEG was
recorded. For each individual scene, we derived two physiologically plausible summary statistics by spatially pooling local
contrast filter outputs: contrast energy (CE), indexing contrast strength, and spatial coherence (SC), indexing scene fragmentation.
We show that behavioral performance is directly related to these statistics, with naturalness rating being influenced in particular
by SC. At the neural level, both statistics parametrically modulated single-trial event-related potential amplitudes during
an early, transient window (100-150 ms), but SC continued to influence activity levels later in time (up to 250 ms). In addition,
the magnitude of neural activity that discriminated between man-made versus natural ratings of individual trials was related
to SC, but not CE. These results suggest that global scene information may be computed by spatial pooling of responses from
early visual areas (e.g., LGN or V1). The increased sensitivity over time to SC in particular, which reflects scene fragmentation,
suggests that this statistic is actively exploited to estimate scene naturalness.