In many fields, including perception, cognition, and information processing, researchers refer to "bottom-up" and "top-down" processes. These terms refer to the direction in which information flows during processing.
Bottom-up processing refers to an approach that starts with the most basic, low-level information and builds up to more complex high-level representations. It is driven by the data itself, with no influence from higher-level knowledge.
In vision, bottom-up processing starts with raw sensory input from receptors in the eyes. This visual data is built up into increasingly complex representations, detecting simple features like edges, combining these to identify shapes and objects, and incrementally constructing a full perceptual interpretation. Bottom-up vision is data-driven and does not utilise top-down guidance.
In contrast, top-down processing refers to an approach that starts with higher-level knowledge, expectations, or context and uses this to guide lower-level processing. Top-down information flows from complex to simple representations.
In vision, top-down processing utilises prior knowledge, memories, and schemas to interpret sensory data. For example, recognizing that a particular constellation of features represents a dog involves matching the visual input to a stored mental representation of what a dog looks like. Top-down vision is concept-driven and shaped by pre-existing knowledge.
In summary, bottom-up processing builds up complexity from basic building blocks while top-down processing contextualises lower levels using higher-level knowledge and guidance. Real-world perception and cognition typically involve an interplay between both types of processing.
In vision science and historiography alike, distorted sources or limited perspectives lead to mistaken views, correct understanding requires upping the resolution or integrating patchy details with contextual knowledge. Bottom-up and top-down information processing are both important for understanding.
Consider the classic image of a Dalmatian dog effectively hidden in the background of a high contrast photo. It has sometimes been used as an example of how people will create meaning even if there is none there. Wishful thinking projected onto a picture of random dots. However, in this instance, it is indeed from a genuine photo and, with a bit of further manipulation of the image, it can be shown that there are important bottom-up details encoded in the picture that give our visual systems clues about the original scene.
This paper argues for the importance of bottom-up cues when processing the meaning of an image.
Bottom–Up Clues in Target Finding: Why a Dalmatian May Be Mistaken for an Elephant
The paper found that manipulating the image by rotating texture elements reduced subjects' ability to locate a body, indicating bottom-up surface interpolation features are important. In a survey, most naive subjects could quickly locate a bulging shape overlapping the dog's body, suggesting bottom-up processing guides attention. However, as shown below, they then assigned incorrect heads/limbs, indicating top-down identification failed.
The authors computed two bottom-up features that overlap with the dog's body - texture compression and affine distortion of texture elements (the rotated blobs in Fig 1b). Small distortions in the image lead to mistaken interpretations when using bottom-up clues alone. People construct an incorrect bigger picture, like seeing a lion cub instead of a dog.
The results suggest bottom-up processing plays a bigger role than traditionally thought in locating targets like the Dalmatian, guiding top-down identification mechanisms to target regions. The paper argues the role of top-down processing in target detection is overstated in classical examples like the Dalmatian image.
However, the paper’s ‘correct’ interpretation (above) can interestingly be used as a counter example. We can demonstrate the importance of top down processing if we go the other way with the image and remove as much distortion as possible by getting as close to the original image as possible. As far as I can tell the first appearance of the image was in Life Magazine in 1965:
We can get a much clearer top-down understanding of the scene and then on looking back at the bottom-up paper’s ‘correct’ image we can clearly see where they have made some incorrect inferences - particularly with the Dalmation’s back and the placement of its hind left leg. Other interesting details also emerge; the dappled background was caused by melting spots in the snow, the dog was called Woody and the park was in East Lansing, Michigan!
We have just demonstrated that top-down features are still important! Or, as another paper puts it “Prior object-knowledge sharpens properties of early visual feature-detectors”.
This visual perception study found what we have just demonstrated here, that prior knowledge of an object's form helps people more accurately detect features consistent with that object. Top-down knowledge guides low-level perception.
Summary of the key points from "Prior object-knowledge" paper
The paper investigates whether high-level knowledge about objects interacts with and influences basic visual feature processing in the human brain. It utilises two-tone images, which initially look like random black and white patches. However, after a person gains prior knowledge about what object is hidden in the image, they suddenly perceive it as a coherent, meaningful object.
The researchers embedded faint line elements in these two-tone images. They then measured people's ability to detect the contrast and orientation of these lines before and after giving them object knowledge about what the image represents.
In two experiments, they found that people's sensitivity to the embedded lines improved specifically for lines aligned with invisible object contours, after the researchers provided object knowledge about what was in the image. This suggests that high-level knowledge about object form serves to sharpen and enhance early visual feature detectors that are tuned to detect features consistent with that object.
Importantly, this top-down influence of learned object representations on early vision occurred independently of any effects of visual attention. The results provide clear behavioural evidence that early visual processing is shaped by dynamic interactions with high-level object knowledge stored in the brain. This context-dependent top-down tuning optimises low-level vision to suit the current perceptual interpretation.
In summary, the study demonstrates that prior conceptual knowledge about objects interacts with and adaptively adjusts basic visual feature processing to support perception of that object.
Integrating Visual Cognition and Historical Analysis
Paradigm Shift or ?
