One of the most fundamental processes in vision is figure–ground perception, segregating the objects in our environment from their backgrounds. Gestalt psychologists grappled with the issues involved by examining the phenomenology of figure–ground perception along with stimulus factors influencing a region to be seen as figure or ground, such as symmetry, area, and orientation. Examples of more recently discovered stimulus factors include relative spatial (e.g., see Brown & Weisstein, 1988; Klymenko & Weisstein, 1986; Klymenko, Weisstein, Topolski, & Hsieh, 1989) and temporal frequency information (Klymenko & Weisstein, 1989a, 1989b; Klymenko et al., 1989; Wong & Weisstein, 1984, 1985, 1987), lower region (Vecera, Vogel, & Woodman, 2002), top-down polarity (Hulleman & Humphreys, 2004), extremal edges (Palmer & Ghose, 2008), and edge–region grouping (Palmer & Brooks, 2008). Current computational and physiological models have also focused on how processing of edges/borders/boundaries and surfaces contribute (e.g., Craft, Schutze, Niebur, & von der Heydt, 2007; Grossberg, 2016; Kogo & van Ee, 2015) as well as delineating how bottom-up and top-down contributions play a role (e.g., Peterson, 1994, 2015, 2018).

The starting point for the current study was Naomi Weisstein’s last published work, in which she proposed a model of figure–ground perception based on antagonistic M and P stream interactions in the visual system (Brown & Greene, 2018; Weisstein, Maguire, & Brannan, 1992).Footnote 1 The primary hypothesis underlying the model is that activity in the P stream encodes figure/foreground, and activity in the M stream encodes background (see Fig. 1). Prior research leading to and supporting the model came from studies showing that stimulus variables expected to create relatively greater P stream activity (e.g., higher spatial and lower temporal frequencies) biased a region to be seen as figure, whereas those expected to create relatively greater M stream activity (e.g., lower spatial and higher temporal frequencies) biased a region to be seen as ground (Brown & Weisstein, 1988; Klymenko & Weisstein, 1986, 1989a, 1989b; Klymenko et al., 1989; Weisstein & Brannan, 1991; Weisstein & Wong, 1986, 1987; Wong & Weisstein, 1984, 1985, 1987). This link between the perception of figure and ground and the spatiotemporal frequency sensitivities of the P and M streams is further supported by the finding that sharp targets (with high spatial frequencies present) are better detected in regions of ambiguous pictures seen as figure, whereas blurred targets (with energy predominantly in the lower spatial frequencies) are better detected in the same regions perceived as ground (Wong & Weisstein, 1983). Thus, the P stream is brought to bear on regions perceived as figure, whereas the M stream is engaged on regions perceived as ground. Since her model was published, psychophysical (Donner, Sagi, Bonneh, & Heeger, 2008; Wokke, Scholte, & Lamme, 2014) and physiological (Appelbaum, Wade, Pettet, Vildavski, & Norcia, 2008; Appelbaum, Wade, Vildavski, Pettet, & Norcia, 2006; Doniger, Foxe, Murray, Higgins, & Javitt, 2002) evidence has been found that is supportive of dorsal/ventral (i.e., M/P) neural dynamics related to figure–ground perception.

Fig. 1.
figure 1

a Example of gray-on-green (Gy/Gn) condition. b–d Figure–ground perception based on antagonistic M & P stream interactions in the visual system. The perception of a region as figure involves top-down (not shown) and bottom-up influences (circles). Antagonism across the boundary (black double arrows) separating two regions (vertical dashed line) occurs with P-dominant figure and M-dominant ground signals generated from antagonistic P and M stream interactions within those regions (gray double arrows). See text for details

Before discussing the model further, consider the proposal that figure “emerges” from the background through a dynamic interaction of activity along and between the P and M streams. One example of evidence of such neural dynamics comes from a study by Appelbaum et al. (2006) using an electrophysiological paradigm allowing them to monitor cortical responses to figure and background separately. They used a frequency-tagging procedure, where a circular figure was temporally modulated at one frequency relative to a background modulating at a slightly different frequency. They found that “the figure region, but not the background, was routed preferentially to lateral cortex. A separate network extending from first tier through more dorsal areas responded preferentially to the background region” (p. 11695).

Another example comes from Wokke et al.’s (2014) study using TMS to disrupt processing in the lateral occipital cortex (LOC) of the P stream, and V5/human MT (HMT) along the M stream both before and after stimuli were presented while observers performed a motion-defined figure-discrimination task. Performance decreased with TMS over HMT, whereas it increased when over LOC compared with performance without TMS and with motion without a perceived figure. Their results suggested a “push–pull interaction in which dorsal and ventral extrastriate areas are being recruited or inhibited depending on stimulus category and task demands” (p. 365) that can influence the segregation of moving stimuli into figure and ground.

A final example of antagonistic M/P neural dynamics directly related to the conscious perception of a region as figure or ground is an fMRI study of motion-induced blindness (MIB) by Donner et al. (2008). They presented a target figure surrounded by a 2° blank region against a background of moving dots that appeared as a rotating sphere, which they called a mask. Under steady fixation, MIB occurred, with the target figure periodically disappearing and reappearing over time, while observers indicated when it did so. Similar responses were made in a control condition, when the target was periodically physically removed and then presented again. By retinotopically locating the target figure relative to the moving background mask in both ventral/P (V4) and dorsal/M (MT+, V3AB, pIPS) cortical regions, they could monitor responses specific to the target and background mask in those regions when the target was visible versus invisible and compare them with when the target was physically removed. They found target figure region disappearance responses (i.e., when the figure was invisible) in V4 were significantly below baseline during MIB, but not when the figure was physically removed.Footnote 2 Background mask-specific responses were not found in V4, but were found in areas MT+, V3AB, and pIPS. Opposite of the target-specific responses in V4, mask-specific responses in these dorsal areas increased above baseline during target disappearance and decreased during target reappearance. By contrast, when the target was physically removed, mask-specific responses in these areas tended to decrease. “The striking dissociation between mask-specific responses during spontaneous disappearance and physical removal of the target is consistent with the hypothesis that cortical representations of the mask in dorsal visual areas (i.e., V3AB and pIPS) play a crucial role in the spontaneous suppression of target representations in ventral visual areas (i.e., V4) during MIB” (p. 10305). If we consider their results from the perspective of Weisstein’s model, the P-dominant figure signal from the target is in an antagonistic relationship with the M-dominant ground signal from the background surrounding it. During MIB, the still physically present target’s figure signal is overwhelmed by the surrounding M-dorsal ground signal, contributing to the target’s disappearance. The associated decrease in target-specific response in V4, as well as the resulting increase in background mask-specific responses in the M-dorsal areas, would be interpreted as being due to the decreased antagonism from what was the perceived figure. M-dorsal-area responses were different when the target was physically removed because there was no longer a figure signal associated with the target in V4 to antagonize against the ground signal. These three recent physiological studies are examples of evidence consistent with Weisstein’s original notion of antagonistic M/P neural dynamics involved in the perception and processing of a region as figure or ground.

Fig. 2
figure 2

Time to fade for a target figure as a function of color combination and whether the background was static (0 frames per second [fps]) or flickering (14 frames per second). Error bars = +SEM

The present study further explores and tests Weisstein’s model using an artificial scotoma paradigm. The perceptual fading experienced with an artificial scotoma (e.g., Troxler fading; Troxler, 1804) can be viewed as a failure of figure–ground segregation (De Weerd, Desimone, & Ungerleider, 1998; De Weerd, Ungerleider, & Desimone, 1998; Spillmann, 2011; Spillmann & De Weerd, 2003), making it a useful tool for investigating possible mechanisms and processes involved in figure–ground perception. When a small target figure is viewed peripherally under steady fixation, it eventually “fades into the background.” The mechanisms and processes underlying the perception of figure–ground normally (e.g., those involved with boundary and surface information) fail to continue distinguishing the figure from the ground when the target figure is phenomenally replaced by the ground. According to Weisstein’s theory, where a boundary separates two regions, the region that is perceived as figure or ground is determined by the outcome of antagonism between M and P activity both within each region (represented by gray double arrows in Figs. 1b–d and 3a–e) and across the boundary between them (represented by the black double arrows in Figs. 1b–d and 3a–e). The region with a relatively stronger P biased “figure signal” is perceived as figure. The region with a relatively stronger M-biased “ground signal” is perceived as ground.

Fig. 3
figure 3

M/P antagonistic relationships within (gray double arrows) and between (black double arrows) figure and ground regions. The relative heights of the P and M bars within figure (left side of a–e) and within ground (right side of a–e) represent the P-biased figure and M-biased ground signals, respectively. a Antagonistic relations for Gy/Gy condition (see Fig. 4a) with relative figure and ground signals illustrative of when fade times are short. Red and blue light within figure (left side of b & d; see Fig. 4b, d) reduces M/P antagonism within figure leading to a stronger P biased figure signal (indicated by relative M/P bar heights) resulting in longer fade times. Red and blue light in ground (right side of c & e; see Fig. 4c, e) reduces M activity within ground, weakening the M-biased ground signal, reducing its antagonism with the figure signal, leading to greater P activity in figure, resulting in longer fade times

Fig. 4
figure 4

Illustrative examples of stimuli referred to in Fig. 3a–e (examples are for illustration purposes. See Stimuli and apparatus section for actual luminance values)

For our display, the Gestalt factors of area and surroundedness bias the target region to be seen as figure against the surrounding region as ground. These factors themselves would cause a bias toward P activity for the target figure (e.g., represented by the relatively larger P vs. M circles within figure in Fig. 1b, and the relatively taller P vs. M bar within figure in Fig. 3a) and toward M activity for the background (e.g., represented by the relatively larger M vs. P circle within ground in Fig. 1b, and the relatively taller M vs. P bar within ground in Fig. 3a). From this perspective, when the target begins to fade, there is a change in the antagonism within the figure (represented by the changing sizes of the P and M circles on the left of Fig. 1c) and across the boundary between figure and ground (represented by the fading vertical dashed line) such that when the figure signal is overwhelmed by the ground signal, the target region fades into and is seen as ground (see Fig. 1dFootnote 3,Footnote 4). From this perspective, the amount of time it takes the target to fade should increase by either strengthening the figure signal making it more resistant to the ground signal, or weakening the ground signal making it less able to overwhelm the figure signal. This could be accomplished in at least two ways. One way would be to strengthen the figure signal by reducing M activity in the figure, thereby reducing its antagonism with P activity, which should result in an increased P bias there. Another way would be to weaken the ground signal by reducing M activity in the ground, which should decrease the M bias there.

Our general strategy was to reduce M activity using long wavelength/red light known to suppress M activity (Awasthi, Williams, & Friedman, 2016; Bedwell, Brown, & Orem, 2008; Bedwell, Miller, Brown, & Yanasak, 2006; Breitmeyer & Breier, 1994; Breitmeyer & Williams, 1990; Chapman, Hoag, & Giaschi, 2004; de Monasterio, 1978; Dreher, Fukada, & Rodieck, 1976; Wiesel & Hubel, 1966; see Experiments 1 & 2), and by using short wavelength/blue light because short wavelength sensitive S-cones (blue) provide minimal if any input to both M and P ganglion (Dacey, 2000; Sun, Smithson, Zaidi, & Lee, 2006a, 2006b) and LGN cells (Chatterjee & Callaway, 2002; Eiber, Pietersen, Zeater, Solomon, & Martin, 2018; Martin & Lee, 2014; Roy et al., 2009; Tailby, Szmajda, Buzas, Lee, & Martin, 2008; see Experiment 2). S-cone signals are predominantly passed from the retina to LGN and into V1 via the koniocellular (K) pathway (Chatterjee & Callaway, 2003; Hendry & Reid, 2000; Xiao, 2014). Despite the lack of S-cone innervation of both the M and P retino-geniculo-striate pathways, once S-cone signals arrive in V1, they would activate the P stream via their inputs to cytochrome oxidase blobs in V1 (Hendry & Reid, 2000; Xiao, Casti, Xiao, & Kaplan, 2007) and thin stripes in V2 (Nasr & Tootell, 2018; Xiao, Wang, & Felleman, 2003) and V3 (Nasr & Tootell, 2018; Xiao, 2014) considered parts of the P stream (DeYoe & Van Essen, 1988; Tootell & Nasr, 2017). Thus, compared with gray and green light not expected to reduce M activity, red or blue light appearing in figure or ground should alter the antagonistic balance of M/P activity within that region, producing a shift in the balance of activity across the border between figure and ground (e.g., Fig. 3; Brown & Greene, 2018).

Experiment 1

In the first experiment, in addition to using long wavelength red light, we also employed a temporal manipulation expected to affect M activity. Red light would be expected to reduce M activity whether it was in the figure or the ground. The reduction of M activity with red light in the figure should decrease its antagonism with P activity there, resulting in a stronger P biased figure signal, making it more resistant to fading and leading to longer fade times. The reduction in M activity with red light in the ground should weaken the M-biased ground signal, making it less able to overwhelm the figure signal, resulting in longer fade times. No reduction in M activity was expected in figure or ground in the two conditions combining gray and green light (Gy/Gn, Gn/Gy). These conditions served as a baseline because they also had a color difference between figure and ground, but no red light to influence M activity. Fade times in these conditions were expected to be shorter compared with conditions containing red light, whether in figure (Rd/Gy, Rd/Gn) or in ground (Gy/Rd, Gn/Rd). The temporal manipulation was presenting the background as static or flickering, with greater M activity expected when the background was flickering compared with when it was static (Breitmeyer & Ganz, 1976; Derrington & Lennie, 1984; Robson, 1966). This could create a potential strengthening of the M-biased ground signal, making it easier to overwhelm the figure signal and thereby making the time to fade shorter for the flickering compared with the static condition.

Method

Participants

Fifteen participants from the University of Georgia were recruited and trained for this study. This total included the two authors, two graduate students, and 11 undergraduate students who received course credit and were naïve to the purpose of the experiment. Participants had normal or corrected-to-normal vision and normal color vision (confirmed by pseudoisochromatic plates). Participants had no history of epilepsy or attention-deficit disorder. All research conducted was under the approval of the University of Georgia Institutional Review Board’s ethical guidelines for research involving human participants.

Stimuli and apparatus

Stimuli for all conditions had the same basic configuration. A 2° × 2° homogeneous square-shaped target figure was presented against a random-dot background. The target figure appeared equally often and randomly in either the top or bottom right of the display, 10° from a .22° × .22° black fixation cross. All target figures were physically equiluminant with the random-dot backgrounds. A .66° black dot was positioned just inside the nasal side of each participant’s mapped blind spot. This blind spot stimulus served as a control for large eye movements, which can disrupt the fading process. Trials where participants indicated they saw the blind spot stimulus were excluded from analysis. The random-dot backgrounds subtended 26.7° (w) × 17.2° (h) and were composed of equal amounts of two different luminance levels of the same color, with an overall mean luminance of 25 cd/m2. Flickering backgrounds were created by generating 10 different random-dot images that cycled at 14 frames per second (fps). Five different static random-dot backgrounds (0 fps) were also generated. Using a SpectrILight ILT950 Spectroradiometer, the coordinates for the colors used in this experiment based on the CIE XYZ color system were: gray (.3584, .3448, .2966), green (.2247, .6616, .1442), and red (.6079, .3318, .0603), and the RGB values for the two different levels of the same color were gray (120, 120, 120) and (91, 91, 91); green (0, 156, 0) and (0, 91, 0); and red (255, 0, 0) and (181, 0, 0).

Stimulus presentation and data collection was conducted using PsychoPy software (Version 1.83.01; Peirce, 2007) running on a PC equipped with a color LCD monitor operating at 60 Hz. Participants viewed the monitor monocularly with their right eye (left eye patched), in a dimly lit room, with their head positioned in a chin rest 100 cm from the monitor.

Design and procedure

Every possible figure/background combination of gray, green, and red was tested, resulting in three same (Gy/Gy, Gn/Gn, Rd/Rd) and six different color combinations (Gy/Gn, Gy/Rd, Gn/Gy, Gn/Rd, Rd/Gy, Rd/Gn), for a total of nine color combinations. A random order was generated for each participant to complete the nine different color combinations. Participants completed five practice trials, followed by 60 experimental trials for each color combination. The 60 randomly presented experimental trials consisted of 30 flickering and 30 static background trials, with the five different static backgrounds appearing randomly six times each. The dependent variable was the time it took the target figure to fade, in seconds.

Prior to the experiment, all participants underwent a training session where they were familiarized with the stimuli and instructed how to do the task. For this purpose, participants were trained using the Gy/Gy color combination. No data from these sessions were used in the final analyses. After completing the training session, participants began their first color combination at a later time.

During the experiment, an instruction screen was presented to participants, reminding them of their task and how to perform it at the beginning of each color combination. Participants fixated on the fixation cross, initiating each trial by pressing the space bar with their left hand. Using their right hand, participants pressed the “left” arrow key when/if the target figure faded completely from visibility. When/if the target area reappeared, even slightly, the participant pressed the “right” arrow key, ending the trial. Trial duration was 30 seconds and self-terminated if there was no response after 30 seconds. After each trial, text on a gray screen (30.45 cd/m2) asked if the blind spot stimulus was seen during the trial. If so, the “down” arrow key was pressed. Participants were encouraged to take short breaks between trials as needed to prevent fatigue. Each color combination took approximately 30 minutes to complete. Participants ran in no more than two color combinations per day, with at least a 10-minute break between them when they did run two sessions. Participants completed all sessions over a 3-week period once they started the experiment.

Results and discussion

The results of a 9 (color combination) × 2 (temporal condition: static vs. flickering) repeated-measures analysis of variance (ANOVA) on time to fade revealed a significant main effect of color combination. F(8, 112) = 10.78, η2 = 0.435, p < .001, and a significant interaction with temporal condition, F(8, 112) = 2.69, η2 = 0.161, p < .009. The lack of a main effect of temporal condition, F(1, 14) = 0.42, η2 = 0.029, p = .53, indicates that the flickering background did not create an overall increase in M activity leading to a decrease in time to fade as predicted. We speculate the reason for this was because the target was equiluminant with the background, and each frame of the background flicker was equiluminant with the next, creating insufficient luminance contrast over time between target and background, making the flicker ineffective at increasing M activity compared with the static condition. As evident from Fig. 2, the pattern of time to fade across color combinations is very similar for both static and flickering conditions. The significant interaction appears to have occurred because there were (1) no differences between static and flickering fade times for the three same-color combinations (Gy/Gy, Gn/Gn, Rd/Rd) and the two combining red and gray (Gy/Rd, Rd/Gy), (2) slightly greater flickering fade times for three combinations (Gy/Gn, Gn/Gy, Rd/Gn), and, (3) slightly greater static fade times for one (Gn/Rd). The lack of any kind of pattern over the nine color combinations makes any meaningful interpretation of the interaction difficult at best.

To probe the main effect of color, the time to fade for each color combination was averaged over temporal condition and submitted to the Benjamini–Hochberg procedure for multiple comparisons (e.g., see Thissen, Steinberg, & Kuang, 2002), based on the fact we had specific predictions about how color would influence time to fade. On inspection of Fig. 2, all color combinations not appearing to be different from each other are not, and all those appearing to be different are (see Table 1). Looking at Fig. 2 from left to right, the three conditions with the same color in figure and ground (Gy/Gy, Gn/Gn, Rd/Rd) produced the shortest fade times and were not different from each other. Other than when the background flickered, the only difference between figure and ground in the same-color conditions was a texture difference. The lack of a color difference made these fade times the shortest because fade times for static and flickering background conditions were the same. Fade times were significantly longer for Gy/Gn and Gn/Gy than the same-color conditions, indicating a color difference itself was enough to make the figure more resistant to fading. Most importantly, when the color difference involved red, whether red was figure (Rd/Gy, Rd/Gn) or ground (Gy/Rd, Gn/Rd), fade times were no different from each other and significantly greater than the Gy/Gn and Gn/Gy conditions.

Table 1 Results of the Benjamini–Hochberg procedure for multiple comparisons to assess the differences across color combinations. *p < .05
Table 2 Results of the Benjamini–Hochberg procedure for multiple comparisons to assess the differences across color combinations. *p < .05

This finding supports the hypothesis that using red light to decrease M activity in figure strengthens the figure signal, making it more resistant to being overwhelmed by the ground signal, and using red light to decrease M activity in the ground weakens the ground signal, making it less able to overwhelm the figure signal. To illustrate this graphically, note the relative heights of the P and M bars within the gray figure and within the gray ground in Fig. 3a, which represent the initial P-biased figure and M-biased ground signals, respectively, for the Gy/Gy condition (similar to the relatively different-sized circles in Fig. 1b). This initial representation of the relative P-biased figure and M-biased ground signals for these stimuli is due to the Gestalt factors of area and surroundedness, and would be similar for the Gn/Gn and Rd/Rd conditions because they, too, only had a lack of texture defining the target figure. As the figure fades, the ground signal eventually overwhelms the weakening figure signal (illustrated in Fig. 1b–d), and the texture fills in the target region. The initial relative balance of antagonism between M and P activity within figure, within ground, and across their boundary depicted in Fig. 3a also represents the susceptibility of the figure to fading for these stimuli and viewing conditions, as indicated by their short fade times. If the M/P balance within figure shifts to relatively greater P activity, the figure signal would be strengthened relative to the ground signal, resulting in longer fade times. Conversely, if the M/P balance within ground shifts to relatively less M activity, this would weaken the ground signal relative to the figure signal, resulting in longer fade times.

Compared with the baseline conditions with only a texture difference described above, now consider how red light in the figure (see Fig. 3b, left side) might influence the initial balance of M/P antagonism within the figure, and ultimately across the boundary between figure and ground, and how this would increase the time to fade. With red light in the figure, M activity is reduced, as illustrated by the shorter M bar within the red figure in Fig. 3b, compared with within the gray figure in Fig. 3a. Decreasing M activity within the red figure reduces its antagonism with P activity, shifting the balance further toward relatively greater P activity indicated by the longer P bar within a red figure (see Fig. 3b) compared with a gray figure (Fig. 3a). This change in antagonism within figure would also shift the balance across the border in Fig. 3b (i.e., represented by black double arrow) to a relatively stronger P-biased figure signal compared with the baseline conditions illustrated in Fig. 3a, leading to increased fade times.

Now, consider how red light in the ground might influence the balance of M/P antagonism within ground and ultimately across the boundary between figure and ground, and how this would increase the time to fade compared with the baseline conditions with only a texture difference. With red light in ground (see Fig. 3c, right side), M activity is reduced as illustrated by the shorter M bar within ground in Fig. 3c compared with within a gray ground (Fig. 3a). The decreased M activity in a red ground reduces its antagonism with P activity there, shifting the balance within the red ground toward increased P activity indicated by a longer P bar within a red ground (see Fig. 3c) compared with within a gray ground (Fig. 3a). This reduction in M activity and its antagonism with P activity within ground weakens the M-biased ground signal. The weakened M-biased ground signal would reduce its antagonism with the P-biased figure signal across the boundary between figure and ground (i.e., the black double arrow). The reduced antagonism with the P-biased figure signal from the weakened M-biased ground signal across the boundary would lead to increased P activity within figure. The resultant increased P activity within figure would increase its antagonism with M activity within figure reducing M activity there, indicated by the longer P and shorter M bars within the gray figure in Fig. 3c (left side), compared with within the gray figure in Fig. 3a (left side), resulting in a relatively stronger figure signal compared with the baseline conditions and increased fade times. Thus, whether red light is in the figure or the ground, it changes M/P antagonism within those regions, shifting the P/M antagonistic balance across the boundary between figure and ground, resulting in longer fade times for Rd/Gy and Gy/Rd versus Gy/Gy (or vs. Gy/Gn and Gn/Gy, not shown).

Experiment 2

The results of Experiment 1, using red light to suppress M activity, appeared to support Weisstein’s M/P antagonistic model of figure–ground perception. Experiment 2 was designed to test whether the results of Experiment 1 would replicate by again using red light conditions while further testing the model by introducing blue light conditions. As noted previously, short wavelength sensitive S-cones provide minimal if any input to both M and P ganglion (Dacey, 2000; Sun et al., 2006a, 2006b) and LGN cells (Chatterjee & Callaway, 2002; Eiber et al., 2018; Martin & Lee, 2014; Roy et al., 2009; Tailby et al., 2008). S-cone signals are predominantly passed from the retina to LGN and into V1 via the koniocellular (K) pathway (Chatterjee & Callaway, 2003; Hendry & Reid, 2000; Xiao, 2014). While there is a lack of S-cone innervation of both the M and P retino-geniculo-striate pathways, once S-cone signals arrive in V1, they would innervate the P stream via their inputs to cytochrome oxidase blobs in V1 (Hendry & Reid, 2000; Xiao et al., 2007) and thin stripes in V2 (Nasr & Tootell, 2018; Xiao et al., 2003) and V3 (Nasr & Tootell, 2018; Xiao, 2014), all considered parts of the P stream (DeYoe & Van Essen, 1988; Tootell & Nasr, 2017). Thus, again, compared with the two conditions combining gray and green light (Gy/Gn, Gn/Gy) with no expected reductions in M activity, blue light was predicted to have similar effects on fade times as red light because it would also reduce M activity by providing little if any activation of the M stream.

Based on the lack of any systematic influence of a flickering background in Experiment 1, only static conditions were tested in Experiment 2. The addition of blue created seven additional color combinations, one more same-color (Bl/Bl) and six more different-color combinations (Gy/Bl, Bl/Gy, Gn/Bl, Bl/Gn, Rd/Bl, Bl/Rd). Conditions with gray and green light again served as baselines for comparison, with those containing red or blue light expected to reduce M activity. Another difference for Experiment 2 was a lower mean luminance (3 cd/m2) because all color combinations were matched to the reduced luminance range for blue.

Method

Participants

There were 16 participants, the two authors, plus 14 undergraduate participants recruited for Experiment 2, using the same criteria and receiving course credit as in Experiment 1.

Stimuli and apparatus

Stimuli for Experiment 2 were created and presented the same way as in Experiment 1, with two exceptions. As noted above, only static conditions were tested, and due to the limited luminance range for blue, all backgrounds were composed of equal amounts of two different luminance levels of the same color, with an overall mean luminance of 3 cd/m2. All target figures were again physically equiluminant with the random-dot background. The CIE XYZ coordinates for the colors used in this experiment were: gray (.3530, .3278, .3209), green (.2485, .5837, .1588), red (.6188, .3237, .0570), and blue (.1560, .0723, .7717). The RGB values for the two different levels of the same color were: gray (38, 38, 38) and (26, 26, 26); green (0, 49, 0) and (0, 35, 0); red (70, 0, 0) and (48, 0, 0); and blue (0, 0, 175) and (0, 0, 129).

Design and procedure

The design and procedure for Experiment 2 were the same as in Experiment 1, except for the number of trials and the addition of blue. Participants completed five practice trials, followed by 40 experimental trials for each color combination. The 40 randomly presented experimental trials consisted of the five different static backgrounds appearing randomly eight times each. The addition of blue resulted in a fourth same-color (Bl/Bl) and six new different-color combinations (Gy/Bl, Gn/Bl, Rd/Bl, Bl/Gy, Bl/Gn, Bl/Rd), for a total of 16 color combinations.

Results and discussion

A one-way ANOVA on the time to fade for the 16 color combinations was significant, F(15, 225) = 21.62, η2 = 0.59, p < .001. The Benjamini–Hochberg procedure for multiple comparisons was used to assess the differences across color combinations (see Table 2). With only a few exceptions, the results involving gray, green, and red replicated those from Experiment 1. Looking at Fig. 5 from left to right, again, the shortest fade times were for the same-color conditions (Gy/Gy, Gn/Gn, Rd/Rd, Bl/Bl), which had no differences between them, except for shorter fade times for Bl/Bl than for Rd/Rd. Again Gy/Gn and Gn/Gy were not different from each other, and Gy/Gn had longer fade times than did Gy/Gy, but, here, Gn/Gy and Gn/Gn were not significantly different. Again, all conditions with a color difference involving red (i.e., excluding blue), whether in figure (Rd/Gy, Rd/Gn) or ground (Gy/Rd, Gn/Rd), had longer fade times relative to Gy/Gn and Gn/Gy, except Gn/Rd was not different from Gy/Gn. Thus, the influence of red on fade times was very similar to Experiment 1, with average fade times again hovering near 15 seconds even at a lower mean luminance (3 cd/m2 compared with 25 cd/m2). Blue increased fade times as predicted, but the magnitude of the increase was somewhat surprising. Whether in figure (Bl/Gy, Bl/Gn, Bl/Rd) or ground (Gy/Bl, Gn/Bl, Rd/Bl), fade times involving blue were no different from each other and much longer than most all other conditions, except for Gn/Bl not being different from Rd/Gy and Rd/Gn. During debriefing, many participants noted it was most difficult to experience fading for blue conditions, as is evident by the average fade times hovering near 20 seconds.

Fig. 5
figure 5

Time to fade for a target figure as a function of color combination. Error bars = +SEM

The results of this experiment replicated the general pattern of fade times for gray, green, and red from Experiment 1 at an overall lower luminance while also extending those findings to blue light. The longer fade times with blue and red cannot simply be due to a color difference, because the Gn/Gy and Gy/Gn conditions had a color difference but produced significantly shorter fade times. It is important to note that the effects of red and blue were similar whether in figure or ground. This indicates their effects were not simply due to the amount of red or blue light, because the effect on M activity should have been greater for ground due to its greater surface area. Finally, the results cannot be due to optical factors (e.g., chromatic aberration), because chromatic aberration increases from red to green to blue wavelengths.

The results support the hypothesis that using blue light to decrease M activity in figure strengthens the figure signal, making it more resistant to being overwhelmed by the ground signal, and using blue light to decrease M activity in the ground weakens the ground signal, making it less able to overwhelm the figure signal. To illustrate this graphically, we start by again referring to the relative heights of the P and M bars within figure and within ground in Fig. 3a as representing the initial P-biased figure and M-biased ground signals for the Gy/Gy condition (as well as the other same-color conditions). As noted earlier, this antagonistic balance within figure, within ground, and across the border between them represents the susceptibility of the figure to fading for these stimuli and viewing conditions as indicated by their short fade times. If the M/P balance within figure shifts to relatively greater P activity, the figure signal would be strengthened relative to the ground signal, resulting in longer fade times. Conversely, if the M/P balance within ground shifts to relatively less M activity, the ground signal would be weakened relative to the figure signal, resulting in longer fade times. Relative to the baseline conditions with only a texture difference, how blue light might influence the initial M/P antagonistic balance within the target figure, within the ground, and ultimately across the border between them (i.e., the black double arrows) is illustrated qualitatively by the P and M bar heights within figure and within ground in Fig. 3d and e compared with Fig. 3a.

M activity is reduced when blue light is in figure, as illustrated by the shorter M bar within a blue compared with a gray figure (i.e., left side of Fig. 3d compared with Fig. 3a). This reduction in M activity within figure is even greater for blue (Fig. 3d) compared with red light (Fig. 3b) as represented by an even shorter M bar, which is based on the minimal if any S-cone input to the M pathway (Chatterjee & Callaway, 2002; Roy et al., 2009; Sun et al., 2006a, 2006b; Tailby et al., 2008) and the longer fade times for blue. The greater reduction in M activity with blue light reduces its antagonism with P activity within figure even more than red light represented by the even longer P bar. This shifts the balance within figure further toward relatively greater P activity compared with Fig. 3a and b, resulting in a stronger P-biased figure signal and longer fade times. The relative differences in P and M bar heights within figure under blue (Fig. 3d) compared with red light (Fig. 3b) illustrate the stronger P-biased figure signal with blue light, indicative of the longer fade times for blue compared with red.

When blue light is in ground (Fig. 3e, right side), M activity is reduced, as illustrated by the shorter M bar within a blue ground compared with within a gray ground (Fig. 3a, right side). The decreased M activity in a blue ground reduces its antagonism with P activity there, shifting the balance within the blue ground toward increased P activity, indicated by the longer P bar within a blue ground in Fig. 3e compared with within a gray ground (Fig. 3a). This reduction in M activity and its antagonism with P activity within ground weakens the M-biased ground signal, contributing to longer fade times. The shorter M and longer P bars within a blue ground in Fig. 3e compared with those within a red ground in Fig. 3c represent an even greater shift in antagonism away from M (and toward P) activity with blue compared with red light, resulting in a weaker ground signal compared with red (Fig. 3c), contributing to even longer fade times.

The weakened M-biased ground signal would reduce its antagonism with the P-biased figure signal across the boundary between figure and ground (i.e., the black double arrow). The reduced antagonism from the weakened ground signal across the boundary would lead to increased P activity within figure. The resultant increased P activity within figure would in turn increase its antagonism with M activity within figure, reducing M activity there, as indicated by the longer P and shorter M bars within figure in Fig. 3e (left side) compared with Fig. 3a (left side), resulting in a relatively stronger figure signal and increased fade times.

General discussion

The current study tested Weisstein’s model of figure–ground perception based on antagonistic M and P stream interactions in the visual system by using an artificial scotoma paradigm where a peripherally viewed target figure fades into the background, resulting in a failure of figure–ground segregation. According to the model, activity in the P stream encodes figure/foreground (the figure signal), and activity in the M stream encodes background (the ground signal). When a boundary separates two regions, the region that is perceived as figure or ground is the result of antagonism between M and P activity, both within each region and across the boundary between them. Our small target figure would therefore have a relatively stronger P-biased figure signal compared with the surrounding background’s relatively stronger M-biased ground signal. From this perspective, when the target figure fades, its figure signal is overwhelmed by the surrounding ground signal. We measured the time to fade for every possible figure–ground color combination of physically equiluminant gray, green, red (Experiments 1 and 2), and blue (Experiment 2) light. Compared with gray and green light, red and blue light were expected to reduce M activity, shifting its antagonistic balance with P activity whether it appeared in figure or ground. Decreasing M activity in the target figure would strengthen its P-biased figure signal, making it more resistant to being overwhelmed by the ground signal. Decreasing M activity in the background would weaken its M-biased ground signal, making it less able to overwhelm the target figure. These shifts in M/P antagonism in figure and ground were expected to increase the time to fade for color combinations involving red and blue compared with gray and green. Overall, the results support this hypothesis and Weisstein’s model (as described above, using Fig. 3 as a pictorial representation of this proposed dynamic M/P antagonism).

Some additional observations are noteworthy and bear repeating. First, the results with red and blue cannot be due to red and blue simply not filling-in as well as gray and green because Rd/Rd and Bl/Bl filled in as easy as Gy/Gy and Gn/Gn conditions. Second, as noted above, the effects of red and blue were similar whether in figure or ground, indicating their influence was not simply due to the amount of red or blue light, because the effect on dorsal-M activity should have been greater when they were in the ground because it had a greater surface area. Finally, the results cannot be due to optical factors (e.g., chromatic aberration), because chromatic aberration increases from red to green to blue, yet red and blue had very similar influences on fade times.

The current results might also offer some insight into why fade times were longest for blue compared with the other colors tested in the only other artificial scotoma study we could find examining the influence of color (Sakaguchi, 2001). In that study, every possible target figure/background combination of white, red, green, and blue were tested. Fade times were longer when the background was blue and the target figure was white, red, or green compared with the other way around (e.g., blue target figure and white background; see Sakaguchi’s Fig. 7, effect of exchange). An examination of Sakaguchi’s fade times (see Sakaguchi’s Fig. 7, reaction times) also shows that fade times were generally longer for combinations involving red and/or blue compared with white/green or green/white. Thus, not only are our results consistent with Sakaguchi’s, but together they support Weisstein’s figure–ground model.

The longer fade times for blue in our experiment and Sakaguchi’s (2001) warrants further speculation why this occurs. We suggest it could be due to the difference between not stimulating/activating the M stream with blue light compared with suppressing M activity with red. While there is some debate about using red light to reduce M activity (e.g., Hugrass, Verhellen, Morrall-Earney, Mallon, & Crewther, 2018; Skottun, 2004) and about the extent to which S cones provide input to M cells in the retina (Sun et al., 2006a, 2006b) and LGN (Chatterjee & Callaway, 2002; Dacey, 2000; Martin & Lee, 2014), it may be that blue light is more effective at reducing dorsal-M activity compared with red light because it creates minimal stimulation of the M stream. Earlier, we used Fig. 3 to graphically represent and discuss our speculation about the possible M/P antagonistic relationships within and between figure and ground regions when the target is visible. Though suppressed with red light, M activity would still be greater with red compared with blue light because L-cones provide input to the magnocellular layers of the LGN involved in the response to luminance (Lennie, Pokorny, & Smith, 1993), which would contribute to M activity.

Could the tendency for blue not to fill-in in our experiments be due to the sparsity of S-cones across the retina? The sparse arrangement of S-cones across the retina means their signals are of low spatial resolution, which might result in S-cone signals being less able to extrapolate across space, reducing the tendency to fill-in. This explanation seems unlikely for two reasons. First is how easily blue filled-in for the Bl/Bl condition, similar to the other colors. Clearly, filling-in can occur with blue. Second, when the target figure is blue, the sparse S-cone signals from that region should make it easier for red, green, and gray to fill-in because they are processed by the denser distributions of L-cones and M-cones, but this was not the case.

It is unlikely that attention played a role in the pattern of results for the different-color combinations because participants were explicitly focusing their attention on the target while judging whether it was still visible or not. From a salience perspective, there was a difference in how salient the target figure was relative to the background for different-color versus same-color conditions (e.g., Gy/Gy) because in the latter the only thing distinguishing it from the background was the lack of texture in the figure, making them relatively less salient. This difference between same-color and different-color combinations was obviously evident phenomenally and also through computation of saliency maps for all color combinations (Itti & Koch, 2000; Itti, Koch, & Niebur, 1998), which showed equally poor differentiation of the target from the background for same-color combinations and similar clear differentiation of the target from the background in the different-color combinations.

The purpose of our study was to test Weisstein’s model of figure–ground perception using an artificial scotoma paradigm as a psychophysical tool. It was not specifically designed to explore the nature of color filling-in, perceptual filling-in in general, or to test theories of how it happens, particularly the delayed filling-in found with an artificial scotoma. There is much debate about the nature of filling-in and the neural mechanisms underlying it (e.g., Devinck & Knoblauch, 2019; Komatsu, 2006). Before considering the present results from some current perspectives on fading and filling-in, first consider the initial segmentation of our figure from the background. There is growing evidence that feedforward and recurrent activity along ventral striate and extrastriate areas (Friedman, Zhou, & von der Heydt, 2003; Johnson, Hawken, & Shapley, 2008; Shapley & Hawken, 2011; Shapley, Nunez, & Gordon, 2019; Zweig, Zurawel, Shapley, & Slovin, 2015) as well as across dorsal and ventral areas (Devinck & Knoblauch, 2019; Gerardin et al., 2018) are involved in extracting and integrating edge/boundary and surface-color signals. For our stimuli, then, other than the flickering background in Experiment 1 that did not influence the results, the two stimulus attributes contributing to figure–ground segregation were the texture edges created by the homogeneous figure against the textured background and the color edges when figure and background were different colors. While the mean luminance of the textured background was physically equiluminant with the target figure, there would still have been some luminance contrast along the texture edges because the luminance of the individual texture elements randomly varied from maximum to minimum along these edges relative to the constant luminance of the figure. These texture/luminance and color edges distinguishing the figure from ground would likely be processed by orientation-tuned double-opponent color-luminance cells in V1 sensitive to luminance and color contrast (for reviews, see Shapley, 2019; Shapley & Hawken, 2011). Their high-pass spatial frequency sensitivity makes them responsive to both luminance and chromatic edges compared with the low-pass sensitivity of single-opponent color-specific cells in V1, responsive to large uniform regions of color (Zweig et al., 2015). Thus, adaptation of such edge signals due to steady fixation plays an important role in the fading of an artificial scotoma, as evidenced by the lack of fading or immediate reappearance after fading due to eye movements. The shortest fade times found for our same-color conditions indicate greater resistance to adaptation when the target’s edges were also defined by color. However, we have found no indication in either the physiological or psychophysical literature on color processing or, as noted above, on the fading/filling-in of artificial scotomas, that would indicate why color edges involving red or blue light would be more resistant to adaptation under steady fixation. However, based on Weisstein’s model, once such underlying processes have contributed to segregating figure from ground, P and M stream processing are brought to bear differently to these perceived regions.

Gestalt psychologists were some of the first to note how the boundary separating figure from ground belongs to the figure. This is particularly evident in figure–ground reversible pictures where border ownership changes over time. The results of our experiments have implications for research and theory pertaining to border ownership. Unlike figure–ground reversible pictures, the border owned by the figure phenomenally ceases to exist when the target figure fades into the background with artificial scotoma stimuli. Based on how much shorter fade times were for the same-color combinations for our displays, it appears the mechanisms involved in the figure owning its boundary provide a stronger border ownership signal when the figure is defined by both a color and texture difference compared with only a texture difference. Thus, edge/boundary adaptation (e.g., adaptation of border ownership signals) is considered a primary factor involved in the filling-in of artificial scotomas, regardless of the feature(s) distinguishing the target figure from its background (i.e., luminance, color, orientation, texture, etc.; e.g., De Weerd, Desimone, et al., 1998; Sakaguchi, 2001; Weil & Rees, 2011). Two perspectives on these issues as they relate to the fading and filling-in of artificial scotomas are considered next.

According to Grossberg’s FACADE (form and color and depth) and 3-D LAMINART models (Grossberg, 1994, 1997, 1999, 2016), figure–ground separation (i.e., the resolution of border ownership) is achieved through the interaction of parallel cortical processing streams involved in boundary and surface computations (e.g., processing involving boundary and feature contours). According to these models “boundaries are completed in the cortical stream from V1 interblobs to V2 interstripes and on to V4, whereas surfaces are filled-in in the cortical stream from V1 blobs to V2 thin stripes and on to V4” (Grossberg, 2016, p. 3). Boundary contours act as both filling-in generators and barriers even though their computation and completion occur outside of awareness, such that “all boundaries are invisible” (Grossberg, 1994, p. 59). What we phenomenally perceive emerges from filling-in of surface color and brightness where boundary and feature contours align. Boundary contours act as barriers within which brightness and color are filled in and ultimately experienced. From this perspective, the boundary contour defining our target figure from the background allows feature contours from the color and/or texture difference between target and background to be contained within the figure. We might suppose the converse is also true. While the boundary contour defining the figure is owned by the figure, there would also be a boundary contour at the edge of the monitor defining the background, which would be containing the background’s color and texture within it. When the boundary contour activity defining the figure is no longer sufficient to signal the figure’s boundary, the color and/or texture contained by the background’s boundary contour is able to spread from the background into the figure region.

This description of the target figure being replaced by the background is also consistent with von der Heydt’s symbolic filling-in theory of color (von der Heydt, Friedman, & Zhou, 2003). According to the symbolic filling-in theory, irrespective of the colors involved, when a target figure like ours fades during steady fixation, what is perceived “is the colorFootnote 5 of the ‘underlying’ object whose representation is based on signals of more distant borders,” and where “small eye movements will revive the adapted border signals regenerating the object representation” (von der Heydt et al., 2003, p. 110; i.e., when the target reappears). Responses of cells in monkey extrastriate cortex (V2 and V3) to color (von der Heydt et al., 2003) and texture (De Weerd, Gattass, Desimone, & Ungerleider, 1995) boundaries related to filling-in likely play a role in processing border ownership, too (e.g., see von der Heydt, 2015). The present results show the figure resists fading when red or blue are one of the colors defining a boundary, regardless of border ownership. It is unclear why blue-edged and red-edged border ownership signals would be so resistant to fading from Grossberg or von der Heydt’s perspectives. Weisstein’s antagonistic model provides a reasonable account for the present results.

Weisstein’s model, as well as the effects of red and blue light we found, also seem to find support from two recent high-field (7T) high-resolution fMRI studies on the columnar organization of human striate and extrastriate cortex (Nasr & Tootell, 2018; Tootell & Nasr, 2017). Using stimuli expected to preferentially activate M and P streams based on their sensitivity to color versus luminance, binocular disparity, luminance contrast sensitivity, peak spatial frequency, and color/spatial interactions, Tootell and Nasr (2017) found evidence for segregated cortical columns in V2, V3, and V4, with thin and thick stripes in these regions being dominated by the P and M streams, respectively. They suggest “an interesting generality: distinct M and P streams (segregated between or within cortical areas) exist through classically retinotopic extrastriate areas” and that “cortex may tile (replicate) the visual field into duplicated M and P representations for each retinotopic location” (p. 8029). Such an organization along with recurrent connections with V1 might provide the neural underpinnings for the proposed M/P antagonism within figure and ground regions, as well as across the boundary between them that Weisstein envisioned.

In their study of hue preferences in striate and extrastriate cortex, Nasr and Tootell (2018) used equiluminant, low spatial frequency hue-varying gratings and compared their activation relative to achromatic gratings of the same mean luminance. They found striking differences in activation in V1 and across thin and thick stripes in V2 and V3 for end-spectral (red and blue) versus midspectral (green and yellow) hues. There was overall greater activation for end versus midspectral hues in V1. In V2 and V3, this hue preference reversed with greater activity evoked by midspectral versus end-spectral hues, and this activity having the greatest overlap with the M-dominant thick stripes. Compared with the M-dominant thick stripes, end-spectral hues showed the greatest overlap with the P-dominant thin stripes. Most relevant to our findings, while color selective fMRI responses to all hues were overall greater in the P-dominant thin relative to M-dominant thick stripes in V2 and V3 (see Fig. 5; Nasr & Tootell, 2018) as would be expected by the greater color selectivity of the P stream, there was no response to end-spectral hues in the thick stripes. We might speculate the lack of response to red and blue light in the M-dominant thick stripes could possibly be due to the suppressive effect of red light on and lack of S-cone input to the M stream. We might further speculate that this lack of response to red and blue light in these extrastriate M stream areas is consistent with our prediction of reduced M stream activity due to red and blue light within ground contributing to a weakened M-biased ground signal. Conversely, the stronger response to end-spectral hues in the P-dominant thin stripes suggests a relatively stronger P stream response to red and blue light. This would also be consistent with our predictions of a stronger P-biased figure signal within figure, resulting from red and blue light in figure reducing M stream antagonism. While we predicted a stronger figure signal from reduced antagonism from M stream activity there, it is unknown if the stronger thin stripe response to red and blue light Nasr and Tootell (2018) found is related to M/P antagonism (i.e., greater P stream thin-stripe activity due to reduced M stream antagonism from the relatively reduced thick-stripe responses to these wavelengths).

Finally, there are at least two major differences between the delayed filling-in found with artificial scotoma stimuli like those tested here and the instantaneous filling-in of blind spots and permanent scotomas. First, there is no phenomenal figure–ground experience with blind spots and permanent scotomas. When one is made aware of their blind spot or a permanent scotoma, there is a hole in, or the absence of, information from that part of the visual field. They are not figure–ground reversible stimuli like those used here. Second, because there are no retinal signals coming from permanent scotomas and the blind spot, there can be no M/P antagonism within those regions of the retina or across the boundary between them and functioning retina. Thus, Weisstein’s antagonistic figure–ground model is not applicable to the instantaneous filling-in associated with permanent scotoma and the blind spot.

In conclusion, using an artificial scotoma paradigm where figure–ground segregation fails, we investigated Weisstein’s antagonistic M/P figure–ground model by manipulating stimulus variables expected to reduce M activity. Our findings are consistent with prior studies leading to the model’s creation and are best explained by her antagonistic model.