I recently read Natsu Onoda Power's God of Comics: Osamu Tezuka and the Creation of Post-World War II Manga (2009), a good introduction to Tezuka's career and achievements. The back cover text promises an in-depth study of Tezuka's oeuvre, which is misleading. God of Comics reads more like a generalist overview, and as such is ideal for a manga gaijin like me.
At one point in chapter three (titled "Movie in a Book"), Power's arguments and observations intersect with my own interests in film studies. Specifically, Power claims that Tezuka was heavily influenced by the visual aesthetics of Hollywood films of the 1930s and '40s, and beginning with Metropolis (1948) began to incorporate "a more solid vocabulary of 'cinematic techniques'" into his manga (God 54). One of these techniques is the use of moment-to-moment transitions between panels that mimic the infinitesimal passage of time between frames on a strip of motion picture film (Metropolis 19):
Power doesn't mention this, but I find it fascinating that the above page isn't designed to be read right-to-left in stacked tiers like traditional manga (and like the rest of Metropolis). Rather, the page makes sequential sense only if we read it up-and-down, as if Tezuka is deliberately emulating the verticality of a film strip as it feeds into the projector.
Power traces other examples of Tezuka's cinematic techniques. She draws similarities, for instance, between the crowd scenes in the Babylon sequence of D.W. Griffith's Intolerance (1916) and the large-panel depictions of panicking citizens and scientists that open Metropolis. (She also mentions other inspirations for Tezuka's crowds, notably George McManus' intricately-rendered splashes of Jiggs' old neighborhood in Bringing Up Father Sunday pages.) Most interesting to me, though, was Power's claim that "images inspired by deep-focus cinematography are particularly characteristic" of Tezuka's cartooning in Metropolis (God 56). I'm more than a little obsessed with deep focus, and in this post I want to explore and expand on Power's claim. I'll begin by defining deep focus and summarizing Andre Bazin's perceptual and philosophical arguments for its importance; then I'll look closely at Power's examples of deep focus in Metropolis. Finally, I'll question if it's accurate to talk about a comic (by Tezuka or any other cartoonist) having depth of field in the same way that a film does.
Andre Bazin and Deep Focus
It may seem counter-intuitive, but let's begin by looking at an example of a shot not in deep focus. (Readers already acquainted with deep focus can skip the next few paragraphs.) Below is a still from Inglourious Basterds (Quentin Tarantino, 2009), swiped from film scholar David Bordwell's blog:
This is an example of shallow focus, where only one aspect of the picture--in this case a reflection of Shosanna (Melanie Laurent)--is in crisp focus. Shosanna stands close to us (and the camera) in the left side of the fame, but her back is blotchy, indistinct, out of focus. Our eyes gravitate to the sharper image of her in the mirror. Most contemporary Hollywood films are shot in shallow focus like this, and the director and cinematographer use focus to guide the spectator's vision to important parts of the frame (the frontal view of Shosanna's face) and marginalize less important elements inside the frame, such as Shosanna's inexpressive hair and back.
In the 1930s and '40s, however, deep focus was more common in Hollywood movies. Director Orson Welles and cinematographer Gregg Toland, the men responsible for the visual style of Citizen Kane (1941), would've shot the frame of Shosanna so that everything was in focus, including the back of her head and her hands at dead center. In God of Comics, Power reproduces a deep focus still from a famous scene in The Best Years of Our Lives (William Wyler, 1946):
This deep focus shot reflects tensions among characters in Best's plot. Best chronicles the interlocking stories of three men--Al Stephenson (Fredric March), Fred Derry (Dana Andrews) and Homer Parish (Harold Russell)--as they return home from military service in World War II and face challenges in re-adapting to civilian life. In the still above, the closest character to the camera, Homer, is playing "Chopsticks" on the piano in a duet with a barkeep (Hoagy Carmichael), and this gets our attention, since Homer has hooks where his hands used to be. (Like his character, actor Russell lost his hands in a wartime accident.) In the middle distance, leaning on the piano, is Al. In the left background, in a phone booth, is Fred. Earlier in the scene, Al and Fred quarreled about Al's daughter Peggy (Teresa Wright), because Fred, an unhappily married man, has fallen in love with Peggy, and Al insists that Fred call Peggy and end their burgeoning, potentially adulterous relationship. As the scene at the piano unfolds, director Wyler occasionally inserts shots of Al looking off to his right, watching Fred making the fateful call, and one of these insert shots is included in God of Comics (59):
In contrast to the Basterds still, every distance in the Best frame--the foreground, the middle ground and the background--is in focus. But why is that important? According to film theorist Andre Bazin, the deep focus aesthetic offers several advantages. First, deep focus provides a density of visual details comparable (and analogous) to the real world. A devout Christian, Bazin believed that nothing could improve on God's creation and that the film medium's highest purpose is to replicate His creation as accurately as possible. This religious aspect of Bazin's thought is addressed in Richard Linklater's movie Waking Life (2001), where Caveh Zahedi describes Bazinian deep focus as the representation of the world as an endless succession of "holy moments":
The irony here, of course, is that Zahedi celebrates Bazin's notions of deep focus photography while starring in an animated film.
Bazin's second point is that deep focus affords audiences a freedom of perceptual choice unavailable in shallow focus shots. In the still from Basterds, the selective focus forces us to look at Shosanna's reflection rather than at the actual character standing in front of the mirror. In the Best sequence, however, individual spectators decide where their gazes will move and land: on Homer, on Al, on Fred in the phone booth, on any element of the mise-en-scene they choose. (As I watch a film in deep focus, I tend to pingpong among characters, my gaze following whoever's speaking.) According to Bazin, this freedom of visual choice implies "both a more active mental attitude on the part of the spectator and a more positive contribution on his part to the action in progress" (What is Cinema I, 35-36). Because the spectator chooses to zero in on some aspects of the frame ("I'm looking at Al's sour expression...") and not on others ("...but what was Homer doing while I was looking at Al?"), s/he constructs, at least in small measure, an aesthetic experience unique from other viewers who choose to watch different elements ("I see Fred making the phone call...") in the frame.
Some criticize Bazin on this point, especially Gerald Golzan, who argues that the idea of perceptual freedom is a reductio ad absurdum that threatens to empty the image of any meaning, spectator-made or otherwise. Golzan writes that viewers could focus solely on "the flies buzzing round the room or at the colour of the walls" (The New Wave, 54) when they watch a movie, but do they really "get" the movie that way? Whenever I think of Golzan's critique, I remember a little comic strip by Dan Clowes, first printed in Eightball # 5 (1989) and reprinted in Lout Rampage (1991):
If Grandma just stares at the lawn, does she understand golf? And if Grandma watches the Best scene, exercising her freedom by staring blankly at the ashtray on the piano, does she truly understand what's going on in the movie? Golzan believes that "the spectator's job is not simply to watch what he wants to, and say afterwards that he has the right to cuddle up to his girl friend" (The New Wave, 55-6). Rather, the viewer should try to understand how the director arranged the mise-en-scene to transmit story information, create a mood, or otherwise artfully influence an audience.
Bazin's third point about deep focus concerns what he calls "ambiguity of expression" (What is Cinema I, 36). In a shallow focus shot, the emotional identification we pour into the image is limited to the character(s) occupying the plane in focus, while deep focus allows us to extend our empathy to various characters in various planes. Visual depth, in other words, creates emotional depth. Because we clearly see all the men in the Best scene, we connect with all of them, and we comprehend their individual points of view. We understand why Homer plays the piano in a flagrant denial of his disability, why Al wants to protect his daughter, and why Fred is in love with Peggy. And it's significant that Al and Fred are on the two opposing sides of a single argument; since we identify with both sides, we come to realize the ambiguous and complex nature of the situation. According to Bazin, the quintessential example of deep-focus ambiguity is Citizen Kane, where the visual depth is a perfect complement to the depths of Charles Foster Kane himself. One of the original ads for Kane emphasized the contradictory aspects of Charlie Kane--"He's a scoundrel!" "He's a saint!" "He's a genius!"--and one of the film's most famous shots represents Kane as a fluid, ultimately unknowable proliferation of selves:
This ambiguity is, for Bazin, another way in which deep focus is faithful to God's creation. We remain a mystery to ourselves and others, and only God knows All.
Power, Tezuka and Deep Focus
In God of Comics, Power's comments on the origins of layered imagery in Tezuka's manga don't invoke Bazin's Christian metaphysics or any other epistemological theory. Rather, Power simply points out that the publication of Metropolis was roughly contemporaneous with--and possibly inspired by--such Gregg Toland-lensed movies as Kane and Ball of Fire (1941). Due to embargoes against American products, Japanese theaters didn't show Hollywood films from the early 1940s until after the war, but the deep focus aesthetic nevertheless proved to be popular with Japanese audiences. As Power writes,
In the July 1948 issue of CINE-AMERICA, a new magazine specializing in American films, six leading Japanese film critics discuss The Best Years of Our Lives in a "roundtable" format. In the article, Oka Toshio describes two scenes in the film that make innovative use of deep focus, discussing them in relation to Greg [sic] Toland's other works (29). Numerous other film magazines discussed this film, familiarizing the viewers with its cinematographic innovation long before its release in Japan. (59-60).
It's this cultural interest in filmic deep focus that encouraged Tezuka to compose panels that stage action in depth. Power further notes that Tezuka's multi-plane Metropolis panels tend to crop up in the story when a character is secretly watching someone else. One of the examples reproduced in God of Comics, for instance, features the detective Higeoyaji spying on the evil Duke Red and his robots:
One thing Power is careful not to do, however, is assume that Bazinian ideas about perception and metaphysics apply to Tezuka's multi-layered panels. She quotes Bazin only to the effect that deep focus allows directors and cinematographers to convert "the screen into a dramatic checkerboard, planned down to the last detail" (What is Cinema 1, 51). But her mention of Bazin in conjunction with Tezuka made me wonder: how does deep focus function in comics? Is there any substantial similarity between cinematic deep focus and the illusion of depth that a cartoonist can create in a panel? If Bazin's theories apply to the cinematic image (and I realize that's a big if), might they also apply to cartoon renditions of depth? My answer is no, and here's why.
Comics: Timeless and Sparse
One reason why filmic deep focus is drastically different from cartoon deep focus involves the modulation of time. One painfully obvious fact about watching a movie in the theater is that the spectator has no effect over the pace of the film. A traditional projector has a default setting for the speed at which a strip of film passes through its gate--24 frames a second--which is fast enough to wring a convincing illusion of motion out of still pictures. Filmmakers can fiddle around with the frame rates on their cameras to shoot in slow or fast motion, and they can use special effects to craft scenes in Matrix-style "bullet time" or give passages a jerky, slo-mo, pixilated look, as director Wong Kar-wei and cinematographer Christopher Doyle do at the beginning of Chungking Express (1994):
Somebody watching Chungking Express or The Matrix at their local theater, however, had no choice but to watch what appears on the screen, at the pace predetermined by the projector and the filmmakers, and this "predetermination" seems to me essential to Bazin's notion of ambiguity. Bazin's spectator is confronted with an unrelenting flow of visual information, and responds by making sacrifices: s/he can't take in all the information, so s/he chooses (consciously or unconsciously) to give specific elements in the frame (a particular character or prop) their closest attention. This means that the spectator inevitably devotes less attention to other information, and risks missing significant changes and movement out on the margins of their perception. Cognition scientists call this phenomenon change blindness, defined by Natalie Angier as "the frequent inability of our visual system to detect alterations to something staring us straight in the face," especially when we're not giving our full attention to that particular something. On the Interwebs, it's easy to find mindblowing tests that illustrate change blindness, including this collection of examples that uses flicker like cinema does. (Can you spot the difference between the alternating pictures?) Bazinian ambiguity is dependent on change blindness, on the fact that viewers will make different choices about what to focus on while watching a shot, and that ambiguity arises because nobody ever sees the complete picture, either literally or metaphorically.
Bazin died in 1958, at the far-too-young age of 40, and sometimes I wonder how he would've reacted to the Video Revolution. At the very least, home video technologies give the power to manipulate a film's pace to the audience. By pressing "slow" or "pause" on a remote control, a spectator can study the image on the screen for as long as s/he wants. My students typically use the freeze frame to ferret out weird bloopers (do those clouds really spell out "Sex" in The Lion King [1994]?), but the pause function has a more profound application: it lets us take the time to pay attention to all the planes of a deep focus image.
In other words, the pause button turns a film frame into a comic book panel. Comics readers have always been able to "pause," "rewind," and read a comic at their own chosen speed, though formal aspects of the comic itself--particularly panel sizes and layouts--can affect the reader's pace. (Think, for instance, of the slivers of time represented by the wordless panels on the final page of Feldstein and Krigstein's "Master Race," or the metronome-norm of the 9-panel grid in Watchmen.) When I first read a comic, I buzz through it as quickly as possible, reading exclusively for the plot, but as soon as I finish, I immediately re-read more slowly, to catch nuances I missed the first time and to wallow in the images in an aesthetic-picture-plane manner. There are lots of other ways to read comics, though, and I bet there's some grandmother somewhere who reads Dan Clowes just to see how beautifully he draws lawns. Comics readers don't have to pick which plane to pay attention to while reading the stacked-in-depth panels of Metropolis. They have the option to pause and scrutinize all the information in the panels. Comics readers have time to digest the whole picture, and--following Bazinian logic--are perhaps more aware of character and narrative points of view than their cinematic counterparts.
It's also an important difference that there's less visual information in a comic. Both still and motion picture cameras are designed to automate, as much as possible, the creation of images; we'll all taken a picture without looking through the viewfinder, and been surprised at how good the results still were. As filmmaker Maya Deren writes, photography--at least pre-digital chemical photography--is "a process by which an object creates its own image by the action of its light or light-sensitive material. It thus presents a closed circuit precisely at the point where, in the traditional art forms, the creative process takes place as reality passes through the artist" (Braudy and Cohen, Film Theory and Criticism, 189-190). This "closed circuit" of photography--built to capture the almost impossibly detailed visual world so celebrated in Bazin's writings--results in photos with bizarre and striking details utterly unplanned by the folks behind the cameras. Historian Ian Jeffrey describes how pioneering photographer and inventor William Henry Fox Talbot was beguiled by the aleatory quirks and "fascinating irrelevancy" that would sneak into even his most meticulously planned pictures:
"Sometimes inscriptions and dates are found upon buildings, or printed placards most irrelevant, are discovered upon their walls: sometimes a distant sundial is seen, and upon it--unconsciously recorded--the hour of the day at which the view was taken." To judge from his commentaries, Fox Talbot enjoyed such incidentals. At the same time, though, they were troublesome, for they meant that the instrument was only partially under control, recording disinterestedly in despite of its operator's intentions. (Photography: A Concise History, 12-13)
Comparatively speaking, nothing in a comics panel is accidental. Because cartooning is an open circuit--because reality does "pass through the artist" before s/he makes marks on the page--comics are intentional approximations of the world that omit or minimize the fascinating details of the photograph. This is especially true of early Tezuka manga like Metropolis, where Disney's influence is everywhere: the characters are rounded and animation-ready, the rooms are simple line drawings in single-point perspective, and Tezuka only draws what's necessary to keep the story bouncing along. In later work, of course, Tezuka added a lot more to his mise-en-scene, and created a realistic, habitable world in multiple planes of deep focus, but his characters remained streamlined abstractions, partially to keep them easier to draw, and partially (if we accept Scott McCloud's masking effect theory) to forge tight psychic connections between the characters and the readers. Tezuka never tried to capture the complexity of objective reality in photographic detail--most cartoonists don't--so ideas about "fascinating irrelevancy" and the visual density of deep focus don't apply.
When I watch Citizen Kane, I scramble to perceive as much as I can of the 24-frames-a-second flow of visual information; I'm overwhelmed by Welles and Toland's creative generosity. When I read Phoenix or Buddha, I pause, I linger, and I marvel at Tezuka's ability to draw supple abstractions based on the forms of the real world. And then I pause and linger some more.
Thanks to Natsu Onoda Power, whose comments in God of Comics inspired this post, and to Robert Ray, author of The Avant-Garde Finds Andy Hardy (1995), from where I pilfered Ian Jeffrey's quote about Fox Talbot.
Awesome post, Craig, a veritable basketful of links, suggestions, resources, observations. I've read it through twice now, first without your links, just to get the flow of argument, then again with plenty of link-clicking, just so I can follow the lateral connections you're making.
I like your emphasis on the way Bazinian staging in deep focus (with its corollaries: density of visual detail, freedom of perceptual choice, and ambiguity of expression) has a temporal as well as spatial dimension. You point out that deep focus not only is a way of thinking about pictorial space but also is a technique dependent on time, that is, on the (barring video freeze-frame) ceaseless flow of time in a film and how this affects the spectator. So often I've heard the debate between Bazinian depth and Eisensteinian montage explained as a dynamic between "space" and "time" (with Bazin linked to notions of pictorial space, Eisenstein et al. to notions of film as a "language" of statements unfolding in time). But you show here, most usefully, that the full apprehension of Bazinian deep focus also depends on time, that indeed Bazin's prized ambiguity is dependent on the spectator's lack of control over timing, over the speed with which the film/viewing apparatus depicts time.
Your point about the simplicity of Tezuka's early staging (if indeed staging is the right word for comics) is also intriguing. The filmstrip-like sequence from Metropolis, above, is almost schematic in its simplicity, its use of one-point perspective. The later work shows a much greater richness. (I find the gestures toward cinema in early Tezuka so self-conscious as to be almost distracting, in the same way that I'm distracted by the use in comics of fragmented sound bites, sound cuts, soundbridges, etc.)
BTW, I don't buy McCloud's "masking effect" theory, or at least the larger psychological underpinnings of same. I don't think simplicity in drawing results in the sort of absolute psychological identification McCloud claims, or at least I wouldn't put the argument in such stark terms.
Posted by: CharlesWHatfield | November 05, 2009 at 01:31 PM