Ways of seeing

By Mylene Zozaya Tinoco

 

Vision is one of the most fundamental ways in which human beings relate to the world around us. That sensory experience becomes “gaze”—a way of seeing that carries affective, interpretative, and aesthetic dimensions. From our perception of physical space emerge complex models that intertwine cognition, psychology, and even spirituality. But how do we see in today’s world? And more importantly, why does it matter how we see? Are we looking through our own being—our essence—or do we now see through formats imposed by others? Who are those others? Why have they shaped how we see? What codes are embedded in their systems? And most crucially: are we still capable of reclaiming the nature of our own vision? What would it take to liberate our gaze from the frameworks that have obscured it?

Throughout history, the ways of seeing, representing, and preserving the world have evolved alongside technological developments. The invention of linear perspective and, later, photography brought us closer to replicating human visual perception. These techniques enabled the creation of images aligned with the natural logic of human sight—especially through the use of lenses in the 40–50mm range, which closely approximate how we perceive space and depth. These visual systems made it possible to construct representations of the world consistent with human vision—an intentional choice by the image-maker to generate records grounded in human visual logic.

Today, we are undergoing a new transformation: the mass production of images through mobile phone photography. Far from being neutral or individual choices, these images are generated by software and refined by invisible algorithms that operate according to the visual logic of technological systems. Although often perceived as objective or spontaneous, these images do not correspond to human vision—in optics, perspective, or spatial construction. Instead, they become encoded representations that gradually shape both personal and collective memory, without our conscious awareness.

Simultaneously, these same images—circulating through social media, screenshots, and digital archives—serve as foundational material for the training of artificial intelligence systems, particularly Large Language Models (LLMs). While LLMs are commonly understood as text-based systems, they are increasingly multimodal, capable of interpreting not only language but also images. These systems rely on vast datasets composed of real-world media—much of it sourced from the digital traces of everyday life, including photographs taken with mobile devices. As such, algorithmically constructed images not only represent the world, but also shape how machines learn to interpret it. This creates a feedback loop in which machines learn to see through images built by other machines, reinforcing a non-human visual logic shaped by technological hegemony.

This investigation explores that transformation. It reveals how LLMs interpret digital photographs and exposes the disconnect between machine-generated vision and human-scale perception. Using a custom LLM-based algorithm, the project analyzes images “from machine to machine,” identifies the artificial perspective embedded in them, and reconstructs versions aligned with natural human sight. In doing so, it unveils the hidden systems that mediate our relationship with images and challenges the assumed neutrality of contemporary visual culture.

This project is an act of revealing the layered systems that shape and condition our gaze—systems that influence not only how we look, but also how we construct meaning, memory, and reality itself through images. Its aim is to invite viewers to recognize the complexity of seeing in contemporary life, and to reclaim the awareness needed to create their own ways of seeing—their own gaze