Artificial Intelligence Machine Learning Software Development

How AI Is Expanding the Boundaries of Human Intelligence

Dec 04, 2017 1,072 views

What if the most important question in computing right now isn't how smart we can make machines, but how much smarter machines can make us? That's the premise behind a growing body of research that sits at the crossroads of artificial intelligence and human-computer interaction — a synthesis its proponents are calling artificial intelligence augmentation, or AIA.

Two competing visions of what computers are actually for

The history of computing is, in many ways, a history of competing philosophies. ENIAC, the first general-purpose electronic computer, was built to calculate artillery firing tables for the U.S. Army. The machines that followed were similarly task-oriented — simulating nuclear blasts, modeling weather systems, plotting rocket trajectories. Computers were, in this framing, fast calculators: batch-processing engines that compressed weeks of human arithmetic into hours.

That framing started to crack in the 1950s, and by 1962 Douglas Engelbart had articulated a fundamentally different idea. In his proposal for augmenting human intellect, computers weren't number-crunchers — they were interactive partners in human thought. Real-time systems with rich inputs and outputs, designed not to replace human reasoning but to extend it. That vision of intelligence augmentation (IA) went on to shape the work of Alan Kay at Xerox PARC, Steve Jobs at Apple, and entire disciplines including data visualization, interaction design, and computational creativity.

For decades, IA and artificial intelligence (AI) have competed — for funding, for talent, for philosophical dominance. AI has typically framed its goals in terms of matching or exceeding human performance: beating grandmasters at chess, recognizing speech, translating language. IA has focused on something quieter but arguably more ambitious: building systems where humans and machines think together.

Generative models as a new kind of interface material

The emerging field of AIA proposes using AI systems not as replacements for human cognition, but as raw material for building new tools that expand it. One of the clearest demonstrations of what this looks like in practice comes from an unlikely domain: type design.

Designing a font involves navigating an enormous space of visual decisions. A single glyph rendered at 64×64 pixels requires 4,096 parameters to describe fully. Traditionally, tools for exploring variations — adjusting weight, width, or italicization — had to be built by hand, one axis at a time. But a generative model trained on existing fonts can compress that 4,096-dimensional space down to just 40 latent variables, each one capturing something meaningful about the font's character.

The model used in this research is a variational autoencoder (VAE), trained by James Wexler. By manipulating those 40 latent variables, a designer can move fluidly through font-space — generating bold, italic, or condensed variations from any starting design. More interestingly, users can define entirely new axes of variation themselves, simply by selecting examples. Drag a set of serif fonts to one side and sans-serif fonts to the other, and the model infers the interpolation axis on the fly. The tool builds itself from examples rather than from explicit programming.

That 40 numbers can meaningfully encode what previously required over 4,000 is not just a compression trick — it's evidence that generative models have learned something real about the structure of fonts. The latent space isn't arbitrary; it reflects genuine relationships between visual forms.

Why this matters beyond typography

The font example is deliberately modest in scope, but the questions it raises are not. If a generative model can give a type designer a new kind of exploratory instrument — one that didn't exist before the model existed — then what other instruments become possible across other creative and intellectual domains? The researchers behind AIA frame this as a question about interface primitives: the fundamental building blocks of how humans interact with information and ideas.

Every major shift in computing has introduced new primitives. The command line gave us text as instruction. The graphical interface gave us spatial metaphor. Touch gave us direct manipulation. Generative interfaces, the argument goes, could give us something harder to name but potentially more powerful: the ability to navigate high-dimensional spaces of meaning, to explore what a model "knows" and incorporate it into human creative work.

The open questions are serious ones. Can these tools generate ideas that are genuinely novel, or do they mostly recombine existing patterns in ways that feel new but aren't? Does interacting with a generative model expand how a person thinks, or does it subtly constrain thinking to what the model already represents? These aren't rhetorical questions — they're the foundational research problems of a field that is only just beginning to define itself.

The synthesis of AI and IA has been a long time coming, and the tools emerging from it are still crude by any honest measure. But the underlying idea — that the representations inside machine learning models can become the material for new kinds of human reasoning — points toward a version of computing that neither field could have built alone.

Neural networks trained on tens of thousands of fonts aren't just memorizing typefaces — they're quietly building a compressed theory of what typography actually is. That distinction turns out to matter quite a lot for how we think about AI-assisted design tools.

How a generative model learns the rules of type design

The model at the center of this work was trained on more than 50,000 fonts scraped from the open web. Through that process, the network adjusted its internal weights until it could reproduce any font from the training set by selecting the right combination of latent variables — compact numerical representations that encode each font's visual identity.

What makes this more than a sophisticated lookup table is generalization. By being forced to compress thousands of fonts into a much smaller representational space, the network had to learn something deeper: an abstract model of what a font fundamentally is. That abstraction allows it to produce fonts it has never seen before, not just replay ones it has. The analogy to scientific theory is apt here — thermodynamics doesn't describe every molecule in a gas, it finds the few variables (temperature, pressure) that capture the system's behavior. The same compression logic applies. In 1924, physicists used exactly this kind of reasoning to predict Bose-Einstein condensation before it was ever observed. Generative models carry a similar predictive potential.

The model isn't perfect. One recurring failure is the capital "Q" — many generated fonts drop its tail entirely. But the framework for what an ideal generative model would do remains a useful benchmark: given any conceivable font, whether existing or not yet imagined, the right latent variables should exist to represent it.

Attribute vectors: turning examples into design operations

The practical interface built on top of this model uses a technique called attribute vectors, introduced by Larsen et al. To create a bolding tool, the system takes a set of user-provided bold fonts and a set of non-bold fonts, computes the average latent vector for each group, then subtracts one from the other. The result — a bolding vector — points in the direction of "more bold" through the latent space.

Applying that vector to any font's latent representation shifts it toward bolder territory. The amount added controls the degree of change. The same method produces an italicizing vector, a condensing vector, and a serif vector. Each one is derived from examples, not hand-coded rules.

There are real artifacts at the extremes. Push the bolding vector too far in either direction and edges get rough, serifs start to dissolve. A better generative model would reduce those degradation effects — that's an open research problem. But even with current limitations, the approach does something a naive pixel-thickening algorithm simply cannot.

Why learned heuristics outperform manual rules

The gap between naive bolding and professional bolding is significant. A simple approach adds pixels around a glyph's edges uniformly. Expert type designers don't do that. In Georgia, the left stroke changes only slightly during bolding while the right stroke expands asymmetrically. In both Georgia and Helvetica, the overall height of the font stays constant — something a thickening algorithm would violate.

These aren't arbitrary stylistic choices. They're heuristics accumulated over decades of professional practice, refined through historical study and experimentation. Encoding all of them explicitly in a conventional program would be an enormous undertaking. The generative model absorbs them automatically from the training data.

One clear example: when bolding the letter "A", a naive tool would quickly fill in the enclosed triangular space at the top. The generative model preserves it — moving the crossbar down, expanding interior strokes more slowly than exterior ones. That heuristic improves legibility, and the model inferred it without being told. Font height preservation is another. These principles emerge from the data, not from explicit programming.

What this means for human-AI collaboration in creative tools

The font tool represents something broader than a typography utility. It functions as a cognitive technology — a set of interface primitives that users can internalize as new ways of thinking about design. Programs like Photoshop or 3D graphics software have always done this: they don't just automate tasks, they introduce new conceptual operations that reshape how practitioners approach their work. A bolding slider backed by a generative model is a fundamentally different mental object than a stroke-width slider.

The same architecture extends well beyond fonts. Generative interfaces built on the same principles can manipulate images of human faces along axes like expression or hair color, reshape sentences by tone or length, or adjust molecular structures by chemical properties. In each case, the interface provides a kind of map — a way for humans to navigate a high-dimensional space of possibilities that would otherwise be inaccessible without deep domain expertise.

The real promise here isn't automation. It's expansion — giving people without years of specialized training the ability to explore design spaces that were previously gated behind expert knowledge, while giving experts faster access to the heuristics they already know.

Machine learning models don't just process data — they absorb the assumptions baked into it. When researchers discovered that adding a "smile vector" to a face model also made faces appear more feminine, it wasn't a bug in the traditional sense. It was the model faithfully reflecting a statistical reality in its training data: more women than men were smiling. That kind of embedded bias is subtle, hard to detect, and raises a genuine question about how thoroughly we can ever audit these systems.

Why Attribute Vectors Are a Temporary Fix

The attribute vector approach — where you add a fixed directional offset in latent space to transform one version of something into another, say a regular font into a bold one — works well enough as a proof of concept. But it rests on a shaky assumption: that the same vector displacement applies equally to every starting point. In practice, bolding a serif font and bolding a sans-serif font involve meaningfully different visual heuristics, which suggests the underlying transformations should differ too.

A more robust approach would train a separate model to take an unbolded font's latent vector as input and predict the bolded version's latent vector — learning the transformation rather than approximating it with a constant offset. With enough training data on font weights, such a model could generate fonts across an arbitrary weight spectrum. Attribute vectors are essentially a shortcut, and like most shortcuts, they have limits. The expectation is that over the next few years, more sophisticated methods will replace them. What will likely persist, though, is the interface pattern itself: giving users access to high-level, semantically meaningful controls over generative outputs, whether or not attribute vectors are doing the work underneath.

iGANs and the Geometry of Creative Constraints

Interactive Generative Adversarial Networks, or iGANs, introduced by Zhu et al. in 2016, take a different but related approach to human-machine creative collaboration. Rather than variational autoencoders, they use GANs — but the core idea is the same: find a low-dimensional latent space that compactly represents a class of images, whether shoes downloaded from Zappos or landscape photographs, and let users navigate that space through direct manipulation.

The mechanics are worth understanding. When a user sketches a stroke — say, the outline of a mountain — the interface treats that stroke as a constraint on the latent space, narrowing down the set of valid images to those whose generated output matches the sketch. The system then finds the nearest point in latent space that satisfies the constraint without straying too far from the current image. It's an optimization problem: minimize the distance moved while maximizing constraint satisfaction.

What makes this more than a novelty is what happens implicitly. When a user fills in the sole of a shoe, the overall silhouette narrows and becomes sleeker. Small details — piping, coloring, proportions — adjust automatically, because the generative model has internalized what shoes actually look like from 50,000 training examples. The user doesn't specify any of that. They just draw a sole.

Cognitive Tools, Not Just Productivity Software

Both the font tool and iGANs point toward something larger than interface design. They represent a shift in what computers are actually for — away from the "cognitive outsourcing" model, where a machine solves a well-defined problem and returns an answer, and toward something more like cognitive expansion.

The outsourcing model has deep roots. Early computers were number-crunchers solving ballistics problems. Modern AI systems predict weather, classify images, and play Go at superhuman levels. In each case, the human poses a question and the machine answers it. The human's own thinking doesn't change.

The augmentation model works differently. When a designer using iGANs learns to think in terms of "add a higher top" or "adjust the heel profile," they're developing a new vocabulary for thinking about shoes — one that didn't exist for non-experts before. They're not outsourcing a decision; they're expanding the range of decisions they're capable of making. The same logic applies to Photoshop users who think fluently in layers, masks, and clone stamps. These are thoughts that were literally unthinkable before the tool existed.

Descartes' 1637 "Discourse on Method" — which showed how geometric ideas could be expressed algebraically and vice versa — is a useful historical parallel. It was a cognitive technology that permanently expanded what mathematicians could think. Historically, tools like that appeared rarely. What's different now is that computers are a meta-medium: a platform for rapidly generating new cognitive technologies, each one potentially reshaping how its users think. The iGANs prototype, rough visual quality and all, is an early sketch of what that looks like in practice.

The real measure of these tools won't be their output resolution or benchmark scores — it'll be whether they give people genuinely new ways to think, and whether those ways of thinking stick.

The history of human thought is, in many ways, a history of tools that changed what thinking itself could do. Writing didn't just record ideas — it restructured them. Feynman diagrams didn't just illustrate physics — they made certain kinds of reasoning possible that weren't before. The question now is whether artificial intelligence can play that same role: not as a shortcut around cognition, but as a means of expanding it.

Two ways AI can relate to human thinking

Most public conversation about AI centers on what it can do for you — generate text, answer questions, produce images on demand. This is cognitive outsourcing, and it's genuinely useful. But there's a deeper model worth taking seriously: cognitive transformation. Rather than delegating a mental task to a machine, the idea is that interacting with AI systems can change the representations and operations a person uses to think. The substrate of thought itself shifts.

This isn't a new concept in the history of technology. Writing emerged independently in Sumeria and Mesoamerica and fundamentally altered how humans organized knowledge. Centuries later, interface pioneers like Douglas Engelbart and Alan Kay built systems not just to automate tasks but to augment the range of what a person could conceive and manipulate. What's new is the possibility that AI systems — particularly those built on machine learning — can now participate in that same tradition of inventing cognitive tools.

A font exploration tool built on a generative model, for instance, isn't simply an oracle that produces typefaces on request. Used well, it becomes a space for discovery — a way of navigating relationships between visual forms that a designer might internalize over time, developing new intuitions about what makes letterforms work. The tool doesn't think for the designer. It gives the designer new things to think with.

The broader ecosystem of AI augmentation tools

Font design is one narrow example among many. The sketch-rnn system assists with neural network-aided drawing. The Wekinator lets users rapidly prototype new musical instruments and interactive art systems. TopoSketch enables animation through latent space exploration. There are generative models for typographic layout and for interpolating between musical phrases. Each of these systems uses machine learning to surface new primitives — new units of creative and cognitive action — that users can absorb into their own practice.

The common thread is that these tools don't replace human judgment; they extend the vocabulary available to it. Fields like computational creativity and interactive machine learning are converging toward this goal, building systems where the human and the model are genuinely collaborating on the shape of thought, not just the output of a task.

Why the best interfaces feel strange at first

There's a recurring pattern in the history of powerful representations: they tend to look wrong before they look right. The premiere of Stravinsky and Nijinsky's "Rite of Spring" nearly caused a riot. Early cubist paintings prompted The New York Times to ask, in 1911, whether those responsible had "taken leave of their senses." Richard Feynman's diagrammatic approach to quantum electrodynamics left physicists baffled at a 1948 workshop, while Julian Schwinger's more conventional algebraic method was immediately acclaimed.

The outcome, of course, is well known. Feynman's diagrams turned out to be mathematically equivalent to the Schwinger-Tomonaga approach but far more powerful in practice — easier to reason with, more generative of new insights. As James Gleick described in his biography of Feynman, physicists who initially struggled with the diagrams eventually found them indispensable. The strangeness wasn't a flaw. It was a signal that something genuinely new was being introduced.

This has direct implications for interface design. The conventional wisdom — that interfaces should be immediately intuitive, "user friendly" in the shallow sense — tends to produce tools built from familiar elements combined in predictable ways. Such interfaces are easy to pick up, but they don't reveal anything surprising about their subject. They don't change how you think. For routine tasks, that's acceptable. For deeper work, it's a ceiling.

The more ambitious goal is an interface that surfaces the deep principles underlying a domain — one that, once internalized, gives the user more powerful ways of reasoning about that domain. What initially seems strange becomes, over time, a natural part of how the user thinks. The strangeness was never the point; it was the cost of encountering something genuinely new.

What this demands from AI models — and where the gaps remain

For AI systems to serve this deeper augmentation role, they need to do more than fit patterns in data. They need to discover meaningful principles, recognize those principles as significant, and surface them through an interface in ways a user can actually engage with. That's a considerably harder problem than generating plausible outputs.

Current models occasionally stumble onto something like this — a generative font model that implicitly preserves enclosed negative space when simulating bold weight, for example, is capturing a real structural principle of typography. But it's capturing it implicitly. The model doesn't know it knows this, and the interface doesn't make it visible. Work on InfoGANs, which use information-theoretic methods to find structure in latent space, represents early progress toward making such principles explicit and manipulable. But the gap between where things stand and where they'd need to be for genuine cognitive transformation remains wide.

There's also the question of creativity. Generative interfaces constrain exploration to the space of patterns the model has learned — natural images, existing fonts, familiar musical structures. For craft-level creative work, that constraint is manageable and may even be useful, providing a structured space within which skilled recombination can happen. For the kind of creativity that breaks existing principles entirely — the Picasso or Monet mode, where new ways of seeing are invented rather than refined — the picture is less clear. Whether AI-assisted interfaces can support that kind of rupture, or whether they inevitably pull toward the center of what already exists, is a question the field hasn't yet answered.

What's already evident is that framing AI purely as a productivity tool misses something important. The more interesting question isn't what AI can do instead of you — it's what kinds of thinking become possible when AI is woven into the tools you use to think.

What separates a tool from a creative partner? That question sits at the heart of how researchers and artists are now thinking about generative models — not as autonomous creators, but as systems whose limitations and quirks can be exploited to produce something genuinely unexpected.

Why "Imperfect" Models May Be the Most Useful Ones

A perfectly trained GAN presents a paradox for creative work. By design, it learns to reproduce the statistical distribution of its training data — which means it excels at generating plausible recombinations of what it has already seen, but struggles to produce images grounded in entirely new visual principles. Those simply wouldn't resemble anything in its training set, so the model has no pathway to reach them.

Artists like Mario Klingemann and Mike Tyka have turned this constraint on its head. Both work with deliberately imperfect GAN models, and the results suggest that the noise, artifacts, and unpredictability of an undertrained system can open up territory that a polished model would never visit. The implication is counterintuitive but worth sitting with: for creative exploration, a model that fails in interesting ways may outperform one that succeeds reliably.

Beyond model quality, there's also the question of how interfaces are designed. Latent space exploration — navigating the compressed internal representation a model builds of its training data — is the dominant paradigm right now. But there's no fundamental reason interfaces have to stay within that space. Operations that deliberately push toward low-probability regions, or that step outside the latent space entirely, could surface images that feel genuinely surprising rather than merely novel.

The Deeper Possibility: Models That Outpace Human Discovery

GANs are one class of generative model, but the broader category raises a more ambitious question. If a model is powerful enough, the abstractions it builds from training data might encode patterns that human experts haven't consciously articulated yet. Exploring that model's latent space could then become a form of discovery rather than recombination.

The analogy offered here is striking: imagine a generative model trained on paintings from just before the cubist movement emerged. Could systematic exploration of that model's internal representations have surfaced something resembling cubism before any human artist arrived there? It's speculative, but it parallels documented cases in science — like the theoretical prediction of Bose-Einstein condensation before it was experimentally observed — where formal reasoning got ahead of empirical reality.

Current generative models aren't capable of that kind of leap. But framing it as an aspiration shapes how researchers think about what these systems should eventually be able to do.

Pix2Pix and the Role of Human Constraint

Not every illuminating example comes from generative models in the GAN sense. The pix2pix system, developed by Isola et al., takes a different approach: it trains on paired images — say, the edge outline of a cat alongside the full photograph — and learns to translate one into the other. Given a new set of edges, it generates a plausible corresponding image.

Where it gets interesting is under unusual constraints. When users feed pix2pix inputs that deviate significantly from anything in its training distribution, the outputs can be visually striking in ways that feel genuinely unfamiliar — not Picasso-level invention, but images that most people have simply never encountered before. The system isn't doing this alone. The human user is choosing the constraints, deciding which edge patterns to supply, and interpreting the results. The creativity emerges from that back-and-forth rather than from either party independently.

What This Means for How We Think About Machine Creativity

The through-line across these examples — imperfect GANs, latent space navigation, pix2pix under unusual inputs — is that the most interesting creative outcomes tend to involve friction rather than fluency. A model that behaves exactly as intended produces expected results. A model that can be pushed, misused, or constrained in unexpected ways becomes something closer to a collaborator.

That reframes the design problem. Building better generative models matters, but so does building interfaces that let users productively exploit the gaps between what a model was trained to do and what it actually produces when pushed. The artists already working in this space seem to understand that intuitively — the question is whether the research community catches up to what they're already doing in practice.

Source: https://distill.pub/2017/aia

Comments

No comments yet. Be the first to comment.