← Back to Blog

From bare metal to bare glass: AI's final hardware

photonics optical-computing ai hardware infrastructure
From bare metal to bare glass: AI's final hardware

Picture a cube of glass small enough to hold in one hand. Sculpted inside it, in three dimensions at nanometre scale, are billions upon billions of optical nodes — one for every connection in a large language model. Light carrying your question enters one face and starts to walk. At every node it bends, splits, fades or reinforces, hopping junction to junction through the depth of the glass exactly the way a signal traverses a neural network. The beam that exits the far side has been shaped by every node it touched: it carries the answer. No GPU, no rack, no cooling tower, no electricity beyond the beam. The model is not running on hardware. The model is the hardware — and the inference is the path of the light through it.

This is a thought experiment, not a product announcement. Nothing like it exists, and parts of it may never exist. But it is assembled from pieces that are surprisingly real — and following it to its endpoint says something uncomfortable about the hundreds of billions currently being poured into AI datacenters.

The parts that already exist

The cube is not built from science fiction. It is built from three research lines that each work today, in isolation.

Passive optical inference. In 2018, researchers at UCLA demonstrated a diffractive deep neural network: a stack of printed, completely passive layers that classifies images as light physically passes through them. No processor, no power except the illumination itself. The computation happens at the speed of light through matter, because the computation is the matter. Research since then has extended the idea to natural, incoherent light. For small, fixed, feed-forward networks, the cube already exists.

Writing into glass with femtosecond lasers. Microsoft’s Project Silica uses femtosecond laser pulses to write data as nanoscale voxels inside plain glass — reportedly several terabytes in a single platter, with an estimated lifetime of around ten thousand years. The parameters of a frontier model would fit. Today this is storage, not computation, and writing is slow: filling one platter takes days. But the tool the cube needs — sculpting optical structure deep inside a solid block of glass, voxel by voxel — is the same tool.

Optical matrix multiplication as an industry. Companies like Lightmatter ship silicon-photonics hardware that performs the linear algebra at the heart of neural networks using light, and have demonstrated photonic processors running real networks such as ResNet and BERT. This is no longer a lab curiosity; it is a funded commercial sector attacking the energy cost of AI compute directly.

Passive optical inference exists. Dense, permanent, laser-written glass exists. Optical matrix multiplication is a business. The cube is these three lines extended until they meet.

Where the cube breaks

Honesty requires naming the places where the extension snaps, because they are not small.

  • Language models are not feed-forward. Generation is a loop: a token comes out, feeds back in, thousands of times, with a growing memory of the conversation. A passive block of glass gives you exactly one forward pass. Recirculating light, holding optical state, sampling the next token — all of that is unsolved, and every conversion back to electronics hands the job back to a chip.
  • Attention is computed from the input. Passive optics excels at fixed transformations — frozen weights are precisely what you can write into glass. But the attention mechanism multiplies activations by activations; the matrix changes with every prompt. That demands active, controllable optics, which is no longer a passive cube.
  • Light does not like to interact with light. Every nonlinear activation between layers needs either exotic materials or an optical-electronic-optical detour. This has been the bottleneck of optical neural networks for decades.
  • Precision. Analog optics realistically delivers a handful of effective bits. Quantisation research suggests language models tolerate that surprisingly well — but holding optical phase accuracy across billions of voxels in a centimetre-scale volume is far beyond current fabrication.

One more correction to the dream: the speed of light is the wrong selling point. A GPU’s latency is dominated by moving data in and out of memory, not by how fast signals travel. What optics genuinely offers is different and better — energy per operation approaching zero for passive structures, and massive parallelism, because many computations can share the same glass on different wavelengths at once.

The plausible path

The cube does not arrive by porting a transformer into glass. It arrives, if it arrives, in stages.

The first stage is already shipping: hybrid systems where photonics does the linear algebra and electronics keeps the control, memory, and nonlinearity. The second stage is passive optical inference for small, fixed models — preprocessing, classification, routing — where a written-once optical structure replaces a always-on accelerator. The third stage is the interesting one: model architectures designed for the medium. Attention-free, feed-forward-heavy networks co-designed with the optics that will embody them, the way today’s models were co-designed with the GPU. Each stage is more speculative than the last, and we would put decades, not years, on the final one — if it happens at all.

The model becomes an artifact

Now assume some version of it works, even partially. Something deeper than a performance improvement happens: the model turns from a service back into an object.

A model written into glass is a printed book, where today’s hosted model is a printing press you are only allowed to rent. An artifact can be bought once, owned outright, shipped like a record, locked in a drawer. It draws no power at rest, sends no telemetry, needs no API key, and cannot be re-priced, deprecated, or switched off by anyone. For a business, intelligence stops being a metered utility with someone else’s terms of service attached and becomes capital equipment — closer to a machine on the shop floor than to a subscription.

That single shift — from metered to owned — is what makes the cube more than a physics curiosity. It rewrites who holds the power in the AI economy.

How the bubble pops

Which brings us to the uncomfortable part. Analyst forecasts put combined hyperscaler capital expenditure above 600 billion dollars for 2026, with roughly three quarters of it tied to AI infrastructure, increasingly financed with debt, spent on accelerators that depreciate in a handful of years. Every dollar of that buildout is priced on one assumption: that inference stays scarce, centralised, and metered — that intelligence keeps flowing through someone else’s datacenter, billed by the token.

The history of computing is unkind to that assumption. Mainframe time-sharing was a magnificent rent-extraction machine until the microcomputer turned compute into an object on a desk. Grid electricity looked unassailable until panels started appearing on rooftops. In each case the incumbent economy did not pop because the technology failed — it popped because the technology succeeded so well that it stopped needing the incumbent.

The glass cube is the extreme endpoint of that pattern, but the pattern starts biting long before any cube exists. Photonic accelerators that cut inference energy by an order of magnitude already erode the moat. Small models running on owned hardware erode it further. Every step that moves inference from rented scarcity toward owned abundance deflates the same asset: the assumption baked into 600 billion dollars a year of spending. If the AI bubble pops, the trigger may not be disappointment in AI. It may be a chip — or a piece of glass — that delivers AI too cheaply for the rent model to survive.

The endgame is the disappearance of the hardware

The single takeaway: the final destination of AI hardware is for the hardware to stop being a service and dissolve into matter — and today’s AI economy is priced on the bet that this never happens. The cube as described will likely never run a frontier transformer; physics is genuinely in the way of the pure version. But the direction it points in — inference migrating from metered datacenters toward owned, near-zero-marginal-cost artifacts — is supported by every research line we can verify today.

For a business, the practical conclusion is quieter than the vision: do not anchor a long-term strategy to the assumption that intelligence stays expensive and rented. Build your processes so the model behind them is replaceable — because the model behind them will be replaced, possibly by something you own. If you want to think through what that means for your own systems, talk to us.