Compute, power and network resources — once treated as cheap commodities — are now critical bottlenecks in AI pipelines.
While latency and high throughput are paramount, they’re constrained; systems complexity is challenging and compute is extremely expensive. Enterprises face a sharp, unsustainable spike in token usage as AI workloads scale, with infrastructure bills showing no sign of dropping.
To balance this out, experts tell VentureBeat that enterprises should architect systems that are open, observable, adaptable and reversible; balance performance and cost; contextualize AI; and optimize for what matters most for their business.
“The requirements of these workloads are changing everything,” Chen Goldberg, SVP of engineering at cloud-based GPU provider CoreWeave, said at a recent VB Impact Tour event.
Building open, observable, reversible systems
One of the biggest system-level mistakes enterprises make: “The idea that you can just retrofit an AI system to your existing infrastructure,” said Goldberg.
Especially with reinforcement learning and gen AI workloads, the entire pipeline must change.; throughputs require a different type of network that can keep up with a space that is moving so quickly.
Inference, for instance, is becoming increasingly nuanced. As Goldberg noted, some inference is more sensitive to latency, some to availability or reliability; others less so on all counts. And, the process is much more iterative and multi-step than in the past.
“You take the models, you do your pre-training, fine tuning, then you run, you get results,” she noted. “So it's like training-inference, training-inference, training-inference.”
As an advocate for open source — she was part of the founding team of Kubernetes — Goldberg emphasized the importance of keeping systems open, observable and reversible. White box systems, as opposed to black box systems, can provide extensibility and flexibility and drive innovation because there is no “one-way door decision.”
Calculated risks are important, she noted, but if enterprise leaders don't know anything about the system they’re running on, they’re disempowered to innovate, make decisions or take risks. Enterprises need to consider the cost of change and whether they can make more “two-way door decisions” that are easily reversible and replaceable.
“Things are changing so quickly that people are worried about making decisions: Which vendor will I go to? Which kind of tools will I use? What kind of storage solution will I use?,” she said. “People are worried because those are big investments.”
Optimizing for the critical stuff; you can't have it all
When making architecture decisions, it’s important to remember that “not all GPUs are born the same,” Goldberg noted. There are many nuances between different platforms, and enterprises should choose based on how they access GPUs, the ease of that access, architecture observability and latency and performance drags. Also, how much time a system is actually running AI tasks?
“There's a lot of trade offs, a lot of decisions that we need to make every day,” said Goldberg. In the end, enterprises must optimize around what matters most to their business, “because you can’t get it all.”
One important question: What are you optimizing for? Just as critically, what’s the worst that could happen when making a strategic decision?
Access to power itself is another limitation, but Goldberg pointed out that there are many emerging technologies. CoreWeave, for its part, recently incorporated liquid cooling techniques. Power is “one of the most fascinating spaces right now in the industry,” she said. “There's so much innovation happening.”
Ultimately, Goldberg urged enterprise leaders to accept being uncomfortable and challenge the status quo. “I think that sometimes we are holding ourselves back, thinking through all those worst case scenarios,” she said. Instead: “Get that courage and move forward.”
How Wells Fargo contextualizes success
Success is no longer about proving AI works — it’s proven itself to be very powerful — but contextualizing it, Swarup Pogalur, managing director and CTO for digital and AI engineering at Wells Fargo, noted in a chat with VentureBeat CEO and editor-in-chief Matt Marshall.
“So it's more proof of value as opposed to proof of concept,” he said.
Wells Fargo has seen early success in consumer banking and contact center spaces, equipping employees with AI assistants to help them be more productive and spend more quality time with customers.
“If they're saying, ‘Hold on, let me go look at that,’ it's a swivel chair movement,” he said. “We are trying to reduce the amount of systems that have to go and scrape stuff.”
Previously, they had a simple retrieval-augmented generation (RAG)-based system that indexed and vectorized content from a multitude of different sources, then human agents had to sift through that and communicate next steps to the customer. Now, Wells Fargo has transitioned to a “complete self-serve tool” that walks agents and customers through transactions step-by-step with “in the moment messaging,” Pogalur explained.
“When we talk about human-in-the-loop, it's not just a risk deflection, it's an empowered knowledge worker,” he said. “They're making mindful decisions.”
A poly-cloud strategy
It’s important enterprises be flexible with their architectures; this is why Wells Fargo has built poly-cloud, poly-model frameworks with guardrails.
Wells Fargo has a strategic partnership with both GCP and Microsoft Azure, and is modernizing its apps by bursting into different infrastructure based on volume. GPUs have been added to the mix to make that poly cloud infrastructure more robust.
“It's not just one framework through which we build agents,” said Pogalur. “We want to give a full facade of more frameworks.” That could be LangGraph, semantic kernel, or any modalities natively available in cloud providers.
In recent years to prepare for AI, the financial giant has done a “huge recast” of its data centers, not just a ‘lift and shift’ but a modernization that is simpler, leaner and can run on a smaller footprint. “Our cost to run today versus cost to run in the future will look very different,” he said. “That’s a 5-to-10-year journey, not a one-year ROI.”
Another important consideration is keeping up with rapid-pace model releases. Unlike API versions where there's usually backward compatibility for 6 to 12 months, model providers are “not that patient” and are retiring older models much more quickly.
“So how do I protect my investments and the business continuity of apps that are consuming these?”
An atmosphere of experimentation
Wells Fargo supports “innovation at scale” but cherry picks the ideas that can add the most value. As Pogalur noted: “Not all ideas have a potential to be the next billion dollar idea, not all ideas are good ideas.”
This process is supported by an open-source “co-contribution model,” where small groups of internal users are given tools and asked for feedback, ‘thumbs up, thumbs down.’ The company also has a lab with a “completely walled infrastructure” where researchers can experiment with synthetic data generators that mimic Wells Fargo’s application surfaces.
“So they're able to test that, prove it out and say, ‘Hey, this works,’” Pogalur explained. “And then we figure out a way to source it into our ecosystem. It just gives you a power of scale and allows people to focus on building apps, as opposed to everybody learning a new framework as it gets dropped with a lack of consistency.”
Wells Fargo has also accepted OpenAI’s APIs as a standard to help establish consistency. This makes agent reskilling and platform rewriting “much, much faster, and cheaper for us.”
Continuous evaluation throughout these processes is critical; teams must test for bias and hallucinations, and analyze security risks and controls to help prevent breaches and prompt injection attacks. Pogalur noted that, while extremely important to AI development, open source frameworks can release vulnerable code into the wild.
Financial services, in particular, has to perform additional checks and balances, and every line of business must understand AI risks and establish mitigating controls.
Ultimately, Wells Fargo is taking a deliberate, pragmatic approach. “Getting to the latest and greatest on the day one of an announcement is not going to prove value,” said Pogalur. “We are trying to look at a steady state of adoption and scale and production.”