For more than a decade, cloud architectures have been built around a deliberate separation of storage and compute. Under this model, storage became a place to simply hold data while intelligence lived entirely in the compute tier.
This design worked well for traditional analytics jobs operating on structured, table-based data. These workloads are predictable, often run on a set schedule, and involve a smaller number of compute engines operating over the datasets. But as AI reshapes enterprise infrastructure and workload demands, shifting data processing toward massive volumes of unstructured data, this model is breaking down.
What was once an efficiency advantage is increasingly becoming a structural cost.
Why AI exposes the cost of separation
AI introduces fundamentally different demands than the analytics workloads businesses have grown accustomed to. Instead of tables and rows processed in batch jobs by an engine, modern AI pipelines now process large amounts of unstructured and multimodal data, while also generating large volumes of embeddings, vectors, and metadata. At the same time, processing is increasingly continuous, with many compute engines touching the same data repeatedly—each pulling the data out of storage and reshaping it for its own needs.
The result isn’t just more data movement between storage and compute, but more redundant work. The same dataset might be read from storage, transformed for model training, then read again and reshaped for inference, and again for testing and validation—each time incurring the full cost of data transfer and transformation. Given this, it’s no surprise that data scientists spend up to 80% of their time just on data preparation and wrangling, rather than building models or improving performance.
While these inefficiencies can be easy to overlook at a smaller scale, they quickly become a primary economic constraint as AI workloads grow, translating not only into wasted hours but real infrastructure cost. For example, 93% of organizations today say their GPUs are underutilized. With top-shelf GPUs costing several dollars per hour across major cloud platforms, this underutilization can quickly compound into tens of millions of dollars of paid-for compute going to waste. As GPUs increasingly dominate infrastructure budgets, architectures that leave them waiting on I/O become increasingly difficult to justify.
From passive storage to smart storage
The inefficiencies exposed by AI workloads point to a fundamental shift in how storage and compute must interact. Storage can no longer exist solely as a passive system of record. To support modern AI workloads efficiently and get the most value out of the data that companies have at their disposal, compute must move closer to where data already lives.
Industry economics make this clear. A terabyte of data sitting in traditional storage is largely a cost center. When that same data is moved into a platform with an integrated compute layer, its economic value increases by multiples. The data itself hasn’t changed; the only difference is the presence of compute that can transform that data and serve it in useful forms.
Rather than continuing to move data to capture that value, the answer is to bring compute to the data. Data preparation should happen once, where the data lives, and be reused across pipelines. Under this model, storage becomes an active layer where data is transformed, organized, and served in forms optimized for downstream systems.
This shift changes both performance and economics. Pipelines move faster because data is pre-prepared. Hardware stays more productive because GPUs spend less time waiting on redundant I/O. The costs of repeated data preparation begin to disappear.
Under this new model, “smart storage” changes data from something that is merely stored to a resource that is continuously understood, enriched, and made ready for use across AI systems. Rather than leaving raw data locked in passive repositories and relying on external pipelines to interpret it, smart storage applies compute directly within the data layer to generate persistent transformations, metadata, and optimized representations as data arrives.
By preparing data once and reusing it across workflows, organizations allow storage to become an active platform instead of a bottleneck. Without this shift, organizations remain trapped in cycles of redundant data processing, constant reshaping, and compounding infrastructure cost.
Preparing for AI-era infrastructure
The cloud’s separation of storage and compute was the right architectural decision for its time. But AI workloads have fundamentally changed the economics of data and exposed the limits of this approach—a constraint I’ve watched kill numerous enterprise AI initiatives, and a core reason I founded DataPelago.
While the industry has begun focusing on accelerating individual steps in the data pipeline, efficiency is no longer determined by squeezing marginal gains from existing architectures. It is now determined by building new architectures that make data usable without repeated preparation, excessive movement, or wasted compute. As AI’s demands continue to crystallize, it is becoming increasingly clear that the next generation of infrastructure will be defined by how intelligently storage and compute are brought together.
The companies that succeed will be the ones that make smart storage a foundation of their AI strategy.
—
New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
.NET’s annual cadence has given the project a solid basis for rolling out new features, as well as a path for improving its foundations. No longer tied to Windows updates, the project can provide regular previews alongside its bug and security fixes, allowing us to get a glimpse of what is coming next and to experiment with new features. At the same time, we can see how the upcoming platform release will affect our code.
The next major release, .NET 11, should arrive in November 2026, and the project recently unveiled its first public preview. Like earlier first looks, it’s nowhere near feature complete, with several interesting developments marked as “foundational work not yet ready for general use.” Unfortunately, that means we don’t get to play with them in Preview 1. Work is continuing and can be tracked on GitHub.
What’s most interesting about this first preview isn’t new language features (we’ll learn more about those later in the year), but rather the underlying infrastructure of .NET: the compiler and the run time. Changes here reveal intentions behind this year’s release and point to where the team thinks we’ll be running .NET in the future.
Changes for Android and WebAssembly
One big change for 2026 is a move away from the Mono runtime for Android .NET applications to CoreCLR. The modern .NET platform evolved from the open source Mono project, and even though it now has its own runtime in CoreCLR, it has used the older runtime as part of its WebAssembly (Wasm) and Android implementations.
Switching to CoreCLR for Android allows developers to get the same features on all platforms and makes it easier to ensure that MAUI behaves consistently wherever it runs. The CLR team notes that as well as improving compatibility, there will be performance improvements, especially in startup times.
For Wasm, the switch should again make it easier to ensure common Blazor support for server-side and for WebAssembly code, simplifying the overall application development process. The project to make this move is still in its early stages, with an initial SDK and interoperability work complete. There’s still a lot to do before it’ll be possible to run more than “Hello World” using CoreCLR on Wasm and WebAssembly System Interface (WASI). The project aims to have support for RyuJIT by the end of the .NET 11 development cycle.
Full support won’t arrive until .NET 12, and having a .NET runtime that’s code-compatible with the rest of .NET for WebAssembly is a big win for both platforms. You should treat the .NET 11 WASM CoreCLR capabilities as a preview, allowing you to experiment with various scenarios and use those experiments to help guide future development.
Native support for distributed computing
One of the more interesting new features appears to be a response to changes in the ways we build and deliver code. Much of what we build still runs on one machine, especially desktop applications, although more and more code needs to interact with external APIs. That code must run asynchronously so that an API call doesn’t become a blocker and hold up a user’s PC or device while an application waits for a response from a remote server. Operating this way is even more important for cloud-native applications, which are often loosely connected sets of microservices managed by platforms like Kubernetes or serverless Functions on Azure or another cloud platform.
.NET 11’s CoreCLR is being re-engineered to improve support for this increasingly important set of design patterns. Earlier releases needed explicit permission to use runtime asynchronous support in the CLR. In Preview 1, runtime async on CoreCLR is enabled by default; you don’t need to do anything to test how your code works with this feature, apart from installing the preview bits and using them with your applications.
For now, this new tool is limited to your own code, as core libraries are still compiled without runtime async support. That will change in the next few months as libraries are recompiled and added to future previews. Third-party code will most likely wait until Microsoft releases a preview with a “go live” license.
You can get a feel for how this feature is progressing by reading what the documentation describes as an “epic issue.” This lists the current state of the feature and what steps need to be completed. Work began during the .NET 10 timeframe, so much of the foundational work has been completed, although several key features are still listed as issues, including just-in-time support on multicore systems as well as certain key optimization techniques, such as when there’s a need to respond to actual workloads, recompiling on the fly using profile-guided optimization.
It’s important to note that issues like these are a small part of what needs to be delivered to land runtime async support in .NET 11. With several months between Preview 1’s arrival and the final general availability release, the .NET team has plenty of time to deliver these pieces.
With the feature in development, you’ll still need to set project file flags to enable support in ahead-of-time (AOT) compiled applications. This entails adding a couple of lines to the project file and then recompiling the application. For now, it’s a good idea to build and test with AOT runtime async and then recompiling when you are ready to try out the new feature.
Changes to hardware support
One issue to note is that the updated .NET runtime in .NET 11 has new hardware requirements, and older hardware may not be compatible. It needs modern instruction sets. Arm64 now requires armv8.0-a with LSE (armv8.2-a with RCPC on Windows and M1 on macOS), and x64 on Windows and Linux needs x86-64-v3.
This is where you might find some breaking changes, as older hardware will now give an error message and code will not run. This shouldn’t be an issue for most modern PCs, devices, and servers, as these requirements align with .NET’s OS support, rather than supporting older hardware that’s becoming increasingly rare. However, if you’re running .NET on hardware that’s losing support, you will need to upgrade or stick with older code for another year or two.
There are other hardware platforms that get .NET support, with runtimes delivered outside of the environment. This includes support for RISC-V hardware and IBM mainframes. For now, both are minority interests: one to support migrations and updates to older enterprise software, and one to deliver code on the next generation of open hardware. It’ll be interesting to see if RISC-V support becomes mainstream, as silicon performance is improving rapidly and RISC-V is already available in common Internet of Things development boards and processors from organizations and companies like Raspberry Pi, where it is part of the RP2350 microcontroller system on a chip.
Things like this make it interesting to read the runtime documentation at the start of a new cycle of .NET development. By reading the GitHub issues and notes, we can see some of the thinking that goes into .NET and can take advantage of the project’s open design philosophy to plan our own software development around code that won’t be generally available until the end of the year.
It’s still important to understand the underpinnings of a platform like .NET. The more we know, the more we can see that a lot of moving parts come together to support our code. It’s useful to understand where we can take advantage of compilers and runtimes to improve performance, reliability, and reach.
After all, that’s what the teams are doing as they build the languages we will use to write our applications as .NET moves on to another preview, another step on the road to .NET 11’s eventual release.
I’ve watched cloud careers rise and fall with each new wave of tools, from the early “lift-and-shift everything” days to today’s platform engineering, AI-ready data estates, and security-by-default mandates. Through all of it, the role that stays stubbornly in demand is the cloud architect because the hardest part of cloud has never been spinning up resources. The hard part is making hundreds of decisions that won’t quietly compound into outages, cost blowouts, security gaps, or organizational gridlock.
That’s why, even when organizations are moving from cloud to cloud or swapping one set of managed services for another, they still need deep planning capabilities. The platform names change, the service catalogs get refreshed, and vendors repackage features, but the enterprise constraints remain: regulatory obligations, latency and resiliency requirements, identity and access realities, data gravity, contractual risk, and the simple fact that large companies rarely move in a straight line. Cloud architecture is the discipline that prevents transformation programs from becoming expensive improvisation.
Easy to adopt, hard to industrialize
Most companies can get to cloud quickly. A few motivated teams, a credit card, and some well-meaning enthusiasm can produce working workloads in weeks. What you can’t do quickly is scale that success safely across dozens or hundreds of teams while preserving governance, predictable costs, and operational integrity. Industrializing cloud means standardizing patterns without crushing innovation, creating guardrails without blocking delivery, and giving engineers paved roads that are truly easier than off-roading.
This is where architects become force multipliers. In many enterprises, you’ll find dozens of cloud architects assigned across portfolios, projects, and solution development efforts, with a mix of junior and senior levels. Junior architects often focus on implementing reference patterns, helping teams conform to landing zones, and translating standards into deployable templates. Senior architects spend more time shaping the operating model, defining the target architecture, arbitrating trade-offs, and coaching leaders through decisions that ripple across the business.
Compensation follows leverage. In major markets, it’s common to see total annual compensation for experienced cloud architects exceed $200,000, particularly when the role includes broad platform scope, security accountability, and cross-domain influence. One good architect can keep a large organization out of trouble in ways that save far more than the cost of the role.
Daily life of a cloud architect
The best architects don’t “draw diagrams” as an end in itself. They create clarity. On a daily basis, they translate business intent into technical constraints and then into designs that teams can execute. They review solution approaches, challenge hidden assumptions, and ensure that the architecture aligns with the enterprise’s risk posture, delivery maturity, and budget reality.
A typical day includes a steady cadence of conversations and artifacts. There are design reviews where an architect examines network topology, identity flows, encryption boundaries, data classification, and resiliency patterns to verify that a workload won’t fail compliance audits or operational expectations. There are platform decisions about landing zones, shared services, segmentation strategies, private connectivity, and the balance between central control and team autonomy. There is constant attention to cost behavior because architectures don’t just “run.” They consume, and consumption becomes a strategic issue at scale.
Architects also mediate between competing truths. Security wants least privilege and tight controls, product teams want speed, finance wants predictability, and operations wants standardization. The architect’s job is to create a design that meets the business goal with an operationally supportable system. That means documenting nonfunctional requirements, setting service-level objectives, designing for failure, planning disaster recovery, choosing managed services wisely, and preventing accidental complexity.
Another major function is modernization planning. Even when the company is not migrating, it is still evolving: moving from VMs to containers, from containers to serverless, from bespoke data pipelines to managed analytics platforms, or from one identity approach to a unified zero-trust posture. Cloud architects provide the sequencing and the guardrails so that change doesn’t break everything that currently works.
Why demand stays high
Cloud-to-cloud migrations and moves from technology to technology within the cloud are often driven by economics, risk, mergers and acquisitions, data residency, or strategic leverage against a vendor. These moves are rarely clean. They involve interoperability, phased cutovers, temporary duplication, and years of coexistence. In that environment, teams can’t just chase feature parity; they need an architectural blueprint that defines what “done” means and how to get there without creating a brittle, duplicated mess.
Architects are also the antidote to the myth that cloud decisions are reversible. In theory, everything is abstracted. In reality, organizations build around specific services, identity and access management, logging pipelines, networking constructs, and operational habits. Those become sticky. An architect anticipates stickiness and designs for it, using patterns that preserve options where it matters and committing deliberately where the payoff is worth it.
This is also why advancement opportunities are so strong. As architectures grow, the role naturally expands into platform leadership, cloud center of excellence direction, principal architect positions, and enterprise architecture. The most valuable architects become trusted advisors because they can connect strategy to execution without hand-waving.
How to become a cloud architect
Start by building depth in fundamentals and breadth in systems thinking. You can’t architect what you don’t understand, so get hands-on with networking, identity, security, and observability, not just compute and storage. Learn how systems fail, how incidents are managed, and how costs emerge from architecture, because those realities shape every good design.
Next, accumulate “pattern experience.” Build and operate a few real systems end to end, then document what you learned. What would you standardize? What would you avoid? Which trade-offs surprised you? Architecture is applied judgment, and judgment comes from seeing consequences over time. Pair that with structured learning, including cloud provider certifications if they help you organize your knowledge, but don’t confuse badges with mastery. The goal is to be fluent in a cloud’s primitives while remaining capable of designing across clouds and across organizational boundaries.
Finally, develop the communication skills that turn architecture into outcomes. Learn to write clear decision records, present trade-offs without drama, and negotiate constraints with empathy. The strongest architects are credible because they can meet teams where they are, raise the maturity level pragmatically, and keep the enterprise moving forward without creating bureaucracy.
Cloud architects remain in such high demand because they reduce risk, prevent costly missteps, and make cloud adoption scalable and repeatable. Their daily work blends technical design, governance, cost, security, and cross-team alignment. If you want the role, build strong fundamentals, collect real-world pattern experience, and master the communication skills that turn diagrams into dependable systems.
Rust developers are mostly satisfied with the current pace of evolution of the programming language, but many worry that Rust does not get enough usage in the tech industry, that Rust may become too complex, and that the developers and maintainers of Rust are not properly supported.
These findings are featured in the Rust Survey Team’s 2025 State of Rust Survey report, which was announced March 2. The survey ran from November 17, 2025, to December 17, 2025, and tallied 7,156 responses, with different numbers of responses for different questions.
Asked their opinion of the pace at which the Rust language is evolving, 57.6% of the developers surveyed reported being satisfied with the current pace, compared to 57.9% in the 2024 report. Asked about their biggest worries for the future of Rust, 42.1% cited not enough usage in the tech industry, compared to 45.5% in 2024. The other biggest worries were that Rust may become too complex (41.6% in 2025 versus 45.2% in 2024) and that the developers and maintainers of Rust are not properly supported (38.4% in 2025 versus 35.4% in 2024).
The survey also asked developers which aspects of Rust present non-trivial problems to their programming productivity. Here slow compilation led the way, with 27.9% of developers saying slow compilation was a big problem and 54.68% saying that compilation could be improved but did not limit them. High disk space usage and a subpar debugging experience were also top complaints, with 22.24% and 19.90% of developers citing them as big problems.
In other findings in the 2025 State of Rust Survey report:
91.7% of respondents reported using Rust in 2025, down from 92.5% in 2024. But 55.1% said they used the language daily or nearly daily last year, up from 53.4% in 2024.
56.8% said they were productive using Rust in 2025, compared to 53.5% in 2024.
When it comes to operating systems in 2025, 75.2% were using Linux regularly; 34.1% used macOS, and 27.3% used Windows. Linux also was the most common target of Rust software development, with 88.4% developing for Linux.
84.8% of respondents who used Rust at work said that using Rust has helped them achieve their goals.
Generic const expressions was the leading unimplemented or nightly-only feature that respondents in 2025 were looking to see stabilized, with 18.35% saying the feature would unblock their use case and 41.53% saying it would improve their code.
Visual Studio Code was the IDE most commonly used to code with Rust on a regular basis in 2025, with 51.6% of developers favoring it.
89.2% reported using the most current version of Rust in 2025.
3 takeaways from the top 5 cloud data platforms
Architectural convergence: Warehouses and data lakes are merging into unified lakehouses to eliminate data silos and simplify global governance.
AI-native integration: Platforms now embed LLMs and agent-builders directly, bringing AI to the data rather than moving the data.
Zero-ETL connectivity: New virtualization and shortcuts allow querying across different clouds without the cost or complexity of moving data
Choosing the right data platform is critical for the modern enterprise. These platforms not only store and protect enterprise data, but also serve as analytics engines that source insights for pivotal decision-making.
There are many offerings on the market, and they continue to evolve with the advent of AI. However, five prominent players — Databricks, Snowflake, Amazon Redshift, Google BigQuery, and Microsoft Fabric — stand out as the leading options for your enterprise.
Databricks
Founded in 2013 by the creators of the open-source analytics platform Apache Spark, Databricks has established itself as one of the dominant players in the data market. Notably, the company coined the term and developed the concept of a data lakehouse, which combines the capabilities of data lakes and data warehouses to give enterprises a better handling of their data estates.
Data lakehouses create a single platform incorporating both data lakes (where large amounts of raw data are stored) and data warehouses (which contain categories of structured data) that typically operate as separate architectures. This unified system allows enterprises to query all data sources together and govern the workloads that use that data.
The lakehouse has become its own category and is now widely used and incorporated into many IT stacks.
Databricks presents itself as a “data+AI” company, and calls itself the only platform in the industry featuring a unified governance layer across data and AI, as well as a single unified query engine across ML, BI, SQL, and ETL.
Databricks’ Data Intelligence Platform has a strong focus on ML/AI workloads and is deeply tied to the Apache Spark ecosystem. Its open, flexible environment supports almost any data type and workload.
Further, to support the agentic AI era, Databricks has rolled out a Mosaic-powered Agent Bricks offering, which gives users tools to deploy customized AI agents and systems based on their unique data and needs. Enterprises can use retrieval-augmented generation (RAG) to build agents on their custom data and use Databricks’ vector database as a memory function.
Core platform: Databricks’ core offering is its Data Intelligence Platform, which is cloud-native — meaning it was designed from the get-go for cloud computing — and built to understand the semantics of enterprise data (thus the “intelligence” part).
The platform sits on a lakehouse foundation and open-format software interfaces (Delta Lake and Apache Iceberg) that support standardized interactions and interoperability. It also incorporates Databricks’ Unity Catalog, which centralizes access control, quality monitoring, data discovery, auditing, lineage, and security.
DatabricksIQ, Databricks’ Data Intelligence Engine, fuels the platform. It uses generative AI to understand semantics, and is based on innovations from MosaicML, which Databricks acquired in 2023.
Deployment method: Databricks is a built-on cloud platform that has established partnerships with the top cloud providers, including Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).
Pricing: A pay-as-you-go model with no upfront costs. Customers only pay for the products they use at “per second granularity.” There are different pricing-per-unit options for data engineering, data warehousing, interactive workloads, AI, and operational databases (ranging from .07 to .40). Databricks also offers committed use contracts that provide discounts when customers commit to certain levels of usage.
Challenges/trade-offs: Operation can be more complex and less “plug and play”: Users are essentially running an Apache Spark-based platform, so there’s more to manage than in serverless environments that are easier to operate and tune. Pricing models can tend to be more complex.
Additional considerations for Databricks
A unified stack provides data pipelines, feature engineering, BI, ML training, and other complex tasks on the same storage layer.
Support for open formats and engines — including Delta and Iceberg — doesn’t lock users into a storage engine.
Unity Catalog provides a common governance layer, and data descriptions and tags can help the platform learn an enterprise’s unique semantics.
Agent Bricks and MLflow offer a strong AI and ML toolkit.
Snowflake
Snowflake, founded in 2013, is considered a pioneer in cloud data warehousing, serving as a centralized repository for both structured and semi-structured data that enterprises can easily access for analysis and business intelligence (BI).
The company is considered a direct competitor to Databricks. In fact, as a challenge to the data lakehouse pioneer, Snowflake claims it has always been a hybrid of data warehouses and data lakes.
Core platform: Snowflake positions itself as an ‘AI Data Cloud’ that can manage all data-driven enterprise activities. Like Databricks, its platform is cloud-native and it unifies storage, elastic compute, and cloud services.
Snowflake can support AI model development (notably through its agent-builder platform Cortex AI), advanced analytics, and other data-heavy tasks. Its Snowgrid cross-cloud layer supports global connectivity across different regions and clouds (thus allowing for consistency in performance) while a Snowflake Horizon governance layer manages access, security, privacy, compliance, and interoperability.
Integrated Snowpipe and Openflow capabilities allow for real-time ingestion, integration, and streaming, while Snowpark Connect supports migration and interoperability with Apache Spark codebases. Further, Cortex AI allows users to securely run large language models (LLMs) and build generative AI and agentic apps.
Deployment method: Like Databricks, Snowflake has partnerships with major players, running as software-as-a-service (SaaS) on AWS, Azure, GCP, and other cloud providers. Notably, a key strategic partnership with Microsoft allows customers to buy and run Azure Databricks directly and integrate with other Azure services.
Pricing: A consumption-based pricing model. Customers are charged for compute in credits costing $2 and up based on subscription edition (standard, enterprise, business critical or virtual private Snowflake) and cloud region. A monthly fee for data stored in Snowflake is calculated based on average use.
Snowflake strengths: Snowflake positions itself as a turnkey, managed SQL platform for data‑intensive applications with strong governance and minimal tuning required.
Further, the company continues to innovate in the agentic AI era. For instance, Snowflake Intelligence allows users to ask questions, and get answers, about their data in natural language. Cortex AI provides secure access to leading LLMs: Teams can call models, perform text-to-SQL commands, and run RAG inside Snowflake without exposing their data.
Snowflake challenges/trade-offs
Snowflake’s proprietary storage and compute engine are less open and controllable than a lakehouse environment.
Cost can be difficult to visualize and manage due to credit-based pricing and serverless add-ons.
Users have reported weaker support for unstructured data and data streaming.
Additional considerations for Snowflake
Elastic compute provides strong performance for numerous users, data volumes, and workloads in a single, scalable engine.
There’s little infrastructure to manage: Snowflake abstracts away most capabilities, such as optimization, planning, and authentication.
Storage is interoperable and users get un-siloed access.
Snowgrid capabilities work across regions and clouds — whether AWS, Azure, GPC, or others — to allow for data sharing, portable workloads, and consistent global policies.
These five platforms are the dominant leaders in the cloud data ecosystem. While they all handle large-scale analytics, they differ significantly in their architecture (e.g.,warehouse vs. lakehouse), ecosystem ties, and target users.
Foundry
Amazon Redshift
Amazon Web Services (AWS) Redshift is Amazon’s fully managed, petabyte-scale cloud data warehouse designed to replace more complex, expensive on-premises legacy infrastructure.
Core platform: Amazon Redshift is a queryable data warehouse optimized for large-scale analytics on massive datasets. It is built on two core architectural pillars: columnar storage and massively parallel processing (MPP). Content is organized in different nodes (columns) and MPP can quickly process these datasets in tandem.
Redshift uses standard SQL to interact with data in relational databases and integrates with extract, transform, load (ETL) tools — like AWS Glue — that manage and prepare data. Through its Amazon Redshift Spectrum feature, users can directly query data from files on Amazon Simple Storage (Amazon S3) without having to load data into tables.
Additionally, with Amazon Redshift ML, developers can use simple SQL language to build and train Amazon SageMaker machine learning (ML) based on their Redshift data.
Redshift is deeply integrated in the AWS ecosystem, allowing for easy interoperability with numerous other AWS services.
Deployment method: Amazon Redshift is fully-managed by AWS and is offered in both provisioned (a flat determined rate for a set amount of resources, whether used or not) and serverless (pay-per-use) options.
Pricing: Offers two deployment options, provisioned and serverless. Provisioned starts at $0.543 per hour, while serverless begins at $1.50 per hour. Both options scale to petabytes of data and support thousands of concurrent users.
Amazon Redshift strengths: AWS Redshift’s main differentiator is its strong integration in the broader AWS ecosystem: It can easily be connected with S3, Glue, SageMaker, Kinesis data streaming, and other AWS services. Naturally, this makes it a good fit for enterprises already leaning heavily into AWS. They can securely access, combine, and share data with minimal movement or copying.
Further, AWS has introduced Amazon Q, a generative AI assistant with specialized capabilities for software developers, BI analysts, and others building on AWS. Users can ask Amazon Q about their data to make decisions, speed up tasks and, ideally, increase productivity.
Amazon Redshift challenges/trade-offs
Ecosystem lock-in: While it fits quickly and easily into the AWS environment, Redshift might not be a good fit for enterprises with multi-cloud or cloud-agnostic strategies.
Even as it is managed by AWS, though, users say it is not as hands-off as other options. Some compaction tasks must be run manually (vacuum), ETL processes must be checked regularly, and continuous monitoring of unusual queries can negatively impact service performance.
Additional considerations for Redshift
Devs find Redshift easy to use because of its SQL backbone.
The platform is highly-performance and scalable thanks to its columnar architecture, decoupled compute and storage, and MPP.
AWS offers flexible deployment options: provisioned clusters for more predictable workloads, serverless for spikier ones.
Zero-ETL capabilities simplify data ingestion without complex pipelines, thus supporting near real-time analytics.
Google BigQuery
Google BigQuery started out as a fully managed cloud data warehouse that Google now sells as an autonomous data and AI platform that automates the entire data lifecycle.
Core platform: Google BigQuery is a serverless, distributed, columnar data warehouse optimized for large‑scale, petabyte-scale workloads and SQL‑based analytics. It is built on Google’s Dremel execution engine, allowing it to allocate queries on an as-needed basis and quickly analyze terabytes of data with fewer resources.
BigQuery decouples compute (Dremel) and storage, housing data in columns in Google’s distributed file system Colossus. Data can be ingested from operational systems, logs, SaaS tools, and other sources, typically via extract, transform, load (ETL) tools.
BigQuery uses familiar SQL commands, allowing developers to easily train, evaluate, and run ML models for capabilities like linear regression and time-series forecasting for prediction, and k-means clustering for analytics. Combined with Vertex AI, the platform can perform predictive analytics and run AI workflows on top of warehouse data.
Further, BigQuery can integrate agentic AI, such as pre-built data engineering, data science, analytics, and conversational analytics agents, or devs can use APIs and agent development kit (ADK) integrations to create customized agents.
Deployment method: BigQuery is fully-managed by Google and serverless by default, meaning users do not need to provision or manage individual servers or clusters.
Pricing: Offers three pricing tiers. Free users get up to 1 tebibyte (TiB) of queries per month. On-demand pricing (per-TiB) charges customers based on the number of bytes processed by each query. Capacity pricing (per slot-hour) charges customers based on compute capacity used to run queries, measured in slots (virtual CPUs) over time.
Google BigQuery strengths: BigQuery is deeply coupled with the GCP ecosystem, making it an easy choice for enterprises already heavily using Google products. It is scalable, fast, and truly serverless, meaning customers don’t have to manage or provision infrastructure.
GCP also continues to innovate around AI: BigQuery ML (BQML) helps analysts build, train, and launch ML models with simple SQL commands directly in the interface, and Vertex AI can be leveraged for more advanced MLOps and agentic AI workflows.
Google BigQuery challenges/trade-offs
Costs for heavy workloads can be unpredictable, requiring discipline around partitioning and clustering.
Users report difficulties around testing and schema mismatches during ETL processes.
Other considerations for BigQuery
BigQuery can analyze petabytes of data in seconds because its architecture decouples storage (Colossus) and compute (Dremel engine).
Google automatically handles resource allocation, maintenance, and scaling, so teams do not have to focus on operations.
Flexible payment models cover both predictable or more sporadic workflows.
Standard SQL support means analysts can use their existing skills to query data without retraining.
Microsoft Fabric
Microsoft Fabric is a SaaS data analytics platform that integrates data warehousing, real-time analytics, and business intelligence (BI). It is built on OneLake, Microsoft’s “logical” data lake that uses virtualization to provide users a single view of data across systems.
Core platform: Fabric is delivered via SaaS and all workloads run on OneLake, Microsoft’s data lake built on Azure Data Lake Storage (ADLS). Fabric’s catalog provides centralized data lineage, discovery, and governance of analytics artifacts (tables, lakehouses and warehouses, reports, ML tools).
Several workloads run on top of OneLake so that they can be chained without moving data across services. These include a data factory (with pipelines, dataflows, connectors, and ETL/ELT to ingest and process data); a lakehouse with Spark notebooks and pipelines for data engineering on a Delta format; and a data warehouse with SQL endpoints, T‑SQL compatibility, clustering and identity columns, and migration tooling.
Further, real-time intelligence based on Microsoft’s Eventstream and Activator tools ingest telemetry and other Fabric events without the need for coding; this allows teams to monitor data and automate actions. Microsoft’s Power BI sits natively on OneLake, and a DirectLake feature can query lakehouse data without importing or dual storage.
Fabric also integrates with Azure Machine Learning and Foundry so users can develop and deploy models and perform inferencing on top of Fabric datasets. Further, the platform features integrated Microsoft Copilot agents. These can help users write SQL queries, notebooks, and pipelines; generate summaries and insights; and populate code and documentation.
Microsoft recommends a “medallion” lakehouse architecture in Fabric. The goal of this type of format is to incrementally improve data structure and quality. The company refers to it as a “three-stage” cleaning and organizing process that makes data “more reliable and easier to use.”
The three stages include: Bronze (raw data that is stored exactly as it arrives); Silver (cleaned, errors fixed, formats standardized, and duplicates removed); and Gold (curated and ready to be organized into reports and dashboards.
Deployment method: Fabric is offered as a SaaS fully managed by Microsoft and hosted in its Azure cloud computing platform.
Pricing: A capacity-based licensing model (FSKUs) with two billing options: flexible pay-as-you-go that is billed per second and can be scaled up or paused; and reserved capacity, prepaid 1 to 3 year plans that can offer up to 40 to 50% savings for predictable workloads. Data storage in OneLake is typically priced separately.
Microsoft Fabric strengths
Explicitly designed as an all‑in‑one SaaS, meaning one platform for ingestion, lakehouse, warehouse, and real‑time ML and BI.
Built-in Copilot can help accelerate common tasks (such as documentation or SQL), which users report as an advantage over competitors whose AI tools aren’t as tightly-integrated.
Microsoft recommends and documents medallion architecture, with lake views that automate evolutions from bronze to silver to gold.
Microsoft Fabric challenges/trade-offs
Fabric is newer (released in GA in 2023); users complain that some features feel early-stage, and documentation and best practices aren’t as evolved.
Can lead to lock-in the Microsoft stack, which makes it less appealing to enterprises looking for more open, multi‑cloud tools like Databricks or Snowflake.
Because pricing is capacity/consumption‑based, careful FinOps may be necessary to avoid surprises.
Other considerations for Microsoft Fabric
Direct lake mode allows Power BI to analyze massive datasets directly from OneLake memory without the “import/refresh” cycles required by other platforms.
This Zero-ETL feature allows Fabric to virtualize data from Snowflake, Databricks, or Amazon S3. You can see and query your Snowflake tables inside Fabric without moving a single byte of data.
Copilot Integration: Native AI assistants help users write Spark code, build data factory pipelines, and even generate entire Power BI reports from natural language prompts.
Bottom line
Choosing the right cloud data platform is a strategic decision extending beyond simple storage and access. Leading providers now blend data stores, governance layers, and advanced AI capabilities, but they differ when it comes to operational complexity, ecosystem integration, and pricing.
Ultimately, the right choice depends on an organization’s individual cloud strategy, operational maturity, workload mix, AI ambitions, and ecosystem preference — lock-in versus architectural flexibility.
The first time my team shipped an agent into a real SaaS workflow, the product demo looked perfect. The production bill did not. A small percentage of sessions hit messy edge cases, and our agent responded the way most agents do: it tried harder. It re-planned, re-queried, re-summarized and retried tool calls. Users saw a slightly slower response, and finance saw a step-change in variable spend.
That week changed how we think about agent design. In agentic SaaS, cost is a reliability metric. Loop limits and tool-call caps protect your margin.
I call this discipline FinOps for Agents: a practical way to govern loops, tools and model spend so your gross margin survives contact with real customers. I have found progress comes from putting product, engineering and finance in the same room, replaying agent traces and agreeing on guardrails that define the user experience.
Why does FinOps look different for agentic SaaS?
Measuring the Cost of Goods Sold (COGS) for classic SaaS is well known: compute, storage, third‑party services and support. Agentic SaaS adds a new axis: cognition. Every plan, reflection step, retrieval pass and tool call burns tokens and ambiguity often pushes agents to do more work to resolve it.
FinOps practitioners are increasingly treating AI as its own cost domain. The FinOps Foundation highlights token-based pricing, cost-per-token and cost-per-API-call tracking and anomaly detection as core practices for managing AI spend.
Seat count still matters, yet I have watched two customers with the same licenses generate a 10X difference in inference and tool costs because one had standardized workflows and the other lived in exceptions. If you ship agents without a cost model, your cloud invoice quickly becomes the lesson plan.
The agentic COGS stack
As head of AI R&D, I spend a lot of time with architects and CTOs, and the conversation almost always lands on a COGS breakdown that mirrors the agent’s architecture:
Model inference: Tokens across planner/executor/verifier calls, usually the largest contributor to COGS of agentic software
Tools and side effects: Paid APIs (e.g., web search), per-record automation fees, retries and idempotent write safeguards.
Orchestration runtime: Workers, queues, state storage and sandboxed execution for code and documents.
Memory and retrieval: Embeddings, vector storage, index refresh and context-building or summarization checkpoints.
Governance and observability: Tracing, evaluation suites, safety filters and audit retention.
Humans in the loop: Review time, escalations and support load created by agent mistakes.
How does FinOps help standardize unit economics when outcomes span actions, workflows and tasks?
Gartner has cautioned that cost pressure can derail agentic programs, which makes unit economics a delivery requirement.
When it comes to most SaaS products, customers don’t buy raw tokens; instead, they buy progress toward completing their work, e.g., cases resolved, pipelines updated, reports produced or exceptions handled. Unit economics becomes actionable when we measure at the boundary where that value is delivered, and that boundary expands as your agentic SaaS matures: from answers in the UI, to a single approved operation, to a multi-step process and eventually to a recurring responsibility the agent runs end-to-end. In the following table, we lay out this structure and the corresponding unit metric and outcome to meter at each level of scope.
Where to meter: Actions, workflows and tasks
Scope of integrationWhat it meansExampleUnit economicsWhat outcomes to meterAssistanceThe user asks, AI answers. No integration.“Brief me on Acme: last touchpoints, open opp status and the next best step.”Cost per query.Seats.Wrap an actionAI proposes one operation. Users generally approve or decline.“Update this opportunity to Proposal, set the close date to Feb 15 and create a follow-up task.”Cost per approved action.Actions executed.Wrap a workflowAI assists across a multi-step process.“When a new inbound lead arrives, enrich it, score fit, route to the right rep and start the first-touch sequence.”Cost per workflow.Workflows completed.Wrap a taskAI owns a recurring responsibility.“Run weekly pipeline hygiene end-to-end: fix missing fields, merge duplicates, advance stale stages and only ask me about exceptions.”Cost per run.Tasks × frequency, hours saved
The FinOps metric product and finance agree on: CAPO, the cost-per-accepted-outcome
In early pilots, teams obsess over token counts. However, for a scaled agentic SaaS running in production, we need one number that maps directly to value: Cost-per-Accepted-Outcome (CAPO). CAPO is the fully loaded cost to deliver one accepted outcome for a specific workflow.
The phrase “accepted outcome” matters. A run that completes quickly and produces the wrong answer still consumes tokens, retrieval and tool calls. I define acceptance as a concrete quality gate: automated validation, a user “Apply” click or a downstream success signal such as “case not reopened in 7 days.”
Forrester’s FinOps research highlights the importance of operating-model maturity and step-by-step practice building for cost optimization for agentic software.
We calculate CAPO per workflow and per segment, then watch the distribution, not just the average. Median tells us where the product feels efficient. P95 and P99 tell us where loops, retries and tool storms are hiding.
Note, failed runs belong in CAPO automatically since we treat the numerator as total fully loaded spend for that workflow (accepted + failed + abandoned + retried) and the denominator as accepted outcomes only, so every failure is “paid for” by the successes.
Tagging each run with an outcome state (accepted, rejected, abandoned, timeout, tool-error) and attributing its cost to a failure bucket allows us to track Failure Cost Share (failed-cost ÷ total-cost) alongside CAPO and see whether the problem is acceptance rate, expensive failures or retry storms.
These metrics naturally translate to measurable targets that inference engineering teams can rally behind.
Which budget guardrails keep FinOps off your back?
A well-designed agent has a budget contract the way a well-run service has an SLO. I encode that contract in five guardrails, enforced at the gateway where every model and tool call flows:
Loop/step limit: Cap planning, reflection and verification cycles. Escalate or ask a clarifying question when hit.
Tool-call cap: Cap total paid actions per run, with stricter sub‑caps for expensive tools like search and long-running automations.
Token budget: Enforce a per‑run token ceiling across calls and summarize history instead of re-sending transcripts.
Wall‑clock timeout: Keep interactive flows snappy and push long work into explicit background jobs with status updates.
Tenant budgets and concurrency: Limit blast radius with per‑tenant caps and anomaly alerts. CSPs like AWS have vastly improved
Tenant budgets and concurrency: Limit blast radius with per-tenant caps and FinOps anomaly alerts. CSPs like AWS have announced vastly improved Cost Anomaly Detection for inference services at re:Invent in December 2025.
How can interaction design and user experience drive FinOps savings?
Most FinOps savings come from architecture and interaction design, not from arguing over pennies per million tokens.
“Having comprehensive evals allows you to compare your product performance across LLMs and guide what LLMs you can use. The biggest cost saver is defaulting to the smallest possible model for data analysis while maintaining performance and accuracy, while still allowing customers to override and select the model of their choice,” says Geoffrey Hendrey, CEO of AlertD.
Three patterns consistently flatten the cost curve for us:
Separate planning from execution. A planner can be context‑heavy and cheap, whereas an executor can be tool‑constrained and action‑oriented. This reduces “thinking while acting” loops and makes retries easier to reason about.
Route work to the smallest capable model. Extraction, validation and routing succeed with smaller models when you use structured outputs. Reserve larger models for synthesis and edge cases that fail validation.
Make tools idempotent and cacheable. Add idempotency keys to every write. Cache repeated reads inside a run. Tool-call caps become practical when retries stay safe.
Premium lane: Pricing that keeps your agent profitable
I expect many teams to keep seat-based pricing because procurement teams understand it. Predictable margin comes from attaching explicit entitlements to those seats and creating a controlled premium lane for expensive behavior.
Seats plus allowances: Bundle a monthly budget of agent runs or action credits. Throttle or upsell when exceeded.
Usage add‑ons: Sell metered AI as a separate SKU so power users fund their own tail behavior. Tread with caution here as you don’t want to add friction to adoption.
Premium lane policy: Reserve premium models for high‑stakes tasks or failed validation paths, backed by a paid tier. Make sure deployments used for demos are on the paid tier.
How does FinOps mature from cost visibility to ROI?
As you mature, pricing shifts from bundled access to outcomes that map directly to customer value.
FinOps focus shifts in parallel from adoption-driven cost volatility to unit economics, acceptance integrity and forecastable margin.
Maturity levelWhat you sell to customersWhat FinOps cares aboutWhat can go wrongSeat-bundled“Agents are included with the license.”Gross margin volatility by adoption, cohort and workflow mix.A few heavy workflows or tenants quietly dominate spend and there’s no clean lever to price, throttle or forecast it.Credits-based“You get X credits/month to spend on agent work and you can buy more as needed.”Whether credit price covers costs, how many go unused, how often customers buy overagesCredits fail as a budgeting tool if different workflows consume credits unpredictably and surprises customersWorkflow metering“You pay per workflow type (research, triage, enrichment, etc.).”What each workflow costs per accepted outcome (CAPO), how often it succeeds and where the expensive outliers come from.You ship a great meter and a weak value narrative, so procurement treats it as arbitrary fees and pushes for discounts.Outcome-linked“You pay when the outcome is accepted and delivered.”Credits fail as a budgeting tool if different workflows consume credits unpredictably and surprise customersIncentives shift to “passing the gate,” and borderline outcomes create disputes, churn risk and perverse product behavior.Value-based contracts“We guarantee a business result with predictable unit economics.”Whether contracted outcomes can be delivered at the target margin, with reliable forecastsYou sign outcome promises without enforcement and operational controls, then deliver more work than you can profitably price.
A practical 30-60-90 day FinOps plan for agentic SaaS
0-30 days: Choose 3-5 high-volume workflows, define explicit acceptance gates and log every run with a unique ID tied to the tenant and workflow so you can trace cost and quality end-to-end.
31-60 days: Add routing and validation cascades, cache retrieval and tool outputs and harden tools with schemas, timeouts and idempotency keys.
61-90 days: Align pricing with entitlements, set anomaly alerts with an on‑call playbook and review CAPO and tail spend every month.
This article is published as part of the Foundry Expert Contributor Network.Want to join?
For years, one of the cloud’s biggest gifts was that vendors like AWS could take care of the “undifferentiated heavy lifting” of managing infrastructure for you. Need compute? Click a button. Need storage? Click another. Need a database? Let someone else worry about the details. The whole point of managed infrastructure was to save most enterprises from spending their days swimming in low-level systems engineering.
AI is making that abstraction leak.
As I’ve argued, the real enterprise AI challenge is no longer training. It’s inference: applying models continuously to governed enterprise data, under real-world latency, security, and cost constraints. That shift matters because once inference becomes the steady-state workload of the enterprise, infrastructure that once seemed necessary but dull suddenly becomes strategic.
That’s especially true of the network.
The network is… cool?
For decades, networking was prized precisely because it was stable and uneventful. That was the point: No one wants exciting networking. Standards bodies moved slowly and kernel releases moved carefully because predictability was paramount. That conservatism made sense in a world where most enterprise workloads were relatively forgiving and the network’s job was mostly to stay out of the way.
Interestingly, the times when networking became sexy(ish) were times of significant technology upheaval. Think 1999 to 2001, when we had the dot-com bubble/internet infrastructure boom. Then in 2007, we saw broadband and mobile expansion. Later we saw cloud networking consolidation from 2015 to 2022. We’re about to see another big upward shift in networking interest because of AI.
Although observers posting on X still obsess over training runs, model sizes, and huge capital expenditures for data center build-outs to support it all, the real action is arguably elsewhere. For most enterprises, training a model occasionally isn’t the hard part. The harder part is running inference all day, every day, across sensitive data, inside shared environments, with serious performance expectations. Network engineers might prefer to toil away in relative obscurity, but AI makes that impossible. In the AI era, network performance becomes a first-order bottleneck because the application is no longer just waiting on CPU or storage. It’s waiting on the movement of context, tokens, embeddings, model calls, and state across distributed systems.
In other words, AI doesn’t simply increase traffic volume; it changes the nature of what the network does.
A different view of the network
This isn’t the first time we’ve seen a network paradigm shift. As Thomas Graf, CTO at Cisco Security, cofounder of Isovalent, and the creator of Cilium, said in an interview, “The rise of Kubernetes and microservices was the first wave of east-west traffic acceleration. Instead of a single monolith, we broke applications up, and that immediately required security not just at the firewall but east-west inside the infrastructure.”
AI compounds that shift. These workloads aren’t just a few more services talking to one another. They involve synchronized GPU clusters, retrieval pipelines, vector lookups, inference gateways, and, increasingly, agents that continuously exchange state across systems. That’s a different operational world from the one most enterprise networks were built to support. “With AI workloads,” Graf continues, “that’s a hundred times more [data moving around]. Not because things are more broken up, but because AI runs at a scale that is bigger and needs an insane amount of data.”
That “insane amount of data” is why the network matters again and why developers need to think about it again.
In AI environments, the fabric increasingly becomes part of the compute system itself. GPUs exchange gradients, activations, and model state in real time. Packet loss isn’t just an annoyance, as it can stall collective operations and leave expensive hardware idle. Traditional north-south visibility isn’t enough because much of the important traffic never crosses a classic perimeter (e.g., user request to a server). Hence, security policy can’t live only at the edge because the valuable flows are often east-west inside the cluster. And because enterprises are still discovering what their AI demand curves will look like, elasticity matters, too. Networks have to scale incrementally, adapt to mixed workloads, and support evolving architectures without forcing a full redesign every time the AI road map changes.
In other words, AI is making the network less like plumbing and more like part of the application runtime.
Getting serious about Cilium
That’s why eBPF matters. The official eBPF project documentation describes Cilium as a way to safely run sandboxed programs in the kernel, extending kernel capabilities without changing kernel source or loading modules. The technical details are important, but the broader point is simple: eBPF moves observability and enforcement closer to where packets and system calls actually happen. In a world of east-west traffic, ephemeral services, and machine-speed inference, that’s a big deal.
Cilium is one important expression of that shift. It builds on eBPF to provide Kubernetes-native networking, observability, and policy enforcement as fast as the network link itself can carry traffic without becoming a meaningful bottleneck. This is critical to network performance. Unsurprisingly, Cilium has become mainstream table stakes for hyperscalers’ networking stacks. (Google’s GKE Dataplane V2, Microsoft’s Azure CNI Powered by Cilium, and AWS’s EKS Hybrid Nodes all depend on or support Cillium.) Indeed, across the Kubernetes user base, as the 2025 State of Kubernetes Networking Report indicates, a majority use Cilium-based networking.
As important as Cilium is, however, the bigger story is that AI is forcing enterprises to care again about infrastructure details they had happily abstracted away. That doesn’t mean every company should hand-roll its network stack, but it does mean that platform teams can no longer treat networking as an untouchable utility layer. If inference is where enterprise AI becomes real, then latency, telemetry, segmentation, and internal traffic policy are no longer secondary concerns. They’re an essential part of product quality, operational reliability, and developer experience.
More than the network
Nor is this isolated to Cillium, specifically, or networking, generally. AI keeps forcing us to care about things we’d hoped to forget. As I’ve written, it’s fun to fixate on fancy AI demos, but the real work is to make these systems work reliably, securely, and economically in production. Just as important, in our rush to make AI dependable at enterprise scale, we can’t overlook the need to make the whole stack easier to use for developers, easier to govern by IT/ops, and faster under real-world load.
“If an AI-backed service responds faster and behaves more reactively, it will perform better in the market. And the foundation for that is a highly performant, low-latency network without bottlenecks,” notes Graf. “To me, this is very similar to high-frequency trading. Once computers replaced humans, network latency and throughput suddenly became a competitive differentiator.”
That feels right. The winners in enterprise AI won’t simply be the companies with the biggest models. Success comes from making inference reliable, governed, and economical on real data under real load. Some of that battle will be won in models. More of it than many enterprises realize will be won in the supposedly boring layers underneath, like networking.
Stateless AI, in which a model offers one-off answers without context from previous sessions, can be helpful in the short-term but lacking for more complex, multi-step scenarios. To overcome these limitations, OpenAI is introducing what it is calling, naturally, “stateful AI.”
The company has announced that it will soon offer a stateful runtime environment in partnership with Amazon, built to simplify the process of getting AI agents into production. It will run natively on Amazon Bedrock, be tailored for agentic workflows, and optimized for AWS infrastructure.
Interestingly, OpenAI also felt the need to make another announcement today, underscoring the fact that nothing about other collaborations “in any way” changes the terms of its partnership with Microsoft. Azure will remain the exclusive cloud provider of stateless OpenAI APIs.
“It’s a clever structural move,” said Wyatt Mayham of Northwest AI Consulting. “Everyone can claim a win, but the subtext is clear: OpenAI is becoming a multi-cloud company, and the era of exclusive AI partnerships is ending.”
What differentiates ‘stateful’
The stateful runtime environment on Amazon Bedrock was built to execute complex steps that factor in context, OpenAI said. Models can forward memory and history, tool and workflow state, environment use, and identity and permission boundaries.
This represents a new paradigm, according to analysts.
Notably, stateless API calls are a “blank slate,” Mayham explained. “The model doesn’t remember what it just did, what tools it called, or where it is in a multi-step workflow.”
While that’s fine for a chatbot answering one-off questions, it’s “completely inadequate” for real operational work, such as processing a customer claim that moves across five different systems, requires approvals, and takes hours or days to complete, he said.
New stateful capabilities give AI agents a persistent working memory so they can carry context across steps, maintain permissions, and interact with real enterprise tools without developers having to “duct-tape stateless API calls together,” said Mayham.
Further, the Bedrock foundation matters because it’s where many enterprise workloads already live, he noted. OpenAI and Amazon are meeting companies where they are, not asking them to rearchitect their security, governance, and compliance posture.
This makes sophisticated AI automation accessible to mid-market companies; they will no longer need a team of engineers to “build the plumbing from scratch,” he said.
Sanchit Vir Gogia, chief analyst at Greyhound Research, called stateful runtime environments “a control plane shift.” Stateless can be “elegant” for single interactions such as summarization, code assistance, drafting, or isolated tool invocation. But stateful environments give enterprises a “managed orchestration substrate,” he noted.
This supports real enterprise workflows involving chained tool calls, long running processes, human approvals, system identity propagation, retries, exception handling, and audit trails, said Gogia, while Bedrock enforces existing identity and access management (IAM) policies, virtual private cloud (VPC) boundaries, security tooling, logging standards, and compliance frameworks.
“Most pilot failures happen because context resets across calls, permissions are misaligned, tokens expire mid workflow, or an agent cannot resume safely after interruption,” he said. These issues can be avoided in stateful environments.
Factors IT decision-makers should consider
However, there are second order considerations for enterprises, Gogia emphasized. Notably, state persistence increases the attack surface area. This means persistent memory must be encrypted, governed, and auditable, and tool invocation boundaries should be “tightly controlled.” Further, workflow replay mechanisms must be deterministic, and observability granular enough to satisfy regulators.
There is also a “subtle lock in dimension,” said Gogia. Portability can decrease when orchestration moves inside a hyperscaler native runtime. CIOs need to consider whether their future agent architecture remains cloud portable or becomes anchored in AWS’ environment.
Ultimately, this new offering represents a market pivot, he said: The intelligence layer is being commoditized.
“We are moving from a model race to a control plane race,” said Gogia. The strategic question now isn’t about which model is smartest. It is: “Which runtime stack guarantees continuity, auditability, and operational resilience at scale?”
Partnership with Microsoft still ‘strong and central’
Today’s joint announcement from Microsoft and OpenAI about their partnership echoes OpenAI’s similar reaffirmation of the collaboration in October 2025. The partnership remains “strong and central,” and the two companies went so far as to call it “one of the most consequential collaborations in technology,” focused on research, engineering, and product development.
The companies emphasized that:
Microsoft maintains an exclusive license and access to intellectual property (IP) across OpenAI models and products.
OpenAI’s Frontier and other first-party products will continue to be hosted on Azure.
The contractual definition of artificial general intelligence (AGI) and the “process for determining if it has been achieved” is unchanged.
An ongoing revenue share arrangement will stay the same; this agreement has always included revenue-sharing from partnerships between OpenAI and other cloud providers.
OpenAI has the flexibility to commit to compute elsewhere, including through infrastructure initiatives like the Stargate project.
Both companies can independently pursue new opportunities.
“That joint statement reads like it was drafted by three law firms simultaneously, and that’s the point,” says Mayham.
The anchor of the agreement is that Azure remains the exclusive cloud provider of stateless OpenAI APIs. This allows OpenAI to establish a new category on AWS that falls outside of Microsoft’s reach, he said.
OpenAI is ultimately “walking a tightrope,” because it should expand distribution beyond Azure to reach AWS customers, which comprise a massive portion of the enterprise market, he noted. At the same time, they have to ensure Microsoft doesn’t feel like its $135 billion investment “just got diluted in strategic value.”
Gogia called the statement “structural reassurance.” OpenAI must grow distribution across clouds because enterprise buyers are demanding multi-cloud flexibility. They don’t want to be confined to a single cloud; they want architectural optionality.”
Also, he noted, “CIOs and boards do not want vendor instability. Hyperscaler conflict risk is now a board level concern.”
New infusion of funding (again)
Meanwhile, new $110 billion in funding from Nvidia, SoftBank, and Amazon will allow OpenAI to expand its global reach and “deepen” its infrastructure, the company says. Importantly, the funding includes the use of 3GW of dedicated inference capacity and 2 GW of training on Nvidia’s Vera Rubin systems. This builds on the Hopper and Blackwell systems already in operation across Microsoft, Oracle Cloud Infrastructure (OCI), and CoreWeave.
Mayham called this “the headline within the headline.”
“Cash doesn’t build AI products; compute does,” he said. Right now, access to next-generation Nvidia hardware is the “true bottleneck for every AI company on the planet.”
OpenAI is essentially locking in a “guaranteed supply line” for the chips that power everything it does. The money from all three companies funds operations and infrastructure, but the Nvidia capacity and training allows OpenAI to use infrastructure at the frontier, said Mayham. “If you can’t get the processors, the cash is just sitting in a bank account.”
Inference is now one of the biggest cost drivers in AI, and Gogia noted that frontier AI systems are constrained by physical infrastructure; GPUs, high bandwidth memory (HBM), high speed interconnects, and other hardware, as well as grid level power capacity. Are all finite resources.
The current moves embed OpenAI deeper into the infrastructure stack, but the risk is concentration. When compute control centralizes among a small cluster of hyperscalers and chip vendors, the system can become fragile. To protect themselves, Gogia advised enterprises to monitor supply chain concentration.
“In strategic terms, however, this move strengthens OpenAI’s durability,” he said. “It secures the physical substrate required to sustain frontier model scaling and enterprise inference growth.”