editors desk

18 posts

AWS Transform now supports agentic modernization of custom code

Does AI-generated code add to, or reduce, technical debt? Amazon Web Services is aiming to reduce it with the addition of new capabilities to AWS Transform, its AI-driven service for modernizing legacy code, applications, and infrastructure. “Modernization is no longer optional for enterprises these days,” said Akshat Tyagi, associate practice leader at HFS Research. They need cleaner code and updated SDKs to run AI workloads, tighten security, and meet new regulations, he said, but their inability to modernize custom code quickly and with little manual effort is one of the major drivers of technical debt. AWS Transform was introduced in May to accelerate the modernization of VMware systems and  Windows .Net and mainframe applications using agentic AI. Now, at AWS re:Invent, it’s getting some additional capabilities in those areas — and new custom code modernization features besides. New mainframe modernization agents add functions including activity analysis to help decide whether to modernize or retire code; blueprints to identify the business functions and flows hidden in legacy code; and automated test plan generation. AWS Transform for VMware gains new functionality including an on-premises discovery tool; support for configuration migration of network security tools from Cisco ACI, Fortigate, and Palo Alto Networks; and a migration planning agent that draws business context from unstructured documents, files, chats and business rules. The company is also inviting partners to integrate their proprietary migration tools and agents with its platform through a new AWS Transform composability initiative. Accenture, Capgemini, and Pegasystems are the first on board. Customized modernization for custom code On top of that, there’s a whole new agent, AWS Transform custom, designed to reduce the manual effort involved in custom code modernization by learning a custom pattern and operationalizing it throughout the target codebase or SDK. In order to feed the agent the unique pattern, enterprise teams can use natural-language instructions, internal documentation, or example code snippets that illustrate how specific upgrades should be performed. AWS Transform custom then applies these patterns consistently across large, multi-repository codebases, automatically identifying similar structures and making the required changes at scale; developers can then review and fine-tune the output, which the agent adapts and operationalizes, allowing it to continually refine its accuracy, the company said. Generic is no longer good enough Tyagi said that the custom code modernization approach taken by AWS is better than most generic modernization tools, which rely solely on pre-packaged rules for modernization. “Generic modernization tools no longer cut it. Every day we come across enterprises complaining that the legacy systems are now so intertwined that pre-built transformation rules are now bound to fail,” he said. Pareekh Jain, principal analyst at Pareekh Consulting, said Transform custom’s ability to support custom SDK modernization will also act as a value driver for many enterprises. “SDK mismatch is a major but often hidden source of tech debt. Large enterprises run hundreds of microservices on mismatched SDK versions, creating security, compliance, and stability risks,” Jain said. “Even small SDK changes can break pipelines, permissions, or runtime behavior, and keeping everything updated is one of the most time-consuming engineering tasks,” he said. Similarly, enterprises will find support for modernization of custom infrastructure-as-code (IaC) particularly valuable, Tyagi said, because it tends to fall out of date quickly as cloud services and security rules evolve. Large organizations, the analyst noted, often delay touching IaC until something breaks, since these files are scattered across teams and full of outdated patterns, making it difficult and error-prone to clean up manually. For many enterprises, 20–40% of modernization work is actually refactoring IaC, Jain said. Not a magic button However, enterprises shouldn’t see AWS Transform’s new capabilities as a magic button to solve their custom code modernization issues. Its reliability will depend on codebase consistency, the quality of examples, and the complexity of underlying frameworks, said Jain. But, said Tyagi, real-world code is rarely consistent. “Each individual writes it with their own methods and perceptions or habits. So the tool might get some parts right and struggle with others. That’s why you still need developers to review the changes, and this is where human intervention becomes significant,” Tyagi said. There is also upfront work, Jain said: Senior engineers must craft examples and review output to ground the code modernization agent and reduce hallucinations. The new features are now available and can be accessed via AWS Transform’s conversational interface on the web and the command line interface (CLI).
Read More

AWS unveils Frontier AI agents for software development

Amazon Web Services has unveiled a new class of AI agents, called frontier agents, which the company said can work for hours or days without intervention. The first three agents are focused on software development tasks. The three agents announced December 2 include the Kiro autonomous agent, AWS Security Agent, and AWS Devops Agent, each focused on a different aspect of the software development life cycle. AWS said these agents represent a step-function change in what can be done with agents, moving from assisting with individual tasks to completing complex projects autonomously like a member of the user’s team. The Kiro autonomous agent is a virtual developer that maintains context and learns over time while working independently, so users can focus on their biggest priorities. The AWS Security Agent serves as virtual security engineer that helps build secure applications by being a security consultant for app design, code reviews, and penetration testing. And the AWS DevOps Agent is a virtual operations team member that helps resolve and proactively prevent incidents while continuously improving an applications’ reliability and performance, AWS said. All three agents are available in preview. The Kiro agent is a shared resource working alongside the entire team, building a collective understanding of the user’s codebase, products, and standards. It connects to a team’s repos, pipelines, and tools such as Jira and GitHub to maintain context as work progresses. Kiro previously was positioned as an agentic AI-driven IDE. The AWS Security Agent, meanwhile, helps build applications that are secure from the start across AWS, multi-cloud, and hybrid environments. AWS Devops Agent is on call when incidents happen, instantly responding to issues and usings its knowledge of an application and relationship between components to find the root cause of a problem with an application going down, according to AWS. AWS said the Frontier agents were the result of examining its own development teams building services at Amazon scale and uncovering three critical insights to increase value. First, by learning what agents were and were not good at, the team could switch from babysitting every small task to directing agents toward broad, goal-driven outcomes. Second, the velocity of teams was tied to how many agentic tasks could be run at the same time. Third, the longer agents could operate on their own, the better. The AWS team realized it needed the same capabilities across very aspect of the software development life cycle, such as security and operations, or risk creating new bottlenecks.
Read More

Why data contracts need Apache Kafka and Apache Flink

Imagine it’s 3 a.m. and your pager goes off. A downstream service is failing, and after an hour of debugging you trace the issue to a tiny, undocumented schema change made by an upstream team. The fix is simple, but it comes with a high cost in lost sleep and operational downtime. This is the nature of many modern data pipelines. We’ve mastered the art of building distributed systems, but we’ve neglected a critical part of the system: the agreement on the data itself. This is where data contracts come in, and how they fail without the right tools to enforce them. The importance of data contracts Data pipelines are a popular tool for sharing data from different producers (databases, applications, logs, microservices, etc.) to consumers to drive event-driven applications or enable further processing and analytics. These pipelines have often been developed in an ad hoc manner, without a formal specification for the data being produced and without direct input from the consumer on what data they expect. As a result, it’s not uncommon for upstream producers to introduce ad hoc changes consumers don’t expect and can’t process. The result? Operational downtime and expensive, time-consuming debugging to find the root cause. Data contracts were developed to prevent this. Data contract design requires data producers and consumers to collaborate early in the software design life cycle to define and refine requirements. Explicitly defining and documenting requirements early on simplifies pipeline design and reduces or removes errors in consumers caused by data changes not defined in the contract. Data contracts are an agreement between data producers and consumers that define schemas, data types, and data quality constraints for data shared between them. Data pipelines leverage distributed software to map the flow of data and its transformation from producers to consumers. Data contracts are foundational to properly designed and well behaved data pipelines. Why we need data contracts Why should data contracts matter to developers and the business? First, data contracts reduce operational costs by eliminating unexpected upstream data changes that cause operational downtime. Second, they reduce developer time spent on debugging and break-fixing errors. These errors are caused downstream from changes the developer introduced without understanding their effects on consumers. Data contracts provide this understanding. Third, formal data contracts aid the development of well-defined, reusable data products that multiple consumers can leverage for analytics and applications. The consumer and producer can leverage the data contract to define schema and other changes before the producer implements them. The data contract should specify a cutover process, so consumers can migrate to the new schema and its associated contract without disruption. Three important data contract requirements Data contracts have garnered much interest recently, as enterprises realize the benefits of shifting their focus upstream to where data is produced when building operational products that are data-driven. This process is often called “shift left.” Confluent In a shift-left data pipeline design, downstream consumers can share their data product requirements with upstream data producers. These requirements can then be distilled and codified into the data contract. Data contract adoption requires three key capabilities: Specification — define the data contract Implementation — implement the data contract in the data pipeline Enforcement — enforce the data contract in real-time There are a variety of technologies that can support these capabilities. However, Apache Kafka and Apache Flink are among the best technologies for this purpose. Apache Kafka and Apache Flink for data contracts Apache Kafka and Apache Flink are popular technologies for building data pipelines and data contracts due to their scalability, wide availability, and low latency. They provide shared storage infrastructure between producers and consumers. In addition, Kafka allows producers to communicate the schemas, data types, and (implicitly) the serialization format to consumers. This shared information also allows Flink to transform data as it travels between the producer and consumer. Apache Kafka is a distributed event streaming platform that provides high-throughput, fault-tolerance, and scalability for shared data pipelines. It functions as a distributed log enabling producers to publish data to topics that consumers can asynchronously subscribe to. In Kafka, topics have schemas, defined data types, and data quality rules. Kafka can store and process streams of records (events) in a reliable and distributed manner. Kafka is widely used for building data pipelines, streaming analytics, and event-driven architectures. Apache Flink is a distributed stream processing framework designed for high-performance, scalable, and fault-tolerant processing of real-time and batch data. Flink excels at handling large-scale data streams with low latency and high throughput, making it a popular choice for real-time analytics, event-driven applications, and data processing pipelines. Flink often integrates with Kafka, using Kafka as a source or sink for streaming data. Kafka handles the ingestion and storage of event streams, while Flink processes those streams for analytics or transformations. For example, a Flink job might read events from a Kafka topic, perform aggregations, and write results back to another Kafka topic or a database. Kafka supports schema versioning and can support multiple different versions of the same data contract as it evolves over time. Kafka can keep the old version running with the new version, so new clients can leverage the new schema while existing clients are using the old schema. Mechanisms like Flink’s support for materialized views help accomplish this. How Kafka and Flink help implement data contracts Kafka and Flink are a great way to build data contracts that meet the three requirements outlined earlier—specification, implementation, and enforcement. As open-source technologies, they play well with other data pipeline components that are often built using open source software or standards. This creates a common language and infrastructure around which data contracts can be specified, implemented, and enforced. Flink can help enforce data contracts and evolve them as needed by producers and consumers, in some cases without modifying producer code. Kafka provides a common, ubiquitous language that supports specification while making implementation practical. Kafka and Flink encourage reuse of the carefully crafted data products specified by data contracts. Kafka is a data storage and sharing technology that makes it easy to enable additional consumers and their pipelines to use the same data product. This is a powerful form of software reuse. Kafka and Flink can transform and shape data from one contract into a form that meets the requirements of another contract, all within the same shared infrastructure. You can deploy and manage Kafka yourself, or leverage a Kafka cloud service and let others manage it for you. Any data producer or consumer can be supported by Kafka, unlike strictly commercial products that have limits on the supported producers and consumers. You could get enforcement via a single database if all the data managed by your contracts sits in that database. But applications today are often built using data from many sources. For example, data streaming applications often have multiple data producers streaming data to multiple consumers. Data contracts must be enforced across these different databases, APIs, and applications. You can specify a data contract at the producer end, collaborating with the producer to get the data in the form you need. But enforcement at the producer end is intrusive and complex. Each data producer has its own authentication and security mechanisms. The data contract architecture would need to be adapted to each producer. Every new producer added to the architecture would have to be accommodated. In addition, small changes to schema, metadata, and security happen continuously. With Kafka, these changes can be managed in one place. Kafka sits between producers and consumers. With Kafka Schema Registry, producers and consumers have a way of communicating what is expected by their data contract. Because topics are re-usable, the data contract may be re-usable directly or it could be incrementally modified and then re-used. Data contract enforcement in Kafka.  Confluent Kafka also provides shared, standardized security and data infrastructure for all data producers. Schemas can be designed, managed, and enforced at Kafka’s edge, in cooperation with the data producer. Disruptive changes to the data contract can be detected and enforced there. Data contract implementation needs to be simple and built into existing tools, including continuous integration and continuous delivery (CI/CD). Kafka’s ubiquity, open source nature, scalability, and data re-usability make it the de facto standard for providing re-usable data products with data contracts. Best practices for developers building data contracts As a data engineer or developer, data contracts can help you deliver better software and user experiences at a lower cost. Here are a few guidelines for best practices as you start leveraging data contracts for your pipelines and data products. Standardize schema formats: Use Avro or Protobuf for Kafka due to their strong typing and compatibility features. JSON Schema is a suitable alternative but less efficient. Automate validation: Use CI/CD pipelines to validate schema changes against compatibility rules before deployment. Make sure your code for configuring, initializing, and changing Kafka topic schemas is part of your CI/CD workflows and check-ins. Version incrementally: Use semantic versioning (e.g., v1.0.0, v1.1.0) for schemas and document changes. This should be part of your CI/CD workflows and run-time checks for compatibility. Monitor and alert: Set up alerts for schema and type violations or data quality issues in Kafka topics or Flink jobs. Collaborate across teams: Ensure producers and consumers (e.g., different teams’ Flink jobs) agree on the contract up front to avoid mismatches. Leverage collaboration tools (preferably graphical) that allow developers, business analysts, and data engineers to jointly define, refine, and evolve the contract specifications. Test schema evolution: Simulate schema changes in a staging environment to verify compatibility with Kafka topics and Flink jobs. You can find out more on how to develop data contracts with Kafka here. Key capabilities for data contracts Kafka and Flink provide a common language to define schemas, data types, and data quality rules. This common language is shared and understood by developers. It can be independent of the particular data producer or consumer. Kafka and Flink have critical capabilities to make data contracts practical and widespread in your organization: Broad support for potential data producers and consumers Widespread adoption, usage, and understanding, partly due to their open source origins Many implementations available, including on-prem, cloud-native, and BYOC (Bring Your Own Cloud) The ability to operate at both small and large scales Mechanisms to modify data contracts and their schemas as they evolve Sophisticated mechanisms for evolving schemas and reusing data contracts when joining multiple streams, each with its own data contract. Data contracts require a new culture and mindset that encourage data producers to collaborate with data consumers. Consumers need to design and describe their schema and other data pipeline requirements in collaboration with producers, and guided by developers and data architects. Kafka and Flink make it much easier to specify, implement, and enforce the data contracts your collaborative producers and consumers develop. Use them to get your data pipelines up and running faster, operating more efficiently, without downtime, while delivering more value to the business. — New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.
Read More

How to ensure your enterprise data is ‘AI ready’

Many organizations are experimenting with AI agents to determine which job roles to focus on, when to automate actions, and what steps require a human in the middle. AI agents connect the power of large language models with APIs, enabling them to take action and integrate seamlessly into employee workflows and customer experiences in a variety of domains: Field operations AI agents can help outline the steps to address a service call. HR agents partner with job recruiters to schedule interviews for top applicants. Finance AI agents help respond to daily challenges in managing supply chain, procurement, and accounts receivable. Coding agents are integrated into AI-assisted development platforms that facilitate vibe coding and accelerate application development. AI agents are integrating into the workplace, where they participate in meetings, summarize discussions, create follow-up tasks, and schedule the next meetings. World-class IT organizations are adapting their strategies and practices to develop AI agents while mitigating the risks associated with rapid deployments. “Building a world-class IT team means leading the conversation on risk,” says Rani Johnson, CIO of Workday. “We work closely with our legal, privacy, and security teams to set a clear adoption risk tolerance that aligns with our overall strategy.” A key question for every technology, data, and business leader is whether the underlying data that AI agents tap into is “AI-ready.” According to Ocient’s Beyond Big Data report, 97% of leaders report notable increases in data processing due to AI, but only 33% have fully prepared for the escalating scale and complexity of the AI-driven workplace. Establishing data’s AI readiness is critical, as most AI agents leverage enterprise data to provide business, industry, and role-specific responses and recommendations. I asked business and technology leaders how they were evaluating AI agents for data readiness in domains such as sales, HR, finance, and IT operations. Seven critical practices emerged. Centralize data and intelligence IT departments have invested significantly in centralizing data into data warehouses and data lakes, and in connecting resources with data fabrics. However, data is not equivalent to intelligence, as much of the data science and computational work occurs downstream in a sprawl of SaaS tools, data analytics platforms, and other citizen data science tools. Worse, numerous spreadsheets, presentations, and other unstructured documents are often poorly categorized and lack unified search capabilities. “Instead of endlessly moving and transforming data, we need to bring intelligence directly to where the data lives, creating a journey to enterprise-ready data with context, trust, and quality built in at the source,” says Sushant Tripathi, VP and North America transformation lead at TCS. “This connected organizational intelligence weaves into the fabric of an enterprise, transforming fragmented information into trusted and unified assets so that AI agents can act with the speed and context of your best people, at enterprise scale.” Even as IT looks to centralize data and intelligence, a backlog of data debt creates risks when using it in AI agents. “AI-ready data must go beyond volume and accuracy and be unified, trusted, and governed to foster reliable AI,” says Dan Yu, CMO of SAP data and analytics. “With the right business data fabric architecture, organizations can preserve context, mitigate bias, and embed accountability into every layer of AI. This foundation ensures accurate, auditable decisions and enables AI to scale and adapt on semantically rich, governed data products, delivering durable business value.” Recommendation: Most organizations will have a continuous backlog of dataops and data debt to address. Product-based IT organizations should manage data resources as products and develop roadmaps aligned with their AI priorities. Ensure compliance with regulations and security standards When it comes to data security, Jack Berkowitz, chief data officer at Securiti, advises starting by answering who should have access to any given piece of information flowing in or out of the genAI application, whether sensitive information is included in the content, and how this data and information are being processed or queried. He says, “As we move to agentic AI, which is actively able to do processing and take decisions, putting static or flat guardrails in place will fail.” Guardrails are needed to help prevent rogue AI agents and to use data in areas where the risks outweigh the benefits. “Most enterprises have a respectable security base with a secure SDLC, encryption at rest and in transit, role-based access control, data loss prevention, and adherence to regulations such as GDPR, HIPAA, and CCPA,” says Joanne Friedman, CEO of ReilAI. “That’s sufficient for traditional IT, but insufficient for AI, where data mutates quickly, usage patterns are emergent, and model behavior must be governed—not guessed.” Recommendation: Joanne recommends establishing the following four pillars of AI risk-ready data: Define an AI bill of materials. Use a risk management framework such as NIST AI RMF or ISO 42001. Treat genAI prompts as data and protect against prompt injection, data leakage, and related abuses. Document AI with model cards and datasheets for datasets, including intended use, limitations, and other qualifications. Define contextual metadata and annotations AI language models can be fed multiple documents and data sources with conflicting information. When an employee’s prompt results in an erroneous response or hallucinations, they can respond with clarifications to close the gap. However, with AI agents integrated into employee workflows and customer journeys, the stakes of poor recommendations and incorrect actions are significantly higher. An AI agent’s accuracy improves when documents and data sources include rich metadata and annotations, signaling how to use the underlying information responsively. “The AI needs to be able to understand the meaning behind the data by adding a semantic layer, which is like a universal dictionary for your data,” says Andreas Blumauer, SVP growth and marketing at Graphwise. “This layer uses consistent labels, metadata, and annotations to tell the AI what each piece of data represents, linking it directly to your business concepts and questions. This is also where you include specific industry knowledge, or domain knowledge models, so the AI understands the context of your business.” Recommendation: Leverage industry-specific taxonomies and categorization standards, then apply a metadata standard such as Dublin Core, Schema.org, PROV-O, or XMP. Review the statistical significance of unbiased data Surveys are a primary tool of market research. Researchers determine the questions and answers of the survey according to best practices that minimize the exposure of biases to the respondent. For example, asking employees who use the service desk, “How satisfied are you with our excellent help desk team’s quick response times?” is biased because the words excellent and quick in the question imply a subjective standard. Another challenge for researchers is ensuring a significant sample size for all respondent segments. For example, it would be misleading to report on executive response to the service desk survey if only a handful of people in this segment responded to it. When reviewing data for use in AI, it is even more important to consider statistical significance and data biases, especially when the data in question underpins an AI agent’s decision-making. “AI-ready data requires more than conventional quality frameworks, demanding statistical rigor that encompasses comprehensive bias audits with equalized odds, distributional stability testing, and causal identifiability frameworks that enable counterfactual reasoning,” says Shanti Greene, head of data science at AnswerRocket and adjunct professor at Washington University. “Organizations pursuing transformational outcomes through sophisticated generative models paradoxically remain constrained by data infrastructures exhibiting insufficient volume for edge-case coverage. AI systems remain bounded by statistical foundations, proving that models trained on deficient data can generate confident hallucinations that masquerade as authoritative intelligence.” Recommendation: Understanding and documenting data biases should be a data governance non-negotiable. Applicable common fairness metrics include demographic parity and equalized odds, while p-value testing is used for statistical significance testing. Benchmark and review data quality metrics Data quality metrics focus on a dataset’s accuracy, completeness, consistency, timeliness, uniqueness, and validity. JG Chirapurath, president of DataPelago, recommends tracking the following: Data completeness: Fewer than 5% of entries for any critical field may be blank or missing to be considered complete. Statistical drift: If any key statistic changes by more than 2% compared to expected values, the data is flagged for human review. Bias ratios: If a group or segment experiences outcomes that are more than 20% different from those of another group or segment, the data is flagged for human review. Golden data sets: AI outputs must achieve greater than 90% agreement with human-verified ground truth on sample subsets. Rajeev Butani, chairman and CEO of MediaMint, adds, “Organizations can measure readiness with metrics like null and duplicate rates, schema and taxonomy consistency, freshness against SLAs, and reconciliation variance between booked, delivered, and invoiced records. Bias and risk can be tested through consent coverage, PII exposure scores, and retention or deletion checks.” Recommendation: Selecting data quality metrics and calculating a composite data health score is a common feature of data catalogs that helps build trust in using datasets for AI and decision-making. Data governance leaders should communicate target benchmarks and establish a review process for datasets that fall below data quality standards. Establish data classification, lineage, and provenance Looking beyond data quality, key data governance practices include classifying data for IP and privacy, and establishing data’s lineage and provenance. “The future is about governing AI agents as non-human identities that are registered, accountable, and subject to the same discipline as people in an identity system,” says Matt Carroll, founder and CEO of Immuta. “This requires classifying information into risk tiers, building in checkpoints for when human oversight is essential, and allowing low-risk interactions to flow freely.” Geoff Webb, VP of product and portfolio marketing at Conga, shares two key metrics that must be carefully evaluated before trusting the results of any agentic workflows. Data provenance refers to the origin of the data. Can the source be trusted, and how did that data become part of the dataset you are using? The chronology of the data refers to how old it is. Prevent training models using data that is no longer relevant to the objectives, or that may reflect outdated working practices, non-compliant processes, or simply poor business practices from the past. Recommendation: Regulated industries have a long history of maturing data governance practices. For companies lagging in these disciplines, data classification is an essential starting point. Create human-in-the-middle feedback loops As organizations use more datasets in AI, it is essential to have ongoing validation of the AI language model and agent’s accuracy by subject matter experts and other end-users. Dataops should extend feedback on AI to the underlying data sources to help prioritize improvements and identify areas to be enriched with new datasets. “In our call centers, we’re not just listening to customer interactions, we’re also feeding that qualitative data back into engineering teams to reshape how experiences are designed,” says Ryan Downing, VP and CIO of enterprise business solutions at Principal Financial Group. “We measure how people interact with AI-infused solutions and how those interactions correlate with downstream behaviors, for example, whether someone still needed to call us after using the mobile app.” Recommendation: Unstructured datasets and those capturing people’s opinions and sentiments are most prone to variance that statistical methods may not easily validate. When people report odd responses from AI models built on this data, it’s essential to trace back to the root causes in the data, especially since many AI models are not fully explainable. Automate a data readiness checklist Guy Adams, CTO of DataOps.live, says “AI-ready data isn’t just good data; it’s good data that’s been productized, governed, and delivered with the correct context so it can be trusted by AI systems today—and reused for the AI use cases we haven’t even imagined yet.” Organizations that heavily invest in AI agents and other AI capabilities will first ensure their data is ready and then automate a checklist for ongoing validation. The bar should be raised for any dataset’s AI readiness when that data is used for more mission-critical workflows and revenue-impacting customer experiences at greater scales.
Read More

The ripple effects of a VPN ban

Michigan and Wisconsin are considering proposals that would ban the use of virtual private networks (VPNs) by requiring internet providers to block these encrypted connections. The stated rationale is to control how users access certain online materials, but such a ban would upend the technical foundation of modern work, learning, and communication far beyond any single issue. VPNs are not simply niche tools or workarounds. They’re the invisible infrastructure that underpins the security, productivity, and connectivity of countless institutions and individuals worldwide. If states implement a broad VPN ban, the day-to-day operations of businesses, schools, and residents would be severely affected. The wide reliance on VPNs Nearly every organization, from large multinational tech companies to small accounting firms, relies on VPNs to protect sensitive operations. In a world of distributed teams, cloud-based applications, and bring-your-own-device workplaces, the only way to keep sensitive company data secure as it moves across public networks is through encrypted VPN connections. Cloud computing forms the foundation of most business activities. Whether employees are accessing files, databases, or proprietary applications, they often do so through the cloud. Remote workers, traveling employees, or anyone logging in from outside the office requires a VPN to establish a secure connection and protect their activity and the company’s sensitive assets from cyberthreats. Removing VPNs cuts the essential link between remote users and their digital workspace. The consequences would be immediate and serious: Companies would need to recall staff to physical offices, risking the loss of talent and drops in productivity, or shifting entire operations to more tech-friendly locations. For smaller businesses without the resources to handle these sudden challenges, the impact could be existential. VPNs are as essential to educational institutions as they are to businesses. Universities, colleges, and even K-12 districts use VPNs to allow students and faculty to access research databases, library archives, and administrative systems from anywhere in the world. The University of Michigan’s own VPN is a crucial tool that enables students and staff to connect securely even when using non-university internet providers. A ban would prevent students from doing coursework remotely, block faculty from accessing grading portals or academic data anywhere off campus, and make it extremely difficult for school IT teams to maintain security. Academic collaboration—both with colleagues at other institutions within the state and with international peers—would be hindered, isolating campuses at a time when global connectivity has never been more important. Losing critical privacy and access For regular internet users, VPNs are a fundamental privacy and security tool similar to having a phone number or locking your mailbox. They prevent third parties from tracking your activity, profiling your location, or creating a detailed record of your browsing history. Public Wi-Fi at coffee shops, airports, or hotels remains a top target for attackers. VPNs mitigate many of these risks, providing users with an important layer of protection. Users traveling across states or countries rely on VPNs to securely access their home services, bank accounts, and private communications. Freelancers, consultants, medical professionals, and legal experts—anyone who frequently moves between client sites—would be unable to securely connect to their own files or confidential portals. From a purely technical perspective, attempts to restrict VPNs create problems that are much bigger than the ones they claim to fix. Websites cannot reliably tell whether a VPN connection is coming from a particular state or even another country. If just a few states ban VPNs, sites that face legal risks are likely to block all VPN access globally to avoid accidental violations. This means VPN users everywhere could lose access to vital sites and services simply because of a law in one state. Such broad effects show how a technical policy, made without understanding operational realities, can cause widespread disruption across the internet. Productivity and security at risk The unintended consequences of a VPN ban reach well beyond state borders and far beyond the original lawmaking intentions. Without VPNs: Businesses lose the option of remote work—and with it, the flexibility and efficiency today’s economy requires. Educational institutions and students are cut off from essential resources and collaboration tools. Everyday users are exposed to cyberthreats, tracking, and data breaches when using public networks. Vulnerable populations, such as journalists, advocates, and individuals relying on privacy for their safety, are deprived of vital digital protections. Additionally, VPNs are the foundation of many compliance systems, including those overseeing financial data, health records, and legal documents. A ban could lead to legal and regulatory issues for companies trying to stay in good standing. Informed policy and practical solutions The debates in Michigan and Wisconsin over VPN access aren’t just about a single technology. They grapple with how societies balance security, productivity, privacy, and economic competitiveness in the digital age. Instead of limiting key security tools, states should focus on promoting cybersecurity education, strengthening tech infrastructure, and implementing smart digital policies that acknowledge the vital role VPNs play in modern life. The digital world requires thoughtful legislation that helps people and organizations thrive online rather than broad bans that make the internet less useful, secure, and productive for everyone. If Wisconsin and Michigan truly aim to attract business, research, and innovation, maintaining secure, private, and open access to essential technologies like VPNs is a key step.
Read More

Qdrant vector database adds tiered multitenancy

Qdrant has released Qdrant 1.16, an update of the Qdrant open source vector database that introduces tiered multitenancy, a capability intended to help isolate heavy-traffic tenants, boost performance, and scale search workloads more efficiently. Announced November 19, Qdrant 1.16 also offers ACORN, a search algorithm that improves the quality of filtered vector search in cases of multiple filters with weak selectivity, Qdrant said. To upgrade, users can go to Qdrant Cloud, then go to the Cluster Details screen and select Qdrant 1.16 from the dropdown menu. With tiered multitenancy, users get an improved approach to multitenancy that enables the combining of small and large tenants in a single collection, with the ability to promote growing tenants to dedicated shards, Qdrant said. Multitenancy is a common requirement for SaaS applications, where multiple customers, or tenants, share a database instance. When an instance is shared between multiple users in Qdrant, vectors may need to be partitioned by the user. The main principles behind tiered multitenancy are user-defined sharding, fallback shards, and tenant promotion, Qdrant said. User-defined sharding enables users to create named shards within a collection, allowing large tenants to be isolated in their own shards. Fallback shards are a routing mechanism that allows Qdrant to route a request to a dedicated shard or shared fallback shard. Tenant promotion is a mechanism that allows tenants to be changed from a shared fallback shard to their own dedicated shard when they have grown large enough. ACORN stands for ANN (Approximate Nearest Neighbor) Constraint-Optimized Retrieval Network. This capability offers improved vector search in cases of multiple filters with weak selectivity, according to Qdrant. With ACORN enabled, Qdrant not only traverses direct neighbors in Qdrant’s HNSW (Hierarchical Navigable Small World) graph-based infrastructure but also examines neighbors of neighbors if direct neighbors have been filtered out. This improves search accuracy but at the expense of performance, especially when multiple low-selectivity filters are applied. Because ACORN is slower (approximately 2x to 10x slower in typical scenarios) but improves recall (i.e. accuracy) for restrictive filters, tuning this parameter is about deciding when the accuracy improvement justifies the performance cost, the company said. Qdrant has published a decision matrix on when to use ACORN. Qdrant 1.16 also features a revamped UI intended to offer a fresh new look and an improved user experience. The new design incudes a welcome page that offers quick access to tutorials and reference documentation as well as redesigned Point, Visualize, and Graph views in the Collections Manager. This redesign makes it easier to work with data by presenting it in a more-compact format. In the tutorials, code snippets now are executed inline, thus freeing up screen space for better usability, Qdrant said. Also featured in Qdrant 1.16 is a new HNSW index storage mode, which enables more efficient disk-based vector search; a conditional update API, which facilitates easier migration of embedding models to a new version; and improved full-text search capabilities, with a new text_any condition and ASCII folding support.
Read More

Contagious Interview attackers go ‘full stack’ to fool you

Researchers at Socket have uncovered more details of a sophisticated software supply-chain operation linked to the Contagious Interview campaign attacking developers who rely on packages from NPM. They report finding a “full stack” operation behind the attacks, where code hosting, package distribution, staging servers and command-and control (C2) infrastructure are orchestrated much like a legitimate software development and delivery pipeline — and offer honest developers fresh advice on protecting themselves against the attacks. In the latest wave, threat actors uploaded almost 200 new malicious NPM packages, with more than 31,000 recorded downloads. The campaign lures victims with fake job interviews and coding assignments related to Web3 and blockchain projects, asking them to pull dependencies for a “test project”. But the NPM packages they install are Trojan horses. The latest packages identified by Socket ultimately deliver a new payload with upgraded credential theft, system monitoring and remote access capabilities, enabling them to take over developers’ accounts and machines. Point defense Based on its latest analysis, Socket advised developers to focus on the weak points this campaign exploits, and to treat every “npm install” as potential remote code execution, restrict what continuous-integration runners can access, enforce network egress controls, and review the code of any new templates or utilities pulled from GitHub. Teams should also scrutinize unfamiliar helper packages, pin known-good versions, and use lockfiles instead of auto-updating dependencies, it advised. Automated package analysis can further reduce risk, with real-time scans catching threats including import-time loaders, network probing, and bulk data exfiltration before they hit developer machines or CI systems. With these checks in place, dependency onboarding and code review become effective filters for blocking Contagious Interview-style attacks early, Socket said. Coding tasks lead to malware delivery These defensive measures are effective because Contagious Interview’s entry vector relies heavily on social engineering, using fake interview tasks to trick developers into installing compromised dependencies. The campaign exploits NPM, a widely used package registry for JavaScript and Node.js, by publishing packages that appear benign but carry hidden payloads. The malicious packages including one named “tailwind-magic” mimic legitimate libraries (in this case, a typosquatted version of the genuine “tailwind-merge” utility) to avoid suspicion. When an unsuspecting developer installs such a package, a post-install script triggers and reaches out to a staging endpoint hosted on Vercel. That endpoint in turn delivers a live payload fetched from a threat-actor controlled GitHub account named “stardev0914”. From there the payload, a variant of OtterCookie that also folds in capabilities from the campaign’s other signature payload, BeaverTail, executes and establishes a remote connection to the attackers’ control server. The malware then silently harvests credentials, crypto-wallet data, browser profiles and more. “Tracing the malicious npm package tailwind-magic led us to a Vercel-hosted staging endpoint, tetrismic[.]vercel[.]app,and from there to the threat actor controlled GitHub account which contained 18 repositories,” Socket’s senior threat intelligence analyst Kirill Boychenko said in a blog post, crediting related research by Kieran Miyamoto that helped confirm the malicious GitHub account stardev0914. A ‘full stack’adversary: GitHub, Vercel, and NPM What makes this campaign stand out is the layered infrastructure behind it. Socket’s analysis traced not just the NPM packages but also how the attackers built a complete delivery pipeline: malware serving repositories on GitHub, staging servers on Vercel, and separate C2 servers for exfiltration and remote command execution. Through this setup, attackers can rotate payloads, update malware unobtrusively, and tailor deployments per target—all while blending deeply into the legitimate developer ecosystem, according to Boychenko. Once installed, OtterCookie doesn’t just run and vanish: It remains persistent, capable of logging keystrokes, hijacking the clipboard, scanning the filesystem, capturing screenshots, and grabbing browser and wallet credentials across Windows, macOS and Linux. The campaign actors’ intensified NPM activity arrives at a worrying moment for the JavaScript and open-source ecosystem. In recent months, the community has seen a flurry of NPM-based attacks — including worm-style campaigns that transformed popular packages into Trojan horses, automated credential theft, and widespread supply chain compromise across both development and CI environments.
Read More

How to succeed as an independent software developer

Success as an independent software developer requires a lot of preparation and hard work, as well as some luck. But as baseball executive Branch Rickey once said, luck is the residue of design. Income for freelance developers varies depending on factors such as location, experience, skills, and project type. Average pay for a contractor is about $111,800 annually, according to ZipRecruiter, with top earners making potentially more than $151,000. That’s in line with what developers in general can expect to make, based on U.S. Bureau of Labor Statistics figures for median pay in 2024, the most recent figure available as of this writing. So, what does it take to succeed as a freelancer in the tech industry? I asked five successful independent developers how they did it. 1. Become a business Creating a formal business can be a good way to attract new clients and retain existing ones. “One of the most important ways to succeed as an independent developer is to treat yourself like a business,” says Darian Shimy, CEO of FutureFund, a fundraising platform built for K-12 schools, and a software engineer by trade. “That means setting up an LLC or sole proprietorship, separating your personal and business finances, and using invoicing and tax tools that make it easier to stay compliant,” Shimy says. “For some people, it might feel like an overkill or unnecessary overhead at first. But that type of structure will help give your clients confidence and save you a few headaches down the road.” Independent developers often underestimate the value of structure, says Sonu Kapoor, who has worked as an independent software engineer for more than two decades, architecting front ends for Citigroup’s global trading platform, leading RFID integration at American Apparel, and modernizing enterprise stacks for Sony Music Publishing and Cisco. “For individual developers, the difference between staying small and landing enterprise-scale work often comes down to perception,” Kapoor says. “Early on, I treated my freelance work like a company, registering a limited entity, keeping separate finances, and using professional tools like QuickBooks and HubSpot. But what really moved the needle was building relationships with senior leaders inside companies like Citigroup and Sony Music Publishing. Enterprises rarely hire individuals directly; contracts usually flow through vendors.” Kapoor focused on networking with decision-makers, showcasing credibility through his previous work and thought leadership. “That combination of structure and relationships opened doors that pure technical skill alone never could,” he says. “Running my freelance career as a structured business with processes, relationships, and professional credibility turned those introductions into sustained partnerships. It’s not about pretending to be a big company; it’s about operating with the same reliability as one.” 2. Find your niche Being a jack of all trades in the development world can be helpful for working on broad projects. But for some, success comes with specialization. “The biggest leap in my independent career came when I stopped spreading myself thin across frameworks and committed fully to Angular,” Kapoor says. “That focus reshaped my professional identity, leading to an invitation to join a private group of 11 global Angular collaborators who work directly with Google’s core team.” Soon after, Kapoor was recognized as a Google Developer Expert, which opened doors to speaking, consulting, and global visibility. This included being featured on a Topmate billboard in Times Square, New York City, highlighting his work on Angular and AI. “That depth also brought new opportunities organically,” Kapoor says. After seeing his work as a technical editor and contributor in the developer publishing space, Apress approached him to author a book on Angular Signals. “It was a full-circle moment, recognition not just for coding expertise, but for shaping how developers learn emerging technologies,” Kapoor says. “Specialization builds identity. Once your expertise becomes synonymous with progress in a field, opportunities—whether projects, media, or publishing—start coming to you.” Shimy of FutureFund followed a similar arc. At first, “I really tried to be everything to everyone,” he says. “It’s a similar outlook a lot of agencies have—do we want to specialize in one or two areas or be ‘decent’ at five or six things? A niche helps you stand out, build a reputation, and get referrals more easily.” 3. Build authority through visible contributions Publishing open source work and becoming known for thought leadership creates leverage and new opportunities for independent developers, Kapoor says. “Early in my career, I launched DotNetSlackers, a technical community that reached over 33 million views and became one of the top destinations for .NET content,” he says. “I didn’t realize it at the time, but that reach was more powerful than any marketing budget.” CTOs and engineering managers started discovering Kapoor’s work organically, he says. “One of my first major enterprise contracts came from a client who had been reading my posts for months before reaching out,” he says. That same principle carried forward when Kapoor shifted to Angular. “Using open source, I created over 100 code changes within one year in the Angular repository,” he says. “Contributing to Angular’s Typed Forms, which became the most upvoted feature request in Angular history, put my work in front of the global developer community and directly led to my Microsoft MVP and later Google Developer Expert recognitions.” Each visible contribution, whether it’s an open source library, conference talk, or published article in CODE Magazine, helps create credibility for independent developers, Kapoor says. “Developers often underestimate how far a single well-documented idea can travel,” he says. “One blog post can bring in client leads years later. In my case, it’s created a steady loop of media visibility, consulting opportunities, and technical recognition that continues to grow long after the initial effort.” 4. Prioritize communication to build relationships Freelancers in any field need to know how to communicate well, whether it’s through the written word or conversations with clients and colleagues. If a developer communicates poorly, even great talent might not make the difference in landing gigs. “My main tip, having been an indie developer for several years and now the CEO of a development agency, is to always communicate clearly and thoroughly,” says Lisa Freeman, CEO at 18a, a web design, development, and hosting services provider. “We work with the same clients for years—some over a decade—and that’s because of how we communicate,” Freeman says. “It’s easier to keep clients you’ve got than constantly [needing] to win new ones, as the competition is fierce nowadays.” A relationship with a client is as important if not more so than the code produced, Freeman says. “Don’t bamboozle them with complicated things they don’t need, but explain why you’ve done things the way you have,” she says. One area where Freeman often sees developers falling short is in communicating to clients where they’ve added value. “If a client asks for something and the developer does it in a way that makes things quicker another day, or helps solve another issue…that all needs to be highlighted,” she says. “It often doesn’t seem worth mentioning, but honestly, these little extras just help build a better impression in the mind of the client and keep them coming back to you.” A key to good communication is to practice translating technical jargon into something more approachable, says Mia Kotalik, who became a full-time freelance developer in 2022. “You can’t win trust by drowning non-technical clients in tech jargon,” Kotalik says. “It makes people feel talked down to and reluctant to engage with you. Explain concepts in non-technical language first, then introduce key terms with one-line definitions so clients feel informed, not overwhelmed. This skill is a differentiator: Clients understand the plan, feel respected, and still see that you’re technically rigorous. This skill is arguably the most important.” 5. Create a portfolio of your work A portfolio of work tells the story of what you bring to the table. It’s the main way to showcase your software development skills and experience, and is a key tool in attracting clients and projects. A good portfolio supplements a resume and other materials needed to demonstrate what you are capable of doing. “You will need customers who are willing to take a risk on an independent developer,” says Brad Weber, founder and president of InspiringApps, a company that designs and builds custom digital products, and previously an independent developer for 12 years. “Minimize their risk by having similar work you can point to for reference,” Weber says. “If that sounds like a Catch-22 when you are starting, it is. I found it effective to do work for free or greatly reduced price for friends, family, and not-for-profit organizations.” Independent software developers first starting out don’t even need to wait for a client to build a portfolio, Kotalik says. “Make apps and websites in your free time,” she says. “I built my first sites on my own time for free; by the second hobby project, paid clients started reaching out.”
Read More

How much will openness matter to AI?

Sometimes in tech we misunderstand our history. For example, because Linux eventually commoditized the Unix wars, and because Apache and Kubernetes became the standard plumbing of the web, we assume that “openness” is an inevitable force of nature. The narrative is comforting; it’s also mostly wrong. At least, it’s not completely correct in the ways advocates sometimes suppose. When open source wins, it’s not because it’s morally superior or because ”many eyes make all bugs shallow” (Linus’s Law). It dominates when a technology becomes infrastructure that everyone needs but no one wants to compete on. Look at the server operating system market. Linux won because the operating system became a commodity. There was no competitive advantage in building a better proprietary kernel than your neighbor; the value moved up the stack to the applications. So, companies like Google, Facebook, and Amazon poured resources into Linux, effectively sharing the maintenance cost of the boring stuff so they could compete on the interesting stuff where data and scale matter most (search, social graphs, cloud services). This brings us to artificial intelligence. Open source advocates point to the explosion of “open weights” models like Meta’s Llama or the impressive efficiency of DeepSeek’s open source movement, and they declare that the closed era of OpenAI and Google is already over. But if you look at the actual money changing hands, the data tells a different, much more interesting story, one with a continued interplay between open and closed source. Losing $25 billion A recent, fascinating report by Frank Nagle (Harvard/Linux Foundation) titled “The Latent Role of Open Models in the AI Economy” attempts to quantify this disconnect. Nagle’s team analyzed data from OpenRouter and found a staggering inefficiency in the market. Today’s open models routinely achieve 90% (or more) of the performance of closed models while costing about one-sixth as much to run. In a purely rational economic environment, enterprises should be abandoning GPT-4 for Llama 3 en masse. Nagle estimates that by sticking with expensive closed models, the global market is leaving roughly $24.8 billion on the table annually. The academic conclusion is that this is a temporary market failure, a result of “information asymmetry” or “brand trust.” The implication is that once CIOs realize they are overpaying, they will switch to open source, and the proprietary giants will topple. Don’t bet on it. To understand why companies are happily “wasting” $24 billion, and why AI will likely remain a hybrid of open code and closed services, we have to stop looking at AI through the lens of 1990s software development. As I’ve written, open source isn’t going to save AI because the physics of AI are fundamentally different from the physics of traditional software. The convenience premium In the early 2010s, we saw a similar “inefficiency” with the rise of cloud computing. You could download the exact same open source software that AWS was selling—MySQL, Linux, Apache—and run it yourself for free. Yet, as I noted, developers and enterprises flocked to the cloud, paying a massive premium for the privilege of not managing the software themselves. Convenience trumps code freedom. Every single time. The $24 billion “loss” Nagle identifies isn’t wasted money; it is the price of convenience, indemnification, and reliability. When an enterprise pays OpenAI or Anthropic, they aren’t just buying token generation. They are buying a service-level agreement (SLA). They are buying safety filters. They are buying the ability to sue someone if the model hallucinates something libelous. You cannot sue a GitHub repository. This is where the “openness wins” argument runs into reality. In the AI stack, the model weights are becoming “undifferentiated heavy lifting,” the boring infrastructure that everyone needs but no one wants to manage. The service layer (the reasoning loops, the integration, the legal air cover) is where the value lives. That layer will likely remain closed. The ‘community’ that wasn’t There is a deeper structural problem with the “Linux of AI” analogy. Linux won because it harnessed a large, decentralized community of contributors. The barrier to entry for contributing to a large language model (LLM) is much higher. You can fix a bug in the Linux kernel on a laptop. You cannot fix a hallucination in a 70-billion-parameter model without access to the original training data and a compute cluster that costs more than any individual developer can afford, unless you’re Elon Musk or Bill Gates. There is also a talent inversion at play. In the Linux era, the best developers were scattered, making open source the best way to collaborate. In the AI era, the scarce talent—the researchers who understand the math behind the magic—are being hoarded inside the walled gardens of Google and OpenAI. This changes the definition of “open.” When Meta releases Llama, the license is almost immaterial because of the barriers to running and testing that code at scale. They are not inviting you to co-create the next version. This is “source available” distribution, not open source development, regardless of the license. The contribution loop for AI models is broken. If the “community” (we invoke that nebulous word far too casually) cannot effectively patch, train, or fork the model without millions of dollars in hardware, then the model is not truly open in the way that matters for long-term sustainability. So why are Meta, Mistral, and DeepSeek releasing these powerful models for free? As I’ve written for years, open source is selfish. Companies contribute to open source not out of charity, but because it commoditizes a competitor’s product while freeing up resources to pay more for their proprietary products. If the intelligence layer becomes free, the value shifts to the proprietary platforms that use that intelligence (conveniently, Meta owns a few of these, such as Facebook, Instagram, and WhatsApp). Splitting the market into open and closed We are heading toward a messy, hybrid future. The binary distinction between open and proprietary is dissolving into a spectrum of open weights, open data (rare), and fully closed services. Here is how I see the stack shaking out. Base models will be open. The difference between GPT-4 and Llama 3 is already negligible for most business tasks. As Nagle’s data shows, the catch-up speed is accelerating. Just as you don’t pay for a TCP/IP stack, you soon won’t pay for raw token generation. This area will be dominated by players like Meta and DeepSeek that benefit from the ecosystem chaos. The real money will shift to the data layer, which will continue to be closed. You might have the model, but if you don’t have the proprietary data to fine-tune it for medical diagnostics, legal discovery, or supply chain logistics, the model is a toy. Companies will guard their data sets with far more ferocity than they ever guarded their source code. The reasoning and agentic layer will also stay closed, and that’s where the high-margin revenue will hide. It’s not about generating text; it’s about doing things. The agents that can autonomously navigate your Salesforce instance, negotiate a contract, or update your ERP system will be proprietary because they require complex, tightly coupled integrations and liability shields. Enterprises will also pay for the tools that ensure they aren’t accidentally leaking intellectual property or generating hate speech—stuff like observability, safety, and governance. The model might be free, but the guardrails will cost you. Following the money Frank Nagle’s report correctly identifies that open models are technically competitive and economically superior in a vacuum. But business doesn’t happen in a vacuum. It happens in a boardroom where risk, convenience, and speed dictate decisions. The history of open source is not a straight line toward total openness. It is a jagged line where code becomes free and services become expensive. AI will be no different. The future is the same as it ever was: open components powering closed services. The winners won’t be the ideological purists. The winners will be the pragmatists who take the free, open models, wrap them in proprietary data and safety protocols, and sell them back to the enterprise at a premium. That $24 billion gap is just going to be reallocated to the companies that solve the “last mile” problem of AI: a problem that open source, for all its many virtues, has never been particularly good at solving.
Read More

AWS launches Flexible Training Plans for inference endpoints in SageMaker AI

AWS has launched Flexible Training Plans (FTPs) for inference endpoints in Amazon SageMaker AI, its AI and machine learning service, to offer customers guaranteed GPU capacity for planned evaluations and production peaks. Typically, enterprises use SageMaker AI inference endpoints, which are managed systems, to deploy trained machine learning models in the cloud and run predictions at scale on new data. For instance, a global retail enterprise can use SageMaker inference endpoints to power its personalized-recommendation engine: As millions of customers browse products across different regions, the endpoints automatically scale compute and storage to handle traffic spikes without the company needing to manage servers or capacity planning. However, the auto-scaling nature of these inference endpoints might not be enough for several situations that enterprises may encounter, including workloads that require low latency and consistent high performance, critical testing and pre-production environments where resource availability must be guaranteed, and any situation where a slow scale-up time is not acceptable and could harm the application or business. According to AWS, FTPs for inferencing workloads aim to address this by enabling enterprises to reserve instance types and required GPUs, since automatic scaling up doesn’t guarantee instant GPU availability due to high demand and limited supply. FTPs support for SageMaker AI inference is available in US East (N. Virginia), US West (Oregon), and US East (Ohio), AWS said. Reducing operational load and costs The guarantee of GPU availability, according to analysts, solves major challenges that enterprises face around scaling AI and machine learning workloads. “The biggest change is reliability,” said Akshat Tyagi, associate practice leader at HFS Research. “Before this update, enterprises had to deploy Inference Endpoints and hope the required GPU instances were available. When GPUs were scarce, deployments failed or got delayed. Now they can reserve the exact GPU capacity weeks or months in advance. This can be huge for teams running LLMs, vision models, or batch inference jobs where downtime isn’t an option.” Forrester’s principal analyst Charlie Dai termed the new capability a “meaningful step” toward cost governance that reduces cost unpredictability for AI operationalization: “Customers can align spend with usage patterns and avoid overprovisioning, which will lower idle costs,” Dai said. Tyagi pointed out that by reserving capacity in advance, AWS customers can pay a lower committed rate compared to on-demand pricing, lock in pricing for a set period, avoid expensive last-minute scrambling or scaling up to more costly instance types, and plan budgets more accurately because the expenditure is fixed upfront. The ability to reserve instances, Tyagi added, also might stop the trend of enterprises being forced to “run” inference endpoints 24*7 in fear of not being able to secure them when needed, which in itself causes more unavailability. AWS isn’t the only hyperscaler that is offering the option to reserve instances for inference workloads. While Microsoft Azure offers reserved capacity for inference via Azure Machine Learning, Google Cloud provides committed use discounts for Vertex AI.
Read More