Month: November 2025

14 posts

AWS launches Flexible Training Plans for inference endpoints in SageMaker AI

AWS has launched Flexible Training Plans (FTPs) for inference endpoints in Amazon SageMaker AI, its AI and machine learning service, to offer customers guaranteed GPU capacity for planned evaluations and production peaks. Typically, enterprises use SageMaker AI inference endpoints, which are managed systems, to deploy trained machine learning models in the cloud and run predictions at scale on new data. For instance, a global retail enterprise can use SageMaker inference endpoints to power its personalized-recommendation engine: As millions of customers browse products across different regions, the endpoints automatically scale compute and storage to handle traffic spikes without the company needing to manage servers or capacity planning. However, the auto-scaling nature of these inference endpoints might not be enough for several situations that enterprises may encounter, including workloads that require low latency and consistent high performance, critical testing and pre-production environments where resource availability must be guaranteed, and any situation where a slow scale-up time is not acceptable and could harm the application or business. According to AWS, FTPs for inferencing workloads aim to address this by enabling enterprises to reserve instance types and required GPUs, since automatic scaling up doesn’t guarantee instant GPU availability due to high demand and limited supply. FTPs support for SageMaker AI inference is available in US East (N. Virginia), US West (Oregon), and US East (Ohio), AWS said. Reducing operational load and costs The guarantee of GPU availability, according to analysts, solves major challenges that enterprises face around scaling AI and machine learning workloads. “The biggest change is reliability,” said Akshat Tyagi, associate practice leader at HFS Research. “Before this update, enterprises had to deploy Inference Endpoints and hope the required GPU instances were available. When GPUs were scarce, deployments failed or got delayed. Now they can reserve the exact GPU capacity weeks or months in advance. This can be huge for teams running LLMs, vision models, or batch inference jobs where downtime isn’t an option.” Forrester’s principal analyst Charlie Dai termed the new capability a “meaningful step” toward cost governance that reduces cost unpredictability for AI operationalization: “Customers can align spend with usage patterns and avoid overprovisioning, which will lower idle costs,” Dai said. Tyagi pointed out that by reserving capacity in advance, AWS customers can pay a lower committed rate compared to on-demand pricing, lock in pricing for a set period, avoid expensive last-minute scrambling or scaling up to more costly instance types, and plan budgets more accurately because the expenditure is fixed upfront. The ability to reserve instances, Tyagi added, also might stop the trend of enterprises being forced to “run” inference endpoints 24*7 in fear of not being able to secure them when needed, which in itself causes more unavailability. AWS isn’t the only hyperscaler that is offering the option to reserve instances for inference workloads. While Microsoft Azure offers reserved capacity for inference via Azure Machine Learning, Google Cloud provides committed use discounts for Vertex AI.
Read More

What is devops? Bringing dev and ops together to build better software

A portmanteau of “development” and “operations,” devops emerged as a way of bringing together two previously separate groups responsible for the building and deploying of software. In the old world, developers (devs) typically wrote code before throwing it over to the system administrators (operations, or ops) to deploy and integrate that code. But as the industry shifted towards agile development and cloud-native computing, many organizations reoriented around modern, cloud-native practices in the pursuit of faster, better releases. This required a new way to perform these key functions in a more streamlined, efficient, and cohesive way, one where the old frustrations of disconnected dev and ops functions would be eliminated. With two groups working together, developers can rapidly roll out small code enhancements via continuous integration and delivery rather than spending years on “big bang” product releases. Devops was born at cloud-native companies like Facebook, Netflix, Spotify, and Amazon; but it’s become one of the defining technology industry trends of the past decade, primarily because it bridges so many of the changes that have shaped modern software development. As agile development and cloud-native computing have become ubiquitous, devops has enabled the entire industry to speed up its software development cycles. Thus, devops has now thoroughly infiltrated the enterprise, especially in organizations that rely on software to run their business, such as banks, airlines, and retailers. And it’s spawned a host of other “ops” practices, some of which we’ll touch on here.[JF1]  Devops practices Devops requires a shift in mindset from both sides of the dev and ops divide. Development teams should focus on learning and adopting agile processes, standardizing platforms, and helping drive operational efficiencies. Operations teams must now focus on improving stability and velocity, while also reducing costs by working hand in hand with the developer team. Broadly speaking, these teams need to all speak a common language and there needs to be a shared goal and understanding of each other’s key skills for devops to thrive. More specifically, engineers Damon Edwards and John Willis created the CALMS model to bring together what are commonly understood to be the key principles of devops: Culture: One that embraces agile methodologies and is open to change, constant improvement, and accountability for the end-to-end quality of software. Automation: Automating away toil is a key goal for any devops team. Lean: Ensuring the smooth flow of software through key steps as quickly as possible. Measurement: You can’t improve what you don’t measure. Devops pushes for a culture of constant measurement and feedback that can be used to improve and pivot as required, on the fly. Sharing: Knowledge sharing across an organization is a key tenet of devops. “Who could go back to the old way of trying to figure out how to get your laptop environment looking the same as the production environment? All these things make it so clear that there’s a better way to work. I think it’s very tough to turn back once you’ve done things like continuous integration, like continuous delivery. Once you’ve experienced it, it’s really tough to go back to the old way of doing things,” Kim told InfoWorld. What is a devops engineer? Naturally, the emergence of devops has spawned a whole new set of job titles, most prominent of which is the catch-all devops engineer. Generally speaking, this role is the natural evolution of the system administrator — but in a world where developers and ops work in close tandem to deliver better software. This person should have a blend of programming and system administrator skills so that he or she can effectively bridge those two sides of the team. That bridging of the two sides requires strong social skills more than technical. As Kim put it, “one of the most important skills, abilities, traits needed in these pioneering rebellions — using devops to overthrow the ancient powerful order, who are very happy to do things the way they have for 30 to 40 years — are the cross-functional skills to be able to reach across the table to their business counterparts and help solve problems.” This person, or team of people, will also have to be a born optimizer, tasked with continually improving the speed and quality of software delivery from the team, be that through better practices, removing bottlenecks, or applying automation to smooth out software delivery. The good news is that these skills are valuable to the enterprise. Salaries for this set of job titles have risen steadily over the years, with 95% of devops practitioners making more than $75,000 a year in salary in 2020 in the United States. In Europe and the UK, where salaries are lower across the board, 71% made more than $50,000 a year in 2020, up from 67% in 2019. Key devops tools While devops is at its heart a cultural shift, a set of tools has emerged to help organizations adopt devops practices. This stack typically includes infrastructure as code, configuration management, collaboration, version control, continuous integration and delivery (CI/CD), deployment automation, testing, and monitoring tools. Here are some of the tools/categories that are increasingly relevant in 2025, and what is changing: CI/CD and delivery automation: Traditional tools like Jenkins remain in many stacks, but newer orchestration tools and CLI-driven or GitOps-centric platforms are growing in importance (e.g. ArgoCD, Flux, Tekton). Also, platforms that integrate more tightly with monitoring, secrets management, drift detection, and policy enforcement are gaining traction. Security, compliance, and devsecops tooling: Security tools are increasingly integrated into devops pipelines. Expect to see more use of static analysis (SAST), dynamic testing (DAST), dependency and supply chain scanning (SCA), secret management, and policy as code. The push is toward embedding security earlier and bridging gaps between dev, security, and machine learning teams. (InfoWorld:) AI  and automation augmentation: AI-assisted tools are increasingly part of tooling stacks: auto-suggestions in CI/CD, anomaly detection, predictive scaling, intelligent test suite selection, and more. The hope is that these tools will reduce manual interventions and improve reliability. Tools that are “AI ready”—that is, they integrate well with AI or have mature built-in automation or assistance—increasingly stand out from the pack. Devops challenges Even as devops becomes more widely adopted, there remain real obstacles that can slow progress or limit impact. One major challenge is the persistent skills gap. The modern devops engineer (or team) is expected to master not just source control, CI/CD, and scripting, but also cloud architecture, infrastructure as code, security best practices, observability, and strong cross-team communication. In many organizations these capabilities are uneven: some teams excel, others lag behind. A 2024 survey showed that while 83% of developers report participating in devops activities, using multiple CI/CD tools was correlated with worse performance — a sign that complexity without deep expertise can backfire. Toolchain fragmentation and complexity is a related issue. Devops toolchains have sprouted into a sometimes bewildering array of packages and techniques to master: version control, CI build/test, security scanning, artifact management, monitoring, observability, deployment, secret management, and more. The more tools you have, the more difficult it becomes to integrate them cleanly, manage their versions, ensure compatibility, and avoid duplicated effort. Organizations often get stuck with “tool sprawl” — tools chosen by different teams, legacy systems, or overlapping functionalities — which introduce friction, maintenance burden, and sometimes vulnerabilities. Finally, although devops has spread far and wide, there is still cultural resistance and alignment. Devops isn’t just about tools and processes; it’s about collaboration, shared responsibility, and continuous feedback. Teams rooted in traditional silos (dev vs ops, or security separate) may resist changes to roles and workflows. Leadership support, communication of shared goals, trust, and allowance for continuous learning are all necessary. Many CIOs focus too much on tools or implementation first, rather than organizational culture and behaviors; but without addressing culture, even the best tools or processes may not yield the hoped-for velocity, quality, or reliability. Organizations that succeed here tend to have proactive strategies: dedicated training programs, mentorship, internal “guilds,” pairing junior and senior engineers, and making sure leadership supports ongoing learning rather than one-off bootcamps. Why do devops? Whoever you ask will tell you that devops is a major culture shift for organizations, so why go through that pain at all? Devops aims to combine the formerly conflicting aims of developers and system administrators. Under its principles, all software development aims to meet business demands, add functionality, and improve the usability of applications while also ensuring those applications are stable, secure, and reliable. Done right, this improves the velocity and quality of your output, while also improving the lives of those working on these outcomes. Does devops save money — or add cost? Devops teams are recognizing that speed and agility are only part of success — unchecked cloud bills and waste undermine long-term sustainability. Waste in devops often comes in the form of “devops debt”— idle cloud capacity, dead code, or false-positive security alerts—which was called a “hidden tax on innovation” in recent Java-environment studies.  Embedding finops practices can help fight these costs. Teams should shift left on cost: estimating costs when spinning up new environments, resizing instances, and scaling down unused resources before they become runaway expenses. How to start with devops There are lots of resources for help getting started with devops, including Kim’s own Devops Handbook, or you can enlist the help of external consultants. But you have to be methodical and focus on your people more than on the tools and technology you will eventually use if you want to ensure lasting buy-in across the business. A proven route to achieving this is a “land and expand” strategy, where a small group starts by mapping key value streams and identifying a single product team or workload for trialing devops practices. If this team is successful in proving the value of the shift, you will likely start to get interest from other teams and from senior leadership. If you are at the start of your devops journey, however, make sure you are prepared for the disruption a change like this can have on your organization, and keep your eye on the prize of building better, faster, stronger software. More on devops: Devops debt: The hidden tax on innovation 10 big devops mistakes and how to avoid them Smarter devops: How to avoid deployment horrors
Read More

Python vs. Kotlin: Which loops do you like better?

Prepare to be surprised when we compare Python and Kotlin for simple programs, loops, imports, exceptions, and more. You can also get a super early preview of Python’s next-generation (Python 3.15) sampling profiler, get up close with AWS’s new AI-powered Zed editor, and explore your options for AI/ML programming outside of the Python ecosystem. Find these stories and more in this week’s Python Report. Top picks for Python readers on InfoWorld Python vs. Kotlin: Side-by-side code comparisonHow does the snake stack up against a new-school JVM language? Prepare to be surprised (video). Hands-on with the new sampling profiler in Python 3.15The next version of Python is over a year away, but you can try out one of its hottest features right now (video). Hands-on with Zed: The IDE built for AIAWS’s Zed IDE was designed from the ground up for machine-native speed, integrates collaborative features, and sports its own custom LLM for code completion. AI and machine learning outside of PythonPython might be the best, but it’s not your only choice for AI and machine learning. Here’s what you need to know about using Java, Rust, Go, or C#/.Net for AI/ML. Python news bite Aspire 13 bolsters Python, JavaScript supportMicrosoft’s distributed app toolkit, Aspire, rolls in Python and JavaScript support, letting you develop and debug parallel-executable apps deployed across many machines at once. More good reads and Python updates elsewhere Pyrefly language server is now in betaMeta’s high-performance type checking and code linting tool for Python is now being offered as a production-ready (if still fast-developing) project. Phoronix.com’s Python 3.14 benchmark suiteFor the most part, Python 3.14’s performance stands up to or exceeds its predecessors. But you’ll want to know about these eyebrow-raising exceptions to the rule. NiceGui 3.0: Python UIs in HTML made simple (and nice)We’ve talked before about the NiceGUI library, used for quickly designing Python app UIs with HTML. The latest major version is out, with tons of convenience improvements, Tailwind 4 support, and a new event system. Slightly off-topic: Microsoft open sources original Zork I-III codeThree classic text adventure games that inspired generations of developers are now available for all to read, run, and tinker with under the MIT license.
Read More

Cloud fragility is costing us billions

It was supposed to be a routine Tuesday. Employees trickled into the office of a mid-sized logistics company; some grabbed coffee, others settled at their desks. As they tried to access their shipment-tracking dashboards, schedule pickups, or even log in to the HR portal, critical systems inexplicably went offline. Chaos ensued. The IT team scrambled to diagnose the problem, initially confused because their company’s tech infrastructure didn’t rely on the major cloud service that had been hit by a significant outage that morning. Hours later, they learned their software vendors—and their vendors’ vendors—depended on that cloud provider. Like many others, the company unexpectedly fell victim to the complexities of modern cloud systems. This story echoes across industries every time an infrastructure as a service (IaaS) provider stumbles. It happens more often than most realize. The truth is that we’ve constructed a digital world on a surprisingly fragile foundation, building whole economies, sometimes unknowingly, on just a handful of hyperscaler companies. The domino effect Beneath the elegant veneer of mobile apps, dashboards, and connected devices lies a labyrinth of technical dependencies. Cloud computing promised affordable scalability and offloaded complexity. As adoption snowballed, a handful of giants (Amazon Web Services, Microsoft Azure, Google Cloud Platform) and a small circle of others became the backbone for modern digital services. These hyperscalers offer infrastructure so ubiquitous that many technology providers, even ones hesitant to rely on the tech titans directly, still depend on them indirectly through partner services, APIs, or even core infrastructure providers that themselves run on the cloud. When one of these hyperscalers suffers an outage, the impact is uncontained. It cascades. In late 2025, for example, three major outages at Amazon Web Services, Microsoft Azure, and Cloudflare rippled across industries with astonishing speed. Delta and Alaska airlines couldn’t check in passengers. Gaming and streaming platforms like Roblox and Discord ground to a halt. Even internet-connected “smart beds” and residential video doorbells became unusable. It’s tempting to write these off as embarrassing but rare moments in the industry’s upward arc, but the frequency is increasing. More importantly, the scope is far broader than what’s visible on outage maps. For every downed social media giant, thousands of enterprises, municipalities, and nonprofits experience the same disruptions in silence, sometimes without even knowing where to place the blame. Costs are higher than they appear Whenever an outage disrupts our digital core, the damage extends beyond customer complaints. The adverse effects are immediate and widespread: decreased productivity, delayed or missed financial transactions, and erosion of trust. Estimating the global economic loss from outages is also tricky. Brief disruptions can cost companies hundreds of millions of dollars in downtime, lost transactions, customer support costs, and reputational damage. The hidden costs for third-party providers, such as compensating customers and re-architecting platforms, push the total losses into the billions. It’s not just about money. Modern life depends on the invisible infrastructure of the cloud. Outages lead to confusion, missed opportunities, and sometimes serious risks, such as medical services or public utilities going dark due to software failures deep within their systems. Typical approaches fail Following these outages, calls for regulation have intensified. Lawmakers and consumer advocates seek to investigate, enforce redundancy, and possibly even break up platforms that are increasingly regarded as “too big to fail.” That’s understandable, but it only addresses the most superficial layer of the issue. Regulatory guardrails can only do so much; outages often result from minor errors, bugs, or routine changes more than from catastrophic hacks. No legislation can prevent typos, misconfigurations, or software mishaps. If anything, the drumbeat for outside intervention can give businesses a false sense of security, believing that safety is someone else’s job, or that headline-grabbing outages are unavoidable acts of fate. The urgent need for resilience The solution is both challenging and empowering. Enterprises must own their architectures, identify direct and indirect dependencies, and plan for failures. Resilience can’t be an afterthought or delegated solely to IT; it must be a core mindset of every digital transformation. This requires answers to some tough questions: If a core provider or one of our technology partners suffers an outage, what happens? Which systems go offline, which degrade gracefully, and which are truly mission-critical? How can redundancy—real, cross-provider redundancy, not mere failover within a single vendor’s walled garden—be built into every layer of operations? Are we confident in our disaster recovery and business continuity strategies, or do they only exist on paper? The recent series of outages was a wake-up call, highlighting how few organizations have truly robust plans in place. Too many were caught flat-footed, unsure how to respond or not even sure what had failed and why. A plan built on awareness and action We are not powerless in the face of these challenges. The answer isn’t to “uncloud” entirely or freeze innovation, but to build digital ecosystems that acknowledge their real-world fragility. This means deeper due diligence when selecting partners, open conversations about dependencies, and above all, engineering for failure rather than assuming it won’t occur. The lesson should be clear by now: The interconnected nature of cloud services means the entire economy is only as resilient as its weakest link. Enterprises must look beyond the marketing gloss and prepare for the inevitable, not just the ideal. Only through proactive, ongoing investments in resilience can we stop reliving the same costly surprises week after week.
Read More

Security researchers caution app developers about risks in using Google Antigravity

Google’s Antigravity development tool for creating artificial intelligence agents has been out for less than 11 days and already the company has been forced to update the known issues pages after security researchers discovered what they say are vulnerabilities. According to a blog from Mindgard, one of the first to discover problems with Antigravity, Google isn’t calling the issue it found a security bug. But Mindgard says a threat actor could create a malicious rule by taking advantage of Antigravity’s strict direction that any AI assistant it creates must always follow user-defined rules. Author Aaron Portnoy, Mindgard’s head of research and innovation, says that after his blog was posted, Google replied on November 25 to say a report has been filed with the responsible product team. Still, until there is action, “the existence of this vulnerability means that users are at risk to backdoor attacks via compromised workspaces when using Antigravity, which can be leveraged by attackers to execute arbitrary code on their systems. At present there is no setting that we could identify to safeguard against this vulnerability,” Portnoy wrote in his blog. Even in the most restrictive mode of operation, “exploitation proceeds unabated and without confirmation from the user,” he wrote. Asked for comment, a Google spokesperson said the company is aware of the issue reported by Mindguard, and is working to address it.  In an email, Mindgard’s Portnoy told CSOonline that the nature of the flaw makes it difficult to mitigate. “Strong identity would not help mitigate this issue, because the actions undertaken by Antigravity are occurring with the identity of the user running the application,” he said. “As far as the operating system can tell, they are indistinguishable. Access management control could possibly do so, but only if you were able to restrict access to the global configuration directory, and it may have downstream impact on Antigravity’s functionality.” For example, he said, this could cause Model Context Protocol (MCP), a framework for standardizing the way AI systems share data, to malfunction. “The attack vector is through the source code repository that is opened by the developer,” he explained, “and doesn’t need to be triggered through a prompt.” Other researchers also have found problems with Antigravity: Adam Swanda says he discovered and disclosed to Google an indirect prompt injection vulnerability. He said Google told him the particular problem is a known issue and is expected behavior of the tool. another researcher who goes by the name Wunderwuzzi blogged about discovering five holes, including data exfiltration and remote command execution via indirect prompt injection vulnerabilities. This blog notes that, according to Google’s Antigravity Known Issues page, the company is working on fixes for several issues.    Asked for comment about the issues reported in the three blogs, a Google spokesperson said, “We take security issues very seriously and encourage reporting of all vulnerabilities so we can identify and address them quickly. We will continue to post known issues publicly as we work to address them.” What is Antigravity? Antigravity is an integrated application development environment (IDE), released on November 18, that leverages the Google Gemini 3 Pro chatbot. “Antigravity isn’t just an editor,” Google says. “It’s a development platform that combines a familiar, AI-powered coding experience with a new agent-first interface. This allows you to deploy agents that autonomously plan, execute, and verify complex tasks across your editor, terminal, and browser.” Individuals — including threat actors — can use it for free. Google Antigravity streamlines workflows by offering tools for parallelization, customization, and efficient knowledge management, the company says, to help eliminate common development obstacles and repetitive tasks. Agents can be spun up to tackle routine tasks such as codebase research, bug fixes, and backlog tasks. Agent installs the malware The problem Mindgard discovered involves Antigravity’s rule that the developer has to work within a trusted workspace or the tool won’t function. The threat manifests through an attacker creating a malicious source code repository, Portnoy says. “Then, if a target opens it (through finding it on Github, being social engineered, or tricked into opening it) then that user’s system becomes compromised persistently.”  The malicious actor doesn’t have to create an agent, he explained. The agent is part of Antigravity and is backed by the Google Gemini LLM. The agent is the component that is tricked into installing the backdoor by following instructions in the malicious source code repository that was opened by the user. “In Antigravity,” Mindgard argues, “’trust’ is effectively the entry point to the product rather than a conferral of privileges.” The problem, it pointed out, is that a compromised workspace becomes a long-term backdoor into every new session. “Even after a complete uninstall and re-install of Antigravity,” says Mindgard, “the backdoor remains in effect. Because Antigravity’s core intended design requires trusted workspace access, the vulnerability translates into cross-workspace risk, meaning one tainted workspace can impact all subsequent usage of Antigravity regardless of trust settings.” For anyone responsible for AI cybersecurity, says Mindguard, this highlights the need to treat AI development environments as sensitive infrastructure, and to closely control what content, files, and configurations are allowed into them. Process ‘perplexing’ In his email, Portnoy acknowledged that Google is now taking some action. “Google is moving through their established process, although it was a bit perplexing on the stop-and-start nature. First [the reported vulnerability] was flagged as not an issue. Then it was re-opened. Then the Known Issues page was altered in stealth to be more all encompassing. It’s good that the vulnerability will be reviewed by their security team to ascertain its severity, although in the meantime we would recommend all Antigravity users to seriously consider the vulnerability found and means for mitigation.” Adam Swanda says in his blog that he was able to partially extract an Antigravity agent’s system prompt, enough to identify a design weakness that could lead to indirect prompt injection. Highlights broader issues The problem is, the prompt tells the AI to strictly follow special XML-style tags that handle privileged instructions in a conversation between a user and the chatbot, so there would be no warning to the user that special, and possibly malicious, instructions were retrieved. When the agent fetches external web content, Swanda says, it doesn’t sanitize these special tags to ensure they are actually from the application itself and not untrusted input. An attacker can embed their own special message in a webpage, or presumably any other content, and the Antigravity agent will treat those commands as trusted system instructions. This type of vulnerability isn’t new, he adds, but the finding highlights broader issues in large language models and agent systems: LLMs cannot distinguish between trusted and untrusted sources; untrusted sources can contain malicious instructions to execute tools and/or modify responses returned to the user/application; system prompts should not be considered secret or used as a security control. Advice to developers Swanda recommends that app development teams building AI agents with tool-calling: assume all external content is adversarial. Use strong input and output guardrails, including tool calling; Strip any special syntax before processing; implement tool execution safeguards. Require explicit user approval for high-risk operations, especially those triggered after handling untrusted content or other dangerous tool combinations; not rely on prompts for security. System prompts, for example, can be extracted and used by an attacker to influence their attack strategy. In addition, Portnoy recommends that developers work with their security teams to ensure that they sufficiently vet and assess the AI-assisted tools that they are introducing to their organization. “There are numerous examples of using AI-assisted tools to accelerate development pipelines to enhance operational efficiency,” he said. “However, from experience, security in bleeding-edge (recently dropped) tools is somewhat of an afterthought. Thinking seriously about the intended use case of the AI tool, what data sources it can access, and what it is connected to are fundamental to ensuring you remain secure.”  This article originally appeared on CSOonline.
Read More