AI Accelerator Institute

How generative AI is revolutionizing drug discovery and development

Dr Nikolay Burlutskiy — Fri, 25 Apr 2025 17:32:35 GMT

This article comes from Dr Nikolay Burlutskiy’s talk at our London 2024 Generative AI Summit. Check out his full presentation and the wealth of OnDemand resources waiting for you.

Nikolay is currently the Senior Manager of GenAI Platforms at Mars

Have you ever wondered just how long it takes to bring a new medicine to market? For those of us working in the pharmaceutical industry, the answer is clear: it can take decades and cost billions. As a computer scientist leading the AI team at AstraZeneca, I've seen firsthand just how complicated the drug discovery process is.

Trust me, there’s a lot of work that goes into developing a drug. But the real challenge lies in making the process more efficient and accessible, which is where generative AI comes in.

In this article, I’ll walk you through the critical role that AI plays in accelerating drug discovery, particularly in areas like medical imaging, predicting patient outcomes, and creating synthetic data to address data scarcity. While AI has incredible potential, it’s not without its challenges. From building trust with pathologists to navigating regulatory requirements, there’s a lot to consider.

So, let’s dive into how AI is reshaping the way we approach drug discovery and development – and why it matters to the future of healthcare.

Generative AI’s role in enhancing medical imaging

One of the most powerful applications of generative AI in drug development is its ability to analyze medical images – a process that’s essential for diagnosing diseases like cancer, which can be difficult to detect early on.

In the world of pathology, we’re increasingly moving away from using traditional microscopes, and using digital images instead. With digitized tissue biopsies, we now have access to incredibly detailed scans that show every single cell. The sheer volume of this data – sometimes containing over 100,000 x 100,000 pixels – makes it almost impossible for human pathologists to analyze every single detail, but AI can handle this level of complexity.

At AstraZeneca, we’ve been using generative AI to help analyze these vast datasets. One of the most exciting developments is in medical imaging, where AI models can quickly identify cancerous cells, segment different areas of tissue, and even predict patient outcomes.

For example, AI can help predict whether a patient will respond to a specific drug – information that’s invaluable for companies like ours, as we work to develop treatments that will provide real, tangible benefits for patients.

In my work, we leverage powerful AI techniques such as variational autoencoders and generative adversarial networks (GANs) to build these models. These AI techniques can help us learn from medical images and generate synthetic data that can be used to train AI models more effectively.

Computer Vision in Healthcare: Download the eBook today

Unlock the mystery of the innovative intersection of technology and medicine with our latest eBook, Computer Vision in Healthcare.

AI Accelerator InstituteMarisa Garanhel

How AI is transforming financial modeling & sales forecasting in enterprise tech

Supreeth Meka — Fri, 25 Apr 2025 07:35:00 GMT

AI is emerging as a key differentiator in enterprise finance. As traditional financial models struggle to keep up with the pace of change, enterprise tech organizations are turning to AI to unlock faster, more accurate, and insight-driven decision-making.

Drawing from my experience in sales planning and forecasting in the enterprise tech sector, I’ve seen firsthand how AI is reshaping how global enterprises forecast revenue, optimize GTM strategies, and manage P&L risk.

This article explores how AI is transforming financial modeling and sales forecasting (two pillars of enterprise strategy) and helping finance teams shift from reactive to proactive operations.

1. Why traditional forecasting falls short

There are three main reasons why traditional forecasting is falling short:

Lack of broader business context

Sales forecasters and financial modelers frequently lack visibility into wider organizational shifts such as changes in product strategy, marketing campaigns, or operational execution that affect demand and performance. This makes it difficult to fine-tune models for niche business dynamics or rapidly changing market conditions.

Inflexibility

They often have an inability to account for real-time changes in demand, market shifts, economic conditions, tariffs, or sales performance.

Human bias

Over-reliance on gut-feel projections leads to inaccurate financial planning.

In many enterprise settings, these limitations create friction between planning and execution across business functions, finance, sales, and marketing. Misaligned forecasts result in delayed strategic actions and misused resources, which are issues that AI is now well-positioned to solve.

2. What makes AI a game-changer for financial modeling

Cross-functional simulations tailored by domain experts

One of AI’s most transformative strengths lies in its ability to empower every function within the enterprise to personalize simulations using their domain-specific expertise. For example:

The pricing team can continuously adjust models based on real-time strategy updates.
The product team can simulate outcomes tied to roadmap changes or launch timing.
The marketing team can incorporate variable lead generation budgets or campaign performance assumptions.

Likewise, GTM leaders can simulate how scaling inside sales headcount could drive more transactional business and enhance margins. These deeply integrated, cross-functional simulations not only improve forecast precision but also drive strategic alignment and execution agility across the business.

Real-time forecast adjustments

Unlike static quarterly models, AI allows finance leaders to refresh forecasts dynamically, giving real-time visibility into revenue performance. This is particularly useful in fast-evolving segments like AI infrastructure, where product cycles and demand signals change rapidly.

How generative AI is revolutionizing drug discovery and development

Discover how generative AI is transforming drug discovery, medical imaging, and patient outcomes to accelerate advancements with AstraZeneca

AI Accelerator InstituteDr Nikolay Burlutskiy

3. Practical use cases in enterprise finance

AI-powered lead scoring & targeting

Inspired by the Lean Startup's 'Build-Measure-Learn' cycle, one effective AI use case is building a lean, predictive lead scoring model.

Organizations can:

Develop an initial AI model based on historical data to identify high-probability buyers.
Continuously refine lead targeting with real-time behavioral and market data.
Deploy a pilot program with a focused sales team to test and validate the model’s effectiveness.
Measure conversion rates, learn from outcomes, and iterate the scoring logic.

Smart bundling & pricing optimization

Following lead scoring, enterprises can create value by applying AI to product bundling and pricing strategies. This includes:

Building AI-driven recommendations for optimal hardware/software bundles based on customer profiles.
Integrating dynamic pricing capabilities that react to competitor behavior and market demand.
Running A/B pricing tests within specific customer segments to evaluate effectiveness.
Collecting feedback from sales teams to iteratively enhance pricing logic and usability.

Automated revenue forecasting

Another valuable use case involves enhancing revenue visibility and predictability. Organizations can:

Better predict conversion rates for large strategic deals and transactional segments, enabling more reliable revenue planning across deal sizes.
Forecast transactional business growth patterns tied to seasonal cycles, marketing triggers, or high-velocity sales channels.
Continuously refine revenue projections by integrating demand signals, channel performance, and seasonality.
Establish feedback loops between finance and GTM teams to adjust models based on real-world performance.

The great web rebuild: Infrastructure for the AI agent era

AI agents require rethinking trust, authentication, and security—see how Agent Passports and new protocols will redefine online interactions.

AI Accelerator InstituteSahar Mor

4. How AI enhances execution and GTM strategy

Smarter pipeline management

AI can streamline pipeline visibility and improve forecast reliability through:

Collaborative pipeline reviews with finance and sales using AI-generated risk scores and close probabilities.
Analysis of competitor dynamics and market share shifts at the product and geo level to understand how winning or losing specific deals affects strategic positioning.
Enhanced understanding of how pipeline outcomes impact both profitability and long-term growth trajectories.

Improved sales productivity

AI boosts front-line efficiency by guiding sales teams to focus efforts on the right product segments expected to experience a surge in demand (such as those driven by OS refresh cycles, compliance deadlines, or emerging industry triggers), enabling them to strategically capture growth opportunities.

AI also helps to prioritize accounts while providing accurate bundling suggestions based on buyer profiles and sales history to increase deal size and win rates.

Tighter finance-sales alignment

AI serves as a bridge between strategic planning and operational execution by:

Providing shared insights to drive collaboration between FP&A, GTM, and sales teams.
Enabling joint decision-making based on real-time financial and sales data.
Improving coordination between business units through unified performance metrics.
Reducing misalignment and strategic blind spots across planning cycles.

5. Key considerations for implementation

Data readiness: Clean, structured data is critical. Integrating CRM, ERP, and planning systems improves AI effectiveness.
Human oversight: AI augments, not replaces, finance leadership. Human intuition is still key for context and judgment.
Change management: Teams need training and adoption support to fully leverage AI’s potential.

Conclusion

AI is redefining how enterprise tech companies forecast, plan, and execute. From lead targeting to revenue modeling and cross-functional scenario planning, it brings precision, agility, and alignment to financial operations.

By breaking silos and enabling real-time collaboration across finance, GTM, and sales, AI turns forecasting into a growth engine. Companies that embed AI into their processes will be better positioned to anticipate market shifts, improve profitability, and lead with confidence.

*Disclaimer: The views expressed in this article are my own and do not reflect the official policy or position of any organization. *

The great web rebuild: Infrastructure for the AI agent era

Sahar Mor — Thu, 17 Apr 2025 11:38:35 GMT

It's December 2028. Sarah's AI agent encounters an unusual situation while booking her family's holiday trip to Japan. The multi-leg journey requires coordinating with three different airlines, two hotels, and a local tour operator.

As the agent begins negotiations, it presents its "agent passport"—a cryptographic attestation of its delegation rights and transaction history. The vendors' systems instantly verify the agent's authorization scope, spending limits, and exposed metadata like age and passport number.

Within seconds, the agent has established secure payment channels and begun orchestrating the complex booking sequence. When one airline's system flags the rapid sequence of international bookings as suspicious, the agent smoothly provides additional verification, demonstrating its legitimate delegation chain back to Sarah.

What would have triggered fraud alerts and CAPTCHA challenges in 2024 now flows seamlessly in an infrastructure built for autonomous AI agents.

—> The future, four years from now.

In my previous essay, we explored how websites and applications must evolve to accommodate AI agents. Now we turn to the deeper infrastructural shifts that make such agent interactions possible.

The systems we've relied on for decades: CAPTCHAs, credit card verification, review platforms, and authentication protocols, were all built with human actors in mind. As AI agents transition from experimental curiosities to fully operational assistants, the mechanisms underpinning the digital world for decades are beginning to crack under the pressure of automation.

The transition to an agent-first internet won't just streamline existing processes—it will unlock entirely new possibilities that were impractical in a human-centric web. Tasks that humans find too tedious or time-consuming become effortless through automation.

Instead of clicking 'Accept All' on cookie banners, agents can granularly optimize privacy preferences across thousands of sites. Rather than abandoning a cart due to complex shipping calculations, agents can simultaneously compare multiple courier services and customs implications.

Even seemingly simple tasks like comparing prices across multiple vendors, which humans typically limit to 2-3 sites, can be executed across hundreds of retailers in seconds. Perhaps most importantly, agents can maintain persistent relationships with services, continuously monitoring for price drops, policy changes, or relevant updates that humans would miss.

This shift from manual, limited interactions to automated, comprehensive engagement represents not just a change in speed, but a fundamental expansion of what's possible online.

Amid these sweeping changes, a new gold rush is emerging. Just as the shift to mobile created opportunities for companies like Uber and Instagram to reinvent existing services, the transition to agent-first infrastructure opens unprecedented possibilities for founders.

From building next-generation authentication systems and trust protocols to creating agent-mediated data marketplaces, entrepreneurs have a chance to establish the foundational layers of this new paradigm. In many ways, we're returning to the internet's early days, where core infrastructure is being reimagined from the ground up—this time for an autonomous, agent-driven future.

In this second post of the AI Agents series, we’ll focus on the foundational infrastructure changes that underlie the agent-first internet: new authentication mechanisms, trust systems, novel security challenges, and agent-to-agent protocols, setting the stage for the more commerce-oriented transformations we’ll explore in the following post.

This article was originally published here at AI Tidbits, where you can read more of Sahar's fascinating perspectives on AI-related topics.

Proving you're a human an agent

Remember when "proving you're not a robot" meant deciphering distorted text or selecting crosswalk images? Those mechanisms become obsolete in a world where legitimate automated actors are the norm rather than the exception.

Today’s CAPTCHAs, designed to block bots, have become increasingly complex due to advances in multimodal AI. Paradoxically, these mechanisms now hinder real humans while sophisticated bots often bypass them. As AI outpaces human problem-solving in these domains, CAPTCHAs risk becoming obsolete, reducing website conversions, and frustrating legitimate users.

The challenge shifts from proving humanity to verifying the agent has been legitimately delegated and authorized by a human user.

I recently failed a CAPTCHA three times before finally passing on the fourth attempt. Now picture an 80-year-old attempting to decipher increasingly convoluted challenges

Today’s rate-limiting mechanisms assume human-paced interactions, relying heavily on IP-based throttling to manage access. But in a world of AI agents, what constitutes "fair use" of digital services? In an agent-driven internet, automated browsing will become not just accepted but essential. Cloudflare, Akamai, and similar services will need to pivot from simplistic IP-based throttling to sophisticated agent-aware frameworks.

As businesses grapple with these challenges, a new solution is emerging—one that shifts the paradigm from blocking automated traffic to authenticating and managing it intelligently. Enter the Agent Passport.

Imagine a digital credential that encapsulates an agent's identity and permissions—cryptographically secured and universally recognized. Unlike simple API keys or OAuth tokens, these passports maintain a verifiable chain of trust from the agent back to its human principal. They carry rich metadata about permissions scope, spending limits, and authorized behaviors, allowing services to make nuanced decisions about agent access and capabilities.

By integrating Agent Passports, business websites like airlines can distinguish between legitimate, authorized agents and malicious actors. New metrics, such as agent reliability scores and behavioral analysis, could ensure fair access while mitigating abuse, balancing security with the need to allow agent-driven traffic.

Authentication mechanisms, such as signing up and signing in, must also evolve for an agent-first internet. Websites will need to determine not just an agent's identity but also its authorized scope—what data the agent is authorized to access (‘read’) and what actions it is permitted to execute (‘write’).

Google Login revolutionized online authentication by centralizing access with a single credential, reducing friction and enhancing security. Similarly, agent passports could create a universal standard for agent authentication, simplifying multi-platform access while maintaining robust authorization controls.

Companies like Auth0 and Okta could adapt by offering agent-specific identity frameworks, enabling seamless integration of these passports into their authentication platforms. Meanwhile, consumer companies like Google and Apple could extend their authentication and wallet services to seamlessly support agent-mediated interactions, bridging the gap between human and agent use cases.

A new protocol for Agent-to-Agent communication

In the early days of the web, protocols like HTTP emerged to standardize how browsers and servers communicated. In much the same way, the rise of agent-mediated interactions demands a new foundational layer: an Agent-to-Agent Communication Protocol (AACP). This protocol would formalize how consumer agents and business agents discover each other’s capabilities, authenticate identities, negotiate trust parameters, and exchange actionable data—all while ensuring both parties operate within well-defined boundaries.

Just as Sarah's travel agent from the intro paragraph seamlessly coordinated with multiple airlines and hotels, AACP enables complex multi-party interactions that would be tedious or impossible for humans to manage manually.

Much like HTTPS introduced encryption and certificates to authenticate servers and protect user data, AACP would implement cryptographic attestation for agents. Trusted third-party authorities, similar to today’s certificate authorities, would issue digital “agent certificates” confirming an agent’s legitimacy, delegation chain, and operational scope. This ensures that when a consumer’s travel-planning agent communicates with an airline’s booking agent, both sides can instantly verify authenticity and adherence to agreed-upon standards.

A potential implementation of the AACP protocol. A full example of booking an airline ticket can be found here.

Without such a protocol, a rogue agent might impersonate a trusted retailer to trick consumer agents into unauthorized transactions, or a malicious consumer agent could spoof credentials to overwhelm a merchant’s infrastructure. By mandating cryptographic proof, robust authentication handshakes, and behavior logs, AACP mitigates these threats before meaningful data or funds change hands.

The handshake phase in AACP would include mutual disclosure of the agents’ technical stacks—such as which LLM or language configuration they use—and their supported capabilities. Once established, the protocol would also govern “write-like operations” (e.g., initiating a payment or updating account details) by enforcing strict sign-offs with auditable cryptographic signatures. Every action would leave a verifiable trail of authorization that can be reviewed and validated after the fact.

Finally, AACP would incorporate locale and language negotiation at the protocol level. Although agents can translate and interpret content dynamically, specifying a preferred language or locale upfront helps streamline interactions. This new protocol weaves together trust, authentication, and contextual awareness, forging a resilient substrate on which the agent-first internet can reliably function.

Trust and reputation reimagined

When we navigate the internet, our judgment of a website's credibility hinges on a blend of visual and social cues. We look for secure HTTPS connections, professional design, and familiar branding to assure us that a site is trustworthy. No one wants to input their credit card information on a site that looks like it was built in the early 2000s. User reviews and star ratings on platforms like Trustpilot and G2 further influence our decisions, offering insights drawn from shared human experiences.

Perhaps no aspect of online commerce requires more fundamental reimagining than trust and reputation systems. In an agent-mediated economy, traditional cues for reliability fall short. AI agents can't interpret visual aesthetics or branding elements–they operate on data, protocols, and cryptographic proofs.

Trust mechanisms must pivot from human perception to machine-readable verifications. For instance, an agent might verify a seller's identity through cryptographic attestations and assess service quality via automated compliance records, ensuring decisions are based on objective, tamper-proof data. Traditional review platforms like Trustpilot and G2, built around subjective human experiences and star ratings, will also become increasingly obsolete.

The emerging alternative is a new trust infrastructure built on quantifiable, machine-readable metrics. Instead of relying on potentially AI-generated reviews, a problem that has already undermined traditional review systems, agents could assess services using benchmarks like delivery time reliability, system uptime, or refund processing speed—measurable metrics that ensure objective evaluations rather than subjective human reviews.

This could involve decentralized reputation networks where trust is established through cryptographically verified interaction histories and smart contract execution records. Such systems would offer objective assessments of service quality, enabling agents to make informed decisions without relying on potentially biased or manipulated human reviews.

Moreover, the feedback loop between consumers and businesses will evolve dramatically. Instead of sending generic emails requesting reviews—a method often resulting in low response rates—commerce websites can engage directly with your AI agent to collect timely feedback about specific topics like shipping or product quality.

They might offer incentives like future store credit to encourage participation. The human user could provide a brief impression, such as "The cordless vacuum cleaner works well, but the battery life is short." The agent then takes this input, contextualizes it with additional product data, and generates a comprehensive review that highlights key features and areas for improvement. This process not only saves time for the user but also provides businesses with richer, more actionable insights.

Trustpilot and G2 could pivot by introducing agent-oriented verification systems, such as machine-readable trust scores derived from operational metrics like service accuracy, delivery consistency, and customer support responsiveness, enabling agents to evaluate businesses programmatically.

Information sharing in the age of AI agents demands a fundamental reinvention of the current consent and data access model. Rather than blunt instruments like cookie banners and privacy policies, websites will implement structured data requirement protocols—machine-readable manifests that explicitly declare what information is needed and why.

This granular control would operate at multiple levels of specificity. For example, an agent could share your shirt size (L) with a retailer while withholding your exact measurements. It might grant 24-hour access to your travel dates, but permanent access to your seating preferences.

When a service requests location data, your agent could share your city for shipping purposes but withhold your exact address until purchase confirmation. These permissions wouldn't be just binary yes/no choices—they could include sophisticated rules like "share my phone number only during business hours" or "allow access to purchase history solely for personalization, not marketing."

Such granular controls, impossible to manage manually at scale, become feasible when delegated to AI agents operating under precise constraints.

AI agents would also act as sophisticated information gatekeepers, maintaining encrypted personal data vaults and negotiating data access in real time.

These mechanisms will fundamentally shift the balance of power in data-sharing dynamics. GDPR-like frameworks may evolve to include provisions for dynamic, agent-mediated consent, allowing for more granular data-sharing agreements tailored to specific tasks.

Websites might implement real-time negotiation protocols, where agents can evaluate and respond to data requests based on their principal's preferences, preserving privacy while optimizing functionality.

New attack vectors

The shift to agent-mediated interaction introduces novel security challenges. Agent impersonation and jailbreaking agents are two examples.

Jailbreaking AI agents poses significant risks, as manipulated agents could act outside their intended scope, leading to unintended purchases or other errors. Techniques like instruction-tuning poisoning or adversarial suffix manipulation could alter an agent’s behavior during critical tasks.

For example, adversarial instructions embedded in websites’ HTML might influence an agent’s purchasing logic, bypassing its human-defined constraints. Robust safeguards and continuous monitoring will be essential to prevent these vulnerabilities.

Agent impersonation adds a complex layer to cybersecurity challenges. Malicious actors could spoof an agent's credentials to access sensitive data or execute fraudulent transactions. Addressing this threat demands robust multi-layered verification protocols, such as cryptographic identity verification paired with continuous behavioral monitoring, to ensure authenticity and safeguard sensitive interactions.

Building the new web - opportunities for founders

The web’s agent-first future has no established playbook, and that’s exactly where founders thrive. Entirely new product categories are waiting to be defined: agent-to-agent compliance dashboards, cryptographic attestation services that replace outdated CAPTCHAs, and dynamic data-sharing frameworks that make “privacy by design” a reality.

Platforms that offer standardized “agent passports,” identity brokerages that verify delegation rights, agent-native payment gateways, and trust ecosystems driven by machine-readable performance metrics—each of these represents a greenfield opportunity to set the standards of tomorrow’s internet.

Startups anticipating these shifts can position themselves as foundational players in an agent-driven economy, opening new channels of value creation and establishing a competitive edge before the rest of the market catches up.

Some concrete areas include:

Trustpilot for agents - creating machine-readable trust metrics and reputation systems that help agents evaluate services and vendors
Okta for AI agents - building the identity and authentication layer that manages agent credentials, permissions, and delegation chains
OneTrust for agents - creating the new standard for privacy preference management, turning today's basic cookie banners into sophisticated data-sharing frameworks where agents can negotiate and manage granular permissions across thousands of services
Cloudflare for agent traffic - developing intelligent rate-limiting and traffic management systems designed for agent-scale operations
LastPass for agent permissions - building secure vaults that manage agent credentials and access rights across services
AWS CloudFront for agent data - creating CDN-like infrastructure optimized for agent-readable formats and rapid agent-to-agent communication
McAfee security for agents - developing security platforms that protect against agent impersonation and novel attack vectors

Go build.

AIOps in action: AI & automation transforming IT operations

Aleksandr Karavanin — Mon, 14 Apr 2025 10:19:18 GMT

The advancement of digital frameworks has created new hurdles for business IT operations. A company’s network, cloud infrastructure, and streams of data need to be monitored and secured to meet performance and availability requirements, which directly cuts into productivity.

These demands are nearly impossible to cope with under traditional workflows due to outdated approaches relying on reactive monitoring and manual debugging.

The use of artificial intelligence for IT operations (AIOps) has become a breakthrough with regard to IT operation streamlining and business growth.

AIOps applies predictive IT maintenance, proactive incident detection, and scalable automation through AI and machine learning, thus bolstering IT operations. Optimized management of resources, minimal downtime, and efficient IT service management (ITSM) transform AIOps into a framework that is crucial for modern-day enterprises.

Understanding AIOps and its role in IT operations

AIOps refers to the application of AI and machine learning technologies to IT operations. It enhances decision-making and automation by analyzing vast amounts of data from numerous sources, such as logs, metrics, and network traffic. Key capabilities of AIOps include:

Data ingestion and correlation: Aggregating IT data from multiple sources.
Anomaly detection: Identifying irregular patterns that indicate potential operational issues (such as misconfigurations), potential failures or security threats.
Root cause analysis: Automatically diagnosing issues to pinpoint the source of disruptions.
Automated remediation: Implementing fixes without human intervention, reducing mean time to resolution (MTTR).

AIOps not only enhances IT operations with advanced analytics and automation but also represents a paradigm shift in how IT teams manage infrastructure and incidents. Unlike traditional IT operations, which rely on reactive monitoring and manual intervention, AIOps enable proactive action by continuously analyzing data to predict and prevent failures before they impact performance.

Top trends: big data

Alongside artificial intelligence, machine learning, and other similar technologies, big data is paving the way for what is called the Fourth Industrial Revolution, or ‘Industry 4.0’.

AI Accelerator InstituteMarisa Garanhel

Traditional IT operations rely on reactive monitoring, where teams respond to alarm notifications only after a problem has already caused system downtime. This approach not only prolongs downtime but also drives up operational costs. Furthermore, the reliance on human interaction introduces additional inefficiencies and increases the risk of incorrect results, ultimately hindering IT teams' ability to deliver seamless service

Furthermore, AIOps enables one to be proactive by constant data analysis to foresee and prevent failures. So, by implementing AI into IT procedures, organizations are able to optimize infrastructure management, enhance security, and automate the remediation of incidents.

Real-world use cases of AIOps in predictive maintenance and incident response

A. Predictive maintenance with AIOps

One of the primary advantages of AIOps is its ability to perform predictive maintenance. By using AI-driven analytics, organizations can detect system anomalies before they escalate into failures. This is how AIOps Enables Predictive Maintenance:

Pattern recognition: Machine learning models can be trained to recognize the expected behavior of a system, analyzing performance data to identify trends and patterns. By doing so, these models can predict potential failures or misconfigurations before they occur, enabling proactive maintenance and minimizing downtime.
Proactive interventions: Upon detection of potential issues, automated runbooks can be triggered to swiftly address the problem, minimizing downtime and ensuring business continuity. In cases where human intervention is unavoidable, IT teams can proactively schedule maintenance during planned downtime or off-peak hours, preventing system issues from impacting end users and reducing the risk of service disruptions.

Moreover, predictive maintenance offers a range of key benefits that help organizations optimize operations and reduce costs:

Reduced downtime: Proactively addressing issues prevents costly outages.
Operational efficiency: Automating maintenance reduces the workload on IT teams.

In order to illustrate the impact of predictive maintenance in action, let’s look at a case study where AIOps played a crucial role in preventing server failures.

One of the best examples of AIOps in action is Netflix's Simian Army, a set of tools employed to make its streaming service reliable.

Among its ranks is Chaos Monkey, which randomly kills instances in Netflix's cloud infrastructure to test the system's ability to survive failure. This is done in advance so that Netflix can detect and fix problems before they impact users, making the system more robust and minimizing downtime.

Free Artificial Intelligence Membership - Become an Insider

Join thousands of other artificial intelligence professionals and test drive you AIAI membership without spending a dime.

AI Accelerator InstituteNasi Rwigema

B. AIOps in incident response and resolution

Having observed how AIOps can actively avoid system failure through predictive maintenance, it is also essential to appreciate its contribution towards improving incident response and resolution.

While AIOps aid in anticipating and avoiding failures, they also assist organizations by automating the identification and resolution of unforeseen incidents, minimizing disruption, and enabling quicker recovery. This leads naturally into the discussion about how AIOps aids in incident response.

AIOps enhances incident response by using automated anomaly detection and resolution processes. Through continuous system monitoring, AI can detect ongoing threats in real-time, unauthorized login attempts or performance anomalies, to ensure problems are detected in a timely manner.

Furthermore, AIOps enables IT Service Management tools to automate the response process. It generates tickets, allocates tasks, and even applies resolutions automatically, all without human intervention, reducing the time and effort required to resolve incidents and preventing operations from becoming derailed.

It also applies to ITSM functions like root cause analysis and issue tracking, where diagnostics are accelerated by AI to enable quicker response to high-priority issues. Moreover, AIOps' integrated platform with helpdesk software guarantees proper case management and seamless team coordination, increasing overall IT service efficiency and reducing resolution times.

A notable example of AIOps in action is a case study of a major multinational financial services organization. The organization implemented Moogsoft's AIOps platform to automate incident management processes. By automating event correlation and noise reduction, the bank decreased MTTD by 35% and MTTR by 43%. These decreases led to greater operational efficiency and a more responsive IT environment.

Building a scalable AIOps architecture

AIOps significantly enhance incident response through the ability to automatically detect and remediate, thus allowing organizations to respond instantly to issues and maintain business continuity.

To effectively leverage AIOps, however, there is a need to create a scalable architecture that will be able to manage increasing data sizes and still be effective as the IT infrastructure grows.

This leads to the essential elements that are the building blocks of an AIOps solution, enabling faster detection of incidents, accurate predictions, and seamless automation of IT operations. These are the key components of AIOps-Driven IT Infrastructure:

Data ingestion layer: Collecting logs, metrics, and event data from diverse IT sources.
AI and ML models: Analyzing patterns, detecting anomalies, and making predictions.
Automation and orchestration: Executing remediation actions and optimizing workflows.

After defining the correct AIOps architecture, it is important to implement it in a manner that provides the best benefits. Best practices for effective deployment and sustained accomplishment are best adopted by organizations as per business objectives and IT processes. The best practices embed AI-driven operations natively and maximize their impact:

Selecting the right AIOps tools: Choose platforms that align with business objectives.
Ensuring seamless integration: AIOps should work with existing IT workflows and monitoring solutions.
Building a feedback loop: Continuously refine AI models to enhance accuracy and effectiveness.

By following these best practices, organizations can maximize the potential of AIOps and enhance their IT operations. However, as organizations scale their AIOps solutions, they must also confront certain challenges that can hinder their growth and effectiveness. Addressing these challenges is crucial to maintaining the value of AIOps as the IT environment continues to evolve.

Two key challenges include handling vast amounts of data, where AI models must process extensive datasets efficiently, and overcoming resistance to automation, as IT teams may need training to trust AI-driven operations. Therefore, addressing these challenges is essential for achieving successful and scalable AIOps deployment.

Gold-copy data & AI in the trade lifecycle process

Use AI to streamline the trade lifecycle, reduce manual breaks, and sync data across systems for faster, more accurate investment decisions.

AI Accelerator InstituteParth Prafulbhai Sonara

The future of AIOps in IT operations

In the future, as AIOps continues to advance, various emerging trends are defining its role in IT operations. One significant trend is the creation of AI-driven self-healing systems, where system recovery mechanisms will be automated in order to self-correct faults without human involvement. This technological advancement will revolutionize operational efficiency by allowing systems to correct issues in advance.

Apart from this, the integration with edge computing will seek to enhance AIOps' capability to manage distributed IT environments better. With more devices and sources of data being executed at the edge, AIOps must scale and accommodate such decentralized networks.

Moreover, Cloud-native AIOps solutions are gaining popularity with greater flexibility and scalability for hybrid and multi-cloud environments. These advances will allow firms to deploy AIOps in increasingly complex IT landscapes.

In addition to these advancements, data security and privacy concerns are coming to the fore ever more, with AIOps becoming mature.

This can be achieved through stronger encryption and compliance features, whereby sensitive information is effectively safeguarded. Besides, as decision making is becoming more and more reliant on AI models, building transparent AI models is essential to ensure trust.

Through the application of explainable AI (XAI) techniques, organizations will be able to offer greater transparency regarding how decisions are made by AI systems, ensuring stakeholders that AI is used ethically and responsibly.

By embracing these new trends and addressing data privacy concerns, AIOps can lead the way to the future of IT operations, making them autonomous, secure, and efficient units.

Conclusion

To sum up, AIOps is revolutionizing IT operations by enabling predictive maintenance, proactive incident management, and automated scalability. Organizations are able to maximize efficiency, reduce downtime, and simplify IT service management by leveraging the capabilities of AI and machine learning.

As AI and automation technologies continue to evolve, AIOps is set to become the key to orchestrating complex IT infrastructures. Further, organizations that adopt AIOps will gain a competitive edge by optimizing their operations and providing users with seamless digital experiences. Finally, in the future, AIOps will no longer be seen just as an assisting tool but will emerge as the backbone of intelligent IT management, driving both innovation and business excellence in the digital era.

Have you checked our free Insider plan?

Access exclusive talks, templates, and more for free.

Check out the plan below:

Free Artificial Intelligence Membership - Become an Insider

Join thousands of other artificial intelligence professionals and test drive you AIAI membership without spending a dime.

AI Accelerator InstituteNasi Rwigema

The truth about enterprise AI agents (and how to get value from them)

Ryan Priem — Fri, 11 Apr 2025 16:19:37 GMT

This article comes from Ryan Priem’s talk at our Washington, D.C. 2025 Generative AI Summit. Check out his full presentation and the wealth of OnDemand resources waiting for you.

What’s the point of AI if it doesn’t actually make your workday easier?

That’s the question I keep coming back to – and the one that ultimately brought me into the generative AI space.

I’m Ryan Priem, and I lead sales for Glean here in the East. After more than two decades in tech, working in data and analytics at places like Snowflake and EMC, I saw something shift. Large language models weren’t just impressive – they were starting to offer real, measurable value.

But there’s a catch: value doesn’t come from the model alone. It comes from how well you apply it.

That’s what drew me to Glean. We’re focused on using AI to solve actual workplace problems. Whether it’s helping someone find the right document, answer a critical question, or automate a tedious task, we’re building AI that works the way people do.

This article is a walk-through of what that journey looks like and what it really takes to build useful, scalable agents that people actually want to use.

Let’s dive in.

What work AI systems actually do (and why they matter now)

We classify ourselves as a "work AI" company. What that means is we’re focused on three core use cases:

Find something. Think of enterprise search – Google-like capabilities across your entire data corpus. We’ve built 120+ native connectors that index everything from Slack and Teams to Confluence, Salesforce, and SharePoint.
Answer something. This is where generative AI kicks in. It’s about providing accurate, relevant answers from within your organization’s ecosystem – like what Microsoft Copilot does, but across all your apps.
Do something. This is the really exciting part: task automation. Whether it’s preparing for a meeting, writing follow-up notes, creating a social media post, or resolving a support ticket – these are the everyday things that slow people down. We help you automate them.

The key to all of this is reducing friction. If you can find the right doc in seconds, get the right answer immediately, and offload repetitive tasks to an agent, you can spend more time doing the high-impact work that actually moves the business forward.

AI agents: Automation and intelligent assistance (2025 guide)

AI agents are intelligent software entities designed to operate autonomously and achieve specific goals.

AI Accelerator InstituteMuhammad Rushad Arab

How to 8‑bit quantize large models using bits and bytes

Robert McMenemy — Wed, 09 Apr 2025 10:44:16 GMT

Deep learning is consistently changing so many fields, from NLP (natural language processing) to computer vision.

However, as these models continue to grow in size and complexity, the demands on the hardware required for memory and compute continue to skyrocket. In light of this, there are promising strategies to overcome these challenges, one of which is quantization. This lowers the precision of numbers used in the model without a noticeable loss in performance.

In this article, I will dive into the theoretical processes underlying this strategy and show the practical implementation of 8‑bit quantization within a large parameter model, in this case, we will be using the IBM Granite model and BitsAndBytes for quantization.

Introduction

The quick growth of deep learning has resulted in an arms race of models boasting billions of parameters, which, in most cases, achieve stellar performance but require enormous computational resources.

As engineers and researchers look for methods to make these large models more efficient, quantization has shown to be an incredibly effective solution. By lowering the bit width of number representations from 32‑bit floating point to x‑bit integers, quantization decreases the overall model size, speeds up inference, and cuts energy consumption, all while keeping a high accuracy in the output.

I will explore the concepts and techniques behind 8‑bit quantization in this article. I will explain the approach's benefits, outline the theory behind it, and walk you through the process step by step.

I will then show you a practical application: quantizing the IBM Granite model using BitsAndBytes.

Understanding quantization

At its core, quantization is the process of mapping input values from a quite large set (usually continuous and high-precision) to a much smaller and more discrete set, which has lower precision. Deep learning typically involves converting 32‑bit floating‑point numbers to x‑bit integer alternatives.

The result is a massive reduction in memory usage and computation time.

Benefits of quantization

Lower memory footprint: Lower precision means that each parameter requires much less memory.
Increased speed: Integer math is generally much faster than floating‑point operations (FlOps), especially on hardware optimized for low‑bit computations.
Energy efficiency: Lower precision computations consume far less power, making them ideal for mobile and edge devices.

Types of quantization

Uniform quantization: This method maps a range of floating‑point values uniformly to integer values.
Non‑uniform quantization: Uses a more complicated mapping based on the distribution of the weights or activations of the network.
Symmetric vs. asymmetric quantization:
- Symmetric: Uses the same scale and zero‑point for positive and negative values.
- Asymmetric: Allows different scales and zero‑points, which is useful for distributions that are not centered around zero.

AI assistants: Only as smart as your knowledge base

AI assistants need real-time, seamless connections to your company’s databases, documents, and internal communication tools to realize their full potential.

AI Accelerator InstituteMarisa Garanhel

Why 8‑bit quantization?

8‑bit quantization is when each weight or activation in the model is fully represented with 8 bits, thus offering us 256 discrete values.

This approach helps maintain compression and precision by enabling:

Memory savings: Lowering the uint from 32 bits to 8 bits per parameter can cut the memory footprint by up to 75%.
Speed gains: Many hardware accelerators and CPUs are fully optimized for 8‑bit arithmetic, which massively improves inference times.
Minimal accuracy loss: With careful calibration and potentially fine‑tuning, the degradation in performance with 8-bit quantization is often minimal.
Deployment on edge devices: The reduced model size and faster computations make 8‑bit quantized models perfect for devices with limited computational resources.

Theoretical underpinnings of quantization

Quantization is thoroughly rooted in signal processing and numerical analysis. The objective here is to reduce precision whilst also controlling the quantization error, the difference between the original value and its quantized version.

Quantization error

Scale and zero‑point

A linear mapping is normally used to perform quantization:

Scale (S): Sets the step size between our quantized values.
Zero‑point (Z): The integer value assigned to the real number zero.

The process normally involves a calibration phase to determine the optimal scale and zero‑point values. This is then followed by the actual quantization of weights and activations.

Quantization Aware Training (QAT) vs. Post‑Training Quantization (PTQ)

Quantization Aware Training (QAT): This integrates a simulated quantization into the training process, allowing the model to adapt its weights to quantization noise.
Post‑Training Quantization (PTQ): Applies quantization to a pre‑trained model using calibration data. PTQ is simpler and faster to implement but it may incur a slightly larger accuracy drop compared to QAT.

Steps in 8‑bit quantization

Applying 8‑bit quantization includes some essential steps:

Preprocessing and calibration

Step 1: Investigate the Model's Dynamic Range

Before quantization, we need to know the weights and activation ranges:
Collect Statistics: Pass a part of the dataset through the model to collect statistics (min, max, mean, standard deviation) for all the layers.
Establish Ranges: Based on these statistics, create quantization ranges, possibly clipping outliers to create a tighter range.

Step 2: Calibration

Calibration is the process of selecting the best scale and zero-point for each tensor or layer:

Min/Max Calibration: Uses the minimum and maximum that were observed.
Percentile Calibration: Uses some percentile (e.g., 99.9th percentile) to avoid outliers. Calibration must be correct since poor decisions will result in significant loss of accuracy.

Quantization Aware Training vs. Post‑Training Quantization

Quantization Aware Training (QAT):

Advantages: Greater precision as the model learns how to compensate for quantization distortion.
Cons: Involves modifying the training procedure and extra computation.

Post‑Training Quantization (PTQ):

Advantages: It's much easier to implement because the model is already pre-trained.
Disadvantages: It can sometimes result in a greater reduction in accuracy, specifically in precision-based models.

For most big models, a small loss of accuracy from PTQ is fine, while mission-critical applications can use QAT.

LLM economics: How to avoid costly pitfalls

Avoid costly LLM pitfalls: Learn how token pricing, scaling costs, and strategic prompt engineering impact AI expenses—and how to save.

AI Accelerator InstituteMarisa Garanhel

8-bit quantization applied

No matter which deep learning environment—PyTorch, TensorFlow, or ONNX—the concepts of 8‑bit quantization remain the same.

Practical considerations

Before implementing quantization, consider the following:

Hardware support

Ensure that the target hardware (CPUs, GPUs, or special accelerators like TPUs) natively supports 8‑bit operations.

Libraries

PyTorch: Gives us built-in support for QAT and PTQ through its designated quantization module.
TensorFlow Lite: Offers us utilities to transform models to an 8‑bit quantized format, especially for embedded and mobile applications.
ONNX Runtime: Supports quantized models for use across different platforms.

Model Structure: Not all the layers in the model are created equal when quantized.

Convolutional and fully connected layers will generally be fine, but some activation and normalization layers may need further special treatment.

Fine-Tuning: Fine-tuning the quantized model on a small calibration dataset can help restore any performance loss due to quantization noise.

BitsAndBytes: A specialized library for 8‑bit quantization

BitsAndBytes is an independent library that helps us further streamline the 8‑bit quantization process for very large models. Frameworks like PyTorch offer us native quantization support. However, BitsAndBytes provides additional optimizations designed to convert 32‑bit floating point weights into 8‑bit integers.

With a simple config flag (e.g., load_in_8bit=True), it enables significant reductions in memory usage and speeds up inference without requiring massive code modifications.

Model structure: Not all layers are equally amenable to quantization. Convolutional and fully connected layers usually perform well under quantization, but some of the activation and normalization layers may need special treatment.
Fine‑tuning: Fine‑tuning the quantized model on a small calibration dataset can help us recover any performance loss due to quantization noise.

Integrating BitsAndBytes with your workflow

For seamless integration, BitsAndBytes can be used alongside other popular frameworks like PyTorch. When you pre-configure your model with BitsAndBytes, you simply have to specify the quantization configuration during model loading.

This tells the system to automatically convert the weights from 32‑bit integers to 8‑bit integers on the fly thus reducing the overall memory footprint by up to 75% and enhancing inference speed, which is ideal for deployment in resource-constrained environments.

For example, by setting up your model with:

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

you can achieve a quick switch to 8‑bit precision. This approach not only optimizes memory usage but also maintains high performance, making it a valuable addition to your deep learning workflow.

Case study: Quantizing IBM Granite with 8‑bit using BitsAndBytes

IBM Granite is a 2‑billion parameter model designed for instruction‑following tasks. Due to its enormous size, it is possible to quantize IBM Granite to 8‑bit to reduce its memory footprint significantly with good performance.

IBM Granite quantization: Example code

The following is the code segment for configuring IBM Granite with 8‑bit quantization:

# Setup IBM Granite model using 8-bit quantization.
model_name = "ibm-granite/granite-3.1-2b-instruct"
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=quantization_config,
device_map="balanced", # Adjust as needed based on available GPU memory.
torch_dtype=torch.float16
)
tokeniser = AutoTokeniser.from_pretrained(model_name)

Code breakdown

Model Selection:

The model_name variable sets up the IBM Granite model to be used for instruction execution.

Quantization Setup:

BitsAndBytesConfig(load_in_8bit=True) activates 8‑bit quantization. It is a flag that informs the model loader to quantize 32‑bit floating point to 8‑bit integer.

Model loading:

AutoModelForCausalLM.from_pretrained() loads the model using the specified configuration. The parameter device_map="balanced" helps distribute the model across available GPUs, and torch_dtype=torch.float16 ensures that any remaining computation uses half‑precision.

Tokenizer initialization:

The tokenizer is instantiated with AutoTokeniser.from_pretrained(model_name) and guarantees the input text undergoes correct preprocessing for the quantized model.
This method not only lowers the memory usage of the model by as much as 75%, it also increases inference speed, making it particularly suitable for deployment in memory-limited settings, such as edge devices.

Gold-copy data & AI in the trade lifecycle process

Use AI to streamline the trade lifecycle, reduce manual breaks, and sync data across systems for faster, more accurate investment decisions.

AI Accelerator InstituteParth Prafulbhai Sonara

Barriers and best practices

Even though 8-bit quantization is highly advantageous, it also has some challenges:

Challenges

Accuracy degradation

Some models can suffer from a loss of accuracy after quantization due to quantization noise.

Calibration difficulty

It is important to determine appropriate calibration data and techniques and may be difficult, especially for models with a broad dynamic range.

Hardware constraints

Ensure that your target deployment platform fully supports 8‑bit operation, or performance will be disappointing.

Best practices full calibration

Use a representative data set to accurately calibrate the model's weights and activations.

Layer-by-layer analysis

Determine which layers are sensitive to quantization and evaluate the necessity to retain them at a higher precision.

Progressive evaluation

Quantization is not a one-shot fix. Repeat your strategy in turn experimenting with different calibration techniques and potentially mixing PTQ with QAT.

Use framework tools

Utilize the high-level quantization utilities integrated into frameworks such as PyTorch and TensorFlow, as these utilities are always being improved and updated.

Fine‑tuning

If possible, optimize the quantized model on a subset of your data to recover any performance loss due to quantization.

Conclusion

Quantization and 8‑bit quantization are powerful techniques for reducing the memory footprint and accelerating the inference of large models. By converting 32‑bit floating‑point values to 8‑bit integers, you can achieve significant memory savings and speedups with minimal accuracy loss.

In the current article, we discussed the theoretical foundations of quantization and expounded on the steps involved in preprocessing, calibration, and choosing between quantization-aware training and post-training quantization.

We then gave practical examples using popular frameworks, finishing with a case study involving the quantization of the IBM Granite model using BitsAndBytes.

As models in deep learning increase in size, mastering techniques like 8‑bit quantization will be needed to deploy efficient state‑of‑the‑art systems: right from the data center down to edge devices.

Regardless of whether you're an AI researcher or a deployment engineer, understanding how to make large models optimized is a needed skill in today's AI landscape.

The application of 8-bit quantization through tools such as BitsAndBytes allows the reduction of the computational and memory overhead of big models, such as IBM Granite, to be achieved for more scalable, efficient, and energy-consumption-friendly deployment in diverse applications and hardware platforms.

Happy quantizing, and may every bit and byte count in your models become leaner, faster, and more efficient!

Connect with like-minded AI professionals and enthusiasts at our in-person events across the globe.

Check out where we'll be this year, and join us to discuss emerging topics with some of the world's leading AI minds.

AI Accelerator Institute | Summit calendar

Unite with applied AI’s builders & execs. Join Generative AI Summit, Agentic AI Summit, LLMOps Summit & Chief AI Officer Summit in a city near you.

.style-65cdffdcd48a30cb7da3635f-logo- { position: relative; display: block; &:hover .WrapperHandleClick { display: block; } display: flex;position: relative;width: 136px;height: 100%;margin-right: 0px;float: left;background-size: contain;background-repeat: no-repeat;background-position-y: center;background-image: url('https://assetsacara.com/production/organizations/62876e3d645e9fcb6e40225e/1722353907815-AIAIFULLLOGOSECONDARYONWHITE.webp');margin-left: 30px; @media (max-width: 76.8em) { margin-right: 2rem;width: 120px;margin-left: 10px;background-image: url('https://assetsacara.com/production/organizations/62876e3d645e9fcb6e40225e/1722353907815-AIAIFULLLOGOSECONDARYONWHITE.webp'); } @media (max-width: 37.5em) { background-image: url('https://assetsacara.com/production/organizations/62876e3d645e9fcb6e40225e/1722353907815-AIAIFULLLOGOSECONDARYONWHITE.webp');display: block;flex-direction: row-reverse;width: 5.5rem;min-width: 120px; } }

Building scalable image data pipelines for AI training

Heng Shi — Mon, 07 Apr 2025 09:53:11 GMT

Artificial intelligence forms the heart of the digital revolution in the advent of the twenty-first century. Handling big data through fine-grained data pipelines is crucial for perfect AI training, and such a requirement is felt more strongly in computer vision applications.

AI models, mainly deep learning models, need large volumes of labeled image data for efficient training and reasoning. A well-designed, scalable image processing pipeline ensures that AI systems are appropriately trained with quality-prepared data to ensure accuracy by minimizing errors in model training and optimizing their performance.

This article discusses essential components and necessary strategies for implementing efficient and scalable image data pipelines for the training of AI models.

Scalable image data pipelines: A need

Image-based AI applications have been infamous for being extremely data-hungry. Be it image classification, object detection, or facial recognition, all of these models require millions of images to learn from. The images have to be preprocessed before training: resized, normalized, and often augmented. As data starts to scale up, such operations become increasingly complex, and one needs a strong and flexible pipeline that could handle a variety of tasks like:

Ingestion of Data: Ingest a large volume of image data coming from different sources very fast.
Preprocessing of Data: Raw image data is transformed into forms that are usable in the training of models, including resizing, cropping, and augmentation.
Storage of Data: Preprocessed data should be stored in a manner such that during training, it can be accessed fast.
Scalability: The system should scale up with larger and larger data without a drop in performance.
Automation and monitoring: Automate the repetitive tasks, while at the same time keeping track of what happens in the pipeline to maintain it at peak efficiency level, therefore capturing potential problems before they emerge.

Mastering data clustering: Your guide to K-means & K-means++

K-means clustering is an unsupervised machine learning algorithm used for clustering or grouping similar data points together in a dataset.

AI Accelerator InstituteNishan Jain

Key components of scalable image data pipelines

1. Data Ingestion

Data ingestion refers to the initial step in an image data pipeline that deals with source image data collection, coming from a variety of sources—public image repositories, company databases, or web scrapings. Since the size of the image dataset spans from thousands to millions of files, efficient mechanisms for their ingesting need to be designed.

Best practices for data ingestion:

Batch processing: Ingest large datasets in batches for smooth handling of high volumes.
Streaming data ingestion: Streaming data should be directly fed into the pipeline from cameras or IoT devices in certain real-time applications to avoid latency and ensure freshness.
Data versioning: Versioning of datasets allows tracking changes and ensures the integrity of the training datasets.

After ingestion, the raw images will undergo preprocessing. This will involve several steps, such as resizing images to uniform dimensions, normalizing pixel values, converting image formats, and augmenting data by rotation, flipping, or color modification. That is an effective way of synthetically increasing the size of a dataset, to enhance model robustness.

2. Efficient data pre-processing:

Parallel processing: If the images are preprocessed in parallel across multiple nodes, this greatly reduces the time to prepare large datasets.
Use of GPUs: Image preprocessing—especially augmentation—is greatly helped by the parallelism afforded by GPUs.
Pipeline automation: Automatic preprocessing pipelines with either TensorFlow's tf.data or PyTorch's DataLoader simplify the process.

3. Data storage and management

This calls for a storage approach that will allow for swift retrieval while training, offer scalability, and be inexpensive.

Popular large-scale image data pipelines use distributed storage systems, such as Amazon S3 or Google Cloud Storage. These provide high availability and scalability while allowing one to store huge datasets without being puzzled by complicated infrastructure at your side.

Key considerations for image data storage:

Object storage: Employ an object storage system like Amazon S3, which can handle unstructured data and store images in large amounts.
Data caching: For repeatedly accessed images, a caching mechanism could be developed to minimize retrieval times, especially during model training.
Data compression: Compression of image files reduces storage costs and time taken in transferring the images without losing quality.

4. Distributed processing and scalability

Among the major considerations in building an image data pipeline, scalability is paramount since datasets keep increasing. This can be supported with distributed processing frameworks like Apache Spark or Dask that allow the processing of huge data in parallel across several machines, ensuring scalability and reduction of processing times.

Scaling strategies for image data pipelines:

Horizontal scaling: By adding nodes, the load can be scaled across a number of servers. This is quite advantageous in datasets of large-scale images.
Serverless architecture: Leverage serverless compute, such as AWS Lambda or Google Cloud Functions, to perform common image data processing tasks without concerns about the management of an underlying server.

AI assistants: Only as smart as your knowledge base

AI assistants need real-time, seamless connections to your company’s databases, documents, and internal communication tools to realize their full potential.

AI Accelerator InstituteMarisa Garanhel

5. Model training and data access

Once the image data is ingested, processed, and stored, it is ready to train. Training requires efficient mechanisms for data access and must be able to scale up to large-scale distributed training on multiple machines or GPUs.

Major machine learning platforms like TensorFlow, PyTorch, and Apache MXNet support distributed training, allowing models to leverage huge datasets without bottlenecks.

Optimizing data access toward training:

Prefetching: Use data prefetching whereby batches of images are loaded into memory while the model is still operating on the previous batch to reduce I/O wait times as much as possible.
Shuffling and batching: Shuffling prevents overfitting, and batching allows models to train on subsets of data, gaining efficiency.
Integration with distributed storage: Ensure your training environment is tightly integrated with the distributed storage system. This cuts down latency and ensures quick access to training data.

6. Monitoring, automation, and maintenance

The pipeline would be continuously monitored to ensure that, by means of automated tasks in charge of recurrent processes such as data ingestion, preprocessing, and error checking, everything happens efficiently.

Monitoring tools such as Prometheus or Grafana can keep track of performance metrics while alerting mechanisms signal issues such as failing processes or resource bottlenecks.

Best practices for monitoring and maintenance:

Automate tasks: Use Apache Airflow and Kubeflow Pipelines as scheduling tools.
Error detection and retries: Identify error conditions in data processing jobs and build retry logic.
Log collection and alerts: Leverage logging frameworks and alerting systems to monitor the health of pipelines.

Best practices for scalable image data pipelines

Leverage cloud-native solutions: The use of cloud-native solutions provides much-needed flexibility, scalability, and optimization of costs. AWS S3, Google Cloud Storage, and Azure Blob Storage make it easy to manage big image datasets.
Data governance: Provide versioning, labeling, and access controls over the datasets for security coherence.
Optimize for cost: Image data pipelines are costly in large-scale systems. Use storage tiers—hot and cold storage—to manage data costs optimally.
Automate and test regularly: Regular testing of the pipeline on the integrity of data and preprocessing ensures predictable performance. This helps catch potential problems before they cause issues in model training.

Conclusion

Designing and sustaining scalable image data processing pipelines for AI training involves careful planning of each step—from ingestion and preprocessing to storage, scalability, and monitoring. Distributed processing, cloud-native utilities, and automation create efficient and agile pipelines that cope with growing volumes of data, laying a solid foundation for robust, high-performing AI models.

Agent-Responsive Design: Rethinking the web for an agentic future

Sahar Mor — Fri, 04 Apr 2025 06:39:09 GMT

It's November 2028. Maya's personal AI agent quietly handles her holiday shopping, easily navigating dozens of e-commerce sites. Unlike the clunky chatbots of 2024, her agent seamlessly parses product specifications, compares prices, and makes purchase decisions based on her preferences.

"The boots for your sister," it explains, "are from that sustainable brand you both discussed last month - I found them at 20% off and confirmed they'll arrive before your family gathering." What would have taken Maya hours of manual searching now happens automatically, thanks to a web rebuilt for agent-first interaction.

—> The future, three years from now.

As we approach the end of 2024, a new paradigm shift is emerging in how we build and interact with the internet. With rapid advances in AI reasoning capabilities, tech giants and innovative startups alike are racing to define the next evolution of digital interaction: AI agents, .

Google, Apple, OpenAI, and Anthropic have all declared AI agents as their primary focus for 2025. This transformation promises to be as significant as the web and mobile revolutions were and represents perhaps the most natural interface for LLM-powered technology, far more intuitive and capable than the chatbots that preceded it.

In the recent No Priors Podcast, Nvidia’s CEO Jensen Huang stated that "there's no question we're gonna have AI employees of all kinds” that would "augment every single job in the company”.

Moreover, Gartner projects that by 2028, 33% of enterprise software applications will include agentic AI, up from less than 1% today, enabling 15% of day-to-day work decisions to be made autonomously. This rapid adoption mirrors the mobile revolution of the early 2010s but with potentially more far-reaching implications for how we interact with digital services.

AI agents: Automation and intelligent assistance (2025 guide)

AI agents are intelligent software entities designed to operate autonomously and achieve specific goals.

AI Accelerator InstituteMuhammad Rushad Arab

What sets AI agents apart?

While there's ongoing debate about what an AI Agent is, at its core, what sets agents apart from traditional software is their ability to autonomously plan and adapt.

Unlike rule-based systems that follow predetermined paths, agents can formulate strategies, execute them, and—most importantly—adjust their approach based on outcomes and changing circumstances. Think of them as digital assistants that don't just follow a script, but actually reason about the best way to achieve your goals.

If a planned action fails or yields unexpected results, an agent can reassess and chart a new course, much like a human would. This flexibility and autonomous decision-making capability marks a departure from traditional software, which can only respond in pre-programmed ways.

The use of tools

Central to agents' capabilities is their sophisticated use of tools. Much like a handyman who knows when to use a screwdriver versus a hammer, agents must determine which tools to use, when to use them, and how to use them effectively.

For instance, when helping you plan a trip, an agent might first use a calendar tool to check your availability, then a flight search API to find options, and finally a weather service to ensure you pack appropriately. The key isn't just having access to these tools — it's the agent's ability to reason about their use and orchestrate them intelligently to accomplish complex tasks.

This article was originally published here at AI Tidbits, where you can read more of Sahar's fascinating perspectives on AI-related topics.

From mobile-first to agent-first

Remember when 'www' stood for something closer to 'Wild Wild West' than 'World Wide Web'? The early 2000s internet was an untamed digital frontier, where users navigated through a maze of pop-ups, fought off malware, and relied on bookmarked URLs just to find their way around.

The early 2010s, when mobile exploded, weren’t that different as businesses scrambled to make their websites mobile-responsive. That shift wasn't just about resizing content for smaller screens–it fundamentally changed how we approached web design, user experience, and digital strategy. It created a whole new field of website and mobile optimization: choosing the best colors and text copy to increase traffic, conversion rates, and stickiness.

The agentic AI inflection point

Today, we stand at a similar inflection point with AI agents.

Just as mobile-responsive design emerged from the need to serve smartphone users better, "agent-responsive design" is emerging as websites adapt to serve AI agents. But unlike the mobile revolution, which was about accommodating human users on different devices, the agent revolution requires us to rethink our fundamental assumptions about who – or what – is consuming our digital content.

In this agent-first era, websites will undergo a dramatic transformation. Gone are the days of flashy advertisements, elaborate typography, and resource-heavy images — elements that consume bandwidth but provide little value to AI agents.

Instead, we're moving toward streamlined, efficient interfaces that prioritize function over form. These new websites will feature minimalist designs optimized for machine parsing, structured data layers that enable rapid information extraction, standardized interaction patterns that reduce processing overhead, and resource-efficient components that minimize token usage and computation costs.

This evolution extends beyond traditional websites. Mobile applications are already being reimagined with agent-interaction layers, as evidenced by recent novel methods like Apple's Ferret-UI 2 and CAMPHOR, enabling seamless agent navigation of mobile interfaces while maintaining human usability.

Google and Microsoft also invest in this space, as demonstrated in their recent papers AndroidWorld and WindowsAgentArena, respectively. Both are fully functional environments for developers to build and test agents.

The incentives are becoming clear: optimize for agents, and you'll unlock new channels of engagement and commerce. Ignore them, and you risk becoming invisible in the emerging agent-first internet.

What is Agent Responsive Design?

At its core, agent-responsive design represents a radical departure from traditional web design principles. Instead of optimizing for human visual perception and engagement, websites must provide clear, structured interfaces that agents can efficiently navigate and interact with.

This transformation will likely unfold in two phases:

Phase 1: Hybrid optimization

Initially, websites will maintain dual interfaces: one optimized for human users and a "shadow" version optimized for agents. This agent-optimized version will feature:

Enhanced semantic markup with clear structure and purpose
Unobfuscated HTML that welcomes rather than blocks automated interaction
Well-defined aria-label labels and metadata to help agents choose and interact with the right UI components
Direct access to knowledge bases and documentation by exposing information beyond what’s visible on the “website interface”, giving the querying agents access to their RAG to easily retrieve information such as refund policy or answer questions the agent has based on their help docs. Also, after being authenticated, providing easy access to user-related information such as last purchases or stored payment methods.
Streamlined authentication and authorization protocols

Phase 2: API-first architecture

The second phase will move beyond traditional UI components, focusing on exposing clean, well-documented APIs that agents can directly interact with. Consumer websites like Amazon, TurboTax, and Chase will:

Provide clear documentation of available tools and capabilities. The agent will leverage its reasoning engine and the task the human delegated to plan the tools and sequence that it needs to use.
Offer structured workflows with explicit input/output specifications
Enable direct access to business logic and user data
Support sophisticated authentication mechanisms for agent-based interactions

AI agents will make traditional A/B testing obsolete

In an agent-first world, the traditional approach to A/B testing becomes obsolete. Instead of testing different button colors or copy variations for human users, companies like Amazon will need to optimize for agent interaction efficiency and task completion rates.

These A/B tests will target similar metrics as today: purchases, sign-ups, etc., employing LLMs to generate and test thousands of agent personas without the need for lengthy user testing cycles.

This new paradigm of testing will require new success metrics such as:

Model compatibility across different AI providers (GPT, Claude, etc.) - each language model has its own nuances. Optiziming can help businesses squeeze a few more percentage points for conversion, bounce rate, etc.
Task completion rate for the human-delegated task at hand, like purchasing a product or subscribing to a newsletter
Token efficiency and latency optimization, enabling lightning-fast interactions while minimizing computational overhead and associated costs
Authentication and security protocol effectiveness, ensuring robust protection while maintaining frictionless agent operations

The competitive landscape in this new era will be shaped significantly by model providers' unique advantages. Companies like OpenAI and Google, with their vast user interaction data, will possess an inherent edge in creating agents that deeply understand user preferences and behaviors. However, this also creates an opportunity for innovation in the form of universal memory and context layers, like what mem0 is pitching with their recently released Chrome extension—systems that can bridge different models, devices, and platforms to create a cohesive user experience.

Drawing from Sierra's τ-bench research, we can anticipate the emergence of standardized benchmarks for measuring agent-readiness across verticals and task types, similar to how we currently measure mobile responsiveness or page load times.

New discovery protocol - Agent Engine Optimization (AEO)

Just as websites evolved from manually curated directories to sophisticated search engine optimization, the agent era demands a new discovery mechanism. The question isn't just about findability—it's about actionability: how do agents identify and interact with the most relevant and capable digital services?

In 2005, Google introduced the Sitemap protocol to improve search engine crawling efficiency, enable discovery of hidden content, and provide webmasters with a standardized method for communicating site structure and content updates to search engines. What is the Sitemap equivalent for AI agents?

Just as SEO emerged to help websites become discoverable in search engines with Google’s inaugural PageRank algorithm, Agent Engine Optimization (AEO) will become crucial for visibility in an agent-first web. Back in Aug 2023, I called it Language Model Ranking Optimization.

This new protocol will go beyond traditional sitemaps, providing agents with structured information about websites:

Available services and capabilities like signing up, placing an order, booking a flight seat
Authentication requirements - what actions require authentication
Data schemas and API endpoints - what data does each action/endpoint need? What is mandatory vs. optional?
Privacy and security protocols - how information is being stored
Service level agreements like refund and shipping guidelines and data retention policy

Exposing such information will become a standard feature in website builders like Shopify and Wix, much like mobile responsiveness is today. These platforms will automatically generate and maintain agent-interaction layers, democratizing access to the agent-first economy for businesses of all sizes.

Companies will need to optimize not just for search engines but for an emerging ecosystem of agent directories and registries that help autonomous agents discover and interact with digital services.

AI literacy: Essential for today's workforce and businesses

Sarah Abdelhady — Wed, 02 Apr 2025 09:39:05 GMT

Bringing generative AI into a business or even public sector organization is consuming the minds of leaders around the world, some approaching the topic with a positive outlook and some with caution.

According to BCG, a mere 10% of organisations have managed to successfully integrate gen AI into their workflows at scale, gaining a significant advantage over their competitors who are in danger of falling behind in this rapidly evolving landscape.

But what does AI literacy mean?

AI literacy is having the knowledge and practical understanding of AI, its use cases, power, and limitations. This could include prompting mastery, the ability to identify when and how to use it, to critically evaluate its outputs, and being able to adapt in a workplace empowered with AI.

AI literacy is a key skill set the whole workforce must master and employers must endorse.

Why AI literacy matters

Future-proofing your business

I briefly mentioned competitiveness above, AI driven companies have a strong competitive advantage in their market, according to HBR.

Those AI-first companies are also way ahead of the market in scaling AI predictive solutions and, hence, future-proofing their business. AI literacy enables this innovation across the whole workforce.

More importantly, the future workforce is already relying on AI in their education, and it is critical that businesses and public sector organisations remain top of mind for the job market of the present and the future.

Your guide to agentic AI

Agentic AI refers to artificial intelligence systems that act autonomously, make decisions, set goals, and adapt to their environment with minimal human intervention.

AI Accelerator InstituteMarisa Garanhel

Boosting productivity across all teams

In a study done by Harvard Business School, AI can help your employees do work 25% faster and 40% better.

AI can help employees across multiple departments across marketing, sales, human resources, customer service, and even frontline operations boost their productivity by helping automate tasks and provide business intelligence, data analysis, and more.

Boosting AI literacy would liberate the workforce from admin tasks, allowing more time for strategic and deep work.

Empower your decision-makers with insights

AI literacy is key in ensuring your workforce is mastering the art of prompting to ensure they are attaining the right insights they need to make more informed decisions.

AI tools can help reduce research time by up to 70-80%, which enables fast action by your teams when needed. More importantly, AI literacy is critical to ensuring your teams are able to evaluate AI outputs and insights, as well as understand potential biases or limitations of AI, especially in sensitive industries like legal and healthcare.

Enhance collaboration

AI features are now being offered by most productivity vendors we use and love, introducing the technology in meeting solutions, chat, and emails.

Avail these solutions, and then train and empower your workforce to make the most out of the improved collaboration benefits that AI brings to the workplace. AI literacy across the organization can help bridge gaps between the different teams, for example, technical and non-technical roles, as well as international teams working across languages.

Ensuring responsible use of this emerging technology

It is critical that beyond learning how to maximize the benefits of this technology, your workforce is aware of the ethical implications of AI, evaluating and selecting the most secure vendor, and training your teams on important topics like bias, responsibility, and fairness in AI. This helps your organization avoid the risks associated with its increased usage.

How to develop AI literacy

Accessible resources and continuous learning

The development of AI literacy begins with access to appropriate tools and resources. Fortunately, knowledge of AI does not require an advanced technical background.

Excellent starting points include online courses and books such as Prediction Machines: The Simple Economics of Artificial Intelligence by Ajay Agrawal, Joshua Gans, and Avi Goldfarb, which provide approachable explanations of AI's business applications and implications.

Engaging with community forums like Reddit’s r/MachineLearning or attending free webinars by institutions such as Stanford University is an often overlooked way to gain some practical insights. These resources cater to varying levels of familiarity, making AI literacy accessible to everyone.

AI is an ever-evolving field. Staying updated is essential. You need to learn continuously to adapt effectively. On the industry scale, this approach builds a workforce that is proactive, well-informed, and resilient in the face of change.

Free Artificial Intelligence Membership - Become an Insider

Join thousands of other artificial intelligence professionals and test drive you AIAI membership without spending a dime.

AI Accelerator InstituteNasi Rwigema

Hands-on experience

AI learning can be accelerated by applying it practically. When you work with AI, you gain an understanding of the ways in which it can be applied and of the ways in which it cannot yet be applied or might not ever be applied.

Hands-on experience with AI leads to a more profound comprehension of the technology and its potential. Staying with the technology, in whatever form it currently takes, is vital to maintaining a necessary foundation.

We can develop our expertise in AI by competing in contests, working in interactive learning spaces, and by all sorts of meaningful event organizing that serves to introduce ASU to the kinds of things we're up to with AI. How is this happening?

Work is happening at the level of individual students who are experimenting with the kinds of platforms that integrate AI into productivity tasks such as data visualization or customer support.

Direct engagement with AI technologies and nurturing a culture of continuous learning around these technologies enable individuals and organizations to gain a much clearer understanding of the capabilities and limitations of AI. And this is not just preparation for today; it is quite literally preparation for the future.

Q&A with Fiddler AI: Observability, security & making the world a better place

Jordan Dunne — Tue, 01 Apr 2025 15:29:23 GMT

Fiddler AI, a pioneer in AI observability and security for LLM and MLOps, has the mission of making the world a better place.

Following his session at Generative AI Summit Washington, D.C., I spoke with Nick Nolan, Solutions Engineering Manager at Fiddler AI, to dive into how they're seeking to achieve this.

Q: Fiddler AI emphasizes building trust into AI systems. Can you elaborate on how your AI Observability platform achieves this, particularly concerning explainability and transparency?

Fiddler is the pioneer in AI observability and security for LLM and MLOps. Fiddler’s mission is to make the world a better place, as many of the decisions consumers make every day are influenced by AI.

As AI continues to be deeply integrated into society, Fiddler addresses on two key areas:

1) Helping enterprise AI teams deliver responsible AI applications.

2) Ensuring that people interacting with AI receive fair, safe, and trustworthy responses.

As AI advances, particularly in generative AI, more policies will be introduced to enforce regulations around governance, risk, and compliance. These regulations will help enterprises strengthen oversight of their AI systems while protecting consumers from harmful, toxic, or biased outcomes.

We continue our mission to lead the way in helping enterprises deploy and use AI responsibly, ensuring trustworthy AI, and safeguarding consumers from harmful and unsafe outcomes. We support enterprises at every stage of their AI journey in establishing long-term responsible AI practices.From accelerating LLMOps and MLOps, driving business value, and mitigating risks to building customer satisfaction, we help establish responsible AI practices across all products as companies mature in their AI journey. Fiddler does so by supporting three key areas: monitoring, security and governance. We monitor a comprehensive set of LLM and ML metrics (performance, accuracy, drift, PII leakage, hallucination, safety, custom metrics), and if the metrics monitored go below a specific threshold, teams are alerted for immediate action using explainability and diagnostics for root cause analysis. Fiddler provides audit evidence and audit trail for regulatory and compliance reporting.

Q: With the rapid adoption of LLMs, what unique challenges do enterprises face in monitoring these models, and how does Fiddler's LLM Observability address these
issues?

When GenAI emerged, companies rapidly began prompt engineering, fine-tuning, testing, and evaluating LLMs. This gave rise to numerous GenAI vendors specializing in pre-production testing and evaluation. However, LLM testing remains stuck in pre-production, neglecting the most challenging part: monitoring LLMs in real-world environments. Without oversight in production, enterprises risk hallucinations, safety concerns, and privacy breaches, harming users and the company. These risks limit their ability to explore AI innovation, keeping LLMs confined to testing environments.

The Fiddler Trust Service and its Guardrails solution help enterprises alleviate this issue. Fiddler Guardrails is the industry’s fastest guardrails with <100 ms response time, cost-effective, and can be securely deployed in the customers’ environment, VPC or air-gapped.

Customers use Fiddler to monitor LLM and ML metrics and link these metrics directly to their business KPIs, gaining valuable insights into how LLM and ML deployments impact business outcomes. This contributes to the ROI of positive business outcomes in several ways.

1. Fiddler enables the delivery of high-performance AI by allowing companies to launch and update models faster, leading to improved decision-making and revenue growth.

2. It reduces operational costs by accelerating the launch of AI applications, improving efficiency, identifying model issues, and minimizing downtime. The time savings for data science and ML engineers, who can monitor and debug models in minutes instead of weeks, boosts productivity.

3. Fiddler ensures responsible AI governance by minimizing reputational risks, reducing bias, and improving customer satisfaction, leading to higher Net Promoter Scores (NPS).

Q: The concept of Responsible AI is central to Fiddler's mission. What strategies and tools do you provide to help organizations identify and mitigate biases in their AI models?

The Fiddler AI Observability and Security platform addresses bias through a comprehensive approach. Our platform continuously monitors models for bias and fairness issues across protected attributes, providing real-time alerts when models exhibit biased behavior. The platform's explainability and model diagnostics capabilities deliver deep insights into model decision-making, enabling teams to perform detailed root cause analysis that uncovers the underlying factors contributing to bias. When issues are identified, Fiddler provides actionable insights for model and application improvement, allowing teams to implement targeted fixes rather than starting from scratch.

Beyond detection and mitigation, Fiddler supports responsible AI governance by automatically generating audit evidence for both internal reviews and external regulatory requirements. This documentation captures model behavior, fairness metrics, and mitigation efforts, helping organizations demonstrate compliance with governance, risk management, and compliance (GRC) standards. By integrating continuous monitoring, explainable AI, and robust governance capabilities, Fiddler transforms responsible AI from a theoretical goal into an operational reality.

Integral Ad Science (IAS), a leader in Ad Tech, partnered with Fiddler to enhance the scope and speed of their digital ad measurement products while ensuring
compliance with AI regulations.

By using Fiddler’s AI Observability platform, IAS reduced monitoring costs, established rigorous oversight of their ML models, and improved collaboration with a unified view of model metrics. This enabled faster product launches, ensured compliance with regulatory standards, and provided audit evidence to stakeholders, all while promoting responsible AI practices through monitoring, explainability, and governance.

Full case study here.

Q: In the context of AI governance and compliance, how does Fiddler assist enterprises in meeting regulatory standards and ensuring ethical AI practices?

Fiddler helps enterprises navigate emerging regulations like the EU AI Act and AI Bill of Rights through three key capabilities.

First, we provide continuous monitoring of critical compliance metrics—tracking hallucinations, safety, and privacy in LLMs, while monitoring performance, accuracy, and drift in ML models.

Second, we automatically generate comprehensive audit evidence required for regulatory reviews and internal governance, crucial for enterprises’ compliance strategies.

Third, our platform enables ethical AI practices through specialized fairness monitoring that tracks intersectional bias across protected attributes. By integrating monitoring, documentation, and risk mitigation capabilities, Fiddler transforms compliance from a burden into a competitive advantage, helping companies innovate and build AI applications responsibly.

Q: As AI technologies evolve, how does Fiddler stay ahead in providing solutions that cater to emerging challenges in AI observability and security?

Fiddler stays ahead of emerging AI challenges through a combination of dedicated research, customer-driven innovation, and adaptive platform development. Our data science team leads AI research initiatives using research-based methodologies specifically designed to address real customer challenges in generative AI.

A prime example is our development of the Fiddler Trust Service, powered by proprietary Fiddler Trust Models. These fine-tuned, task-specific models form the foundation of our LLM solution, enabling LLM scoring, monitoring, and implementing effective guardrails. We've optimized the Fiddler Trust Models to deliver industry-leading guardrails with response times under 100 milliseconds, making it significantly faster than competing solutions.

Our approach prioritizes enterprise requirements, ensuring our solutions are not only powerful but also practical. The Fiddler Trust Models are designed to be cost-effective, minimizing the need for extensive inference or GPU resources that drive up operational costs. We've also built our platform with deployment flexibility in mind, allowing secure implementation within customers' environments, including cloud or VPC deployments.

What distinguishes our research approach is its direct connection to customer needs. Rather than pursuing research for its own sake, we focus on solving specific challenges our enterprise customers face in deploying and managing AI systems. This customer-centric innovation model ensures our platform continuously evolves to address the most pressing real-world observability and security challenges in AI.

Q: Collaboration is key in the tech industry. Can you discuss any notable partnerships Fiddler has formed to enhance its platform's capabilities?

Fiddler has established strategic partnerships with leading technology providers to create a comprehensive ecosystem that enhances our AI observability and security capabilities and extends our market reach.

Our collaboration with AWS includes deep integration with both Amazon SageMaker AI and Amazon Bedrock. The SageMaker AI integration enables customers to access Fiddler's enterprise-grade AI observability directly within their existing MLOps workflows, eliminating additional security hurdles and accelerating the deployment of models into production. Our Amazon Bedrock integration extends these capabilities to LLM applications, allowing organizations to monitor and safeguard generative AI deployments with the same level of rigor.

We've also established a powerful partnership with NVIDIA that combines their NIM (NVIDIA Inference Microservices) and NeMo Guardrails with Fiddler's observability and security platform. As a native integration to NVIDIA's NeMo Guardrails orchestration platform, we provide the industry's fastest guardrails with response times under 100 ms. This cost-effective and secure solution enables enterprises to deploy LLM applications at scale while effectively moderating conversations for hallucinations, safety violations, and jailbreak attempts. The integration allows prompts and responses processed by NIM to be automatically routed to Fiddler for guardrails and monitoring, providing comprehensive operational oversight.

Our partnership with Google Cloud has made the Fiddler AI Observability Platform available on Google Cloud Marketplace, providing seamless integration with Google Vertex AI and other Google Cloud AI offerings. This collaboration simplifies procurement through consolidated billing and allows customers to use pre-approved budgets for greater agility.

Additionally, we've partnered with Domino Data Lab to address the needs of government agencies deploying high-stakes AI systems. This collaboration enables agencies to develop models at scale with Domino while using Fiddler to validate model transparency, interpretability, and trustworthiness in production environments. These strategic partnerships and amongst others reflect our commitment to providing AI observability and security that integrates seamlessly with the broader AI ecosystem, helping organizations deploy responsible, accurate, and safe AI.

Q: How does Fiddler's platform integrate with existing MLOps workflows, and what benefits does this integration bring to enterprise clients?

The Fiddler AI Observability and Security platform is designed with flexibility at its core, seamlessly integrating into customers' existing tech infrastructure for both LLM and MLOps. Our fundamental goal is enabling enterprises to confidently ship LLM and ML applications into production where they can generate real business value.

For traditional MLOps workflows, Fiddler connects natively with ML platforms including Amazon SageMaker AI, Vertex AI, DataRobot, Domino, and homegrown ML systems. These integrations allow data science teams to monitor model performance, data drift, and fairness metrics without disrupting their established development environments. Our platform also plugs directly into enterprise data infrastructure like Google BigQuery, Databricks, Snowflake, Azure Data Lake, and Amazon S3, enabling efficient data flow between systems.

For LLM applications, Fiddler provides specialized integrations with generative AI frameworks and platforms. We integrate with gateway frameworks like NVIDIA NIM, Portkey, and custom solutions to monitor prompt-response interactions. Our connections to Gen AI platforms including Amazon Bedrock, NVIDIA NeMo, and Together.ai enable comprehensive LLM monitoring for hallucination rates, safety violations, and response quality.

What sets Fiddler apart is our adaptability to customers' unique environments. Rather than forcing organizations to change their workflows or technology stacks, our solution is designed to be flexible for seamless implementation and accelerates time-to-value, allowing enterprises to move AI projects from experimentation to production more quickly and with greater confidence.

This flexible integration strategy delivers significant benefits:

1. Reduced barriers to production deployment by providing the monitoring, protection, and governance capabilities needed to satisfy stakeholders.

2. Accelerated value realization from AI investments by enabling faster, more confident production releases.

3. Unified visibility across heterogeneous AI environments spanning both traditional ML and LLMs.

4. Enhanced collaboration between security, AI application engineers, data science teams, ML engineers, and business stakeholders.

5. Lower total cost of ownership by eliminating the need for separate monitoring solutions for different model types.

By adapting to customers' environments rather than forcing them to adapt to us, Fiddler helps enterprises transform AI from promising experiments into production systems that deliver measurable business impact.

Q: With the increasing focus on data privacy, how does Fiddler ensure that its AI observability tools align with data protection regulations?

Fiddler's approach to data privacy and protection is built on a comprehensive security framework designed specifically for enterprise AI deployments. We've implemented robust measures to ensure our observability platform aligns with stringent data protection regulations while providing the insights organizations need.

Our multi-layered security architecture begins with sophisticated access controls. We implement Role-Based Access Controls (RBAC) that provide granular authorizations, ensuring that only authorized personnel can access specific data and functionality. This capability is particularly important for organizations with complex teams and governance requirements.

For data protection, we employ industry-leading encryption standards. All communications with and within customer clusters use HTTPS/TLS 1.2 (or higher), while data-at-rest is secured with AES-256 key encryption. We maintain daily backups of encrypted customer data and implement secure deletion procedures at the end of retention periods to prevent unauthorized access to historical information.

Our commitment to compliance is demonstrated through SOC2 certification and HIPAA compliance, validating our security practices against recognized industry standards. Security is integrated throughout our software development lifecycle, with rigorous controls including security design reviews, threat modeling, application security scans, container image scanning, and network monitoring.

Importantly, Fiddler's platform architecture respects data sovereignty principles – customer data remains within their premises, never leaving their controlled environment. This architecture is crucial for organizations operating under regulations like GDPR that have strict requirements about data movement and storage.

For AI-specific challenges, our platform provides specialized capabilities to detect potential privacy issues in model behavior. We generate comprehensive metrics after model registration to help identify concept drift, data leakage, and other anomalies that might indicate privacy vulnerabilities. These monitoring capabilities allow organizations to detect and respond to issues before they result in regulatory violations.

By combining these technical safeguards with our privacy-by-design philosophy, Fiddler ensures that organizations can implement robust AI observability and security while maintaining full compliance with data protection regulations.

Q: Looking ahead, what are Fiddler AI's strategic priorities for the next five years in advancing AI observability and fostering responsible AI adoption?

Fiddler's five-year strategy centers on expanding our AI observability leadership while enabling enterprises to deploy responsible AI at scale. Building on our 2024 launch of the Fiddler Trust Service — the industry's first comprehensive solution for LLM scoring that enables LLM monitoring and powers the industry’s fastest, cost-effective guardrails with <100 ms response time — we're focused on addressing increasingly complex AI ecosystems.

We're developing enhanced capabilities for agentic AI systems through advanced tracing features like Traces and Spans, providing granular insights for complex application architectures. Our roadmap emphasizes customer-driven innovation, with continued research into optimizing AI systems for performance, cost-effectiveness, and security to ensure all stakeholders experience responsible, equitable, and accurate AI.

We're also investing in making responsible AI more accessible through educational resources and simplified implementation frameworks. This democratization effort will help organizations at all AI maturity levels implement effective governance without impeding innovation. Throughout these initiatives, we'll maintain our focus on helping customers extract measurable business value from their AI investments, transforming promising experiments into mission-critical AI systems that deliver tangible results while upholding the highest ethical standards.

Our ultimate goal remains enabling enterprises to confidently deploy AI that generates real business value while ensuring that everyone—enterprises, employees, and end-users alike—benefits from AI that is responsible, equitable, and safe.

Zero trust and AI: The next evolution in cybersecurity strategy

Nazy Fouladirad — Mon, 31 Mar 2025 09:24:59 GMT

Traditional approaches to cybersecurity have always been to defend the digital perimeter surrounding internal networks. However, with the popularity of remote work and cloud computing technologies, conventional security strategies are no longer as effective at protecting organizations.

Zero trust has now become the go-to security approach. Its guiding concepts are built around the mindset of "never trust, always verify." Each user, access device, and network connection is strictly evaluated and monitored regardless of where they originate from.

Artificial intelligence (AI) has become an addition to zero trust security architecture. With the ability to analyze large volumes of information and apply complex processes to automate security functions, AI has helped how modern businesses approach their security planning.

Understanding zero trust in modern organizations

Digital environments have changed the cybersecurity paradigm in many different ways, as businesses have moved toward highly connected infrastructures.. Zero trust security models assume every network connection within the organization is a potential threat and requires various strategies to address them effectively.

Zero trust models work on several core principles that include:

Providing minimum access privileges: Employees should only be given access to information and systems that are absolutely essential for the job function they perform. This limits unauthorized access at all times, and in the event a security breach does occur, the damage is contained to a minimum.
Creation of isolated network areas: Rather than having a single company network, organizations should segment their systems and databases into smaller, isolated networks. This limits an attacker's access to only a part of the system in the event of a successful perimeter breach.
Constant verification: All users and devices are checked and rechecked frequentlyTrust is never assumed, and all activity is closely monitored regardless of who is gaining access or what they're doing.
Assumed breaches: With zero trust, potential breaches are always viewed as a possibility. Because of this, security strategies don't just focus on prevention, but also limiting the possible damage from a successful attack.

Identity-centric security has now become an essential element for building a strong cybersecurity posture and improved operational resilience. A big part of this process is safeguarding sensitive information and making sure that even if breaches do occur, it's less likely that it becomes compromised.

The role of AI in strengthening zero trust models

Bringing AI and zero trust together represents a major step forward for cybersecurity. AI's power to analyze large datasets, spot unusual network activity, and automate security responses makes the core principles of zero trust even stronger, allowing for a more flexible and resilient defense.

Improving identity and access management

With leveraging AI, managing various identities and provisioning system access within a zero trust environment can be improved. Machine learning models can scan user behaviors looking for anomalies indicative of compromised accounts or potentially dangerous network activity. Adaptive authentication protocols can then use these risk-based assessments to change various security validation parameters dynamically.

AI technology also helps automate authentication processes when validating user identities. They can help facilitate new user setups, streamlining IT processes while at the same time minimizing human error. This added efficiency reduces the strain and resource requirements of IT support teams and significantly reduces the possibility of accidentally giving out wrong access permissions.

Intelligent threat detection and response

Traditional security measures can overlook subtle, yet important indicators of malicious network activity. However, machine learning algorithms can aid in detecting these threats ahead of time, resulting in a far more proactive approach to threat response.

Autonomous threat hunting and incident resolution can reduce the time necessary to identify and contain breaches while mitigating any associated damage. With AI, network monitoring processes can be done automatically, allowing security personnel to act faster if and when the time comes.

AI can also provide organizations with predictive analytics that help to guard against possible attacks by anticipating them before they occur. By using threat intelligence gathered from external vendors, and at the same time, checking for system vulnerabilities, essential steps can be taken to tighten security defenses to avoid any weaknesses from being exploited.

Automating data security and governance processes

AI systems can help sensitive business information be protected in real time. As data is collected, it can be automatically classified into various categories. This dynamic classification allows AI systems to apply relevant security controls to certain datasets, helping to align with various compliance requirements while adhering to any of the organization's specific data management policies.

Another important security element for modern organizations is data loss prevention (DLP). AI-driven DLP solutions can be configured to automatically supervise the way users access and relocate information within a system. This helps to identify potential data manipulation and greatly minimizes the danger of unauthorized system access and data leakage.

Next-generation automation: running artificial intelligence at the edge

My name is Helenio Gilabert. In this article, I’m going to tell you all about how we run artificial intelligence at Edge Solutions. It’s such a fascinating topic, but it can be a little overwhelming.

AI Accelerator InstituteHelenio Gilabert

New security challenges and considerations

Though AI drastically improves the capabilities of traditional zero-trust models, it also can present additional security considerations that require organizations’ attention. Some of these include:

Data privacy and ethical concerns

When applying AI in zero trust settings, balancing security and personal privacy is critical. Organizations need to be certain that their methods of collecting and analyzing data are done within the scope of applicable privacy laws and ethical boundaries.

Bias in AI systems should be dealt with as well. Machine learning algorithms trained on outdated data are capable of producing inaccurate results that could lead to more passive security measures being put in place. Organizations need to ensure that any of their AI-driven systems have supporting policies in place to prevent these biased analyses from taking place.

Integration and implementation challenges

Integrating AI into a zero trust framework isn't always straightforward. Complications can surface - especially when it comes to system and network compatibility. Organizations need to ensure that their AI solutions can be seamlessly integrated into the existing tech stack and that there aren't any potential barriers that will impede data flow to and from critical systems.

Another operational challenge with AI-driven security systems is finding qualified talent to operate them. Companies will likely need to allocate dedicated resources for training and staff development to keep systems functioning effectively.

The importance of regular AI model training

AI solutions, especially those that use complex learning algorithms, aren't a “set-it-and-forget-it” implementation. With cyber threats constantly evolving, maintaining the effectiveness of AI-driven systems requires regular model training.

Without regular intervals of AI model retraining, these systems won't function accurately and efficiently over time. An AI model must be regularly reviewed and modified to avoid false positive alerts, broken automation, or inadequate threat mitigation protocols.

The future of cybersecurity

Integrating AI with zero trust architecture has changed how businesses can approach their cybersecurity initiatives. As cyberthreats become increasingly more sophisticated, then the need for increased automation and identity-centric security planning will only continue to grow.

With the proper implementation strategies in place, organizations can benefit from enhanced threat management, streamlined access management, and a more proactive approach to data protection.

Have you checked our 2025 events calendar?

We'll be all over the globe, so why not have a look and see if we're anywhere near you?

Join us and network with like-minded AI experts in your industry.

AI Accelerator Institute | Summit calendar

Unite with applied AI’s builders & execs. Join Generative AI Summit, Agentic AI Summit, LLMOps Summit & Chief AI Officer Summit in a city near you.

How to secure LLMs with the fastest guardrails for peak AI performance

Nick Nolan — Fri, 28 Mar 2025 16:00:46 GMT

This article comes from Nick Nolan’s talk at our Washington DC 2025 Generative AI Summit. Check out his full presentation and the wealth of OnDemand resources waiting for you.

What happens when a powerful AI model goes rogue? For organizations embracing AI, especially large language models (LLMs), this is a very real concern. As these technologies continue to grow and become central to business operations, the stakes are higher than ever – especially when it comes to securing and optimizing them.

I’m Nick Nolan, and as the Solutions Engineering Manager at Fiddler, I’ve had countless conversations with companies about the growing pains of adopting AI. While AI’s potential is undeniable – transforming industries and adding billions to the economy – it also introduces a new set of challenges for LLMs in production, particularly around security, performance, and control.

So in this article, I’ll walk you through some of the most pressing concerns organizations face when implementing AI and how securing LLMs with the right guardrails can make all the difference in ensuring they deliver value without compromising safety or quality.

Let’s dive in.

The growing role of AI and LLMs

We’re at an exciting moment in AI. Right now, research shows around 72% of large enterprises are using AI in some way, and it’s clear that generative AI is definitely on the rise – about 65% of companies are either using it or planning to.

On top of this, AI is also expected to add an enormous amount to the global economy – around $15.7 trillion by 2030, but let’s keep in mind that these numbers are just projections. We can only guess where this journey will take us, but there’s no denying that AI is changing the game.

But here’s the thing: while the excitement is real, so are the risks. The use of AI, particularly generative AI, comes with a unique set of challenges – especially when it comes to ensuring its security, quality and accuracy. This is where guardrails come into play.

If organizations do AI wrong, the cost of failure can be astronomical – not just financially, but also in terms of reputational damage and compliance issues.

LLM economics: How to avoid costly pitfalls

Avoid costly LLM pitfalls: Learn how token pricing, scaling costs, and strategic prompt engineering impact AI expenses—and how to save.

AI Accelerator InstituteMarisa Garanhel

AI assistants: Only as smart as your knowledge base

Marisa Garanhel — Fri, 28 Mar 2025 09:07:49 GMT

Artificial intelligence assistants are quickly becoming vital tools in modern workplaces, transforming how businesses operate by making everyday tasks simpler and faster.

But despite their widespread adoption and advanced capabilities, even the best AI assistants today face a significant limitation: they often lack access to a company's internal knowledge.

AI assistants need real-time, seamless connections to your company's databases, documents, and internal communication tools to realize their full potential. This integration ensures they're brilliant and contextually aware, making them genuinely valuable workplace assets.

The rise of AI assistants

AI assistants are smart applications that understand commands and use a conversational AI interface to conduct tasks. They’re often embedded into dedicated hardware and even incorporated with several systems.

Unlike chatbots, AI assistants are less limited in both intelligence and functionality. They have more agency and advanced abilities, like contextual understanding and personalization. From drafting emails to summarizing reports, these assistants are everywhere.

Some of the more popular AI assistants are:

ChatGPT from OpenAI
Gemini from Google
Claude from Anthropic
DeepSeek from High-Flyer

In business, these large language models (LLMs) can also help you with data analysis, task automation, workflow streamlining, and more. They can be mostly free if you don’t need to scale up, although some users might struggle with the free versions when it comes to tasks that involve uploading or downloading data.

However, even the more advanced AI assistants are missing something that makes them truly useful in your workplace: they don’t have access to your company’s knowledge and information. Without that, these assistants are simply guessing.

And that’s a problem.

LLM economics: How to avoid costly pitfalls

Avoid costly LLM pitfalls: Learn how token pricing, scaling costs, and strategic prompt engineering impact AI expenses—and how to save.

AI Accelerator InstituteMarisa Garanhel

The knowledge gap: Why AI assistants struggle

Picture this: you ask your AI assistant about a specific company policy you need to quote, a conversation that’s buried in Slack, or a past project you need vital information from. You’re likely to get a vague and generic answer or, even worse, something that’s completely irrelevant or downright wrong.

That’s because these AI assistants don’t have access to the right data – your data – and rely on public information instead. As they aren’t drawing from internal knowledge that sits behind your business, you’ll often find issues with their responses.

Wasted time searching for answers the AI should be able to provide.
Frustration when employees get irrelevant or outdated responses.
AI that feels like more of a novelty than a real workplace tool.

If an AI assistant is to work in a business environment, it needs more than intelligence. It needs context, otherwise it won’t be helpful for your employees.

The fix: Connecting AI assistants to your knowledge base

How do you tackle the information problem?

The answer is simple: the AI assistants have to be plugged into your company’s internal database. When they have real-time access to company documents, emails, Slack threads, and more, they can help you the way your business needs.

But how can AI assistants help your business by being connected to your company data?

When you connect an AI assistant to your institutional knowledge base with policies, documentation, manuals, and more, they’ll be able to provide you with accurate and contextual answers on a wider variety of topics.

This could change how employees share knowledge in the workplace, moving from a tedious process of manual document searching to a more conversational, self-service experience. Employees’ wait times and support costs will be reduced by simply asking assistants and getting instant replies.

A custom AI assistant lets you quality customers and offers personalized solutions by taking care of repetitive and time-consuming tasks. Your employees can then focus on improving products and strategic work.

This streamlined strategy leads to increased efficiency and productivity, which greatly reduces bottlenecks and improves output. And as AI assistants can also handle companies’ growing needs, they’ll adapt to increased workloads and offer long-term ROI and usability.

How Glean makes AI smarter

That’s where Glean comes in. Glean connects AI assistants directly to your company’s knowledge, turning them into real, reliable workplace tools. It’s designed to integrate AI capabilities into your company’s data for up-to-date and context-aware answers.

Here’s what that means in practice:

Real-time data synchronization

Glean's connectors support real-time synchronization, making sure that any updates in the source applications are immediately reflected. This means that your assistant will always work with the most current information, enhancing its responses' accuracy and timeliness.

Real-world applications and challenges in ML implementation

There are a variety of real-world applications of machine learning, including predictive analytics, computer vision, and more.

AI Accelerator InstituteVishnuprasanth Vijaya Kumar

Comprehensive data integration

An extensive data integration makes sure that your AI assistant can access a wide range of company data, which allows it to offer relevant and informed responses. Glean connects with over 100 enterprise applications like Box, Confluence, Dropbox, GitHub, Gmail, Google Drive, Jira, Microsoft Teams, OneDrive, Outlook, Salesforce, ServiceNow, SharePoint, Slack, and Zendesk.

Permissions-aware responses

Strictly enforcing the same permissions in your company’s data sources, Glean ensures that users only have access to the information they have permission to see. This keeps your data secure and in compliance with regulations while still delivering the relevant answers.

Personalized results and semantic understanding

Glean Assistant uses deep learning-based language models, meaning it understands natural language queries and can deliver intuitive interactions. Every personalized result takes into consideration ongoing projects, the user’s role, and collaborations for tailored information.

Universal knowledge access

As it combines external web information with your internal company data, Glean Assistant is ideal for researching internal projects and accessing publicly available insights in just one platform. The integration makes it much easier for a comprehensive understanding and informed decision-making.

AI-driven content generation and analysis

Glean Assistant can analyze structured and unstructured data simultaneously across your company’s applications, documents, and even the web. It offers assistance in supporting a smarter decision-making process by drafting deliverables and finding relevant insights.

A seamless integration with your company’s data ecosystem and advanced AI techniques allow for Glean Assistant to enhance your productivity.

The smarter way forward

AI assistants have the potential to transform the workplace significantly, but only if they have access to accurate and relevant internal information. Connecting them directly to internal knowledge allows companies to move from nice-to-have AI to must-have AI.

Glean makes that shift seamless, turning AI from a frustrating gimmick into a powerful, reliable assistant. This enhances productivity and empowers employees to achieve more meaningful outcomes.

Gold-copy data & AI in the trade lifecycle process

Parth Prafulbhai Sonara — Wed, 26 Mar 2025 15:29:19 GMT

The current end-to-end trade lifecycle is highly dependent on having accurate data at each stage. The goal of the Investment iook of records (IBOR) system is to ensure the trade, position, and cash data match the custodian and for the accounting book of records (ABOR) system for this same data set to match the fund accountant.

There are other stakeholders in the process, including broker systems, transfer agents, central clearing parties, etc, depending on the type and location of execution. A position that reflects identically across all systems is known as having been “straight-through processed”; in other words, systems have recognized the trade, and datasets are in line, or at least, within tolerance.

While efficient, the addressal and eventual resolution of non-STP executions remains highly manual. Stakeholders typically compare data points across multiple systems, beginning as upstream as possible, and gradually move down the lifecycle to the root cause of the break. This investigation takes time, creates noise across the value chain, and most importantly, creates uncertainty for the front office to take new decisions.

The proposal is to leverage AI to continually create and refine gold-copy data at each stage of the life cycle through comparison with sources and link downstream processes to automatically update in real-time with the accurate datasets. Guardrails should also be implemented in case of material differences.

Leveraging AI to accelerate sales effectiveness

Artificial Intelligence can be an extremely useful tool for organizations looking to improve the effectiveness of their sales and marketing activities.

AI Accelerator InstituteArup Chakravarti

Introduction

Let’s analyze the current process with an example - a vanilla bond is about to undergo a payment-in-kind (PIK) corporate action (PIKs occur when an issuer decides to capitalize interest it would have paid in cash as additional security). Assume that the vendor an IBOR system is using utilizes an ACT/360 day-count (to calculate accrual) than the custodian (who uses ACT/365):

On ex-date, the PIK will process with a higher capitalization than the custodian and a mismatch will form between IBOR and Bank.
This mismatch will first be uncovered on ex-date, assuming the bank sends MT567 (corp. action status) and flags the positional difference between the two systems.
Next, on SD+1, this will again be flagged when the bank sends MT535 (position statement), showing the mismatch during position reconciliation.
Finally, if investment accounting is run on ex-date or on SD+1, there’ll be a mismatch between IBOR and the fund accountant, where the balance sheet and statement of change in net asset reports will again show an exception for the security.

This simple example illustrates how one mismatch well upstream in the lifecycle causes three separate breaks in the downstream chain; in other words, three different segments of users (corp. action user, reconciliation user, and an accounting user are all investigating the same root cause).

Once the IBOR system’s data is resolved, each of these user segments need to coordinate the waterfall logic to have each of the downstream system/process updated.

The problem

Unfortunately, such occurrences are common. As front-to-middle-to-back investment systems become more integrated, inaccurate data at any point in the process chain creates inefficiencies across a number of user segments and forces multiple users to analyze the same exception (or the effect of that exception) on their respective tools.

Downstream users that are reconciling to the bank or the fund accountant will notice the security mismatch but would not immediately recognize the root cause of day count difference. These users would typically undertake the below tasks to investigate:

Raise an inquiry with the bank’s MT535 statement to explain the position difference
Raise an inquiry with the fund accountant’s statement to explain the position difference
Raise inquiry with the internal data team to specify IBOR’s position calculations
Once aware of a recent corp. action, raise inquiry with the internal COAC team to investigate the processing of the PIK

As seen, multiple teams’ energy and capacity are being expended to investigate the root cause and all being undertaken manually.

On the other hand, an AI process that could continually query multi-source datasets should have been proactively able to flag the day count discrepancy prior to the corp. action processing, as well as automatically inform downstream teams of potential inaccuracy in the specific position of the PIK security.

While any changes to user data from AI should still undergo a reviewer check, such proactive detection and communication drastically increases resolution times and should reduce user frustration.

LLM economics: How to avoid costly pitfalls

Avoid costly LLM pitfalls: Learn how token pricing, scaling costs, and strategic prompt engineering impact AI expenses—and how to save.

AI Accelerator InstituteMarisa Garanhel

The proposal

Let's look at the corporate action workflow in detail. Users typically create a “gold-copy” event once they’ve “scrubbed” data from multiple sources and created an accurate, up-to-date copy of the event that will occur. This is ideal in many ways: scrubbing multiple sources ensures there’s less chance of an incorrect feed from a single vendor, creating process gaps.

We need AI to undertake this process continuously. IBOR systems should, at minimum, be subscribed to two or more vendors from whom data should be retrieved. Any change to the dataset should be continually updated (either through a push or pull API mechanism). This would work as follows:

A new public security is set up in the marketplace with public identifiers including CUSIP, ISIN, SEDOL etc.
The data vendors supplying the feed to IBOR systems should feed this through automatically, once the required minimum data point details are populated.
- IBOR systems, at this point, would create this security within their data systems
- Any mismatches across vendors should be reviewed by a user, and appropriate values chosen (if deemed necessary)
Any updates the securities undergo from that point in the market should be automatically captured and security updated in the IBOR system
- At this point, downstream applications that leverage the application should automatically flag a security market update and the impending event-driven update
  - This informs users that the dataset they’re seeing may be stale vs. external processes that may be receiving up-to-date data
- To protect against the risk of inaccurate data from a single vendor, only a dataset that is consistent across all vendors should be automatically updated
- Data updates from a single vendor only should be prompted to a user to review and approve
Once underlying securities are updated, this would be considered an ‘event’, which should drive updates to all downstream applications that rely on the security update (called event-driven updates)
- Event-driven updates greatly reduce the number of manual touches downstream users need to make for inaccuracies that have been identified upstream
- Once all applications are in line with the updated data sets, the security market update flag should be removed automatically.

Potential concerns

While exciting, the use of AI and event-driven updates raises a few concerns worth discussing - data capacity/storage, potential timing differences with external participants, and materiality/tolerance.

Let’s address the latter first - materiality/tolerance. Securities can undergo immaterial changes from time to time that may have little to no impact on all upstream and downstream processes in the trade lifecycle.

As a result, a set of fields and tolerances should be identified to be flagged in case of market updates (core dataset). If the updates occur on these specific fields and they’re outside of the existing tolerance, IBOR systems should consume the updates provided by vendors.

If updates occur on any other fields (or are within tolerance), the updates should be rejected. This would ensure the system leverages the efficiency of AI without the inefficiency of noise.

Secondly, there is potential for timing differences with external participants. While the IBOR system may have up-to-date data, external participants (e.g., banks or fund accounting systems) may continue to leverage stale or outdated datasets.

AI and data analytics-driven finance transformation

AI and data analytics are valuable assets for discovering real-time insights, proactive decision-making, and predictive capabilities.

AI Accelerator InstituteArsalan Sheikh

There should be an audit history available of the core dataset’s historical data; in other words, if the bank/fund accounting system refers to any of the audit datasets, an automatic note should be sent to the external participant informing them of stale data and to recheck against external market vendors.

Finally, there is the concern about data capacity. There’s no doubt that continual querying, validation, and updates of core datasets by multiple vendors, along with maintaining audit data, will increase data consumption and storage costs.

A number of companies are required by law to keep an audit history of at least five years, and adding the above requirement would certainly expand the capacity requirements. Making security updates to solely the core data sets and allowing tolerance should help to manage some of this required capacity.

Future

Despite these strong concerns highlighted, the use of AI is still valuable to design and implement across the trade lifecycle process and would be substantially more valuable than the costs that would likely be incurred. While much of the examples in this paper discussed public securities, the universe is substantially wider in private securities with much less high-quality data.

With the investing world transitioning to increased investments in private securities, leveraging AI will continue to pay dividends across both universes.

LLM economics: How to avoid costly pitfalls

Marisa Garanhel — Tue, 25 Mar 2025 12:37:17 GMT

Large Language Models (LLMs) like GPT-4 are advanced AI systems designed to process and generate human-like text, transforming how businesses leverage AI.

GPT-4’s pricing model (32k context) charges $0.06 per 1,000 input tokens and $0.12 per 1,000 output tokens, which makes it a scalable option for businesses. However, it can become expensive very quickly when it comes to production environments.

New models cross-reference all bits of data, or tokens, that deal with other tokens in order to both quantify and understand the context behind each pair. The result? Quadratic behavior of algorithms that becomes more and more expensive as the number of tokens increases.

And scaling isn’t linear; costs increase quadratically when it comes to the length of sequences. If you need to scale up to handle text that’s 10x longer, the cost will go up 10,000 times, and so on.

This can be a significant setback for scaling projects; the hidden cost of AI impacts sustainability, resources, and requirements. This lack of insight can lead to businesses overspending or inefficiently allocating resources.

Where costs lie

Let’s look deeper into tokens, per-token pricing, and how everything works.

Tokens are the smallest unit of text processed by models – something simple like an exclamation mark can be a token. Input tokens are used whenever you enter anything into the LLM query box, and output tokens are used when the LLM answers your query.

On average, 740 words are equivalent to around 1,000 tokens.

Inference costs

Here’s an illustrative example of how costs can exponentially grow:

Input tokens: $0.50 per million tokens

Output tokens: $1.50 per million tokens

Month	Users/ Avg. prompts per user	Input/output tokens per prompt	Total input tokens	Total output tokens	Input cost	Output cost	Total monthly cost
1	1,000/20	200/300	4,000,000	6,000,000	$2	$9	$11
3	10,000/25	200/300	50,000,000	75,000,000	$25	$122.50	$137.50
6	50,000/30	200/300	300,000,000	450,000,000	$150	$675	$825
9	200,000/35	200/300	1,400,000,000	2,100,000,000	$700	$3,150	$3,850
12	1,000,000/40	200/300	8,000,000,000	12,000,000,000	$4,000	$18,000	$22,000

As LLM adoption expands, the user numbers grow exponentially and not linearly. Users engage more frequently with the LLM, and the number of prompts per user increases. The number of total tokens increases significantly as a result of increased users, prompts, and token usage, leading to costs multiplying monthly.

What does it mean for businesses?

Anticipating exponential cost growth becomes essential. For example, you’ll need to forecast token usage and implement techniques to minimize token consumption through prompt engineering. It’s also vital to keep monitoring usage trends closely in order to avoid unexpected cost spikes.

Latency versus efficiency tradeoff

Let’s look into GPT-4 vs. GPT-3.5 pricing and performance comparison.

Model	Context window (max tokens)	Input price	Output price
GPT-3.5 Turbo	4,000	$0.0015	$0.0020
GPT-3.5 Turbo	16,000	$0.0030	$0.0040
GPT-4	8,000	$0.03	$0.06
GPT-4	32,000	$0.06	$0.12
GPT-4 Turbo	128,000	$0.01	$0.03

Latency refers to how quickly models respond; a faster response leads to better user experiences, especially when it comes to real-time applications. In this case, GPT-3.5 Turbo offers lower latency because it has simpler computational requirements. GPT-4 standard models have higher latency due to processing more data and using deeper computations, which is the tradeoff for more complex and accurate responses.

Efficiency is the cost-effectiveness and accuracy of the responses you receive from the LLMs. The higher the efficiency, the more value per dollar you get. GPT-3.5 Turbo models are extremely cost-efficient, offering quick responses at low cost, which is ideal for scaling up user interactions.

GPT-4 models deliver better accuracy, reasoning, and context awareness at much higher costs, making them less efficient when it comes to price but more efficient for complexity. GPT-4 Turbo is a more balanced offering; it’s more affordable than GPT-4, but it offers better quality responses than GPT-3.5 Turbo.

To put it simply, you have to balance latency, complexity, accuracy, and cost based on your specific business needs.

High-volume and simple queries: GPT-3.5 Turbo (4K or 16K).

Perfect for chatbots, FAQ automation, and simple interactions.

Complex but high-accuracy tasks: GPT-4 (8K or 32K).

Best for sensitive tasks requiring accuracy, reasoning, or high-level understanding.

Balanced use-cases: GPT-4 Turbo (128K).

Ideal where higher quality than GPT-3.5 is needed, but budgets and response times still matter.

Experimentation and iteration

Trial-and-error prompt adjustments can take multiple iterations and experiments. Each of these iterations consumes both input and output tokens, which leads to increased costs in LLMs like GPT-4. If not monitored closely, incremental experimentation will very quickly accumulate costs.

You can fine-tune models to improve the responses; this requires extensive testing and repeated training cycles. These fine-tuning iterations require significant token usage and data processing, which increases costs and overhead.

The more powerful the model, like GPT-4 and GPT-4 Turbo, the more these hidden expenses multiply because of higher token rates.

Activity	Typical usage	GPT-3.5 Turbo cost	GPT-4 cost
Single prompt test iteration	~2,000 tokens (input/output total)	$0.0035	$0.18
500 iterations (trial/error)	~1,000,000 tokens	$1.75	$90
Fine-tuning (multiple trials)	~10M tokens	$35	$1,800

(Example assuming average prompt/response token counts.)

Strategic recommendations to ensure efficient experimentation without adding overhead or wasting resources:

Start with cheaper models (e.g., GPT-3.5 Turbo) for experimentation and baseline prompt testing.
Progressively upgrade to higher-quality models (GPT-4) once basic prompts are validated.
Optimize experiments: Establish clear metrics and avoid redundant iterations.

Vendor pricing and lock-in risks

First, let’s have a look at some of the more popular LLM providers and their pricing:

OpenAI

Model	Context length	Pricing
GPT-4	8K tokens	Input: $0.03 per 1,000 tokens Output: $0.06 per 1,000 tokens
GPT4	32K tokens	Input: $0.06 per 1,000 tokens Output: $0.12 per 1,000 tokens
GPT4 Turbo	128K tokens	Input: $0.01 per 1,000 tokens Output: $0.03 per 1,000 tokens

Anthropic

Claude 3.7 Sonnet

Claude.ai plans

Input: $3 per million tokens ($0.003 per 1,000 tokens)

Output: $15 per million tokens ($0.015 per 1,000 tokens)

Free: Access to basic features

Pro plan: $20 per month (Enhanced features for individual users)

Team plan (minimum 5 users):

$30 per user per month (monthly billing) or $25 per user per month (annual billing)

Enterprise plan: Custom pricing tailored to organizational needs.

Google

Gemini Advanced

Gemini Code Assist Enterprise

Included in the Google One AI Premium plan

$19.99 per month.

Includes 2 TB of storage for Google Photos, Drive, and Gmail

$45 per user per month with a 12-month commitment

Promotional rate of $19 per user per month available until March 31, 2025

Committing to just one vendor means you have reduced negotiation leverage, which can lead to future price hikes. Limited flexibility increases costs when you switch providers, considering prompts, code, and workflow dependencies. Hidden overheads like fine-tuning experiments when migrating vendors can increase expenses even more.

When thinking strategically, businesses should keep flexibility in mind and consider a multi-vendor strategy. Make sure to keep monitoring evolving prices to avoid costly lock-ins.

How companies can save on costs

Tasks like FAQ automation, routine queries, and simple conversational interactions don’t need large-scale and expensive models. You can use cheaper and smaller models like GPT-3.5 Turbo or a fine-tuned open-source model.

LLaMA or Mistral are great fine-tuned smaller open-source model choices for document classification, service automation, or summarization. GPT-4, for example, should be saved for high accuracy and high-value tasks that’ll justify incurring higher costs.

Prompt engineering directly affects token consumption, as inefficient prompts will use more tokens and increase costs. Keep your prompts concise by removing unnecessary information; instead, structure your prompts into templates or bullet points to help models respond with clearer and shorter outputs.

You can also break up complex tasks into smaller and sequential prompts to reduce the total token usage.

Example:

Original prompt:

"Explain the importance of sustainability in manufacturing, including environmental, social, and governance factors." (~20 tokens)

Optimized prompt:

"List ESG benefits of sustainable manufacturing." (~8 tokens, ~60% reduction)

To further reduce costs, you can use caching and embedding-based retrieval methods (Retrieval-Augmented Generation, or RAG). Should the same prompt show up again, you can offer a cached response without needing another API call.

For new queries, you can store data embeddings in databases. You can retrieve relevant embeddings before passing only the relevant context to the LLM, which minimizes prompt length and token usage.

Lastly, you can actively monitor costs. It’s easy to inadvertently overspend when you don’t have the proper visibility into token usage and expenses. For example, you can implement dashboards to track real-time token usage by model. You can also set a spending threshold alert to avoid going over budget. Regular model efficiency and prompt evaluations can also present opportunities to downgrade models to cheaper versions.

Start small: Default to GPT-3.5 or specialized fine-tuned models.

Engineer prompts carefully, ensuring concise and clear instructions.

Adopt caching and hybrid (RAG) methods early, especially for repeated or common tasks.

Implement active monitoring from day one to proactively control spend and avoid

The smart way to manage LLM costs

After implementing strategies like smaller task-specific models, prompt engineering, active monitoring, and caching, teams often find that a systematic approach to operationalize these approaches at scale is needed.

The manual operation of model choices, prompts, real-time monitoring, and more can very easily become both complex and resource-intensive for businesses. This is where you’ll find the need for a cohesive layer to orchestrate your AI workflows.

Vellum streamlines iteration, experimentation, and deployment. As an alternative to manually optimizing each component, Vellum will help your teams choose the appropriate models, manage prompts, and fine-tune solutions in one integrated solution.

It’s a central hub that allows you to operationalize cost-saving strategies without increasing costs or complexity.

Here’s how Vellum helps:

Prompt optimization

You’ll have a structured, test-driven environment to effectively refine prompts, including a side-by-side comparison across multiple models, providers, and parameters. This helps your teams identify the best prompt configurations quickly.

Vellum significantly reduces the cost of iterative experimentation and complexity by offering built-in version control. This ensures that your prompt improvements are efficient, continuous, and impactful.

There’s no need to keep your prompts on Notion, Google Sheets, or in your codebase; have them in a single place for seamless team collaboration.

Model comparison and selection

You can compare LLM models objectively by running side-by-side systematic tests with clearly defined metrics. Model evaluation across the multiple existing providers and parameters is made simpler.

Businesses have transparent and measurable insights into performance and costs, which helps to accurately select the models with the best balance of quality and cost-effectiveness. Vellum allows you to:

Run multiple models side-by-side to clearly show the differences in quality, cost, and response speed.
Measure key metrics objectively, such as accuracy, relevance, latency, and token usage.
Quantify cost-effectiveness by identifying which models achieve similar or better outputs at lower costs.
Track experiment history, which leads to informed, data-driven decisions rather than subjective judgments.

Real-time cost tracking

Enjoy detailed and granular insights into LLM spending through tracking usage across the different models, projects, and teams. You’ll be able to precisely monitor the prompts and workflows that drive the highest token consumption and highlight inefficiencies.

This transparent visualization allows you to make smarter decisions; teams can adjust usage patterns proactively and optimize resource allocation to reduce overall AI-related expenses. You’ll have insights through intuitive dashboards and real-time analytics in one simple location.

Seamless model switching

Avoid vendor lock-in risks by choosing the most cost-effective models; Vellum gives you insights into the evolving market conditions and performance benchmarks. This flexible and interoperable platform allows you to keep evaluating and switching seamlessly between different LLM providers like Anthropic, OpenAI, and others.

Base your decision-making on real-time model accuracy, pricing data, overall value, and response latency. You won’t be tied to a single vendor’s pricing structure or performance limitations; you’ll quickly adapt to leverage the most efficient and capable models, optimizing costs as the market dynamics change.

Final thoughts: Smarter AI spending with Vellum

The exponential increase in token costs that arise with the business scaling of LLMs can often become a significant challenge. For example, while GPT-3.5 Turbo offers cost-effective solutions for simpler tasks, GPT-4’s higher accuracy and context-awareness often come at higher expenses and complexity.

Experimentation also drives up costs; repeated fine-tuning and prompt adjustments are further compounded by vendor lock-in potential. This limits competitive pricing advantages and reduces flexibility.

Vellum comprehensively addresses these challenges, offering a centralized and efficient platform that allows you to operationalize strategic cost management:

Prompt optimization. Quickly refining prompts through structured, test-driven experimentation significantly cuts token usage and costs.
Objective model comparison. Evaluate multiple models side-by-side, making informed decisions based on cost-effectiveness, performance, and accuracy.
Real-time cost visibility. Get precise insights into your spending patterns, immediately highlighting inefficiencies and enabling proactive cost control.
Dynamic vendor selection. Easily compare and switch between vendors and models, ensuring flexibility and avoiding costly lock-ins.
Scalable management. Simplify complex AI workflows with built-in collaboration tools and version control, reducing operational overhead.

With Vellum, businesses can confidently navigate the complexities of LLM spending, turning potential cost burdens into strategic advantages for more thoughtful, sustainable, and scalable AI adoption.

AI Accelerator Institute

How generative AI is revolutionizing drug discovery and development

Generative AI’s role in enhancing medical imaging

How AI is transforming financial modeling & sales forecasting in enterprise tech

1. Why traditional forecasting falls short

Lack of broader business context

Inflexibility

Human bias

2. What makes AI a game-changer for financial modeling

Cross-functional simulations tailored by domain experts

Real-time forecast adjustments

3. Practical use cases in enterprise finance

AI-powered lead scoring & targeting

Smart bundling & pricing optimization

Automated revenue forecasting

4. How AI enhances execution and GTM strategy

Smarter pipeline management

Improved sales productivity

Tighter finance-sales alignment

5. Key considerations for implementation

Conclusion

The great web rebuild: Infrastructure for the AI agent era

Proving you're a human an agent

A new protocol for Agent-to-Agent communication

Trust and reputation reimagined

The new data-sharing economy

New attack vectors

Building the new web - opportunities for founders

AIOps in action: AI & automation transforming IT operations

Understanding AIOps and its role in IT operations

Real-world use cases of AIOps in predictive maintenance and incident response

A. Predictive maintenance with AIOps

Moreover, predictive maintenance offers a range of key benefits that help organizations optimize operations and reduce costs:

In order to illustrate the impact of predictive maintenance in action, let’s look at a case study where AIOps played a crucial role in preventing server failures.

B. AIOps in incident response and resolution

Building a scalable AIOps architecture

The future of AIOps in IT operations

Conclusion

The truth about enterprise AI agents (and how to get value from them)

What work AI systems actually do (and why they matter now)

How to 8‑bit quantize large models using bits and bytes

Introduction

Understanding quantization

Benefits of quantization

Types of quantization

Why 8‑bit quantization?

Theoretical underpinnings of quantization

Quantization error

Scale and zero‑point

Quantization Aware Training (QAT) vs. Post‑Training Quantization (PTQ)

Steps in 8‑bit quantization

Preprocessing and calibration

Quantization Aware Training vs. Post‑Training Quantization

Quantization Aware Training (QAT):

Post‑Training Quantization (PTQ):

8-bit quantization applied

Practical considerations

Hardware support

Libraries

BitsAndBytes: A specialized library for 8‑bit quantization

Integrating BitsAndBytes with your workflow

Case study: Quantizing IBM Granite with 8‑bit using BitsAndBytes

IBM Granite quantization: Example code

Code breakdown

Barriers and best practices

Challenges

Accuracy degradation

Calibration difficulty

Hardware constraints

Best practices full calibration

Layer-by-layer analysis

Progressive evaluation

Use framework tools

Fine‑tuning

Conclusion

Building scalable image data pipelines for AI training

Scalable image data pipelines: A need

Key components of scalable image data pipelines

1. Data Ingestion

2. Efficient data pre-processing:

Q: With the rapid adoption of LLMs, what unique challenges do enterprises face in monitoring these models, and how does Fiddler's LLM Observability address these
issues?