Opportunities And Challenges When Using LLMs In The Data Space

Key Takeaways
- •Technical and business users need fundamentally different LLM interfaces — IDE integrations for engineers, conversational chat for business decision-makers.
- •LLM security in data infrastructure requires role-based access, query validation, and comprehensive audit trails for every interaction with production data.
- •Successful LLM adoption depends on strategic implementation matching user personas, not wholesale rollout — thoughtful change management outperforms aggressive deployments.
Large Language Models (LLMs) are transforming how organizations interact with their data infrastructure, offering unprecedented capabilities for both technical and business users. However, this transformation brings unique opportunities and challenges that vary significantly based on user personas, security requirements, and implementation approaches. This writeup explores these dimensions through the lens of practical implementation using tools like Keboola MCP and various client interfaces.
The Persona Divide: Technical vs. Business Users
Technical Users: The Builders and Maintainers
Technical users—data engineers, analysts, and developers—approach LLMs with a fundamentally different mindset than their business counterparts. Their primary intent revolves around:
Intent and Use Cases:
- Creating and Debugging complex data pipelines
- Writing and optimizing SQL queries and transformations
- Automating repetitive configuration tasks
- Data processing optimization
For these users, LLMs serve as intelligent coding assistants that understand context, suggest optimizations, and accelerate development cycles. They seek precision, control, and the ability to review and modify generated code before execution.
However, data engineering operates largely on a binary principle: configurations either function flawlessly and reliably, or they fail. Engineers are unlikely to accept 90% functionality, such as an incomplete SQL query or a partially defined data extraction.
Preferred Interfaces: Technical users gravitate toward IDE-integrated solutions like VSCode or Cursor, where LLMs enhance their existing workflows. These tools allow them to maintain version control, leverage syntax highlighting, and access debugging capabilities while benefiting from AI assistance. The integration feels natural—it's an enhancement of their familiar environment rather than a replacement.
Suggested scopes of work:
- (To understand)
- Project documentation & descriptions
- Data validation
- Debugging
- (To build)
- Component configurations
- Custom Python
- Transformations
- Data Apps
- Whole Pipeline (flows)
Business Users: The Consumers and Decision Makers
Business users approach data with questions rather than queries. Their relationship with LLMs is fundamentally different:
Intent and Use Cases:
- Extracting insights from data without writing code
- Creating reports and visualizations
- Understanding data relationships and trends
- Making data-driven decisions quickly
- Asking natural language questions about their data
- Understanding data processing setups (who, what, how)
- Add business/semantic context to the data
These users need abstraction from technical complexity. They want to ask "What were our top-performing products last quarter?" rather than write JOIN statements and GROUP BY clauses.
However, using non-deterministic systems like LLMs presents increasing challenges in the reliability and reproducibility of results. Overcoming these technical challenges requires further assistance from more technical-savvy users.
Preferred Interfaces: Business users prefer conversational interfaces like Claude or custom chat applications. These tools provide a familiar, accessible entry point to data exploration without requiring technical knowledge. The conversation history becomes their audit trail, and the natural language interaction removes barriers to data access.
Suggested scopes of work:
- (To understand)
- New user onboarding
- Explain complicated logic
- Code comprehension
- Explore data
- (To improve)
- Project documentation
- Descriptions
- (Data validation)
- (Debugging)
Security Considerations: The Critical Foundation
The integration of LLMs into data infrastructure introduces novel security challenges that organizations must address comprehensively:
Data Privacy and Exposure
When LLMs interact with sensitive business data, several risks emerge:
Context Leakage: LLMs require context to provide accurate responses, but this context often includes sensitive information. Every query, every piece of data shared with the model, potentially exposes confidential information. Organizations must implement data policies, handling protocols and rules for using only approved LLM technology
Audit and Compliance: Every interaction between users and LLMs touching production data must be logged and auditable. This includes not just the queries but also the responses, the data accessed, and the transformations applied. Compliance with regulations like GDPR, HIPAA, or SOC 2 requires careful consideration of how LLM interactions are recorded and retained.
Access Control and Authorization
The democratization of data access through LLMs must not compromise security:
Role-Based Access: The LLM interface must respect existing data governance policies. A business user asking about data outside of their scope should receive different responses based on their authorization level. This requires sophisticated integration between the LLM layer and existing identity and access management systems or access management based on explicit dataset permission such as specific data catalogs.
Query Validation: For technical users, LLMs might generate complex queries that could potentially access unauthorized data or perform unintended operations. Implementing query validation and sandboxing mechanisms becomes crucial to prevent accidental or malicious data exposure.
Client Diversity: Matching Tools to Users
The choice of client interface significantly impacts adoption and effectiveness:
VSCode/Cursor for Technical Teams
These IDE-integrated solutions excel for technical users because they:
- Maintain familiar workflows and keyboard shortcuts
- Provide immediate code validation and syntax highlighting
- Enable seamless collaboration through version control integration
- Support debugging and testing within the same environment
- Allow gradual adoption—users can choose when to engage AI assistance
The integration feels like a natural evolution of existing tools rather than a disruptive change, leading to higher adoption rates among technical teams.
Claude and Custom Chat Interfaces for Business Teams
Conversational interfaces succeed with business users because they:
- Remove technical barriers to data access
- Provide explanations alongside results
- Build confidence through natural language interaction
- Create a self-documenting trail of analysis through conversation history
- Enable iterative exploration through follow-up questions
Custom chat interfaces can further enhance this experience by incorporating organization-specific terminology, branded experiences, and integrated visualization capabilities.
Challenges and Mitigation Strategies
The Hallucination Problem
LLMs may generate plausible-looking but incorrect queries or analyses. Mitigation strategies include:
- Implementing validation layers that check generated SQL against schema
- Requiring human review for critical operations
- Providing clear confidence indicators for generated responses
- Training users to verify results against known data points
Performance at Scale
As usage grows, performance challenges emerge:
- Query optimization becomes crucial when LLM-generated queries hit large datasets
- Caching strategies must balance freshness with response time
- Resource allocation needs to account for varying query complexity
- Token limits may restrict complex analytical tasks
Change Management
Successful adoption requires thoughtful change management:
- Training programs tailored to different user personas
- Clear governance policies for LLM usage
- Gradual rollout with pilot groups
- Continuous feedback loops for improvement
- Documentation of best practices and limitations
Future Directions
The convergence of LLMs and data infrastructure is still evolving. Future developments likely include:
Advanced Reasoning: Next-generation models will better understand complex business logic and data relationships, reducing errors and improving insight quality.
Multimodal Capabilities: Integration of visual data analysis, voice interfaces, and even video explanations will further lower barriers to data access.
Autonomous Agents: LLMs will evolve from assistants to autonomous agents capable of monitoring data, identifying anomalies, and proactively suggesting optimizations.
Federated Learning: Organizations will be able to benefit from collective learning while maintaining data privacy through federated approaches.
Implementing with Keboola MCP
Keboola's Model Context Protocol (MCP) server provides a practical framework for addressing mentioned challenges while maximizing opportunities:
Architecture and Integration
Keboola MCP acts as an intelligent middleware layer between LLMs and data infrastructure. It provides:
Standardized Communication: The MCP server establishes a consistent protocol for LLM interactions with data platforms, regardless of the client interface. This standardization simplifies security auditing and ensures consistent behavior across different user personas.
Context Management: Rather than exposing raw database connections to LLMs, Keboola MCP manages context intelligently. It understands the data model, relationships, and business logic, providing rich context to the LLM while maintaining security boundaries.

Persona-Specific Benefits
For Technical Users:
- Direct integration with development environments through MCP-compatible clients
- Access to component configurations and transformation logic
- Ability to generate, test, and deploy code within existing workflows
- Preservation of version control and collaboration features
For Business Users:
- Simplified natural language interface to complex data operations
- Automatic translation of business questions into technical implementations
- Guided data exploration without technical knowledge requirements
- Consistent results regardless of technical proficiency
Security Implementation
Keboola MCP addresses security concerns through:
Abstraction Layer: By sitting between the LLM and data infrastructure, it can enforce security policies consistently, validate queries, and ensure appropriate data access.
Data Access: Utilizing the concepts of Keboola project and Workspace allows strict isolation of the data access of selected “certified” data catalogs.
Audit Trail: All interactions flow through the MCP server, creating a comprehensive audit log for compliance and security monitoring.
Controlled Execution: Rather than allowing direct database access, the MCP server can validate and sanitize all operations before execution.
Conclusion
The integration of LLMs into data infrastructure represents a paradigm shift in how organizations interact with their data. Success requires careful consideration of user personas, robust security implementations, and thoughtful tool selection. Solutions like Keboola MCP provide a framework for navigating these challenges while maximizing opportunities.
By acknowledging the different needs of technical and business users, implementing comprehensive security measures, and choosing appropriate client interfaces, organizations can harness the transformative power of LLMs while maintaining data integrity and security. The key lies not in wholesale adoption but in strategic implementation that enhances existing workflows while opening new possibilities for data interaction.
As this space continues to evolve, organizations that thoughtfully balance innovation with security, accessibility with control, and automation with human oversight will be best positioned to realize the full potential of LLMs in their data operations.
Newsletter
Get more like this in your inbox
Practical data engineering and AI insights from the Keboola team.

Martin Fisher
Field CTO


