How to secure data access for AI Agents

With the growing adoption of AI agents, we are seeing more applications being built around AI-driven capabilities. These applications can range from small demos that highlight what AI agents can do, to full-scale enterprise integrations that enhance existing software products. For example, an AI agent could be embedded within a data governance platform for tasks like data profiling, or entire AI-first applications can be built from scratch.

Regardless of their size, these AI-driven applications may serve either a single user or many users across different teams within an enterprise. Naturally, these users often have varying levels of access to data and AI functionalities. This leads to important questions:

Does the data consumer have the right to use information generated by AI agents during data processing?
Should a user be able to view AI agent computations if they do not have direct access to the original data?
How do we manage access control for AI computations and data outputs across users with different privilege levels?

This article explores these challenges and demonstrates different implementation approaches for managing AI agent data access securely.

You can jump on a source code on Github and try it out.

Overview

An AI agent-based application typically includes these components:

Human Users — People interacting with AI agents and systems.
Systems — Software components that interact with APIs, databases, and other tools.
AI Agents — Entities that perform tasks, process data, and communicate with users and systems.
Data Sources — Structured and unstructured repositories accessed by AI agents such as databases.
Large Language Models (LLMs) — The underlying models powering the intelligence of the agents.

In such an environment:

Human users interact with AI agents through interfaces or systems.
AI agents access data through tools and APIs.
Tools fetch data via API calls, database queries, or access to structured/unstructured storage.

Access control must be enforced at several levels:

Controlling who can use the AI application.
Controlling which AI agents a user can interact with.
Controlling which data agents can retrieve for computation.
Controlling what parts of computed outputs a user can see.
Enforcing database-level access (table-level, row-level, column-level).

The key risk is that if AI agent orchestration bypasses these controls, a user could inadvertently be exposed to sensitive or restricted data. To avoid this, strict enforcement mechanisms are essential to ensure users only see data they are authorized to access.

We must put checkpoints in place to guarantee that users are only exposed to the data they are allowed to see, even if they are unaware of all the sources the AI agents use.

Base Scenario

In our setup, we define two users: User A and User B.

User A has full access to all resources, tools, and data.
User B has restricted access, depending on the resource type.

The application is designed with two AI agents:

Data Collector Agent: Gathers financial data from various sources.
Data Presenter Agent: Formats and presents the collected data.

In the Azure AI Agent Service implementation. See the full source code implementation on GitHub.

from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import FunctionTool, ToolSet
from azure.identity import DefaultAzureCredential

...

project_client = AIProjectClient.from_connection_string(
    credential=DefaultAzureCredential(),
    conn_str=os.environ["PROJECT_CONNECTION_STRING"],
)

# Shared tools
tools = FunctionTool([search_ibm_news, query_finance_data, fetch_ibm_stock])
toolset = ToolSet()
toolset.add(tools)

# Create Data Collector Agent
data_collector = project_client.agents.create_agent(
    model=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
    name=f"data-collector-agent-{datetime.now().strftime('%Y%m%d%H%M')}",
    description="Collects financial data from Neon and Alpha Vantage.",
    instructions="""
    You are an AI researcher focused on collecting financial data. Use your tools to:
    - Query Neon Postgres database
    - Fetch IBM stock data from Alpha Vantage
    """,
    toolset=toolset,
)

# Create Data Presenter Agent
data_presenter = project_client.agents.create_agent(
    model=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
    name=f"data-presenter-agent-{datetime.now().strftime('%Y%m%d%H%M')}",
    description="Presents and summarizes collected financial data.",
    instructions="""
    You are a summarization assistant. Your task is to format and present financial data insights in a concise, readable way.
    """
)

The Data Collector agent has access to five tools:

Internet search using Serper API
Full database query (Neon Postgres)
Limited columns query
Limited rows query (row-level restricted)
API call to Alpha Vantage

Example Tool Definitions:

...

def query_finance_data():
    """Query all finance data from Neon."""
    try:
        conn = psycopg2.connect(NEON_DB_URL)
        cursor = conn.cursor(cursor_factory=RealDictCursor)
        cursor.execute("SELECT * FROM finance")
        rows = cursor.fetchall()
        conn.close()
        return "\\n".join([str(row) for row in rows]) if rows else "No finance data found."
    except Exception as e:
        return f"Error querying Neon: {str(e)}"

def query_limited_finance_data():
    """Query limited finance data (company and stock price only) from Neon."""
    try:
        conn = psycopg2.connect(NEON_DB_URL)
        cursor = conn.cursor(cursor_factory=RealDictCursor)
        cursor.execute("SELECT company, stock_price FROM finance")
        rows = cursor.fetchall()
        conn.close()
        return "\\n".join([str(row) for row in rows]) if rows else "No limited finance data found."
    except Exception as e:
        return f"Error querying Neon: {str(e)}"

def query_row_level_finance_data():
    """Query row level restricted finance data from Neon."""
    try:
        conn = psycopg2.connect(NEON_DB_URL)
        cursor = conn.cursor(cursor_factory=RealDictCursor)
        cursor.execute("SELECT * FROM finance WHERE user_role = 'restricted'")
        rows = cursor.fetchall()
        conn.close()
        return "\\n".join([str(row) for row in rows]) if rows else "No row level restricted data found."
    except Exception as e:
        return f"Error querying Neon: {str(e)}"

def fetch_financial_data():
    """Fetch IBM financial data from Alpha Vantage API."""
    url = f"<https://www.alphavantage.co/query?function=GLOBAL_QUOTE&symbol=IBM&apikey={ALPHA_VANTAGE_API_KEY}>"
    response = requests.get(url)
    if response.status_code == 200:
        data = response.json().get("Global Quote", {})
        return "\\n".join([f"{k}: {v}" for k, v in data.items()]) if data else "No stock data found."
    return f"API error: {response.text}"

def search_ibm_news(query="IBM Q4 earnings"):
    """Search IBM financial news using Serper API."""
    url = "<https://google.serper.dev/search>"
    headers = {
        "X-API-KEY": SERPER_API_KEY,
        "Content-Type": "application/json"
    }
    payload = {"q": query}
    response = requests.post(url, headers=headers, json=payload)
    if response.status_code == 200:
        results = response.json().get("organic", [])
        return "\\n".join([f"{r['title']} - {r['link']}" for r in results[:3]])
    return f"Serper API error: {response.text}"

User Roles and Identity Management

Instead of manually assigning permissions inside the application code, we manage user roles externally through a file like user_roles.yaml or, ideally, via an identity management service like Azure Entra ID.

Example user_roles.yaml:

users:
  - username: user_a
    roles:
      - admin
      - full_data_access
  - username: user_b
    roles:
      - restricted
      - limited_api_access
      - restricted_db
      - row_restricted
      - mask_data

The AI application reads these roles at runtime and dynamically adjusts agent behavior and available tools accordingly.

Implementation and Scenarios

Scenario 1: Unrestricted Access

In this case, User A can use all tools and see all data without any limits. User B, on the other hand, has some restrictions. We assume there are no limits on what data can be collected or processed for users with full access.

...

def get_user_roles(username):
    with open('user_roles.yaml', 'r') as file:
        users = yaml.safe_load(file)['users']
    for user in users:
        if user['username'] == username:
            return user['roles']
    return []python

# Create toolset
toolset = ToolSet()
toolset.add(FunctionTool([search_ibm_news, query_finance_data, fetch_financial_data]))

# Create Data Collector Agent
data_collector = project_client.agents.create_agent(
    model=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
    name=f"data-collector-{datetime.now().strftime('%Y%m%d%H%M')}",
    description="Collects IBM financial data from Neon, APIs, and Web.",
    instructions="Use your tools to collect financial and stock data related to IBM.",
    toolset=toolset,
)

# Create Data Presenter Agent
data_presenter = project_client.agents.create_agent(
    model=os.environ["AZURE_OPENAI_DEPLOYMENT_NAME"],
    name=f"data-presenter-{datetime.now().strftime('%Y%m%d%H%M')}",
    description="Formats and summarizes collected financial data.",
    instructions="Summarize the financial data collected earlier into a nice report.",
)

# Create conversation thread
thread = project_client.agents.create_thread()

# Set current user
current_user = "user_b"
roles = get_user_roles(current_user)

# Dynamic tasks
print("\\n🛠️ Configuring tasks based on user roles...\\n")

# Internet search task
project_client.agents.create_message(
    thread_id=thread.id,
    role="user",
    content="Search for IBM's Q4 financial results from the web.",
)
project_client.agents.create_and_process_run(thread_id=thread.id, agent_id=data_collector.id)

...

Both agents operate normally, and all information is accessible to the user.

------------------------------------------------------------
🧑 User at 2025-04-25 17:09:37
-----------------------------
Collect IBM financial records and latest stock data.
------------------------------------------------------------
🤖 Agent (asst_8CdWRMTZH9uXPHGGYSAoZmvA) at 2025-04-25 17:09:46
--------------------------------------------------------------
Here is the compiled financial data and stock details for IBM:

### Financial Records:
- **Company**: IBM
- **Revenue**: $75,000 million
- **Profit**: $5,000 million
- **Stock Price** (recorded at the time): $145.32

### Latest Stock Data:
- **Symbol**: IBM
- **Opening Price**: $231.1750
- **High**: $232.7800
- **Low**: $224.4401
- **Price**: $229.3300
- **Volume**: 15,428,144 shares
- **Latest Trading Day**: April 24, 2025
- **Previous Close**: $245.4800
- **Change**: -$16.1500
- **Change Percent**: -6.5789%

🧑 User at 2025-04-25 17:09:52
-----------------------------
Summarize the financial data collected earlier.
------------------------------------------------------------
🤖 Agent (asst_OIDOm5IwQ7enimTyaQrAuHEs) at 2025-04-25 17:09:55
--------------------------------------------------------------
### IBM Financial Summary:
1. **Revenue**: $75 billion  
2. **Profit**: $5 billion  
3. **Stock Performance**:  
   - Latest price: $229.33 (April 24, 2025)  
   - Daily change: -$16.15 (-6.58%)

Scenario 2: Block Database Access for User B

In this situation, User B should not be allowed to access the database directly. This setup makes sure that users without the right permissions cannot pull data from the database and can only use other available sources instead.

In many business applications, there are different ways to control who can access the database:

Service Account Model: The whole application connects to the database using one service account that has access to everything. This makes things easier to manage, but every user ends up having the same level of access, without any user-specific restrictions.
User Privilege Validation: The system checks each user’s permissions before allowing them to run database queries. This can be done in two ways:
- Role-Based Access Control (RBAC): Users are given roles that define which tables, rows, or queries they can access.
- Middleware Enforcement: A layer between the app and the database checks what users are allowed to do before sending their queries to the database.
Row-Level Security: The database itself is set up to automatically show different rows of data depending on who the user is and what they are allowed to see. Discover how to enable row-level security using Neon RLS.
Proxy Database Access: Instead of letting users connect directly to the database, they go through an API layer that applies all the necessary security checks based on their role.

...
# Set current user
current_user = "user_b"
roles = get_user_roles(current_user)

if "restricted_db" in roles:
    query_task = "Query limited finance data"
    query_tool = "query_limited_finance_data"

project_client.agents.create_message(
    thread_id=thread.id,
    role="user",
    content=f"{query_task} from the Neon database.",
)
project_client.agents.create_and_process_run(thread_id=thread.id, agent_id=data_collector.id)
...

We enforce this by removing database query tasks for restricted users. See the full source code implementation on GitHub.

------------------------------------------------------------
🧑 User: Query limited finance data from the Neon database.
------------------------------------------------------------
🤖 Assistant: From the Neon database, here is limited financial data for IBM:

- **Revenue**: $75,000 million
- **Profit**: $5,000 million
- **Stock Price**: $145.32

Scenario 3: Restrict API Access

In this case, User B should not be allowed to access certain external APIs. There are several ways to control API access:

API Gateway Policies: API gateways can be set up to check user roles and only allow users with the right permissions to make API requests.
Middleware Layer: The application can use a middleware that checks a user’s role before sending any API requests, blocking those who do not have access.
Token-Based Access Control: API authentication tokens can carry information about what a user is allowed to do, limiting which APIs or endpoints they are allowed to call.

...
# Financial API task (only if allowed)
if "limited_api_access" not in roles:
    project_client.agents.create_message(
        thread_id=thread.id,
        role="user",
        content="Fetch IBM's latest stock data from Alpha Vantage.",
    )
    project_client.agents.create_and_process_run(thread_id=thread.id, agent_id=data_collector.id)
...

This simulates API-level access control, typically enforced via API Gateways, middleware, or token scopes.

Scenario 4: Limit Access to Specific Database Columns

In this case, User B should not be able to access certain tables in the database. User B can only view limited columns like company and stock_price. Table-level access control can be handled in a few ways:

Database Permissions: The database can be set up to control which tables a user can see or query based on their assigned role. Read more about how to manage database access.
Query Proxy Layer: A middleware layer can sit between the user and the database, blocking access to restricted tables before the queries are even sent to the database.

Instead of querying full tables, agents only retrieve selected columns.

Scenario 5: Restrict Access to Specific Rows

In this scenario, User B should not be able to view certain rows in a database table. Row-level security can be enforced in two main ways:

Database Row-Level Policies: The database applies rules that control which rows each user is allowed to see, based on their access rights.
Application-Level Filtering: The application itself checks and filters the query results before showing the data to the user, making sure they only see what they are permitted to access.

...
# Set current user
current_user = "user_b"
roles = get_user_roles(current_user)

# Finance database task
if "restricted_db" in roles:
    query_task = "Query limited finance data"
    query_tool = "query_limited_finance_data"
elif "row_restricted" in roles:
    query_task = "Query row level restricted finance data"
    query_tool = "query_row_level_finance_data"
else:
    query_task = "Query full finance data"
    query_tool = "query_finance_data"

project_client.agents.create_message(
    thread_id=thread.id,
    role="user",
    content=f"{query_task} from the Neon database.",
)
project_client.agents.create_and_process_run(thread_id=thread.id, agent_id=data_collector.id)
...

Row-level security ensures they cannot access data outside their assigned scope.

Scenario 6: Mask Sensitive Output Fields

In this scenario, User B should not be able to see certain sensitive information in the results generated by the AI. This can be handled in two ways:

Data Masking: Sensitive parts of the results are hidden or replaced before being shown to the user.
Post-Processing Filters: The AI agent reviews the output and removes or hides any restricted information before presenting it.

# Set current user
current_user = "user_b"
roles = get_user_roles(current_user)

# Data presenter task
if "mask_data" in roles:
    present_instruction = "Summarize the financial data but mask revenue and profit."
else:
    present_instruction = "Summarize the collected financial data into a clean report."

project_client.agents.create_message(
    thread_id=thread.id,
    role="user",
    content=present_instruction,
)
project_client.agents.create_and_process_run(thread_id=thread.id, agent_id=data_presenter.id)

The Data Presenter agent applies masking before summarizing results.

🧑 User: Summarize the financial data but mask revenue and profit.
------------------------------------------------------------
🤖 Assistant: Here’s the summarized financial data with revenue and profit details masked:

1. **IBM**: Stock Price - $145.32
2. **Apple**: Stock Price - $179.95
3. **Microsoft**: Stock Price - $314.10
4. **Google**: Stock Price - $2,801.12
5. **Amazon**: Stock Price - $142.92
6. **Meta**: Stock Price - $302.56
7. **Tesla**: Stock Price - $199.35
8. **Netflix**: Stock Price - $412.75
9. **Nvidia**: Stock Price - $450.99
10. **Samsung**: Stock Price - $70.10

Conclusion

AI agents open up new challenges in managing data access, especially in multi-user environments where different privilege levels exist. By combining proper authentication, API security, database-level filtering, and AI output moderation, it is possible to build AI-driven applications that are both powerful and secure.

This project demonstrates how role-aware agents can dynamically adjust their behavior based on user permissions while responsibly accessing multiple sources like Neon databases, financial APIs, and the open web.

Following these best practices ensures that AI-powered systems comply with data governance policies and protect sensitive information from unauthorized access.

How to secure data access for AI Agents

Overview

Base Scenario

Example Tool Definitions:

User Roles and Identity Management

Implementation and Scenarios

Scenario 1: Unrestricted Access

Scenario 2: Block Database Access for User B

Scenario 3: Restrict API Access

Scenario 4: Limit Access to Specific Database Columns

Scenario 5: Restrict Access to Specific Rows

Scenario 6: Mask Sensitive Output Fields

Conclusion

More from Neon

Branching as the New Standard for Relational Databases

Build Internal Tools Using Neon, StackAuth, and Vercel

Build Your Own Full-Text Search CMS with Neon and pg_search

How to secure data access for AI Agents

Overview

Base Scenario

Example Tool Definitions:

User Roles and Identity Management

Implementation and Scenarios

Scenario 1: Unrestricted Access

Scenario 2: Block Database Access for User B

Scenario 3: Restrict API Access

Scenario 4: Limit Access to Specific Database Columns

Scenario 5: Restrict Access to Specific Rows

Scenario 6: Mask Sensitive Output Fields

Conclusion

Subscribe to receive our latest updates

More from Neon

Branching as the New Standard for Relational Databases

Build Internal Tools Using Neon, StackAuth, and Vercel

Build Your Own Full-Text Search CMS with Neon and pg_search