Masking of Sensitive LLM Data

Masking is a feature that allows precise control over the tracing data sent to the Langfuse server. With custom masking functions, you can control and sanitize the data that gets traced and sent to the server. Whether it’s for compliance reasons or to protect user privacy, masking sensitive data is a crucial step in responsible application development. It enables you to:

Redact sensitive information from trace or observation inputs and outputs.
Customize the content of events before transmission.
Implement fine-grained data filtering based on your specific requirements.

Learn more about Langfuse’s data security and privacy measures concerning the stored data in our security and compliance overview.

How it works

You define a custom masking function and pass it to the Langfuse client constructor.
All event inputs and outputs are processed through this function.
The masked data is then sent to the Langfuse server.

This approach ensures that you have complete control over the event input and output data traced by your application.

The v3 SDK is currently in beta. Please check out the SDK v3 for more details.

Define a masking function. The masking function will apply to all event inputs and outputs regardless of the Langfuse-maintained integration you are using.

def masking_function(data: any, **kwargs) -> any:
    """Function to mask sensitive data before sending to Langfuse."""
    if isinstance(data, str) and data.startswith("SECRET_"):
        return "REDACTED"
 
    # For more complex data structures
    elif isinstance(data, dict):
        return {k: masking_function(v) for k, v in data.items()}
    elif isinstance(data, list):
        return [masking_function(item) for item in data]
 
    return data

Apply the masking function when initializing the Langfuse client:

from langfuse import Langfuse
 
langfuse = Langfuse(mask=masking_function)

With the decorator:

from langfuse import observe
 
langfuse = Langfuse(mask=masking_function)
 
 
@observe()
def my_function():
    # This data will be masked before being sent to Langfuse
    return "SECRET_DATA"
 
result = my_function()
print(result)  # Original: "SECRET_DATA"
 
# The trace output in Langfuse will have the output masked as "REDACTED"

Using context managers:

from langfuse import Langfuse
 
langfuse = Langfuse(mask=masking_function)
 
with langfuse.start_as_current_span(
    name="sensitive-operation",
    input="SECRET_INPUT_DATA"
) as span:
    # ... processing ...
    span.update(output="SECRET_OUTPUT_DATA")
 
# Both input and output will be masked as "REDACTED" in Langfuse

Define a masking function:

def masking_function(data):
  if isinstance(data, str) and data.startswith("SECRET_"):
    return "REDACTED"
 
  return data

Use with the @observe() decorator:

from langfuse.decorators import langfuse_context, observe
 
langfuse_context.configure(mask=masking_function)
 
@observe()
def fn():
    return "SECRET_DATA"
 
fn()
 
langfuse_context.flush()
 
# The trace output in Langfuse will have the output masked as "REDACTED".

Use with the low-level SDK:

from langfuse import Langfuse
 
langfuse = Langfuse(mask=masking_function)
 
trace = langfuse.trace(output="SECRET_DATA")
 
langfuse.flush()
 
# The trace output in Langfuse will have the output masked as "REDACTED".

import { Langfuse } from "langfuse";
 
function maskingFunction(params: { data: any }) {
  if (typeof params.data === "string" && params.data.startsWith("SECRET_")) {
    return "REDACTED";
  }
 
  return params.data;
}
 
const langfuse = new Langfuse({ mask: maskingFunction });
 
const trace = langfuse.trace({
  output: "SECRET_DATA",
});
 
await langfuse.flushAsync();
 
// The trace output in Langfuse will have the output masked as "REDACTED".

See JS/TS SDK docs for more details.

When using the Python SDK v3, the masking function provided on client initialization will apply to all event inputs and outputs regardless of the Langfuse-maintained integration you are using.

See the Python SDK v3 tab for more details.

When using the OpenAI SDK Integration with the Python SDK v2, set openai.langfuse_mask to the masking function:

from langfuse.openai import openai
 
def masking_function(data):
  if isinstance(data, str) and data.startswith("SECRET_"):
    return "REDACTED"
 
  return data
 
openai.langfuse_mask = masking_function
 
completion = openai.chat.completions.create(
  name="test-chat",
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a bot."},
    {"role": "user", "content": "1 + 1 = "}],
  temperature=0,
)
 
openai.flush_langfuse()

When using the integration with the @observe() decorator (see interop docs), set masking function via the langfuse_context:

from langfuse.decorators import langfuse_context, observe
from langfuse.openai import openai
 
def masking_function(data):
  if isinstance(data, str) and data.startswith("SECRET_"):
    return "REDACTED"
 
  return data
 
langfuse_context.configure(mask=masking_function)
 
@observe()
def fn():
    completion = openai.chat.completions.create(
      name="test-chat",
      model="gpt-3.5-turbo",
      messages=[
        {"role": "system", "content": "You are a calculator."},
        {"role": "user", "content": "1 + 1 = "}],
      temperature=0,
    )
 
fn()

When using the Python SDK v3, the masking function provided on client initialization will apply to all event inputs and outputs regardless of the Langfuse-maintained integration you are using.

See the Python SDK v3 tab for more details.

When using the CallbackHandler, you can pass mask as a keyword argument:

from langfuse.callback import CallbackHandler
 
def masking_function(data):
  if isinstance(data, str) and data.startswith("SECRET_"):
    return "REDACTED"
 
  return data
 
handler = CallbackHandler(
  mask=masking_function
)

When using the integration with the @observe() decorator (see interop docs), set mask via the langfuse_context:

from langfuse.decorators import langfuse_context, observe
 
def masking_function(data):
  if isinstance(data, str) and data.startswith("SECRET_"):
    return "REDACTED"
 
  return data
 
langfuse_context.configure(mask=masking_function)
 
@observe()
def fn():
    langfuse_handler = langfuse_context.get_current_langchain_handler()
 
    # Pass handler to invoke of your langchain chain/agent
    chain.invoke({"person": person}, config={"callbacks":[langfuse_handler]})
 
fn()

When using the CallbackHandler, you can pass mask to the constructor:

import { CallbackHandler } from "langfuse-langchain";
 
function maskingFunction(params: { data: any }) {
  if (typeof params.data === "string" && params.data.startsWith("SECRET_")) {
    return "REDACTED";
  }
 
  return params.data;
}
 
const handler = new CallbackHandler({
  mask: maskingFunction,
});

The LlamaIndex integration is not supported in the Python SDK v3. Please use a community-maintained OTEL-based integration instead.

When using the LlamaIndex Integration, set the mask via the instrumentor.observe() context manager:

from langfuse.llama_index import LlamaIndexInstrumentor
 
def masking_function(data):
  if isinstance(data, str) and data.startswith("SECRET_"):
    return "REDACTED"
 
  return data
 
instrumentor = LlamaIndexInstrumentor(mask=masking_function)
 
with instrumentor.observe():
    # ... your LlamaIndex index creation ...
 
    index.as_query_engine().query("What is the capital of France?")
 
instrumentor.flush()

When using the integration with the @observe() decorator (see interop docs), set the mask via the langfuse_context:

from langfuse.decorators import langfuse_context, observe
from langfuse.llama_index import LlamaIndexInstrumentor
 
def masking_function(data):
  if isinstance(data, str) and data.startswith("SECRET_"):
    return "REDACTED"
 
  return data
 
langfuse_context.configure(mask=masking_function)
 
@observe()
def llama_index_fn(question: str):
    # Get IDs
    current_trace_id = langfuse_context.get_current_trace_id()
    current_observation_id = langfuse_context.get_current_observation_id()
 
    # Pass to instrumentor
    with instrumentor.observe(
        trace_id=current_trace_id,
        parent_observation_id=current_observation_id,
        update_parent=False
    ) as trace:
        # ... your LlamaIndex index creation ...
 
        index.as_query_engine().query("What is the capital of France?")
 
        # Run application
        index = VectorStoreIndex.from_documents([doc1, doc2])
        response = index.as_query_engine().query(question)
 
        return response

Examples

Now, we’ll show you examples how to use the masking feature. We’ll use the Langfuse decorator for this, but you can also use the low-level SDK or the JS/TS SDK analogously.

Example 1: Redacting Credit Card Numbers

In this example, we’ll demonstrate how to redact credit card numbers from strings using a regular expression. This helps in complying with PCI DSS by ensuring that credit card numbers are not transmitted or stored improperly.

Langfuse’s masking feature allows you to define a custom masking function with parameters, which you then pass to the Langfuse client constructor. This function is applied to all event inputs and outputs, processing each piece of data to mask or redact sensitive information according to your specifications. By ensuring that all events are processed through your masking function before being sent, Langfuse guarantees that only the masked data is transmitted to the Langfuse server.

Steps:

Import necessary modules.
Define a masking function that uses a regular expression to detect and replace credit card numbers.
Configure the masking function in Langfuse.
Create a sample function to simulate processing sensitive data.
Observe the trace to see the masked output.

import re
from langfuse.decorators import langfuse_context, observe
 
# Step 2: Define the masking function
def masking_function(data):
    if isinstance(data, str):
        # Regular expression to match credit card numbers (Visa, MasterCard, AmEx, etc.)
        pattern = r'\b(?:\d[ -]*?){13,19}\b'
        data = re.sub(pattern, '[REDACTED CREDIT CARD]', data)
    return data
 
# Step 3: Configure the masking function
langfuse_context.configure(mask=masking_function)
 
# Step 4: Create a sample function with sensitive data
@observe()
def process_payment():
    # Simulated sensitive data containing a credit card number
    transaction_info = "Customer paid with card number 4111 1111 1111 1111."
    return transaction_info
 
# Step 5: Observe the trace
result = process_payment()
 
print(result)
# Output: Customer paid with card number [REDACTED CREDIT CARD].

Redacted trace in Langfuse 1

Link to the trace in Langfuse

Example 2: Using the `llm-guard` library

In this example, we’ll use the Anonymize scanner from llm-guard to remove personal names and other PII from the data. This is useful for anonymizing user data and protecting privacy.

Find our more about the llm-guard library in their documentation.

Steps:

Install the llm-guard library.
Import necessary modules.
Initialize the Vault and configure the Anonymize scanner.
Define a masking function that uses the Anonymize scanner.
Configure the masking function in Langfuse.
Create a sample function to simulate processing data with PII.
Observe the trace to see the masked output.

pip install llm-guard

from langfuse.decorators import langfuse_context, observe
from llm_guard.vault import Vault
from llm_guard.input_scanners import Anonymize
from llm_guard.input_scanners.anonymize_helpers import BERT_LARGE_NER_CONF
 
# Step 3: Initialize the Vault and configure the Anonymize scanner
vault = Vault()
 
def create_anonymize_scanner():
    scanner = Anonymize(
        vault,
        recognizer_conf=BERT_LARGE_NER_CONF,
        language="en"
    )
    return scanner
 
# Step 4: Define the masking function
def masking_function(data):
    if isinstance(data, str):
        scanner = create_anonymize_scanner()
        # Scan and redact the data
        sanitized_data, is_valid, risk_score = scanner.scan(data)
        return sanitized_data
    return data
 
# Step 5: Configure the masking function
langfuse_context.configure(mask=masking_function)
 
# Step 6: Create a sample function with PII
@observe()
def generate_report():
    # Simulated data containing personal names
    report = "John Doe met with Jane Smith to discuss the project."
    return report
 
# Step 7: Observe the trace
result = generate_report()
 
print(result)
# Output: [REDACTED_PERSON] met with [REDACTED_PERSON] to discuss the project.

Redacted trace in Langfuse

Link to the trace in Langfuse 2

Example 3: Masking Email and Phone Numbers

You can extend the masking function to redact other types of PII such as email addresses and phone numbers using regular expressions.

import re
from langfuse.decorators import langfuse_context, observe
 
def masking_function(data):
    if isinstance(data, str):
        # Mask email addresses
        data = re.sub(r'\b[\w.-]+?@\w+?\.\w+?\b', '[REDACTED EMAIL]', data)
        # Mask phone numbers
        data = re.sub(r'\b\d{3}[-. ]?\d{3}[-. ]?\d{4}\b', '[REDACTED PHONE]', data)
    return data
 
langfuse_context.configure(mask=masking_function)
 
@observe()
def contact_customer():
    info = "Please contact John at john.doe@example.com or call 555-123-4567."
    return info
 
result = contact_customer()
 
print(result)
# Output: Please contact John at [REDACTED EMAIL] or call [REDACTED PHONE].