Operations Reference¶
The DeepTagger node currently supports one primary operation: Extract Data. This page provides detailed documentation on how to use it.
Extract Data Operation¶
Extracts structured data from a document or text using a trained DeepTagger project.
Parameters¶
Operation¶
Type: Dropdown (fixed)
Value: Extract Data
Description: The type of operation to perform. Currently only data extraction is supported.
Project ID¶
Type: String
Required: Yes
Description: The ID of your trained DeepTagger project.
Format: fo_
followed by a timestamp (e.g., fo_1759714105892
)
How to find:
- Go to https://deeptagger.com/das/fos
- Click on your project
- Copy the ID from the URL:
Example: fo_1759714105892
Hint: The node includes a helpful hint text with this information.
Input Type¶
Type: Dropdown
Required: Yes
Options:
- File
(default) - Upload a document file
- Text
- Send raw text for extraction
Description: Determines whether you're sending a file (PDF, image) or raw text content.
When to use File: - Processing PDFs, images, scanned documents - Working with binary data from previous nodes - Uploaded documents via webhooks
When to use Text: - Extracting from plain text, emails, form submissions - Text data from APIs or databases - Markdown, HTML, or other text formats
Binary Property (for File input)¶
Type: String
Default: data
Required: Yes (when Input Type = File)
Description: Name of the binary property containing the file data.
Common values:
- data
(default for most nodes)
- file
(some HTTP nodes)
- attachment
(email nodes)
How it works: n8n passes binary data between nodes using named properties. This parameter tells the DeepTagger node where to find the file data from the previous node.
Example:
If a previous node outputs binary data in a property called invoice
, set this to invoice
.
Text (for Text input)¶
Type: Multi-line String
Required: Yes (when Input Type = Text)
Description: The raw text content to extract data from.
Usage:
- Can be hardcoded text (for testing)
- Or dynamic expression: {{$json["body"]}}
(from previous node)
Example:
Input Requirements¶
For File Input¶
The previous node must output binary data. Compatible source nodes include:
- HTTP Request - Download files from URLs
- Webhook - Receive uploaded files
- Google Drive - Read files from Drive
- Dropbox - Read files from Dropbox
- Email (IMAP) - Extract attachments
- Read Binary File - Load files from disk
- FTP - Download files via FTP
Binary data structure:
{
"data": {
"data": "base64encodeddata...",
"mimeType": "application/pdf",
"fileName": "invoice.pdf"
}
}
For Text Input¶
The previous node must output JSON data containing text. Compatible source nodes include:
- HTTP Request - API responses with text
- Webhook - Form submissions
- Email - Email body text
- Google Sheets - Cell content
- Database - Query results
- Set - Manually set text value
JSON data structure:
Output¶
The node returns structured JSON data extracted from the document.
Success Output¶
{
"invoice_number": "INV-2025-001",
"date": "2025-01-15",
"total": "$1,234.56",
"vendor": "Acme Corporation",
"line_items": [
{
"description": "Widget A",
"quantity": 10,
"price": "$10.00"
},
{
"description": "Widget B",
"quantity": 5,
"price": "$20.00"
}
]
}
The exact structure depends on your DeepTagger project configuration.
Error Output¶
If an error occurs (and "Continue on Fail" is enabled):
Configuration Options¶
Continue on Fail¶
Location: Node settings (click the three dots menu)
Description: If enabled, the workflow continues even if the DeepTagger node fails. The error is returned as JSON.
Use cases: - Batch processing where some documents may fail - Fault-tolerant workflows - Logging errors without stopping the workflow
When enabled:
When disabled: Workflow execution stops and shows error message.
Usage Examples¶
Example 1: Extract Invoice Data from File Upload¶
Workflow:
DeepTagger Configuration:
- Operation: Extract Data
- Project ID: fo_1759714105892
(your invoice project)
- Input Type: File
- Binary Property: data
Webhook receives file upload via multipart/form-data.
DeepTagger extracts:
Google Sheets appends row with extracted data.
Example 2: Extract Receipt Data from Email¶
Workflow:
DeepTagger Configuration:
- Operation: Extract Data
- Project ID: fo_1759722334567
(your receipt project)
- Input Type: File
- Binary Property: attachment0
(first attachment)
Filter ensures email has PDF attachment.
DeepTagger processes the attachment.
Airtable creates record with extracted data.
Example 3: Extract Data from Text (Form Submission)¶
Workflow:
Set Node formats the form data:
DeepTagger Configuration:
- Operation: Extract Data
- Project ID: fo_1759733445678
(your form project)
- Input Type: Text
- Text: {{$json["text"]}}
Database node inserts structured data.
Example 4: Batch Process Documents from Google Drive¶
Workflow:
Google Drive List finds new PDFs in a folder.
Loop processes each file individually.
Google Drive Download gets the file binary data.
DeepTagger Configuration:
- Operation: Extract Data
- Project ID: fo_1759744556789
(your document project)
- Input Type: File
- Binary Property: data
Spreadsheet logs extracted data.
Move File archives processed documents.
Example 5: Extract Contract Terms with Error Handling¶
Workflow:
DeepTagger Configuration:
- Operation: Extract Data
- Project ID: fo_1759755667890
(your contract project)
- Input Type: File
- Binary Property: data
- Settings: ✅ Continue on Fail
IF Node checks for errors:
Success Path: Send to Airtable, notify Slack.
Error Path: Log to error database, send alert email.
Expressions and Dynamic Values¶
Using Dynamic Project IDs¶
If you have multiple projects and want to select dynamically:
// Based on document type
{{$json["docType"] === "invoice" ? "fo_1759714105892" : "fo_1759722334567"}}
// From previous node
{{$json["projectId"]}}
// From workflow variables
{{$vars.INVOICE_PROJECT_ID}}
Using Dynamic Text¶
Extract from different sources:
// Email body
{{$json["body"]["text"]}}
// HTTP response
{{$json["content"]}}
// Database query result
{{$json["documentText"]}}
// Multiple fields concatenated
{{$json["title"] + "\n" + $json["description"]}}
Accessing Binary Data from Specific Nodes¶
Different nodes name their binary output differently:
// HTTP Request (binary)
Binary Property: data
// Email IMAP (first attachment)
Binary Property: attachment0
// Google Drive
Binary Property: data
// Read Binary File
Binary Property: data
Performance Considerations¶
Processing Time¶
- Text extraction: ~2-5 seconds
- PDF processing: ~5-15 seconds (depends on page count)
- Image processing: ~3-10 seconds (depends on resolution and complexity)
File Size Limits¶
- Maximum file size: Check your DeepTagger plan limits
- Recommended: Keep files under 10MB for best performance
- Large PDFs: Consider splitting into smaller chunks
Rate Limits¶
- Check your DeepTagger API plan for rate limits
- For high-volume workflows, implement:
- Retry logic with exponential backoff
- Queue management
- Batch processing with delays
Best Practices¶
- Batch Processing: Add small delays between nodes when processing many documents
- Error Handling: Always enable "Continue on Fail" for batch workflows
- Caching: For identical documents, consider caching results
- Monitoring: Log all extractions for auditing and debugging
Next Steps¶
- Example Workflows - Detailed workflow examples
- Troubleshooting - Common issues and solutions
- API Reference - Direct API usage
Future Operations¶
Future versions may include:
- List Projects - Get all available projects
- Train Model - Add training examples via API
- Batch Extract - Process multiple documents in one call
- Get Extraction Status - Check async processing status
Stay tuned for updates!