Skip to content

Operations Reference

The DeepTagger node currently supports one primary operation: Extract Data. This page provides detailed documentation on how to use it.

Extract Data Operation

Extracts structured data from a document or text using a trained DeepTagger project.

Parameters

Operation

Type: Dropdown (fixed)

Value: Extract Data

Description: The type of operation to perform. Currently only data extraction is supported.

Project ID

Type: String

Required: Yes

Description: The ID of your trained DeepTagger project.

Format: fo_ followed by a timestamp (e.g., fo_1759714105892)

How to find:

  1. Go to https://deeptagger.com/das/fos
  2. Click on your project
  3. Copy the ID from the URL:
    https://deeptagger.com/das/fos/fo_1759714105892
                                       ^^^^^^^^^^^^^^^^
                                       This is your Project ID
    

Example: fo_1759714105892

Hint: The node includes a helpful hint text with this information.

Input Type

Type: Dropdown

Required: Yes

Options: - File (default) - Upload a document file - Text - Send raw text for extraction

Description: Determines whether you're sending a file (PDF, image) or raw text content.

When to use File: - Processing PDFs, images, scanned documents - Working with binary data from previous nodes - Uploaded documents via webhooks

When to use Text: - Extracting from plain text, emails, form submissions - Text data from APIs or databases - Markdown, HTML, or other text formats

Binary Property (for File input)

Type: String

Default: data

Required: Yes (when Input Type = File)

Description: Name of the binary property containing the file data.

Common values: - data (default for most nodes) - file (some HTTP nodes) - attachment (email nodes)

How it works: n8n passes binary data between nodes using named properties. This parameter tells the DeepTagger node where to find the file data from the previous node.

Example: If a previous node outputs binary data in a property called invoice, set this to invoice.

Text (for Text input)

Type: Multi-line String

Required: Yes (when Input Type = Text)

Description: The raw text content to extract data from.

Usage: - Can be hardcoded text (for testing) - Or dynamic expression: {{$json["body"]}} (from previous node)

Example:

Invoice #12345
Date: 2025-01-15
Total: $1,234.56

Input Requirements

For File Input

The previous node must output binary data. Compatible source nodes include:

  • HTTP Request - Download files from URLs
  • Webhook - Receive uploaded files
  • Google Drive - Read files from Drive
  • Dropbox - Read files from Dropbox
  • Email (IMAP) - Extract attachments
  • Read Binary File - Load files from disk
  • FTP - Download files via FTP

Binary data structure:

{
  "data": {
    "data": "base64encodeddata...",
    "mimeType": "application/pdf",
    "fileName": "invoice.pdf"
  }
}

For Text Input

The previous node must output JSON data containing text. Compatible source nodes include:

  • HTTP Request - API responses with text
  • Webhook - Form submissions
  • Email - Email body text
  • Google Sheets - Cell content
  • Database - Query results
  • Set - Manually set text value

JSON data structure:

{
  "text": "Invoice content here..."
}

Output

The node returns structured JSON data extracted from the document.

Success Output

{
  "invoice_number": "INV-2025-001",
  "date": "2025-01-15",
  "total": "$1,234.56",
  "vendor": "Acme Corporation",
  "line_items": [
    {
      "description": "Widget A",
      "quantity": 10,
      "price": "$10.00"
    },
    {
      "description": "Widget B",
      "quantity": 5,
      "price": "$20.00"
    }
  ]
}

The exact structure depends on your DeepTagger project configuration.

Error Output

If an error occurs (and "Continue on Fail" is enabled):

{
  "error": "Project not found"
}

Configuration Options

Continue on Fail

Location: Node settings (click the three dots menu)

Description: If enabled, the workflow continues even if the DeepTagger node fails. The error is returned as JSON.

Use cases: - Batch processing where some documents may fail - Fault-tolerant workflows - Logging errors without stopping the workflow

When enabled:

{
  "error": "Failed to extract data: Invalid project ID"
}

When disabled: Workflow execution stops and shows error message.

Usage Examples

Example 1: Extract Invoice Data from File Upload

Workflow:

Webhook → DeepTagger → Google Sheets

DeepTagger Configuration: - Operation: Extract Data - Project ID: fo_1759714105892 (your invoice project) - Input Type: File - Binary Property: data

Webhook receives file upload via multipart/form-data.

DeepTagger extracts:

{
  "invoice_number": "INV-123",
  "total": "$500.00",
  "date": "2025-01-15"
}

Google Sheets appends row with extracted data.

Example 2: Extract Receipt Data from Email

Workflow:

Email Trigger → Filter → DeepTagger → Airtable

DeepTagger Configuration: - Operation: Extract Data - Project ID: fo_1759722334567 (your receipt project) - Input Type: File - Binary Property: attachment0 (first attachment)

Filter ensures email has PDF attachment.

DeepTagger processes the attachment.

Airtable creates record with extracted data.

Example 3: Extract Data from Text (Form Submission)

Workflow:

Webhook → Set → DeepTagger → Database

Set Node formats the form data:

{
  "text": "{{$json.body.formContent}}"
}

DeepTagger Configuration: - Operation: Extract Data - Project ID: fo_1759733445678 (your form project) - Input Type: Text - Text: {{$json["text"]}}

Database node inserts structured data.

Example 4: Batch Process Documents from Google Drive

Workflow:

Schedule → Google Drive List → Loop → Google Drive Download → DeepTagger → Spreadsheet → Move File

Google Drive List finds new PDFs in a folder.

Loop processes each file individually.

Google Drive Download gets the file binary data.

DeepTagger Configuration: - Operation: Extract Data - Project ID: fo_1759744556789 (your document project) - Input Type: File - Binary Property: data

Spreadsheet logs extracted data.

Move File archives processed documents.

Example 5: Extract Contract Terms with Error Handling

Workflow:

Dropbox Trigger → DeepTagger → IF → [Success Path] → [Error Path]

DeepTagger Configuration: - Operation: Extract Data - Project ID: fo_1759755667890 (your contract project) - Input Type: File - Binary Property: data - Settings: ✅ Continue on Fail

IF Node checks for errors:

{{$json["error"] === undefined}}

Success Path: Send to Airtable, notify Slack.

Error Path: Log to error database, send alert email.

Expressions and Dynamic Values

Using Dynamic Project IDs

If you have multiple projects and want to select dynamically:

// Based on document type
{{$json["docType"] === "invoice" ? "fo_1759714105892" : "fo_1759722334567"}}

// From previous node
{{$json["projectId"]}}

// From workflow variables
{{$vars.INVOICE_PROJECT_ID}}

Using Dynamic Text

Extract from different sources:

// Email body
{{$json["body"]["text"]}}

// HTTP response
{{$json["content"]}}

// Database query result
{{$json["documentText"]}}

// Multiple fields concatenated
{{$json["title"] + "\n" + $json["description"]}}

Accessing Binary Data from Specific Nodes

Different nodes name their binary output differently:

// HTTP Request (binary)
Binary Property: data

// Email IMAP (first attachment)
Binary Property: attachment0

// Google Drive
Binary Property: data

// Read Binary File
Binary Property: data

Performance Considerations

Processing Time

  • Text extraction: ~2-5 seconds
  • PDF processing: ~5-15 seconds (depends on page count)
  • Image processing: ~3-10 seconds (depends on resolution and complexity)

File Size Limits

  • Maximum file size: Check your DeepTagger plan limits
  • Recommended: Keep files under 10MB for best performance
  • Large PDFs: Consider splitting into smaller chunks

Rate Limits

  • Check your DeepTagger API plan for rate limits
  • For high-volume workflows, implement:
  • Retry logic with exponential backoff
  • Queue management
  • Batch processing with delays

Best Practices

  1. Batch Processing: Add small delays between nodes when processing many documents
  2. Error Handling: Always enable "Continue on Fail" for batch workflows
  3. Caching: For identical documents, consider caching results
  4. Monitoring: Log all extractions for auditing and debugging

Next Steps

Future Operations

Future versions may include:

  • List Projects - Get all available projects
  • Train Model - Add training examples via API
  • Batch Extract - Process multiple documents in one call
  • Get Extraction Status - Check async processing status

Stay tuned for updates!