Skip to content

Training Models

Best practices for training DeepTagger to extract data accurately.

How Many Examples?

  • Minimum: 3 examples
  • Recommended: 5-10 examples
  • Optimal: 15-20 examples with variation

More examples = better accuracy, especially for complex documents.

Choosing Training Examples

Include variety: - Different layouts - Different vendors/sources - Edge cases (missing fields, unusual formats) - Both simple and complex documents

Annotation Best Practices

  1. Be precise - Select exactly the text you want
  2. Be consistent - Always select the same way
  3. Include context - Don't select too little or too much
  4. Label correctly - Use correct field names

Iterative Improvement

  1. Train with initial examples
  2. Test on new documents
  3. Review extraction results
  4. Correct errors and add as training examples
  5. Repeat

Common Issues

See Best Practices for troubleshooting training issues.