Training Models¶
Best practices for training DeepTagger to extract data accurately.
How Many Examples?¶
- Minimum: 3 examples
- Recommended: 5-10 examples
- Optimal: 15-20 examples with variation
More examples = better accuracy, especially for complex documents.
Choosing Training Examples¶
Include variety: - Different layouts - Different vendors/sources - Edge cases (missing fields, unusual formats) - Both simple and complex documents
Annotation Best Practices¶
- Be precise - Select exactly the text you want
- Be consistent - Always select the same way
- Include context - Don't select too little or too much
- Label correctly - Use correct field names
Iterative Improvement¶
- Train with initial examples
- Test on new documents
- Review extraction results
- Correct errors and add as training examples
- Repeat
Common Issues¶
See Best Practices for troubleshooting training issues.