Document Text Extraction
tagd-ai automatically extracts text from images, PDFs, and documents, making all your content searchable.
Automatic Extraction
When you upload files to a tag, text extraction happens automatically:
Supported File Types
| Type | Formats | Extraction Method |
|---|---|---|
| Images | PNG, JPEG, WebP, GIF, BMP, TIFF | OCR (Optical Character Recognition) |
| PDFs | PDF text extraction + OCR for scanned pages | |
| Documents | DOCX, DOC | Document parsing |
| Spreadsheets | XLSX, XLS | Cell content extraction |
What Gets Extracted
- Images: Any visible text in photos, screenshots, or graphics
- PDFs: All text content, including scanned documents
- Documents: Full document text and formatting
- Spreadsheets: Cell values and structure
How OCR Works
For images and scanned PDFs:
- Upload - Add image or scanned PDF to tag
- Analysis - AI identifies text regions
- Recognition - Characters are converted to text
- Indexing - Text becomes searchable
OCR Accuracy
Best results with:
- Clear, legible text
- Good image quality
- Standard fonts
- Adequate contrast
Works well for:
- Printed documents
- Screenshots
- Photos of text
- Handwritten text (limited)
Using Extracted Text
AI Search
Search across document content:
"Find invoices with amount over $500"
"Documents mentioning product warranty"
"PDFs from 2024"
AI Chat
Ask questions about document content:
"What are the key terms in this contract?"
"Summarize this PDF"
"What dates are mentioned in this document?"
View Extracted Text
- Open tag with document
- Click on the file block
- Click View Extracted Text
- See the full extracted content
Use Cases
Receipts and Invoices
Upload photos of receipts:
- Amount is extracted
- Vendor name captured
- Date recognized
- Searchable records
Business Cards
Photograph business cards:
- Contact name extracted
- Phone/email captured
- Company identified
- Easy to find later
Whiteboard Photos
Capture meeting whiteboards:
- Text becomes searchable
- Ideas preserved
- Notes accessible
Legal Documents
Upload contracts and agreements:
- Full text searchable
- Find specific clauses
- AI answers questions
Product Labels
Photograph product information:
- Specifications extracted
- Model numbers captured
- Ingredients readable
Handwritten Notes
Photograph handwritten pages:
- Text recognized (when legible)
- Notes become searchable
- Works best with clear writing
Best Practices
For Better Extraction
Image quality:
- Good lighting
- Clear focus
- Minimal glare
- Adequate resolution (300+ DPI for print)
Document tips:
- Standard fonts work best
- High contrast (black on white)
- Avoid decorative fonts
- Clean, undamaged pages
Organizing Documents
- Name files descriptively
- Group related documents
- Use folders for categories
- Add context in tag title
Multi-Language Support
OCR supports text in:
- English
- Spanish
- French
- German
- Italian
- Portuguese
- Chinese (Simplified & Traditional)
- Japanese
- Korean
- Arabic
- Hebrew
- Russian
- And 50+ more languages
Mixed-language documents are handled automatically.
Privacy & Security
Data Processing
- Documents processed securely
- Extracted text stored encrypted
- Only accessible to you
- Deleted with the file
No Data Training
- Your documents are never used for AI training
- Content remains private
- Enterprise-grade security
Troubleshooting
Text Not Extracted
- Check file format is supported
- Verify image quality
- Ensure text is visible/legible
- Try higher resolution
Poor Accuracy
- Improve image lighting
- Increase resolution
- Crop to relevant area
- Avoid blurry images
Processing Failed
- Check file isn't corrupted
- Verify file size is reasonable
- Try re-uploading
- Contact support if persistent
Slow Processing
Large documents may take longer:
- Multi-page PDFs: 1-2 minutes
- High-resolution images: 30 seconds
- Standard documents: Under 10 seconds
Plan Limits
| Plan | Pages/Month |
|---|---|
| Free | 50 pages |
| Pro | 500 pages |
| Enterprise | Unlimited |
Each image counts as 1 page. PDFs count by actual page count.