When to Use
- Reading documents - Get the text from PDFs, Word docs, PowerPoints
- Processing attachments - Read files that came with emails
- Reading text from images - Extract words from photos, scans, screenshots
- Importing data - Read CSV or Excel files
- Processing forms - Get the filled-in data from PDF forms
Supported File Types
Documents
| Format | Extension | Notes |
|---|---|---|
| Text and image-based (via OCR) | ||
| Word | .docx, .doc | Modern and legacy formats |
| PowerPoint | .pptx, .ppt | Extracts slide text |
| Text | .txt, .md, .rtf | Plain text files |
Spreadsheets
| Format | Extension | Notes |
|---|---|---|
| Excel | .xlsx, .xls | Returns structured data |
| CSV | .csv | Parsed into rows/columns |
Images
| Format | Extension | Notes |
|---|---|---|
| Images | .png, .jpg, .jpeg, .gif, .webp | OCR extraction |
| TIFF | .tiff | Often used for scans |
Example: Email Attachment Processor
Extract and analyze content from email attachments:Example: Invoice Data Extraction
Extract structured data from PDF invoices:Example: Resume Scanner
Process job applications:Working with Multi-Page Documents
Process All Pages Together
Process Pages Individually
Extract Specific Pages
Configuration:pages: "1,5-10"
Only extracts pages 1 and 5 through 10.
Handling Tables
For documents with tables, use structured extraction mode:Reading Text from Images and Scans
If your file is an image or a scanned document, use OCR mode to read the text: Set extraction mode to: ocrOCR takes longer (usually 2-5 seconds per page) and isn’t always perfect. Double-check important extractions.
- Use higher quality images
- Make sure the text is clear and readable
- Note that handwriting usually doesn’t work well
When Things Go Wrong
File reading can fail because:- The file is damaged
- The file type isn’t supported
- The file is password-protected
- The file is too big
File Size Limits
| File Type | Max Size |
|---|---|
| 50 MB | |
| Word/PowerPoint | 25 MB |
| Images | 10 MB |
| CSV/Excel | 25 MB |
Tips
Settings
What to call this node (shown on the canvas).
A short code to reference this node’s content.
Where the file is coming from:
- url - Download it from a web address
- base64 - The file content encoded as text
- previous_node - A file from an earlier step (like an email attachment)
The web address to download from (if using URL source).
The encoded file content (if using base64 source).
Reference to a file from an earlier node (if using previous_node source).
How to read the file:
- text - Just get the words
- structured - Try to keep tables and lists intact
- ocr - Read text from images or scanned documents
Which pages to read (for documents with multiple pages). Examples:
1-5, 1,3,5, or all.Outputs
The extracted text content.
Array of page contents (for multi-page documents).
File metadata:
pageCount- Number of pagesfileType- Detected file typefileSize- Size in bytestitle- Document title (if available)author- Document author (if available)createdAt- Creation date (if available)
Extracted tables (when using structured mode).
Whether extraction completed successfully.
Related Nodes
Ask AI
Analyze and extract insights from parsed content.
Loop
Process multiple files or pages.
