Read File Node
The Parse File node reads the text content from files. Got a PDF you need to analyze? An email attachment to process? An image with text in it? This node extracts the text so you can work with it.When to Use
- Reading documents - Get the text from PDFs, Word docs, PowerPoints
- Processing attachments - Read files that came with emails
- Reading text from images - Extract words from photos, scans, screenshots
- Importing data - Read CSV or Excel files
- Processing forms - Get the filled-in data from PDF forms
Supported File Types
Documents
| Format | Extension | Notes |
|---|---|---|
| Text and image-based (via OCR) | ||
| Word | .docx, .doc | Modern and legacy formats |
| PowerPoint | .pptx, .ppt | Extracts slide text |
| Text | .txt, .md, .rtf | Plain text files |
Spreadsheets
| Format | Extension | Notes |
|---|---|---|
| Excel | .xlsx, .xls | Returns structured data |
| CSV | .csv | Parsed into rows/columns |
Images
| Format | Extension | Notes |
|---|---|---|
| Images | .png, .jpg, .jpeg, .gif, .webp | OCR extraction |
| TIFF | .tiff | Often used for scans |
Example: Email Attachment Processor
Extract and analyze content from email attachments:Example: Invoice Data Extraction
Extract structured data from PDF invoices:Example: Resume Scanner
Process job applications:Working with Multi-Page Documents
Process All Pages Together
Process Pages Individually
Extract Specific Pages
Configuration:pages: "1,5-10"
Only extracts pages 1 and 5 through 10.
Handling Tables
For documents with tables, use structured extraction mode:Reading Text from Images and Scans
If your file is an image or a scanned document, use OCR mode to read the text: Set extraction mode to: ocrOCR takes longer (usually 2-5 seconds per page) and isn’t always perfect. Double-check important extractions.
- Use higher quality images
- Make sure the text is clear and readable
- Note that handwriting usually doesn’t work well
When Things Go Wrong
File reading can fail because:- The file is damaged
- The file type isn’t supported
- The file is password-protected
- The file is too big
File Size Limits
| File Type | Max Size |
|---|---|
| 50 MB | |
| Word/PowerPoint | 25 MB |
| Images | 10 MB |
| CSV/Excel | 25 MB |
Tips
Settings
What to call this node (shown on the canvas).
A short code to reference this node’s content.
Where the file is coming from:
- url - Download it from a web address
- base64 - The file content encoded as text
- previous_node - A file from an earlier step (like an email attachment)
The web address to download from (if using URL source).
The encoded file content (if using base64 source).
Reference to a file from an earlier node (if using previous_node source).
How to read the file:
- text - Just get the words
- structured - Try to keep tables and lists intact
- ocr - Read text from images or scanned documents
Which pages to read (for documents with multiple pages). Examples:
1-5, 1,3,5, or all.Outputs
The extracted text content.
Array of page contents (for multi-page documents).
File metadata:
pageCount- Number of pagesfileType- Detected file typefileSize- Size in bytestitle- Document title (if available)author- Document author (if available)createdAt- Creation date (if available)
Extracted tables (when using structured mode).
Whether extraction completed successfully.
