Llamaparserb
A Ruby client for the LlamaIndex Parsing API. This gem allows you to easily parse various document formats (PDF, DOCX, etc.) into text or markdown. Loosely based on the Python version.
Installation
Add this line to your application's Gemfile:
gem 'llamaparserb'
And then execute:
$ bundle install
Or install it yourself as:
$ gem install llamaparserb
Usage
Basic Usage
require 'llamaparserb'
# Initialize client with API key
client = Llamaparserb::Client.new(ENV['LLAMA_CLOUD_API_KEY'])
# Parse a file from disk (to text by default)
text = client.parse_file('path/to/document.pdf')
# Parse an in-memory file (requires file type)
require 'open-uri'
file_content = URI.open('https://example.com/document.pdf')
text = client.parse_file(file_content, 'pdf')
# Parse a file to markdown
client = Llamaparserb::Client.new(ENV['LLAMA_CLOUD_API_KEY'], result_type: "markdown")
markdown = client.parse_file('path/to/document.pdf')
# Parse a file from a URL
markdown = client.parse_file('https://example.com/document.pdf')
File Input Options
The parse_file
method accepts three types of inputs:
- File path (String):
client.parse_file('path/to/document.pdf')
- IO object (requires file type parameter):
# From a URL
file_content = URI.open('https://example.com/document.pdf')
client.parse_file(file_content, 'pdf')
# From memory
io = StringIO.new(file_content)
client.parse_file(io, 'pdf')
# From a Tempfile
temp_file = Tempfile.new(['document', '.pdf'])
client.parse_file(temp_file, 'pdf')
- URL (String):
client.parse_file('https://example.com/document.pdf')
Advanced Options
client = Llamaparserb::Client.new(
ENV['LLAMA_CLOUD_API_KEY'],
{
# Basic Configuration
result_type: "markdown", # Output format: "text" or "markdown"
num_workers: 4, # Number of workers for concurrent processing
check_interval: 1, # How often to check job status (seconds)
max_timeout: 2000, # Maximum time to wait for parsing (seconds)
verbose: true, # Enable detailed logging
show_progress: true, # Show progress during parsing
ignore_errors: true, # Return nil instead of raising errors
# Language and Parsing Options
language: :en, # Target language for parsing
parsing_instruction: "", # Custom parsing instructions
skip_diagonal_text: false, # Skip diagonal text in documents
invalidate_cache: false, # Force reprocessing of cached documents
do_not_cache: false, # Disable caching of results
# Processing Modes
fast_mode: false, # Enable faster processing (may reduce quality)
premium_mode: false, # Enable premium parsing features
continuous_mode: false, # Process document as continuous text
do_not_unroll_columns: false, # Keep columnar text structure
# Page Handling
split_by_page: true, # Split result by pages
page_separator: "\n\n", # Custom page separator
page_prefix: "Page ", # Text to prepend to each page
page_suffix: "\n", # Text to append to each page
target_pages: [1,2,3], # Array of specific pages to process
bounding_box: { # Specify area to parse (coordinates in pixels)
x1: 0, y1: 0, # Top-left corner
x2: 612, y2: 792 # Bottom-right corner
},
# OCR and Image Processing
disable_ocr: false, # Disable Optical Character Recognition
disable_image_extraction: false, # Disable image extraction from documents
take_screenshot: false, # Capture screenshot of document
# Advanced Processing Features
gpt4o_mode: false, # Enable GPT-4 Optimization mode
gpt4o_api_key: "key", # API key for GPT-4 Optimization
guess_xlsx_sheet_names: false, # Attempt to guess Excel sheet names
is_formatting_instruction: false, # Use formatting instructions
annotate_links: false, # Include link annotations in output
# Multimodal Processing
vendor_multimodal_api_key: "key", # API key for multimodal processing
use_vendor_multimodal_model: false, # Enable multimodal model
vendor_multimodal_model_name: "model", # Specify multimodal model
# Integration Options
webhook_url: "https://...", # URL for webhook notifications
http_proxy: "http://...", # HTTP proxy configuration
# Azure OpenAI Configuration
azure_openai_deployment_name: "deployment", # Azure OpenAI deployment name
azure_openai_endpoint: "endpoint", # Azure OpenAI endpoint
azure_openai_api_version: "2023-05-15", # Azure OpenAI API version
azure_openai_key: "key" # Azure OpenAI API key
}
)
Feature-Specific Options
Page Processing
-
split_by_page
: Split the document into separate pages -
page_separator
: Custom text to insert between pages -
page_prefix
/page_suffix
: Add custom text before/after each page -
target_pages
: Process only specific pages -
bounding_box
: Parse only a specific area of the document
OCR and Image Processing
-
disable_ocr
: Turn off Optical Character Recognition -
disable_image_extraction
: Disable image extraction from documents -
take_screenshot
: Generate document screenshots -
skip_diagonal_text
: Ignore text at diagonal angles
Advanced Processing
-
continuous_mode
: Process text as a continuous stream -
do_not_unroll_columns
: Preserve column structure -
guess_xlsx_sheet_names
: Auto-detect Excel sheet names -
annotate_links
: Include document hyperlinks in output -
is_formatting_instruction
: Use special formatting instructions
Performance Options
-
fast_mode
: Faster processing with potential quality trade-offs -
premium_mode
: Access to premium features -
invalidate_cache
/do_not_cache
: Control result caching -
num_workers
: Configure concurrent processing
Integration Features
-
webhook_url
: Receive processing notifications -
http_proxy
: Configure proxy settings
Azure OpenAI Integration
Configure Azure OpenAI services with:
azure_openai_deployment_name
azure_openai_endpoint
azure_openai_api_version
azure_openai_key
Multimodal Processing
Enable advanced multimodal processing with:
vendor_multimodal_api_key
use_vendor_multimodal_model
vendor_multimodal_model_name
Supported File Types
The client supports a wide range of file formats including:
- Documents: PDF, DOCX, DOC, RTF, TXT
- Presentations: PPT, PPTX
- Spreadsheets: XLS, XLSX, CSV
- Images: JPG, PNG, TIFF
- And many more
See SUPPORTED_FILE_TYPES
constant for the complete list.
Error Handling
By default, the client will return nil
and print an error message if something goes wrong. You can change this behavior with the ignore_errors
option:
# Raise errors instead of returning nil
client = Llamaparserb::Client.new(api_key, ignore_errors: false)
Logging
By default, the client uses Ruby's standard Logger with output to STDOUT. You can configure logging in several ways:
# Use default logger with debug level output
client = Llamaparserb::Client.new(api_key, verbose: true)
# Use default logger with info level (less output)
client = Llamaparserb::Client.new(api_key, verbose: false)
# Use custom logger
custom_logger = Logger.new('llamaparse.log')
custom_logger.level = Logger::INFO
client = Llamaparserb::Client.new(api_key, logger: custom_logger)
# Use Rails logger in a Rails app
client = Llamaparserb::Client.new(api_key, logger: Rails.logger)
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and the created tag, and push the .gem
file to rubygems.org.
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/heidar/llamaparserb.
License
The gem is available as open source under the terms of the MIT License.