A Python client library for the Instaparser API, providing a simple and intuitive interface for parsing articles, generating summaries, and processing PDFs.
pip install instaparserfrom instaparser import InstaparserClient
# Initialize the client with your API key
client = InstaparserClient(api_key="your-api-key")
# Parse an article from a URL
article = client.Article(url="https://example.com/article")
# Access article properties
print(article.title)
print(article.body) # HTML or text content
print(article.author)
print(article.words)- Article Parsing: Extract clean HTML or text from web articles
- Summary Generation: Generate AI-powered summaries with key sentences
- PDF Processing: Parse PDFs from URLs or file uploads
- Error Handling: Comprehensive exception handling for API errors
- Type Hints: Full type annotations for better IDE support
Parse articles from URLs or HTML content:
from instaparser import InstaparserClient
client = InstaparserClient(api_key="your-api-key")
# Parse from URL (HTML output)
article = client.Article(url="https://example.com/article")
print(article.html) # HTML content
print(article.body) # Same as html when output='html'
# Parse from URL (text output)
article = client.Article(url="https://example.com/article", output="text")
print(article.text) # Plain text content
print(article.body) # Same as text when output='text'
# Parse from HTML content
html_content = "<html><body><h1>Title</h1><p>Content</p></body></html>"
article = client.Article(url="https://example.com/article", content=html_content)
# Disable cache
article = client.Article(url="https://example.com/article", use_cache=False)The Article object provides access to all parsed metadata:
article = client.Article(url="https://example.com/article")
# Basic properties
article.url # Canonical URL
article.title # Article title
article.site_name # Website name
article.author # Author name
article.date # Published date (UNIX timestamp)
article.description # Article description
article.thumbnail # Thumbnail image URL
article.words # Word count
article.is_rtl # Right-to-left language flag
# Content
article.body # HTML or text (depending on output format)
article.html # HTML content (if output='html')
article.text # Plain text (if output='text')
# Media
article.images # List of images
article.videos # List of embedded videosGenerate AI-powered summaries:
# Generate summary
summary = client.Summary(url="https://example.com/article")
print(summary.overview) # Concise summary
print(summary.key_sentences) # List of key sentences
# Stream summary with callback (for real-time updates)
def on_stream_line(line):
print(f"Streaming: {line}")
summary = client.Summary(
url="https://example.com/article",
stream_callback=on_stream_line
)Parse PDFs from URLs or files. The PDF class inherits from Article, so it has all the same properties:
# Parse PDF from URL
pdf = client.PDF(url="https://example.com/document.pdf")
# Parse PDF from file
with open('document.pdf', 'rb') as f:
pdf = client.PDF(file=f)
# Parse PDF with text output
pdf = client.PDF(url="https://example.com/document.pdf", output="text")
print(pdf.text)
print(pdf.body) # Same as text when output='text'
# Access all Article properties
print(pdf.title)
print(pdf.words)
print(pdf.images)The SDK provides specific exception types for different error scenarios:
from instaparser import (
InstaparserClient,
InstaparserAuthenticationError,
InstaparserRateLimitError,
InstaparserValidationError,
InstaparserAPIError,
)
client = InstaparserClient(api_key="your-api-key")
try:
article = client.Article(url="https://example.com/article")
except InstaparserAuthenticationError:
print("Invalid API key")
except InstaparserRateLimitError:
print("Rate limit exceeded")
except InstaparserValidationError:
print("Invalid request parameters")
except InstaparserAPIError as e:
print(f"API error: {e} (status: {e.status_code})")Main client class for interacting with the Instaparser API.
Initialize the client.
api_key: Your Instaparser API key
Article(url: str, content: Optional[str] = None, output: str = 'html', use_cache: bool = True) -> Article
Parse an article from a URL or HTML content.
url: URL of the article (required)content: Optional HTML content to parse instead of fetching from URLoutput: Output format -'html'(default) or'text'use_cache: Whether to use cache (default:True)
Returns: Article object
Summary(url: str, content: Optional[str] = None, use_cache: bool = True, stream_callback: Optional[Callable[[str], None]] = None) -> Summary
Generate a summary of an article.
url: URL of the article (required)content: Optional HTML content to parse instead of fetching from URLuse_cache: Whether to use cache (default:True)stream_callback: Optional callback function called for each line of streaming response. If provided, enables streaming mode.
Returns: Summary object with key_sentences and overview attributes
PDF(url: Optional[str] = None, file: Optional[Union[BinaryIO, bytes]] = None, output: str = 'html', use_cache: bool = True) -> PDF
Parse a PDF from a URL or file.
url: URL of the PDF (required for GET request)file: PDF file to upload (required for POST request)output: Output format -'html'(default) or'text'use_cache: Whether to use cache (default:True)
Returns: PDF object (inherits from Article)
Represents a parsed article from Instaparser.
url: Canonical URLtitle: Article titlesite_name: Website nameauthor: Author namedate: Published date (UNIX timestamp)description: Article descriptionthumbnail: Thumbnail image URLwords: Word countis_rtl: Right-to-left language flagimages: List of imagesvideos: List of embedded videosbody: Article body (HTML or text)html: HTML content (if output was 'html')text: Plain text content (if output was 'text')
Represents a parsed PDF from Instaparser. Inherits from Article and has all the same properties. PDFs always have is_rtl=False and videos=[].
Represents a summary result from Instaparser.
key_sentences: List of key sentences extracted from the articleoverview: Concise summary of the article
MIT
For support, email support@instaparser.com or visit https://instaparser.com.