Instaparser Python Library

A Python client library for the Instaparser API, providing a simple and intuitive interface for parsing articles, generating summaries, and processing PDFs.

Installation

pip install instaparser

Quick Start

from instaparser import InstaparserClient

# Initialize the client with your API key
client = InstaparserClient(api_key="your-api-key")

# Parse an article from a URL
article = client.Article(url="https://example.com/article")

# Access article properties
print(article.title)
print(article.body)  # HTML or text content
print(article.author)
print(article.words)

Features

Article Parsing: Extract clean HTML or text from web articles
Summary Generation: Generate AI-powered summaries with key sentences
PDF Processing: Parse PDFs from URLs or file uploads
Error Handling: Comprehensive exception handling for API errors
Type Hints: Full type annotations for better IDE support

Usage

Article Parsing

Parse articles from URLs or HTML content:

from instaparser import InstaparserClient

client = InstaparserClient(api_key="your-api-key")

# Parse from URL (HTML output)
article = client.Article(url="https://example.com/article")
print(article.html)  # HTML content
print(article.body)  # Same as html when output='html'

# Parse from URL (text output)
article = client.Article(url="https://example.com/article", output="text")
print(article.text)  # Plain text content
print(article.body)  # Same as text when output='text'

# Parse from HTML content
html_content = "<html><body><h1>Title</h1><p>Content</p></body></html>"
article = client.Article(url="https://example.com/article", content=html_content)

# Disable cache
article = client.Article(url="https://example.com/article", use_cache=False)

Article Properties

The Article object provides access to all parsed metadata:

article = client.Article(url="https://example.com/article")

# Basic properties
article.url          # Canonical URL
article.title        # Article title
article.site_name    # Website name
article.author       # Author name
article.date         # Published date (UNIX timestamp)
article.description  # Article description
article.thumbnail    # Thumbnail image URL
article.words        # Word count
article.is_rtl       # Right-to-left language flag

# Content
article.body         # HTML or text (depending on output format)
article.html         # HTML content (if output='html')
article.text         # Plain text (if output='text')

# Media
article.images       # List of images
article.videos       # List of embedded videos

Summary Generation

Generate AI-powered summaries:

# Generate summary
summary = client.Summary(url="https://example.com/article")

print(summary.overview)        # Concise summary
print(summary.key_sentences)   # List of key sentences

# Stream summary with callback (for real-time updates)
def on_stream_line(line):
    print(f"Streaming: {line}")

summary = client.Summary(
    url="https://example.com/article",
    stream_callback=on_stream_line
)

PDF Processing

Parse PDFs from URLs or files. The PDF class inherits from Article, so it has all the same properties:

# Parse PDF from URL
pdf = client.PDF(url="https://example.com/document.pdf")

# Parse PDF from file
with open('document.pdf', 'rb') as f:
    pdf = client.PDF(file=f)

# Parse PDF with text output
pdf = client.PDF(url="https://example.com/document.pdf", output="text")
print(pdf.text)
print(pdf.body)  # Same as text when output='text'

# Access all Article properties
print(pdf.title)
print(pdf.words)
print(pdf.images)

Error Handling

The SDK provides specific exception types for different error scenarios:

from instaparser import (
    InstaparserClient,
    InstaparserAuthenticationError,
    InstaparserRateLimitError,
    InstaparserValidationError,
    InstaparserAPIError,
)

client = InstaparserClient(api_key="your-api-key")

try:
    article = client.Article(url="https://example.com/article")
except InstaparserAuthenticationError:
    print("Invalid API key")
except InstaparserRateLimitError:
    print("Rate limit exceeded")
except InstaparserValidationError:
    print("Invalid request parameters")
except InstaparserAPIError as e:
    print(f"API error: {e} (status: {e.status_code})")

API Reference

InstaparserClient

Main client class for interacting with the Instaparser API.

`init(api_key: str)`

Initialize the client.

api_key: Your Instaparser API key

`Article(url: str, content: Optional[str] = None, output: str = 'html', use_cache: bool = True) -> Article`

Parse an article from a URL or HTML content.

url: URL of the article (required)
content: Optional HTML content to parse instead of fetching from URL
output: Output format - 'html' (default) or 'text'
use_cache: Whether to use cache (default: True)

Returns: Article object

`Summary(url: str, content: Optional[str] = None, use_cache: bool = True, stream_callback: Optional[Callable[[str], None]] = None) -> Summary`

Generate a summary of an article.

url: URL of the article (required)
content: Optional HTML content to parse instead of fetching from URL
use_cache: Whether to use cache (default: True)
stream_callback: Optional callback function called for each line of streaming response. If provided, enables streaming mode.

Returns: Summary object with key_sentences and overview attributes

`PDF(url: Optional[str] = None, file: Optional[Union[BinaryIO, bytes]] = None, output: str = 'html', use_cache: bool = True) -> PDF`

Parse a PDF from a URL or file.

url: URL of the PDF (required for GET request)
file: PDF file to upload (required for POST request)
output: Output format - 'html' (default) or 'text'
use_cache: Whether to use cache (default: True)

Returns: PDF object (inherits from Article)

Article

Represents a parsed article from Instaparser.

Properties

url: Canonical URL
title: Article title
site_name: Website name
author: Author name
date: Published date (UNIX timestamp)
description: Article description
thumbnail: Thumbnail image URL
words: Word count
is_rtl: Right-to-left language flag
images: List of images
videos: List of embedded videos
body: Article body (HTML or text)
html: HTML content (if output was 'html')
text: Plain text content (if output was 'text')

PDF

Represents a parsed PDF from Instaparser. Inherits from Article and has all the same properties. PDFs always have is_rtl=False and videos=[].

Summary

Represents a summary result from Instaparser.

Properties

key_sentences: List of key sentences extracted from the article
overview: Concise summary of the article

License

MIT

Support

For support, email support@instaparser.com or visit https://instaparser.com.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
instaparser		instaparser
tests		tests
.gitignore		.gitignore
BUILD.md		BUILD.md
LICENSE		LICENSE
README.md		README.md
example.py		example.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instaparser Python Library

Installation

Quick Start

Features

Usage

Article Parsing

Article Properties

Summary Generation

PDF Processing

Error Handling

API Reference

InstaparserClient

`init(api_key: str)`

`Article(url: str, content: Optional[str] = None, output: str = 'html', use_cache: bool = True) -> Article`

`Summary(url: str, content: Optional[str] = None, use_cache: bool = True, stream_callback: Optional[Callable[[str], None]] = None) -> Summary`

`PDF(url: Optional[str] = None, file: Optional[Union[BinaryIO, bytes]] = None, output: str = 'html', use_cache: bool = True) -> PDF`

Article

Properties

PDF

Summary

Properties

License

Support

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 1

Languages

License

Instapaper/instaparser-python

Folders and files

Latest commit

History

Repository files navigation

Instaparser Python Library

Installation

Quick Start

Features

Usage

Article Parsing

Article Properties

Summary Generation

PDF Processing

Error Handling

API Reference

InstaparserClient

__init__(api_key: str)

Article(url: str, content: Optional[str] = None, output: str = 'html', use_cache: bool = True) -> Article

Summary(url: str, content: Optional[str] = None, use_cache: bool = True, stream_callback: Optional[Callable[[str], None]] = None) -> Summary

PDF(url: Optional[str] = None, file: Optional[Union[BinaryIO, bytes]] = None, output: str = 'html', use_cache: bool = True) -> PDF

Article

Properties

PDF

Summary

Properties

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 1

Languages

`init(api_key: str)`

`Article(url: str, content: Optional[str] = None, output: str = 'html', use_cache: bool = True) -> Article`

`Summary(url: str, content: Optional[str] = None, use_cache: bool = True, stream_callback: Optional[Callable[[str], None]] = None) -> Summary`

`PDF(url: Optional[str] = None, file: Optional[Union[BinaryIO, bytes]] = None, output: str = 'html', use_cache: bool = True) -> PDF`

Packages