You've spent 45 minutes copy-pasting rows from a competitor's pricing table into a spreadsheet. Then you manually transcribed their team page contact info. Then you saved their blog post as a PDF that looks like it was designed by a fax machine. There's a better way: onHover's Extract & Export tool — a Chrome extension feature that lets you extract data from websites in the format you actually need — tables as CSV, articles as Markdown, pages as clean PDF. One click, right format, done.
What you can extract
Finding email addresses on websites
Open the Extract & Export tab and hit "Find Emails." The email finder scans the full rendered document — including dynamically loaded content — and lists every email address it finds, deduplicated and sortable.
Three situations where this saves real time:
- Building a contact list from a conference speaker directory or event site
- Auditing your own pages for exposed email addresses that attract spam bots
- Gathering contributor emails from an open source project's contributors page
Convert webpage tables to CSV
Any <table> on the page gets a download button in the viewer. This export HTML table to CSV capability works on pricing comparison tables, leaderboards, analytics dashboards that render HTML tables, and government data portals that provide no official export option. Column headers are preserved, and merged cells are handled correctly.
Works on paginated tables
The CSV export captures whatever is currently rendered in the DOM. For paginated tables, use the page's "show all" option first if it has one, then export — no manual copy-paste for each page of results.
Page to Markdown conversion
The Markdown export runs a semantic pass on the page — extracts the primary content block, converts headings, lists, links, and code blocks to proper Markdown, and strips navigation, ads, and sidebar boilerplate. What you get is a clean, portable document.
Particularly useful for:
- Saving documentation pages to Obsidian, Notion, or Bear without the junk
- Turning a blog post into an editable draft for repurposing
- Creating a clean text version of an article to feed into an LLM context window
Page to PDF and Word export
The PDF export applies print-specific CSS that hides headers, footers, navbars, and sidebar clutter before rendering. The result actually looks like a document — not a browser screenshot. The Word export (.docx) is the right choice when you need to hand off web content to someone who'll edit it further — client reports, stakeholder summaries, content workflows where Markdown isn't an option.
A real workflow example
You're doing a competitor content audit. For each blog post you find interesting: extract to Markdown, paste into your notes. For their pricing page: export the comparison table as CSV. For their team page: pull the email addresses. An afternoon of competitive research that used to mean hours of copy-paste now takes 20 minutes. The data was already structured on the page — you just needed a tool that respects that.
Why copy-paste fails at scale
Copy-paste works for one thing. You see a piece of data, you select it, you paste it somewhere. That's fine for a single value. It completely falls apart the moment you need more than a handful of items from a structured source.
Try copy-pasting a 60-row pricing table. You paste it and get a wall of text with no column separation, because the clipboard doesn't carry table structure. Now you spend time manually adding tabs or commas, reconstructing column relationships from memory. Or try copy-pasting a list of 40 speaker names and emails from a conference page — you're selecting each one individually, tabbing to your spreadsheet, pasting, tabbing back, repeat. After the first ten you're making mistakes.
The structural data is already there in the HTML. The browser has already parsed it into a proper DOM tree with <tr>, <td>, href attributes, heading hierarchy — all of it. Copy-paste throws that structure away and gives you plain text. A browser extension reads from the rendered DOM directly, which means it gets the structure for free. That's the fundamental advantage: you're not fighting the clipboard, you're working with what the browser already knows.
External scraping tools have the opposite problem. They re-fetch the page from scratch, which means they have to handle authentication again, they miss dynamically-rendered content (anything loaded by JavaScript after the initial HTML), and they can't see what you're actually looking at — they see what the server sends, which is often different from what your authenticated, logged-in session renders. The extension is already inside the rendering context. It sees exactly what you see.
JSON export for developers
CSV is the right format when the destination is a spreadsheet or a non-technical audience. For developers, JSON is usually better — it preserves hierarchy, it's directly consumable by code, and it doesn't flatten nested structure into something you then have to reconstruct.
When you export a page as JSON from the onHover developer toolkit, here's what you get: the heading tree as a nested structure (h1 containing its child h2s, each h2 containing its child h3s), all links with their anchor text and href, all images with their src and alt text, and any tables as nested arrays of row arrays. The email extraction output is a flat array of strings — one address per entry — which pastes directly into a tool that expects one-per-line input.
This format drops straight into a script. If you're building a competitive analysis tool, an internal knowledge base, or a monitoring script that tracks page structure changes — you don't want to write a parser for CSV. JSON is already the shape your code expects.
Using Markdown export for research notes
The Markdown export does something most page-save tools don't: it reads the page semantically instead of just dumping HTML. It identifies the main content block — the article body, the documentation section, the primary text — and ignores everything that isn't content. No nav links. No footer boilerplate. No cookie banner text that got included because it was technically in the DOM.
Headings become #, ##, ###. Lists become proper Markdown bullet or numbered lists. Links stay intact with their anchor text. Code blocks get fenced with triple backticks. The output pastes cleanly into Obsidian, Notion, Bear, Logseq — any note-taking tool that renders Markdown — and looks like a real document.
One use case we didn't anticipate when building this: Markdown as LLM context input. Raw HTML is a terrible context window filler — it's 60% tags and attributes that the model has to parse before getting to actual content. Clean Markdown is much denser with actual information per token. If you're doing research and want to feed a page into a Claude or GPT conversation, exporting to Markdown first gives you significantly better results than pasting raw HTML or trying to summarize manually.
Markdown length vs HTML length
A typical documentation page exported as raw HTML runs 40,000–80,000 characters. The same page as clean Markdown is usually 5,000–15,000 characters. That's 5–8x more content you can fit into a fixed context window — and the model spends its attention on actual words, not tag attributes.
A note on responsible use of email extraction
Email extraction is one of those features that has genuinely useful applications and also obvious misuse potential. We want to be straightforward about where we stand on this.
The legitimate cases are real. Auditing your own site for exposed addresses that bots can harvest. Compiling a contact list from a conference speaker directory where all the information is publicly posted by the organizers. Collecting contributor emails from an open source project's GitHub contributors page when you're reaching out about a security disclosure. These are things developers actually need to do.
The tool works on a per-page, per-use basis. You visit a page, you run the extraction, you get the emails on that page. There's no crawling, no bulk scraping across thousands of pages, no data stored outside your current browser session. When you close the panel, the results are gone. This is an intentional design constraint, not a limitation we're planning to remove.
Using extracted emails for unsolicited mass outreach isn't a use case we support. Beyond being generally unwelcome, it tends to be illegal in most jurisdictions under CAN-SPAM, GDPR, or equivalent legislation. The tool works per-page because that maps to the legitimate use cases. If your use case requires bulk collection at scale, this isn't the right tool — and honestly, you should think carefully about whether the use case is one you want to pursue.
Why browser extensions win for data extraction
Desktop scraping apps and server-side crawlers have a structural problem: they re-fetch the page. They send an HTTP request, get back HTML, and parse that. For static pages with no authentication, this is fine. For the modern web, it misses most of what matters.
A significant portion of pages that contain interesting data are JavaScript-rendered. The initial HTML response is essentially an empty shell — a script tag and a root div. The actual content populates after the JS runs. A server-side scraper sees the shell. A browser extension sees the finished page, because it runs after the browser has already executed all the JavaScript and rendered the full DOM.
Authentication is the other issue. You're logged into a SaaS dashboard. The data you want is behind that login. A desktop app has to handle your session credentials, manage cookies, potentially solve CAPTCHA challenges — it's a whole separate authentication layer to build and maintain. The browser extension runs in your existing authenticated session. You're already logged in. The extension reads the DOM you're already looking at. No credential management, no session handling, no re-authentication.
This is why we built the extraction features in onHover as a Chrome extension rather than as a standalone tool. The browser is doing the hard work — rendering, JavaScript execution, authentication — and we're just reading the result. For structured data extraction from the modern web, that architecture is genuinely better than alternatives that try to replicate the browser's work from the outside.