← Back to blog

What pages exist on my site that aren't linked from anywhere?

Pages that exist but aren’t linked from anywhere are URLs that return a normal response (e.g. 200) but never appear as the target of a link on your site. To list them you need two things: a full set of “existing” URLs (e.g. from your sitemap or a crawl) and the set of URLs that are linked. Whatever is in the first set but not the second is your answer. This guide explains how to build both sets, avoid common errors, and what to do with the list so you can improve SEO, clean up, or document the site.

Why “pages that aren’t linked from anywhere” matters

  • SEO — Unlinked pages get no internal link equity. Google may still find them via sitemap or external links, but they’re not reinforced by your own site structure. Linking to important ones can help; removing or redirecting obsolete ones keeps the site tidy.
  • Cleanup — Many unlinked pages are legacy, test, or duplicate. A full list supports redirects, consolidation, and removal.
  • Audits and handoffs — When selling, documenting, or handing off a site, “pages that exist but aren’t linked” is a clear, actionable deliverable.

What you’re actually measuring

“Aren’t linked from anywhere” here means: no other page on the same site links to this URL. So:

  • Counted as unlinked: URLs that appear in your sitemap or were discovered by crawl but never appear as the target of an <a href="..."> (or equivalent) on any page you’ve crawled on that site.
  • Not counted: The homepage or other entry points you start from (they’re not “linked” in the same sense). Also, links from other sites don’t count—we only care about internal links.

You need a full URL set (“every URL we consider on the site”) and a linked URL set (“every same-site URL that appears as a link target”). Unlinked = full set − linked set.

Step 1: Build the full URL set

Your goal is a complete list of URLs that “exist” on the site. The most reliable base is the sitemap.

Get sitemap URLs

  • Read robots.txt for every Sitemap: line. Also try common paths: /sitemap.xml, /sitemap_index.xml, /sitemap-index.xml.
  • Fetch each sitemap. If it’s an index (references other sitemaps), fetch those too. Extract every <loc> URL.
  • Normalize: pick one form per URL (e.g. origin + pathname, no fragment, one rule for trailing slash). Deduplicate. This set is “every URL we know about” from the sitemap.

Optional: add URLs from a crawl

If you crawl the site by following links, you’ll discover more URLs (e.g. old or forgotten pages not in the sitemap). Add those to the full set so you’re not only checking “sitemap URLs that aren’t linked”—you’re also catching “crawled URLs that aren’t linked.” For many audits, sitemap-only is enough; for a full picture, sitemap + crawl is better.

Step 2: Crawl and collect every linked URL

You need every same-site URL that appears as a link target on pages you’ve crawled.

Where to start the crawl

  • Start at the homepage and other important entry URLs (main section indexes, key landing pages). Those are the “roots” of your link graph.
  • For each HTML page, parse all links: <a href="..."> and any other attributes you use (e.g. data-href, href in other elements). Resolve relative URLs to absolute using the page’s URL. Keep only same-site URLs (same scheme and host; decide whether www and non-www are the same site and normalize accordingly).
  • Add each linked URL to a set. Use the same normalization as the full set (same trailing-slash rule, no fragment). Crawl in waves (e.g. breadth-first): take all new same-site URLs from the current batch, add them to the queue, fetch those pages, repeat. Stop when no new same-site links appear or when you hit a limit.

Why normalization matters

If the full set has https://example.com/page and the crawl records https://example.com/page/, they’re the same page but different strings. Pick one canonical form (e.g. strip trailing slash everywhere) and apply it to both sets. Otherwise you’ll wrongly count pages as “unlinked” when they are linked under a slightly different URL.

Step 3: Subtract to get unlinked pages

  • Unlinked = full URL set − linked set.
  • Optionally remove the homepage and known entry points (e.g. login, signup) that you don’t expect to be linked from elsewhere.
  • The result is: “These pages exist on my site but aren’t linked from anywhere.”

Step 4: Act on the list

  • Link — If the page should be discoverable, add a relevant in-link from a suitable page (category, hub, or related content).
  • Redirect or remove — If the page is obsolete, redirect to the right URL or remove it and drop it from the sitemap.
  • Leave as-is — Some pages (thank-you, one-off campaigns, legal) may stay unlinked on purpose; document them so the next person knows.

Re-run after big content or structure changes.

Common mistakes

  • Only using sitemap as full set — You’ll only find “sitemap URLs that aren’t linked.” That’s useful, but you’ll miss unlinked pages that aren’t in the sitemap. Add crawl-discovered URLs if you want full coverage.
  • HTML-only crawl on a JS-heavy site — If most links are injected by JavaScript, an HTML-only crawler undercounts links and overcounts “unlinked” pages. Use a JS-aware crawler or accept that the list is “not linked in raw HTML.”
  • Inconsistent normalization — Different trailing-slash or query-string handling between the two sets produces wrong results. Use one canonical form for both.
  • Treating external links as internal — Only same-site links count. A page linked only from another domain is still “unlinked” from your site.

Frequently asked questions

What’s the difference between “unlinked” and “orphan”?
In this guide they’re the same: a page that exists but isn’t linked from anywhere on your site. Some tools use “orphan” or “unlinked from nav” to mean the same thing.

Will Google index unlinked pages?
Yes, if it discovers them (e.g. via sitemap or old backlinks). Unlinked doesn’t mean “not indexed”; it means “no internal link from your site.”

Should I remove unlinked pages from the sitemap?
Only if they shouldn’t exist or shouldn’t be indexed. If the page is valuable, add a link. If it’s obsolete, redirect or remove it and then remove it from the sitemap.

How often should I check?
After major launches, migrations, or CMS changes. For large sites, a quarterly or semi-annual run is often enough.

Get the list without building a crawler

Comparing sitemap (and optional crawl) to the “linked” set requires a crawler and consistent normalization. A tool that does this and reports “unlinked from nav” or “not linked from anywhere” gives you the same list without scripting.

Hidden Pages does this: enter your site URL, run a scan, and get a report that includes URLs that exist (from sitemap or crawl) but aren’t linked from the main site—so you can see every such page in one place.

Summary

To find “what pages exist on my site that aren’t linked from anywhere”: build a full URL set (sitemap + optional crawl), crawl the site to collect every same-site link target, normalize both sets the same way, then take the difference. That’s your list. Use it to add links, redirect, or remove. Avoid HTML-only crawl on JS-heavy sites and inconsistent normalization. For a ready-made list, use a scanner that reports unlinked pages.

Run a free scan →