Indexing: Being Found vs. Being Stored
Session 3.4 · ~5 min read
Crawling and indexing are two different steps. Crawling means Google visited your page. Indexing means Google decided it was worth storing. Many website owners assume that if Google can find their pages, those pages will appear in search results. This is wrong. Google crawls billions of pages and actively rejects a significant portion of them.
The Crawling-to-Indexing Pipeline
Check"} C -->|Pass| D["Indexed
(appears in search)"] C -->|Fail| E["Not Indexed
(crawled but rejected)"] C -->|Defer| F["Discovered
(not yet crawled)"] style D fill:#2a2a28,stroke:#6b8f71,color:#ede9e3 style E fill:#2a2a28,stroke:#c47a5a,color:#ede9e3 style F fill:#2a2a28,stroke:#c8a882,color:#ede9e3 style C fill:#2a2a28,stroke:#c8a882,color:#ede9e3
The quality check is where most pages fail. Google evaluates whether a page provides unique value, whether it has sufficient content, whether the site has enough authority to justify indexing the page, and whether the page adds something that existing indexed pages do not already cover.
The Two Problem Statuses
Google Search Console reports two distinct "not indexed" statuses that website owners frequently confuse:
| Status | What It Means | Why It Happens | Severity |
|---|---|---|---|
| Discovered, currently not indexed | Google knows the URL exists but has not crawled it yet | Low crawl priority. Google does not think the page is worth crawling now. | Medium. The page may be crawled later, or never. |
| Crawled, currently not indexed | Google crawled the page and decided not to index it | Google saw the content and rejected it. Thin content, duplicate content, or low entity authority. | High. Google explicitly rejected this page. |
"Crawled, currently not indexed" is Google telling you: "We looked at this page and it was not worth storing." This is not a technical error. It is a quality judgment.
Why Google Refuses to Index Pages
Google's indexing standards have tightened significantly. Following the May 2025 quality review, sites relying on mass-produced, thin, or AI-generated content without editorial oversight saw severe drops in crawl priority and indexing rates. Pages showcasing genuine firsthand experience saw visibility improve, while generic content dropped substantially.
The most common reasons for indexing rejection:
- Thin content: The page has too little text to be useful. Under 300 words on a topic that competitors cover in 2,000 words.
- Duplicate content: The page says roughly the same thing as another page on your site or elsewhere on the web.
- No unique entity value: The page does not add anything that the existing index does not already have.
- Low site authority: The overall domain has insufficient entity signals for Google to justify indexing additional pages from it.
- Soft 404 behavior: The page returns a 200 status code but has almost no content, behaving like an error page.
The Entity Authority Connection
Here is where entity infrastructure directly affects indexing. Google allocates indexing budget partly based on entity authority. A recognized entity with a Knowledge Panel, consistent citations, and strong structured data gets more pages indexed because Google trusts the source. An unrecognized entity with no external corroboration gets fewer pages indexed because Google has no reason to trust that the content is authoritative.
This creates a feedback loop. Weak entity signals lead to fewer indexed pages. Fewer indexed pages mean fewer opportunities to build topical authority. Less topical authority further weakens entity signals. Breaking this loop requires building entity infrastructure first, then publishing content that Google is willing to index.
Diagnosing Your Indexing Health
In Google Search Console, navigate to the Pages report. The overview shows your total indexed pages and total non-indexed pages. Click into the non-indexed section to see the specific reasons for each excluded URL.
Key metrics to track:
| Metric | Healthy Range | Problem Indicator |
|---|---|---|
| Index rate (indexed / total submitted) | Above 80% | Below 50% indicates systemic quality issues |
| "Crawled, not indexed" count | Under 10% of total pages | Above 30% indicates content quality rejection |
| "Discovered, not indexed" count | Under 20% (temporary for new pages) | Persistent for weeks means low crawl priority |
| Time from publish to index | Under 7 days | Over 30 days indicates authority deficit |
Improving Your Index Rate
The fix is not to request indexing for every rejected page. That treats the symptom. The fix is to address the underlying cause: improve content quality, strengthen entity signals, and ensure each page provides unique value that the existing index lacks.
Pages that Google has crawled and rejected need substantive improvement before they will be reconsidered. Adding a paragraph is not enough. The page needs to become the best available resource on its specific topic for your specific audience.
Further Reading
- Crawled, Currently Not Indexed: Meaning and Fixes - SEOTesting's guide to understanding and resolving indexing rejection.
- Why Pages Are "Crawled, Not Indexed" After May 2025 - Marsiglia Digital on the impact of Google's 2025 quality standards on indexing.
- Crawled, Currently Not Indexed: Why It Happens and How to Fix It - Index Machine on systematic approaches to indexing problems.
Assignment
In Google Search Console, go to Pages and review the "Not indexed" section.
- Count how many pages are "Crawled, currently not indexed." These are pages Google has explicitly rejected.
- Count how many pages are "Discovered, currently not indexed." These are pages Google has deprioritized.
- For the top five rejected pages, read the content and honestly assess: is this page better than what already exists in Google's index for the same topic?
- Calculate your index rate: indexed pages divided by total submitted pages. If below 50%, you have a systemic quality or authority problem.