Canonical Tags, Hreflang, and Duplicate Content
Session 7.7 · ~5 min read
Duplicate content is one of the most common technical SEO problems, and one of the most damaging to entity authority. When multiple URLs on your site (or across the web) serve the same or substantially similar content, Google must decide which version to index. If it chooses the wrong one, your entity signals on the preferred page get diluted or ignored entirely.
Canonical tags, hreflang attributes, and proper URL management are the tools you use to tell Google which version of your content is the authoritative one. They focus your entity signals on the right pages instead of spreading them thin across duplicates.
How Duplicate Content Affects Entity Signals
Has Organization schema,
entity description, sameAs"] --> D["Google Finds
Duplicate Content"] B["Page B: /about/index.html
Same content, same schema"] --> D C["Page C: /about/?ref=email
Same content with URL parameter"] --> D D --> E{"Which URL
is canonical?"} E -->|You specify| F["Canonical tag on A
Entity signals consolidated"] E -->|Google guesses| G["Google picks B or C
Entity signals may be split"] F --> H["Strong Entity Signal
on preferred URL"] G --> I["Diluted Entity Signal
across multiple URLs"] style A fill:#222221,stroke:#6b8f71,color:#ede9e3 style F fill:#222221,stroke:#6b8f71,color:#ede9e3 style H fill:#222221,stroke:#6b8f71,color:#ede9e3 style G fill:#222221,stroke:#c47a5a,color:#ede9e3 style I fill:#222221,stroke:#c47a5a,color:#ede9e3
In the diagram above, three URLs serve the same About page content. Without a canonical tag, Google must guess which URL is the "real" one. If it guesses wrong, the entity signals on your preferred URL may not be the ones Google indexes.
Key concept: Canonical tags do not prevent crawling. They tell Google which URL should be the indexed version. Google treats the canonical tag as a strong hint, not an absolute directive. But in most cases, Google respects a properly implemented canonical tag.
Common Canonicalization Issues
| Issue | Example | Entity Impact | Fix |
|---|---|---|---|
| www vs. non-www | https://example.com and https://www.example.com both serve content | Entity signals split between two domains | 301 redirect one to the other. Set canonical on all pages. |
| HTTP vs. HTTPS | http://example.com and https://example.com both accessible | Entity signals split. Also a security issue. | 301 redirect HTTP to HTTPS. |
| Trailing slash vs. no trailing slash | /about/ and /about both serve content | Minor signal dilution | Choose one format, redirect the other. Set canonical. |
| URL parameters | /about/?utm_source=email and /about/ serve same content | Tracking parameters create duplicate URLs | Set canonical to the clean URL (without parameters). Configure URL parameters in GSC. |
| Index file variations | /about/, /about/index.html, /about/index.php all serve same page | Multiple indexed versions of entity pages | Redirect all variations to one canonical URL. |
| Pagination duplicates | /blog/ and /blog/page/1/ serve identical content | Minor. Affects blog pages more than entity pages. | Canonical /blog/page/1/ to /blog/. Or use rel="next/prev". |
| Print or AMP versions | /about/ and /about/print/ or /about/amp/ serve same content | Duplicate entity content across versions | Canonical to the main version. |
| Case sensitivity | /About/ and /about/ treated as different URLs by some servers | Duplicate content with different URLs | Standardize to lowercase. Redirect uppercase variations. |
| Session IDs in URLs | /about/?sessionid=abc123 creates unique URL per visitor | Potentially infinite duplicate URLs | Remove session IDs from URLs. Use cookies instead. |
Implementing Canonical Tags
A canonical tag is an HTML element placed in the <head> section of a page that specifies the preferred URL for that content.
<link rel="canonical" href="https://example.com/about/" />
Every page on your site should have a self-referencing canonical tag (pointing to its own URL) at minimum. This tells Google "this is the definitive URL for this content" even when no duplicate exists. It is a preventive measure.
For pages that are duplicates of another page, the canonical tag should point to the original:
<!-- On the duplicate page /about/?ref=email -->
<link rel="canonical" href="https://example.com/about/" />
Canonical Tag Rules
| Rule | Explanation |
|---|---|
| Every page needs a canonical tag | Self-referencing canonicals prevent future duplicate issues |
| Use absolute URLs | Always include the full URL with protocol and domain |
| Canonical must return 200 status | Do not canonical to a page that redirects or returns a 404 |
| Canonical must be indexable | Do not canonical to a noindex page |
| One canonical per page | Multiple canonical tags confuse Google. Use only one. |
| Match canonical with sitemap | The URL in your sitemap should be the canonical version |
| Match canonical with internal links | Link to the canonical URL, not duplicate variations |
Hreflang for Multilingual Sites
If your entity operates in multiple languages or regions, hreflang tags tell Google which language version of a page to show to which users. Without hreflang, Google may show the English version to French users, or the US version to UK users.
<link rel="alternate" hreflang="en" href="https://example.com/about/" />
<link rel="alternate" hreflang="id" href="https://example.com/id/about/" />
<link rel="alternate" hreflang="x-default" href="https://example.com/about/" />
Hreflang implementation is complex and error-prone. The most common mistakes:
- Missing return tags: If page A has an hreflang pointing to page B, page B must have an hreflang pointing back to page A.
- Wrong language codes: Use ISO 639-1 language codes (en, id, fr, de) and optionally ISO 3166-1 region codes (en-US, en-GB).
- Missing x-default: The x-default tag tells Google which page to show when no language match exists.
- Hreflang on non-canonical URLs: Hreflang tags should only reference canonical URLs.
For entity authority, hreflang ensures that your entity signals reach the right audience. If you have an Indonesian About page and an English About page, each with appropriate schema markup, hreflang tells Google to show the Indonesian version to Indonesian users and the English version to English users.
searches brand name"] --> B{"hreflang
configured?"} B -->|Yes| C["Google serves
/id/about/
Indonesian entity signals"] B -->|No| D["Google guesses
May serve English page"] E["User in US
searches brand name"] --> B B -->|Yes| F["Google serves
/about/
English entity signals"] B -->|No| D style C fill:#222221,stroke:#6b8f71,color:#ede9e3 style F fill:#222221,stroke:#6b8f71,color:#ede9e3 style D fill:#222221,stroke:#c47a5a,color:#ede9e3
Only 55% of websites implement self-referencing canonical tags, and only 38% properly handle URL parameters. If you implement all five practices in the chart above, your entity signals will be significantly more focused than the average website.
Auditing Your Canonicalization
To audit your site's canonicalization:
- View the source of every entity-critical page. Check for a
<link rel="canonical">tag. - Verify the canonical URL uses the correct protocol (https://), domain (www or non-www, whichever you chose), and path (with or without trailing slash).
- Try accessing your entity pages with different URL variations (www, non-www, with and without trailing slash, with a random parameter like ?test=1). Each variation should either redirect to the canonical URL or carry a canonical tag pointing to it.
- Check Google Search Console's "Duplicate" entries in the Pages report. These show where Google has found duplicates and which canonical it selected.
Further Reading
- Google. "Consolidate Duplicate URLs." Google Search Central. developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls
- Google. "Tell Google About Localized Versions of Your Page." Google Search Central. developers.google.com/search/docs/specialty/international/localized-versions
- Mueller, John. "Canonical Tags and Duplicate Content." Google Search Central YouTube. youtube.com/googlewebmasters
- Moz. "The Canonical Tag: A Comprehensive Guide." moz.com/learn/seo/canonicalization
Assignment
- View the source of your homepage, About page, Contact page, and Services page. Does each have a self-referencing canonical tag? If not, add one to each page.
- Test URL variations for your homepage: try with and without www, with and without trailing slash, and with a random URL parameter (?test=1). For each variation, check: does it redirect to the canonical URL, or does it serve duplicate content?
- Open Google Search Console and check the Pages report for any duplicate content issues. Record the number of pages flagged as "Duplicate without user-selected canonical" and "Duplicate, submitted URL not selected as canonical."
- If your site has multiple language versions, audit your hreflang implementation. Verify that return tags exist on every referenced page and that all hreflang URLs are canonical.
- Create a canonicalization policy document for your site: which URL format is canonical (www or non-www, trailing slash or not), how URL parameters should be handled, and which pages need explicit canonical tags.