Reading Time: 11 mins 59 sec
In this article, we are discussing what is URL canonicalization.
As a website grows, many pages start to resemble each other. Due to the size of the site, this duplicate content cannot be stopped. Search engines are unable to determine which two pages to send traffic to if they are identical. To solve this problem, you can choose a URL of your choice—called a canonical URL.
If you are a blogger and have a large website, you must be familiar with the duplicate URL problem.
As sites grow, duplicate content issues arise, due to an increase in the likelihood of duplicate URLs.
Using canonical URLs will solve this problem.
You can find all the information you need about canonical URLs in this article.
Please read it till the end.
What is URL canonicalization?
As the site grows, many pages start to resemble each other; Due to the size of the site, this duplicate content cannot be stopped.
Search engines are unable to determine which two pages to send traffic to if they are identical in every way and have earned a position in page rank for the same keyword.
To solve this problem, you can choose a URL of your choice—called a canonical URL.
How Google chooses Canonical URL
Google will consider a page on your website as a duplicate if it can be accessed from multiple URLs or if different pages contain the same information.
For example, the same page is divided into its desktop and mobile versions. In this scenario, Google will select one URL as the canonical URL and crawl it, while the other URLs will be considered duplicate URLs and crawled at least.
Making Google the canonical URL is better than making Google the canonical URL of a bad URL.
Multiple URLs of the same data
One technical approach to preventing duplicate content is canonical URLs. Consider the following example:
Let’s say one of your posts has two categories selected and two URLs.
- – If these two URLs are for the same product, you can instruct Google or any other search engine to display it in the search result by selecting one URL as the canonical URL.
- – You can access the original version of any article from search engines with the help of Canonical. You can use Canonical to make a post that you wrote for someone else and publish it on their website into an original post if you want to publish it on your website as well.
How to find out the canonical URL
You can also verify this yourself by making your website content a canonical URL and publishing it as your own.
Today’s market is full of scammers who do this work.
They misrepresent others’ information as their own, or some authors may sell the same information to others, as well as post it on their own website.
Here’s the process to determine whether a URL is a canonical URL:
- – You can find the canonical URL by searching for the rel=”canonical” tag in the source code of the page.
- – This process will only be visible to search engines; It will not affect or disturb your user in any way.
Know when you can redirect URLs, and when you can use canonical
Explain that during the redirect, one of the two URLs of the same type is deleted and that deleted URL is then redirected to the other URL.
Because the removed URL will be redirected to the new URL if someone clicks on it, the old URL should not be listed as searched.
This is done without the use of search engines; It simply ranks URLs based on keywords, and when a user clicks, they find they are being sent to different URLs; When a number is entered it works exactly like mobile forward.
When calling a mobile number, it is often switched off or has no network; In such a situation, people forward the call to another number.
If you use canonical, the user will not know that you are sending them elsewhere as opposed to a redirect.
In this case, the best option would be Canonical if you want to redirect a URL without harming your website.
How to specify a canonical with rel=”canonical” and other methods
There are several ways to indicate your preference for canonical URLs when submitting duplicate or very similar pages to Google Search.
These are listed in descending order of their potential impact on canonicalization:
Remember that these techniques can be combined to increase their efficacy.
Accordingly, the likelihood that your preferred canonical URL will appear in search results will increase if you use two or more methods.
Although we encourage you to use these techniques, none of them are required; Your website will probably work fine without specifying any authentication preferences.
This is so Google knows which version of the URL to show users in search if you don’t specify a canonical URL.
Note: Keep in mind that if you use a CMS like WordPress, Wix, or Blogger, you may not be able to edit your HTML directly. Instead, your CMS may have a page where you can change the default search engine or some other way to inform search engines about the canonical URL. Find instructions on your CMS for editing your page (for example, see “WordPress Set the Canonical Element”).
Motives for defining a canonical URL
There are a few reasons why you might want to explicitly inform Google about a canonical page in a group of duplicates or similar pages, even though it isn’t usually necessary to specify a canonical preference for your URLs:– To group suggestions for related or duplicate pages. This helps search engines combine the signals they have for different URLs (such as links to them) into a single, preferred URL. Therefore signals from other websites are being used.
– To facilitate tracking metrics for a piece of content. Getting aggregated metrics for a particular piece of content can be more difficult when there is a large variety of URLs.
– To control syndicate content. If you syndicate your content for publication on other domains, you may want a specific URL to appear in search results.
– To avoid wasting time crawling duplicate pages. If you want Googlebot to get the most out of your website, it’s more profitable to spend time crawling new (or updated) pages on your site than crawling multiple copies of the same content. Do it
Use these best practices for all forms of authentication:
- – Use the Robots.txt file at least. txt file as a means of identification.
- – For authentication, avoid using URL extraction tools. Obfuscates search results for all variations of the URL.
- – Do not use different canonicalization methods to make different URLs the canonical version of the same page (for example, don’t provide one URL in the sitemap; instead, use rel=”canonical” to make different URLs use).
- – Do not use noindex to prevent the canonical page from being selected. The purpose of this rule is to exclude certain pages from the index, not to control the selection of authentic pages.
- – If you are using hreflang elements, be sure to specify a canonical page in the same language, or the best possible substitute language if there is no canonical page for the same language.
When creating internal links on your website, link to canonical URLs instead of duplicate URLs. Linking to URLs that you regularly consider to be authentic helps Google understand your preferences.
Various canonicalization techniques
The following table contrasts the various authentication techniques, highlighting their benefits and drawbacks in terms of maintainability and effectiveness in various situations.
Method and Specification
1. rel=”canonical”. Using the link element with.
All duplicate pages must have a code component that points to the canonical page.
- – An unlimited number of duplicate pages can be mapped.
- – You can increase the page size.
- – On large websites or websites where URLs change frequently, the mapping can be difficult to maintain.
- – Works only with HTML pages; Not with PDF files. In these situations, you can use the rel=”canonical” HTTP header.
2. HTTP request header rel=”canonical”.
- – As a response to your page, send a rel=”canonical” header.
- – Reduces page size.
- – Capable of mapping an unlimited number of duplicate pages.
- – Large websites or websites with frequently changing URLs can make mapping maintenance difficult.
- – Indicate in the sitemap which pages are canonical.
- – Easier to do and maintain, especially on larger sites.
- – For anyone you mark as authentic in the sitemap, Google still needs to find related duplicates.
- – Not as strong a signal as the rel=”canonical” mapping method for Google.
Use redirects to tell Googlebot that the redirected URL is a better version of the original. Use this only when deleting duplicate pages.
5. Version AMP.
If you have a variant AMP page, follow the AMP guidelines to determine the canonical page and AMP variant.
Use rel=”canonical” link annotations
The rel canonical link annotation of RFC 6596 is supported by Google. You can add the rel=”canonical” link annotation in two different ways:
While both of these methods are supported, using them together increases the potential for errors (for example, you may provide one URL in the HTTP header and another in the rel=”canonical” link element).
The rel=”canonical” link element
The canonical element, also known as the rel=”canonical” link element, is used in the head section of HTML to indicate that the content of another page is a good representation of the current page’s content.
Even though many URLs can access this content, let’s say you want https://example. com/dresses/green-dresses should be the canonical URL. Follow these instructions to designate this URL as the official URL.
1. Include an element with the rel=”canonical” attribute that points to the canonical page in the duplicate pages section. Consider this:
2. If the canonical page has a mobile version at a different URL, add a rel=”alternate” link element to it, pointing to the mobile version of the page.
3. If hreflang or other components are appropriate for the page, add them.
If the link element has rel=”canonical” set, use an absolute path instead of a relative path. Even though Google supports relative paths, we don’t recommend using them because they can be problematic down the road (for example, if you inadvertently allow your test site to be crawled).
Make sure that at least that section is valid HTML because the rel=”canonical” link element can only be used in HTML if it appears in that section.
The rel=”canonical” HTTP header
If you can modify your server’s configuration, you can use a rel=”canonical” HTTP header instead of the HTML element to point to the canonical URL for a document supported by Search, including non-HTML documents such as PDF files. can do.
At this time only web search results are supported by Google using this method.
If your content is in multiple file formats, such as PDF or Microsoft Word, each with its own URL, you can send the rel=”canonical” HTTP header to Googlebot to instruct it to use the canonical URL for non-HTML files.
What is a URL? For example, you can add the following HTTP headers .docx version to specify a PDF version of the content .docx version should be considered authentic.
HTTP/1.1 200 OK
Link: <https://www.example.com/downloads/white-paper.pdf>; rel=”canonical”
Similar to the link element, rel=”canonical” requires the full URL and, per RFC2616, uses only double quotes to enclose the URL.
Read This: Google: “Always Try” To Replace HTTP Links With HTTPS
Use a sitemap
Each of your pages should have a canonical URL, which you should then add to the sitemap.
All pages listed in the sitemap are recommended as canonical; Google will determine which pages (if any) are duplicates based on the degree of content similarity.
To define the canonical for a large site, it’s easy to include the preferred canonical URL in the sitemap.
Sitemaps are a useful way of informing Google about the pages on your website that you value most.
Use 3xx HTTP (also known as server-side) redirects for the fastest effect.
Let’s say you have several ways for people to reach your Page:
Choose one of those URLs to serve as your canonical URL, and then use redirects to send traffic from the other URL to your preferred URL.
In addition to explicitly specified methods, Google also employs a set of canonicalization signals that are typically based on the site’s configuration, such as favoring URLs over HTTPS and hreflang groups.
Prefer HTTPS over HTTP for canonical URLs
Except in cases where there are problems with the above or conflicting signals, Google prefers HTTPS pages as the authentic ones over the equivalent HTTP pages.
- – The SSL certificate used by the HTTPS page is invalid.
In addition to images, an HTTPS page relies on insecure components.
- – A user is redirected from an HTTPS page or via an HTTP page.
- – The rel=”canonical” link of the HTTP page is present on the HTTPS page.
Despite the fact that HTTPS pages are preferred by default in our system, you can guarantee this behavior by doing any of the following.
Convert HTTP page to HTTPS page.
- – Add a rel=”canonical” link from the HTTP page to the HTTPS page.
- – Submit HSTS.
To prevent Google from incorrectly canonicalizing an HTTP page, avoid the following actions:
- – Stay away from HTTPS-to-HTTP redirects and short TLS/SSL certificates as they force Google to prefer HTTP over HTTPS. This strong preference cannot be ignored by implementing HSTS.
- – Do not use hreflang annotations in your sitemap or the HTTP version of your pages instead of the HTTPS version.
- – Avoid deploying your SSL/TLS certificate to the wrong host-variant. For example, the certificate for the subdomain is presented by example.com. dot com, as an example. The certificate needs to be a wildcard certificate that works on a domain with multiple subdomains or it needs to match the entire URL of your site.
URLs with hreflang clusters are preferred
For canonicalization purposes, Google prefers URLs that are part of hreflang clusters to support site localization efforts.
The de-de and de-ch pages will be preferred as canonical, rather than the /de-at/ page that does not appear in the hreflang cluster, for example, if
https://example.com/de-de/cats and https://eg. com/de-ch/cats
both point to each other with mutual hreflang annotations.
Fix canonicalization issues
Use the URL Inspection tool to identify the pages that Google considers to be canonical.
Even if you explicitly specify a canonical page, Google may choose a different canonical page for a variety of factors, such as the strength of the content.
Think about whether your users will understand the canonical URLs chosen by Google better than the canonical URLs you prefer before troubleshooting.
There are many reasons why the canonical URL is chosen may not be the one you wanted to appear in the search results.
Common canonicalization issues
Language variants without localized annotations
Be sure to follow our recommendations for localized sites if you operate multiple websites that serve the same content locally to different users around the world. If you have different websites for English-speaking users in the US, UK, and Australia, but the content is the same, for example, adding hreflang annotations to your pages will guarantee that users from different regions will see the appropriate page. can be useful.
Incorrect canonical elements
Canonicalization techniques can be improperly used by some content management systems (CMS) or CMS plugins to direct URLs to external websites. To find out if this is the case, use the developer tools in your browser to inspect your HTML. Contact your CMS provider and notify them of the error if your website is displaying an unexpected preference for a canonical URL, possibly as a result of improper use of rel=”canonical” or 3xx redirects.
Some hosting configuration errors may result in unexpected cross-domain URL selections as follows:
When a request is made to a URL on another website, the server may be misconfigured and respond with the contents of example.com. example.
Soft 404 pages that two unrelated web servers return may not be recognized as errors by Google. Contact your host if you experience this problem.
Some attacks inject code into websites that cause them to return HTTP 3xx redirects or add cross-domain rel=”canonical” link annotations to HTML or HTTP headers, which typically point to URLs hosting harmful or spammy content. indicates. does. In these circumstances, our algorithms may select a spammy or malicious URL instead of the URL on the compromised website.
A copycat website
Sometimes, without your knowledge, our algorithm may choose a URL from an external website hosting your content. If you believe that another website is using it without your permission in violation of copyright laws, you can ask the host of the site to remove your content. In addition, you can ask Google to remove illegal pages from our search results by submitting a request under the Digital Millennium Copyright Act.
Remember that if the canonical URL is in a Search Console property that you don’t own, you won’t receive any traffic to your duplicate page.
- Top 65 Technical SEO Interview Questions And Answers
- Does Bold Text Help SEO
- How To Create The Perfect H1 Tag For SEO
- Google Announces Five Changes Coming To Mobile Search
- Benefits Of Using Semrush
How to use Canonical URL?
If you work on the server, you should use the rel=”canonical” HTTP header there. This will help search engines recognize that this is a canonical URL. Google currently only supports using this method for searching. For each of your pages, you choose the canonical URL and include it in the sitemap.
How to fix the canonical URLs?
There are two ways to fix this. You can tell Google which of two identical pages to favor by doing one of two things: implementing 301 redirects or adding a canonical tag to the site’s page.
How to remove the canonical URL?
Click the Actions menu at the top of the page, then choose Title and Properties. You can choose to keep or turn off the canonical URL by selecting it from the dropdown box.
How to create a canonical link in HTML?
By using the HTML tag rel=”canonical” to set canonical.
How to find the canonical URL?
<link rel=”canonical” href=”inserturl.com<?