Last post 4 hours 8 min ago
Posts last week 141
Average response time last week 4 hours 42 min
All time posts 67772
All time tickets 10472
All time avg. posts per day 21

Helpdesk is open from Monday through Friday CET

Please create an (free) account to post any question in the support area.
Please check the development versions area. Look at the changelog, maybe your specific problem has been resolved already!
All tickets are private and they cannot be viewed by anyone. We have made public only a few tickets that we found helpful, after removing private information from them.

#6398 – Joomla Duplicate Content Probs

Posted in ‘Pre-sale questions’
This is a public ticket. Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.
Friday, 24 March 2017 11:44 UTC
martinsai
Hi Guys,
it is about the following project: https://www.patlector-patentanwalt.net/de/
on google.de site:https://www.patlector-patentanwalt.net/de I find search results like see attachet image. those results are Joomla generated useless URLs. Can your extension help me to get rid of it? Or is a canonical plugin enought for this matter?
If you help me with your sh404 extension I will definetely buy it....
cheers martin

sry I cannot attach picture, only link.
unwanted links look like:
https://www.patlector-patentanwalt.net/de/88-slideshow.html
https://www.patlector-patentanwalt.net/de/93-designschutz.html
they produce duplicate content....
 
Friday, 24 March 2017 11:53 UTC
wb_weeblr
Hi

It all depends on why you have such pages in Google. For instance, /de/88-slideshow.html is a valid URL, it returns some content, etc
- what is the canonical URL for this page?
- Where did Google found a link to this page?

Not sure what a canonical plugin can do about that, the problem is to automatically find the canonical URL for a page, and I'm not aware of any plugin that can do that. Even sh404SEF can do it automatically only in a limited number of cases (what we do is actually prevent duplicate URLs to appear in the first case). You'll be able to add canonical manually on any URL of course, but it's only useful if the problem is small, ie you only have a limited number of such URLs (and if you have a limited number of such URLs, then just don't bother, Google will sort it out).

So the real question is how do they found those URLs?

You can probably have that fixed by sh404SEF, which will not create multiple URLs for the same page, but then some of your URLs may/will change from the existing, already indexed ones. So you will have to handle that as well, through redirects or manual customization of the URLs to match the old ones.

Rgds
 
Friday, 24 March 2017 12:13 UTC
martinsai
ok thnx for reply. let's make it simpler:
I want to get rif of for example: https://www.patlector-patentanwalt.net/de/95-patentanwalt.html

what makes me worry, ist that if I search for site:https://www.patlector-patentanwalt.net/de
google shows URLs like above shown and those URLs lead to pages that are not correctly displayed and duplicate content. see example
https://www.patlector-patentanwalt.net/de/93-designschutz.html
https://www.patlector-patentanwalt.net/de/designschutz/design-beispielfall.html

cheers, martin
 
Friday, 24 March 2017 12:19 UTC
wb_weeblr
Hi

I want to get rif of for example: https://www.patlector-patentanwalt.net/de/95-patentanwalt.html">https://www.patlector-patentanwalt.net/de/95-patentanwalt.html
And it's hard to get rid of them entirely if you don't know where they come from. Where did Google found them on your site.

Yes, using sh404SEF, if such a URL is requested by Google, they'll get a 404 because it doesn't exist, while with Joomla SEF it will try desparately to display something. In your example, Joomla SEF will see the article id 93, and display that article. That's a problem you won't have with sh404SEF.


Rgds
 
Friday, 24 March 2017 13:16 UTC
martinsai
sry to bother you again:
Where did Google found them on your site.

I dont know what you mean by "where". Google indexes this URL https://www.patlector-patentanwalt.net/de/93-designschutz.html
and also indexes this URL 1 https://www.patlector-patentanwalt.net/de/designschutz/design-beispielfall.html
both lead to the same result. As I understood, your extension helps me to get rif of URL 1

2. there are results that are useless like this one: https://www.patlector-patentanwalt.net/de/88-slideshow.html
plz look at this page....it is useless
there is no article with ID 88! Joomla generates it because I use a slideshow module. Will your extension help me here?

cheers martin
 
Friday, 24 March 2017 13:23 UTC
wb_weeblr
Hi

I dont know what you mean by "where".
Google can only index what it finds on your site, it cannot "invent" a URL such as https://www.patlector-patentanwalt.net/de/93-designschutz.html.

Search engines crawl your site (ie read your home page, search for all links on it, then read each of these links, looking also for more links, and so on) to find pages. If Google indexed /de/93-designschutz.html, it means somewhere on your site, or maybe on somebody else's site, or in an RSS feed, etc, this link exists and Google found it.
Then another problem is that when Google tried to read the content of this page, Joomla accepted it and returned a valid page content. But again, the initial problem is that the link exists somewhere and Google could find it.

there is no article with ID 88! Joomla generates it because I use a slideshow module.
There you found the source of all your issues. It's not Joomla. It's your slideshow module generating bad links.


Will your extension help me here?
Cannot answer to that, because obviously the problem is in your slideshow module, creating bad links.

Rgds
 
Friday, 24 March 2017 13:47 UTC
martinsai
ok , to sum up: would you say that your extension can remove many unnecessary URLs line mentioned above and improve thus google ranking?
 
Friday, 24 March 2017 13:55 UTC
wb_weeblr
Hi

I don't know. It depends on how broken your slideshow is and why it generates bad links.

If it follows Joomla API and passes URLs to sh404SEF, then we can do a better job than the Joomla SEF.

If it only generates them directly, there is nothing we can do.

Even if you do not fix the problem in your slideshow, sh404SEF can still help you as it will 404 any URL not in the database, but again it all depends on what your slideshow is doing.

Rgds
 
Friday, 24 March 2017 14:02 UTC
martinsai
Search engines crawl your site (ie read your home page, search for all links on it, then read each of these links, looking also for more links, and so on) to find pages. If Google indexed /de/93-designschutz.html, it means somewhere on your site, or maybe on somebody else's site, or in an RSS feed, etc, this link exists and Google found it.

Then another problem is that when Google tried to read the content of this page, Joomla accepted it and returned a valid page content. But again, the initial problem is that the link exists somewhere and Google could find it.


I dont think that there is an internal or external link for https://www.patlector-patentanwalt.net/de/93-designschutz.html
I think the only reason ist that Joomla generates those URLs as widely known because there is the category with ID 93.
 
Friday, 24 March 2017 14:13 UTC
wb_weeblr
Hi

I dont think that there is an internal or external link for https://www.patlector-patentanwalt.net/de/93-designschutz.html
If Google found it, then there is a public link to this page on the internet. Whether on your site or elsewhere.

I think the only reason ist that Joomla generates those URLs as widely known because there is the category with ID 93.
Joomla generate URLs to display them (essentially). Google can only index a URL it finds somewhere on a web page.

The only way this sort of things actually happen usually is if that link is found on a search result page (Google bot often performs searches on your site, to find new links).

Searching for designschutz on your site does return the category link as the first result, but the link is https://www.patlector-patentanwalt.net/de/component/content/category/93-designschutz.html, so that's not it. Note that the search also returns what seems more bad links.

Anyway, at this stage we both have to move forward, so I would say in conclusion that indeed sh404SEF will do a better job and most likely avoid most of this mess. There is no guarantee, because you are running extension/template that's not doing what it should, and the only real thing you can do is give it a go. As mentioned earlier, installing sh404SEF will likely change some of your URLs, so you will have to deal with it either by adding redirects or manually customizing the URLs that may be different to match the old ones.

Rgds
 
This ticket is closed, therefore read-only. You can no longer reply to it. If you need to provide more information, please open a new ticket and mention this ticket's number.