• Home
  • Get help
  • Ask a question
Last post 1 hour 12 min ago
Posts last week 89
Average response time last week 30 min
All time posts 67693
All time tickets 10463
All time avg. posts per day 21

Helpdesk is open from Monday through Friday CET

Please create an (free) account to post any question in the support area.
Please check the development versions area. Look at the changelog, maybe your specific problem has been resolved already!
All tickets are private and they cannot be viewed by anyone. We have made public only a few tickets that we found helpful, after removing private information from them.

#4428 – Main URL is reverting after being explicitly selected

Posted in ‘sh404SEF’
This is a public ticket. Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.
Monday, 18 December 2017 18:45 UTC
webmasterhlc
 Greetings:

Our ticket was closed (original ticket: #4303) before we'd had the opportunity to take the advice and perform the actions recommended. I wanted to take this opportunity to follow up and see if you might have any further suggestions, as our issue remains unresolved. Since our last exchange:

We did a complete purge of the sh404SEF database but this did not correct the issue.

We did an update to Joomla (now currently running 3.8.2), but this did not correct the issue.

We verified that the settings recommended in 4303 were consistent with our configuration - they were, so there was nothing to change.

In your response, you asked a few questions:

- is the URL that gets back to #1 spot always the same? ---No. I am attaching a log so you can see.

- if so, how does it differs from the other, especially the "right" one? just a difference in Itemid? ---N/A

- what are the actual and full non-sef for the "good" main URL, and the wrong one? ---Please see the attached log.

This is obviously very bizarre, so any further insights would be appreciated. We've been trying to understand what might be different about this page versus any other page - it does use a unique layout template from all the other pages, but there's nothing in the template code that looks to us like it might be causing this.

Thanks much!

Jon
Tuesday, 19 December 2017 06:51 UTC
wb_weeblr
Hi Jon,

Thanks for your thorought analysis and logs. This is indeed a very difficult case, and so far I do not have a good idea about why this is happening.
Looking at your logs, the duplicates are a typical case of joomla/extensions using many different Itemids to link to a page, so that's not unusual at all.
What's unheard of is the selected "main" URL changing all by itself. That's simply weird, there's no code in sh404SEF that can do that, unless you (or someone else on the team?):

- purge URLs
- "delete with duplicates"
- under some circumstances, manually customizing one of the duplicates

In all those instances, the duplicates are removed and so the selected "main" URL is also cleared. As long as as none of those operations is performed, the manually selected main URL should absolutely be the same and stay like that.

We've been trying to understand what might be different about this page versus any other page - it does use a unique layout template from all the other pages, but there's nothing in the template code that looks to us like it might be causing this.
So this is the only page that expericence this behavior (I mean losing the main URL)?

Note that with respect to having more and more duplicates for a specific page, that will not be caused by something happening on that page. The duplicates created are links to that page, so they are located on other pages on the site, as in a "recent articles" modules, or sometimes just a link in the footer, that does not have a proper Itemid, and get one automatically from Joomla.

Rgds
 
Tuesday, 19 December 2017 21:13 UTC
webmasterhlc
Thanks for the response!

We have been deleting the URL with duplicates in most instances when we find that that sh404 has selected the wrong non-SEF URL. Sh404 then immediately creates some duplicates and selects another wrong non-SEF URL, and we then manually select the correct non-SEF URL.

We’ve also tried NOT deleting the URL with duplicates but just manually selecting the correct non-SEF URL.

--Both methods result in sh404 changing, seemingly spontaneously, to the wrong non-SEF URL within a day or so.

We haven’t done the other two things that you mention.

I am confident that something is causing sh404SEF to revert to the wrong URL - but based on your responses, my own experience, etc., I have no idea what, or what to look at. We've used this software a long time and never seen anything like this before.

Would it help at all to have access to our site administration? I could grant you access to look around if that would be potentially beneficial.

Or do you have other suggestions for how we might track this down?
Wednesday, 20 December 2017 10:02 UTC
wb_weeblr
Hi

--Both methods result in sh404 changing, seemingly spontaneously, to the wrong non-SEF URL within a day or so.
Yes but that's the problem: there is nothing in sh404SEF that can do that (aside from the situations I mentioned) and as you noted, sh404SEF has been around for quite a few years now.

Would it help at all to have access to our site administration? I could grant you access to look around if that would be potentially beneficial.
Please do so, although I have no specific idea to check ATM, i'll just review your settings.

Can you confirm that:

1 - this happens on a number of different URLs, not just the single one you mentioned above?
2 - if this is happening on other URls, is there a pattern, like maybe all are from the same component?
3 - On your log attached to this thread, what do you mean by "Reassign alias"? When making a URL the "Main" one, aliases should be transferred. However, indeed, if the re-assignment of the main URL is done by some bug or external cause, then the process of transferring aliases would not happen.

4 - If possible, the next step would be to enable "Record URL source". a/start with a working situation b/ Enable "Record URL source" c/ Disable it as soon as the error happens again (on any single URL) d/ Click the "Source" button for that URL, on the SEF URL manager.
The problem with this is that the "Record URL source" gathers a lot of data. If your site has significant traffic, event after a day or two you can have tens of megabytes of data stored in the #__sh404sef_urls_src db table. Not a problem if your hosting is up to the task, might be an issue on smaller hosting packages. Also, afterwards, you will have to emoty that table from phpmyadmin or equivalent, there's no UI inside of sh404SEF to do that.

Rgds
 
Wednesday, 20 December 2017 17:38 UTC
webmasterhlc

I can assure you this is happening spontaneously. We make the setting, the setting is saved - we verify the page on the front end and our breadcrumbs appear as expected. A day or two goes by and we notice the page no longer has breadcrumbs. We go back and the non-SEF URL is no longer correct - but no human has made this change. I wish I could record this!

1 - this happens on a number of different URLs, not just the single one you mentioned above?

This only happens on the single URL. No other page has this issue.

3 - On your log attached to this thread, what do you mean by "Reassign alias"?

This just means that we manually re-select the correct alias (since we deleted with duplicates, we have to re-designate the page alias). It's not really relevant to the issue, I don't think.

We did actually come across the "Record URL source" option on our own and played with it, then disabled it, unsure if it would help. We will take this step next (we have about 70 GB available on the drive at the moment). It may be a week, with the holiday and everything, before I can get back with you.

Thanks again for your time!

Wednesday, 20 December 2017 18:02 UTC
webmasterhlc
I just got some more info on item 3 - our admin needs to recreate the xxxx.org/conference alias each time she resets the non-SEF URL in sh404. The alias still works after sh404 resets to an incorrect non-SEF URL, and when you open the list of duplicates for the AC home page, the alias is listed next to the correct non-SEF URL. But when you set that correct URL as the main one, the alias gets deleted and needs to be recreated. From your response, it sounds like this isn’t supposed to happen.
Wednesday, 20 December 2017 18:52 UTC
wb_weeblr
Hi

This only happens on the single URL. No other page has this issue.
OK, that's useful, not sure how, but that's quite important.

It may be a week, with the holiday and everything, before I can get back with you.

We'll be off ourselves from this friday evening until Wednesday 3/01.

But when you set that correct URL as the main one, the alias gets deleted and needs to be recreated. From your response, it sounds like this isn’t supposed to happen.

Yes, it is supposed to happen in that case. When you select a new "main" URL, the aliases are copied from the current "main" to the new "main".
As the currently selected main is wrong (when she tries to fix the problem) and has no alias, then this is what gets copied to the new "main", in effect wiping out the existing, correct, alias.

So this is a consequence of the bad non-sef being selected as the main in an unknown way. When that happens, it seems this new main is added disregarding the normal sh404SEF procedure, which would preserve all aliases.

1 - Do you have any custom code running on this site? anything out of just standard unmodified extensions?

Very troubling that this happens only on one single URL.

2 - Is it a custom URL or automatically generated one?
3 - How is it linked to? from a menu? does it have multiple menu items linking to it? is it linked from a module displayed on multiple pages?

Rgds
 
Thursday, 21 December 2017 15:40 UTC
webmasterhlc
Thanks for the clarification on the alias. I spoke with our site administrator in more detail about this and shared your observations / questions. Here are our combined thoughts:

This is the only page on the live site that uses this particular template. It’s also the only page that uses a Javascript "Loading..." function, but this problem was happening before we added that. Notably, we have not seen this problem on the dev site version of this page - but that site gets very little traffic comparatively.

The URL is an automatically generated one, so far as we know. One thing about this page: http://xxxx.org/ redirects to it. However, this isn't a unique situation for this page - we have other redirects for other pages.

The AC home page is linked to from the sidebar (module) of all the other conference pages, as well as in the body of various pages throughout the site. These embedded text links would have (should have) been created by linking to the menu item using JCE editor’s link tool. There aren’t multiple menu items that point to that page, though, and it's worth noting that we have a number of other pages that meet the same basic criteria.

It is also linked to on the homepage in ribbon 4, which is a module that displays article content.

The AC home page is also linked to twice in the Programs and Events megamenu section. This is not unique to this page—we use the same methods throughout the megamenu.

I recognize these details might not shed light on anything, but I'll continue to share evidence as we are able. Please let me know if you have any other questions we can address, and in the meantime have a great holiday!

Jon
Friday, 22 December 2017 09:37 UTC
wb_weeblr
Hi

Yes, logging all info available here is the right thing to do, even if it doesn't help immediately. What you describe about the links is indeed pretty usual, and should not cause that particular issue on that particular URL.

So I guess next step is to use the URL source recording.

Great holiday to you too!

Cheers
 
Thursday, 04 January 2018 14:44 UTC
webmasterhlc
Happy New Year!

My admins ran into the problem and fixed it, then turned on logging as directed. Only two hours later the link had "reverted" to the wrong one again, so they turned off logging. The log table grew to 1.4 GB in that brief time period, but I was not aware that they had done all this until a couple days later. During that time, our normal daily backup of the database (using mysqlbak) started crashing the web server. When I attempted to extract just the contents of the single table, this too would crash the web server. I opted to give up, wipe the table clean, and request that our host increase the available memory cache for MySQL going forward.

What's the best way to go about this? I expect that we will have the same outcome - a very large table. Do you want me to send the entire contents of the table, assuming I can extract it? Or should I perform a query to narrow the results (what would the query look like?)?
Friday, 05 January 2018 17:06 UTC
wb_weeblr
Hi

Thanks and the same to you! Let's hope in 2018 we can tackle this issue ;)

I expect that we will have the same outcome - a very large table.
Yes, the url_src table content is directly linked to the amount of traffic you have, so on a busy site, the table size can increase very fast.

You could perform a query that search only for the URL we want, but you would still store very large amount of data in the table, so that's no good. Here is what we can do instead:

1 - Download and install the latest dev version from the download area. It has a couple of bug fixes for last version (some might be important). I have just added to it a filter on the URL source recording process, that let's you programmatically select which data to log.
2 - Create a folder: /libraries/weeblr
3 - In that folder, create a file called sh404sef_functions.php
4 - Paste the following content in that file:

<?php
/**
 * sh404SEF hooks file
 */

defined('_JEXEC') or die();

ShlHook::add(
	'sh404sef_store_url_source_data',
	function ($data) {
		if (!wbContains($data['routed_url'], 'Programs-Events/conference.html'))
		{
			$data = null;
		}
		return $data;
	}
);


What this will do is filter the data just before it is recorded in the Record source URL table. Inside the function, we check if the SEF URL contains Programs-Events/conference.html (that's the URL having the problem, right?).
If not, we discard the data. If yes, we return the data untouched, which will then be recorded normally.

4 - After saving that file, truncate the url_src table, and enable the recording again. Then look at the DB table: it should grow much more slowly, and most importantly should only record data about that specific URL.

In case this file causes any issue, just rename it to something else than sh404sef_functions.php, that will disable it entirely.

Rgds

 
Tuesday, 09 January 2018 13:17 UTC
webmasterhlc
The code above (and directions) is super helpful and I will do it - but before I do, I managed to snag a 19MB CSV from the table.

https://www.xxxx.org/tmp/hlc_2017__1-9-18__.csv.zip

Let me know if that's sufficiently helpful - if not, I'll do the above ASAP.

Thanks for your help with this!

Jon
Wednesday, 10 January 2018 10:07 UTC
wb_weeblr
Hi

So I have looked in to the file, but it does not provide insight on anything. We cannot see when the "bad" URL is made the main one and that's because the URL rank is not recorded in the logged data.

So I have added this feature and the current dev version also records the rank. This may tell us at least on which page the problem occurs. What I woudl recommend now:

1 - Truncate the url src table
2 - Download and install the modified sh404SEF version from this page
3 - Implement the code provided above, to limit the amount of recorded data to only the page we want
4 - Enable URL source recording again until the problem comes back, then disable it and export the data in the log table

Hope it works!

Rgds


 
Tuesday, 23 January 2018 21:07 UTC
webmasterhlc
I apologize for the delay - this is now all in place. As of 3pm our time, the URL in sh404SEF was correct. We enabled logging and the log is filling up (40 rows in >5 minutes) with references to the conference.html page.

Thanks for your help with all this - I'll update you when we do the export.
Thursday, 25 January 2018 15:50 UTC
webmasterhlc
I am not certain when the URL "reverted" but it has done so. Attaching a zip of a SQL dump of the _sh404sef_urls_src table.

Friday, 26 January 2018 09:28 UTC
wb_weeblr
Hi

There is no ZIP attached. Maybe it's too big? You can always upload it to this private online folder.

Rgds
 
Wednesday, 07 February 2018 14:43 UTC
webmasterhlc
I couldn't get the One Drive upload to work without a login, and I didn't want to troubleshoot that so I just posted the file to our server:

https://www.xxxx.org/tmp/hlc_2017__1-25-18__.sql.zip

Thanks for your patience! I was out with the flu and then had offsite training for a week, so I have been slow to catch up. Please let me know your thoughts on the data at your earliest convenience.

Wednesday, 07 February 2018 15:04 UTC
wb_weeblr
Hi

I downloaded the file, so you can remove it now. I'll get back to you with any finding.

Rgds
 
Wednesday, 21 February 2018 18:14 UTC
webmasterhlc
Greetings - did the data reveal anything?
Monday, 26 February 2018 10:13 UTC
wb_weeblr
Hi

Sorry about the delay! THe data did not reveal anything, because it still does not contain the rank information (ie whether the URL is stored as main or as a duplicate).

As this is not built-in the main release, and I assum eyou have the current release installed, can you check that your #__sh404sef_urls_src table does have a "rank" column?

If so, what is your exact and full sh404SEF running?

Best regards

 
Friday, 02 March 2018 18:53 UTC
webmasterhlc
Good news - after the latest Joomla update the problem has gone away. I'm not clear why, but there it is - thank you for all your help! I realize issues like this can be maddening, and your willingness to dig deeper despite the inherent mystery is much appreciated.

Please close this ticket.
Monday, 05 March 2018 08:53 UTC
wb_weeblr
Hi

Well, I guess I'm happy! That news is good though even more troubling as I have no idea how something in Joomla could cause this to happen. There were indeed multiple changes to the Joomla router in each of last versions so that's certainly the link, but I guess we'll basically never know. That's sad really, because it means it can happen again.

Anyway, glad your life's a bit simpler now!

Best regards

 
Tuesday, 20 March 2018 05:34 UTC
system
This ticket has been automatically closed. All tickets which have been inactive for a long time are automatically closed. If you believe that this ticket was closed in error, please contact us.
This ticket is closed, therefore read-only. You can no longer reply to it. If you need to provide more information, please open a new ticket and mention this ticket's number.