• Home
  • Get help
  • Ask a question
Last post 1 hour 10 min ago
Posts last week 94
Average response time last week 34 min
All time posts 67856
All time tickets 10487
All time avg. posts per day 20

Helpdesk is open from Monday through Friday CET

Please create an (free) account to post any question in the support area.
Please check the development versions area. Look at the changelog, maybe your specific problem has been resolved already!
All tickets are private and they cannot be viewed by anyone. We have made public only a few tickets that we found helpful, after removing private information from them.

#63 – Follow up to #27 - htaccess regex rule problems

Posted in ‘sh404SEF’
This is a public ticket. Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.
Monday, 13 April 2015 16:10 UTC
waltcooley
  This is a follow-up to the issue we were having with htaccess regex rules, redirecting the old URLs to the new ones created with sh404sef.

I've been round-and-round with our hosting company's support trying to get this working, and they told us 'use sh404sef and create manual redirects using the joomla redirect' (which is not practical when you have 3000+ different old urls to redirect). So I'm asking here one more time to see if maybe you can spot the problem or offer some sort of practical solution.

Here's what we have right now in our htaccess file:

RewriteRule ^topics\/([-0-9a-zA-Z]+)?\/([0-9]+)(-)([0-9a-z,-]+)? http://xxxx.net/topics/$1/$4 [L,R=301]


Things work fine except when we have an article that starts with an number like
"9 New Year's REsolutions for Cattle Producers" which generates a URL like this:
http://xxxx.net/topics/management/3542-9-new-years-resolutions-for-cattle-producers


When that happens the rule automatically removes the "9" from the article alias, resulting in:
http://xxxx.net/topics/management/new-years-resolutions-for-cattle-producers


I've tried placing a new htaccess rule in advance of the above rule:
Redirect 301 /topics/management/3542-9-new-years-resolutions-for-cattle-producers http://xxxx.net/topics/management/9-new-years-resolutions-for-cattle-producers

or
rewriterule /topics/management/3542-9-new-years-resolutions-for-cattle-producers http://xxxx.net/topics/management/9-new-years-resolutions-for-cattle-producers [r=301,nc,L]


But the 'generic' (regex) rule seems to always run and remove the leading number from the article name/alias.

I've also tried a regex like this:
RewriteRule ^topics\/([-0-9a-zA-Z]+)?\/([0-9]{1,4}?)(-)([0-9,a-z,-]+)? http://xxxx.net/topics/$1/$4 [L,R=301]

to try to isolate the initial article number from the number appearing at the start of the article name, but it doesn't solve the problem either.

I've had several people suggest multiple solutions, but haven't been able to get this to work.
I'm at my end of being able to resolve this.

This isn't rocket-science and should be a rather common need in the world of Joomla (esp. for those migrating from std Joomla SEF to sh404sef URLs).
I just need to redirect old ULRs with article numbers to new ones (with sh404sef) without the article numbers.

Are you able to look at this and see what might solve the problem?
Monday, 13 April 2015 18:34 UTC
wb_weeblr
Well I'm of the opposite opinion: this is totally rocket science. I try to avoid htaccess rules at all costs, and contrary to popular belief, I'm no expert at it all, because sh404SEF is not involved in the URL rewriting at all, it's only involved at the point where the request has arrived into Joomla!
It's the web server that does URL rewriting, not Joomla/

That said, your query not working is very strange. It should work. It does work on the htaccess testers that I know of for instance (http://htaccess.madewithlove.be/)
It also work on the reg exp tester I use when I have to do a PHP reg exp: as you can see here: https://weeblr.com/images/screenshots/N06jSGFKiRF6zx2P4I5UDv5Z7lClgE.png, it should work. So it must be something really tricky.

However, on my local windows apache, it doesn't work, and I don't know why.
At this point, I'd suggest to take the problem to people familiar with sys admin. My best bet would be serverfault.com, the equivalent of stackoverflow.com, but for sysadmin.
There are plenty of replies for htaccess questions, so there's good hope a guru will shop up and give good advice.

 
Friday, 17 April 2015 23:24 UTC
waltcooley
I just wanted to follow-up on this and let you know what we found.

After having several rather advanced apache managers look at this (and LOTS of testing) we arrived at the fact that complex rules like these in apache are not reliable.

We even had 2 rules that were identical, except for the folder in the match path (so it was matching 'topics' vs 'news'). The rule for 'topics' worked, but the rule for 'news' didn't.

So we're in the process of manually creating unique rewrite rules for EVERY old URL to the new URL (we'll use some scripting to create those entries) and then place them in the htaccess.

After consulting with several SEO/server experts, they convinced me/us that the best place for these rules (to keep Google reporting properly, etc.) is in the htaccess.

I wish there was some extension that would do this (create a bunch of rewrite rules that take old URLs and then using a regex rule, output the new url), but, because this would likely only need to be a one-use tool, there probably isn't much market for it.

If you're interested (if it will help others) I'll document the process and share with you (although, this might be worth submitting to the Joomla magazine for all to see).


Thanks again for your help in this process.
Saturday, 18 April 2015 18:43 UTC
wb_weeblr
Hi

Thanks for feedback.

Not really convinced that Apache rules are not reliable, that'd be a huge thing I think. Doesn't mean I know what the problem is? Did you take it to serverfautl.com? That's where the realpeople can be found.

Considering the option you're taking, and if that's an option for you (depends if you control the server), i'd suggest to put the rewriting rules in apache configuration file, httpd.conf. That way, they'll be read once at startup, instead of pretty much on each request as happens when you use an .htaccess file.

The alternative of having a plugin created for you by a freelancer programmer would cost close to nothing and might well be a very good one. The reg exp could be in a parameter of the plugin, that should not be so tricky.
I do have in my bug tracker an item to set redirects based on rules on regexp, but that's not there yet, and there's no ETA on this

Rgds
 
Sunday, 03 May 2015 05:34 UTC
system
This ticket has been automatically closed. All tickets which have been inactive for a long time are automatically closed. If you believe that this ticket was closed in error, please contact us.
This ticket is closed, therefore read-only. You can no longer reply to it. If you need to provide more information, please open a new ticket and mention this ticket's number.