• Home
  • Get help
  • Ask a question
Last post 7 hours 45 min ago
Posts last week 110
Average response time last week 29 min
All time posts 67424
All time tickets 10424
All time avg. posts per day 21

Helpdesk is open from Monday through Friday CET

Please create an (free) account to post any question in the support area.
Please check the development versions area. Look at the changelog, maybe your specific problem has been resolved already!
All tickets are private and they cannot be viewed by anyone. We have made public only a few tickets that we found helpful, after removing private information from them.

#7854 – What about the "|" character?

Posted in ‘sh404SEF’
This is a public ticket. Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.
Monday, 07 June 2021 07:31 UTC
TheSDHotel

Hey there,

In both the 

"Strip characters" setting, and "Character replacements list" setting, the | character is used as a separator to decide which characters to strip/replace.

But what if you want to strip the | character itself? I guess that should be stripped by default? But it looks like it's not stripped. Is there any way to strip it? Maybe by escaping it like this?

\||-

but not sure if that would work.

Let me know, thanks :)

Monday, 07 June 2021 09:06 UTC
wb_weeblr

Hi

But what if you want to strip the | character itself? I guess that should be stripped by default?

Why would you strip it? perfectly valid which is why I used it as a separator I guess.

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Monday, 07 June 2021 09:20 UTC
TheSDHotel

 

Why would you strip it?

Not sure, for the same reason that you may want to strip any other character? Which is mostly personal preference for a lot of those characters :D

But either way, in this particular case, the reason is another...

perfectly valid which is why I used it as a separator I guess.

Feed validator doesn't seem to think so... Don't ask me why. But that's how I found out.

5YQfZWR.png

 

Monday, 07 June 2021 10:04 UTC
wb_weeblr

Hi

You're right in that it's now an "unsafe" character. That is it's valid but "some processors may give it a special meaning". Guess your particular feeds processor does.

How exactly are those links created? When I tested:

- if sh404SEF is set to use article alias, well, Joomla strips the | so they are not there

- if sh404SEF is set to use article title (where those strips caracters are used), the | characters is properly URL encoded to %7C (ie /sample-data-articles/article-with-%7C-in-title)

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 

 
Monday, 07 June 2021 10:55 UTC
TheSDHotel

Guess your particular feeds processor does.

This is the feed validator recommended by Google btw: http://validator.w3.org/feed/

if sh404SEF is set to use article title (where those strips caracters are used), the | characters is properly URL encoded to %7C (ie /sample-data-articles/article-with-%7C-in-title)

Interesting. So I noticed that while the URL Manager in sh40sef does not encode the character, when you actually visit the URL, it's encoded in the browser URL bar like you say.

However, it is not encoded in the Feed URL (which is where I noticed the issue)

If I were you, I would strip the | character by default in sh404sef, so you can keep using it a separator in the settings with no issue

 

Monday, 07 June 2021 12:18 UTC
wb_weeblr

Hi

If I were you, I would strip the | character by default in sh404sef, so you can keep using it a separator in the settings with no issue

Not backward compatible, cannot be happening by default.

This is the feed validator recommended by Google btw: http://validator.w3.org/feed/

That's the one I use.

 when you actually visit the URL, it's encoded in the browser URL bar like you say.

I think it's the browser doing this, in the content, the page source, the | character is present. 

However, it is not encoded in the Feed URL (which is where I noticed the issue)

Nor in Joomla content. And that's kind of expected as, again, it's a valid character (unsafe but valid, not reserved, and does not require encoding per RFC 3986

I think I'll try to add a specific syntax for characters to exclude such as \| as you said, or just a double ||

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 

 
Monday, 07 June 2021 12:23 UTC
TheSDHotel

Not backward compatible, cannot be happening by default.

What's the issue? I don't see the backward compatibility issue

Will never be an issue unless people purge all URLs. Which is a destructive action anyway.

Won't affect existing URLs.

Monday, 07 June 2021 12:51 UTC
wb_weeblr

Hi

Will never be an issue unless people purge all URLs. Which is a destructive action anyway.

Unfortunately, people do that all the times. Lots of people do that after each update for instance. They should not. I explain they should not. They do.

Best regards

Yannick Gaultier

weeblr.com / @weeblr

 

 
Monday, 07 June 2021 15:22 UTC
TheSDHotel

Haha. 

Unfortunately, people do that all the times. Lots of people do that after each update for instance. They should not. I explain they should not. They do.

For anyone to notice such "BC Break" they would have to:

- Be stupid enough to purge all URLs

- Happen to use the article title instead of alias to generate URLs

- Happen to have an article with a | character in the title

- Notice or care that that URL is generated differently the next time it's generated by sh404sef after the purged URLs (which I don't think they will ever notice or care, given that they're purging URLs in the first place, means they have no idea what they're doing)

So I don't think you will ever have a single complain for this if you strip it by default :D

So honestly I would just strip it by default and call it a day. (which sounds the fastest solution)

Or this (but this sounds complicated):

I think I'll try to add a specific syntax for characters to exclude such as \| as you said, or just a double ||

Or even:

Ignore it and forget it, it's such a minor issue :D

Monday, 07 June 2021 15:48 UTC
wb_weeblr

Hi

So I don't think you will ever have a single complain for this if you strip it by default :D

I don't care so much about that complaints, that's not the problem. The problem when you do a product such as mine is to do the best for the best result, sometimes despite users, or rather taking into account what users do.

Getting 404s after an update is a major problem.

Ignore it and forget it, it's such a minor issue :D

Sounds best to me

 
This ticket is closed, therefore read-only. You can no longer reply to it. If you need to provide more information, please open a new ticket and mention this ticket's number.