Unix Technical Forum

Re: [GENERAL] plperl and regexps with accented characters- incompatible?

This is a discussion on Re: [GENERAL] plperl and regexps with accented characters- incompatible? within the pgsql Hackers forums, part of the PostgreSQL category; --> Andrew Dunstan wrote: > > > > Greg Sabino Mullane wrote: >> >> Yes, we might want to consider ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-15-2008, 11:32 PM
Andrew Dunstan
 
Posts: n/a
Default Re: [GENERAL] plperl and regexps with accented characters- incompatible?



Andrew Dunstan wrote:
>
>
>
> Greg Sabino Mullane wrote:
>>
>> Yes, we might want to consider making utf8 come pre-loaded for
>> plperl. There is no direct or easy way to do it (we don't have
>> finer-grained control than the 'require' opcode), but we could
>> probably dial back restrictions, 'use' it, and then reset the Safe
>> container to its defaults. Not sure what other problems that may
>> cause, however. CCing to hackers for discussion there.
>>
>>
>>

>
> UTF8 is automatically on for strings passed to plperl if the db
> encoding is UTF8. That includes the source text. Please be more
> precise about what you want.
>
> BTW, the perl docs say this about the utf8 pragma:
>
> Do not use this pragma for anything else than telling Perl that
> your
> script is written in UTF-8.
>
> There should be no need to do that - we will have done it for you. So
> any attempt to use the utf8 pragma in plperl code is probably broken
> anyway.
>
>


Ugh, in testing I see some nastiness here without any explicit require.
It looks like there's an implicit require if the text contains certain
chars. I'll see what I can do to fix the bug, although I'm not sure if
it's possible.

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-15-2008, 11:32 PM
Andrew Dunstan
 
Posts: n/a
Default Re: [GENERAL] plperl and regexps with accented characters- incompatible?



Andrew Dunstan wrote:
>
>
> Ugh, in testing I see some nastiness here without any explicit
> require. It looks like there's an implicit require if the text
> contains certain chars. I'll see what I can do to fix the bug,
> although I'm not sure if it's possible.
>
>


Looks like it's going to be very hard, unless someone has some brilliant
insight I'm missing :-(

Maybe we need to consult the perl coders.

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-15-2008, 11:32 PM
Greg Sabino Mullane
 
Posts: n/a
Default Re: [GENERAL] plperl and regexps with accented characters - incompatible?


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> Ugh, in testing I see some nastiness here without any explicit
> require. It looks like there's an implicit require if the text
> contains certain chars.


Exactly.

> Looks like it's going to be very hard, unless someone has some
> brilliant insight I'm missing :-(


The only way I see around it is to do:

$PLContainer->permit('require');
....
$PLContainer->reval('use utf8;');
....
$PLContainer->deny('require');"

Not ideal. Part of me says we do this because something like //i
shouldn't suddenly fail just because you added an accented
character. The other part of me says to just have people use plperlu.
At the very least, we should probably mention it in the docs as
a gotcha.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200711132155
http://biglumber.com/x/web?pk=2529DF...9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFHOmQLvJuQZxSWSsgRA6bJAKDX9tN6ridD6aP8PywuUO UKRnHFvQCeJizW
Rcq+43grmuckX1I4Rm75eTU=
=3cmn
-----END PGP SIGNATURE-----



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-15-2008, 11:32 PM
Andrew Dunstan
 
Posts: n/a
Default Re: [GENERAL] plperl and regexps with accented characters- incompatible?



Greg Sabino Mullane wrote:
>> Ugh, in testing I see some nastiness here without any explicit
>> require. It looks like there's an implicit require if the text
>> contains certain chars.
>>

>
> Exactly.
>
>
>> Looks like it's going to be very hard, unless someone has some
>> brilliant insight I'm missing :-(
>>

>
> The only way I see around it is to do:
>
> $PLContainer->permit('require');
> ...
> $PLContainer->reval('use utf8;');
> ...
> $PLContainer->deny('require');"
>
> Not ideal.


I tried something like that briefly and it failed. The trouble is, I
think, that since the engine tries a require it fails on the op test
before it even looks to see if the module is already loaded. If you have
made something work then please show me, no matter how grotty.

> Part of me says we do this because something like //i
> shouldn't suddenly fail just because you added an accented
> character. The other part of me says to just have people use plperlu.
> At the very least, we should probably mention it in the docs as
> a gotcha.
>
>


I think we should search harder for a solution, but I don't have time
right now. If you want to submit a warning for the docs in a patch we
can get that in.

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-15-2008, 11:32 PM
Tom Lane
 
Posts: n/a
Default Re: [GENERAL] plperl and regexps with accented characters - incompatible?

Andrew Dunstan <andrew@dunslane.net> writes:
> I tried something like that briefly and it failed. The trouble is, I
> think, that since the engine tries a require it fails on the op test
> before it even looks to see if the module is already loaded.


I think we have little choice but to report this as a Perl bug. It
essentially means that a "safe" interpreter cannot decide to preload
modules that it thinks are safe; and to add insult to injury, the
engine is apparently trying to require utf8 in some very low-level,
hidden-behind-the-scenes place, yet using high-level trappable
operations to do that. Maybe those are two different bugs. Either
utf8 is part of the Perl core or it isn't; you can't have it both ways.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-15-2008, 11:35 PM
Greg Sabino Mullane
 
Posts: n/a
Default Re: [GENERAL] plperl and regexps with accented characters - incompatible?


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


Just as a followup, I reported this as a bug and it is
being looked at and discussed:

http://rt.perl.org/rt3//Public/Bug/D....html?id=47576

Appears there is no easy resolution yet.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200711281358
http://biglumber.com/x/web?pk=2529DF...9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFHTbpxvJuQZxSWSsgRA+BqAJ9Q1KB6w4ow7GyqXTY3Et ZvJRrdkgCfVXlb
yC/EaTWPOI6SpvBSRBXTC7s=
=LA+E
-----END PGP SIGNATURE-----



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-15-2008, 11:35 PM
Tom Lane
 
Posts: n/a
Default Re: [PATCHES] Re: [GENERAL] plperl and regexps with accented characters - incompatible?

Andrew Dunstan <andrew@dunslane.net> writes:
> + * Fill in just enough information to set up this perl
> + * function in the safe container and call it.
> + * For some reason not entirely clear, it prevents errors that
> + * can arise from the regex code later trying to load
> + * utf8 modules.


How many versions of Perl have you tried this against?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-15-2008, 11:35 PM
Kris Jurka
 
Posts: n/a
Default Re: [PATCHES] Re: [GENERAL] plperl and regexps with accentedcharacters - incompatible?



On Thu, 29 Nov 2007, Andrew Dunstan wrote:

> The version I tested against is 5.8.8 - the latest stable release. The
> 5.8 series started in 2003 from what I can see - if anyone has a
> sufficiently old system that they can test on 5.6.2 that will be useful.


I've got a 5.6.1 perl here, but it wasn't built shared, so I can't test
plperl. I ran the test case Greg posted to the perl bug tracker and it
doesn't fail, so unless you're concerned that your change will break 5.6,
then it doesn't look like 5.6 needs a fix.

Kris Jurka

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 02:16 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com