Unix Technical Forum

tsearch filenames unlikes special symbols and numbers

This is a discussion on tsearch filenames unlikes special symbols and numbers within the pgsql Hackers forums, part of the PostgreSQL category; --> Hello I am found small bug postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell, DictFile= 'cs_czutf'); ERROR: invalid text ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-15-2008, 09:48 PM
Pavel Stehule
 
Posts: n/a
Default tsearch filenames unlikes special symbols and numbers

Hello

I am found small bug

postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,
DictFile= 'cs_czutf');
ERROR: invalid text search configuration file name "cs_czutf"
postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,
DictFile= 'csczutf8');
ERROR: invalid text search configuration file name "csczutf8"
postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,
DictFile= "csczutf8");
ERROR: invalid text search configuration file name "csczutf8"
postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,
DictFile= "cs_czutf");
ERROR: invalid text search configuration file name "cs_czutf"
postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,
DictFile= "csczutf");
ERROR: could not open dictionary file
"/usr/local/pgsql/share/tsearch_data/csczutf.dict": není souborem ani
adresářem

regards
Pavel Stehule

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-15-2008, 09:48 PM
Oleg Bartunov
 
Posts: n/a
Default Re: tsearch filenames unlikes special symbols and numbers

I just tried on CVS HEAD and seems something is broken

postgres=# CREATE TEXT SEARCH DICTIONARY ru_ispell (
TEMPLATE = ispell,
DictFile = russian-utf8.dict,
AffFile = russian-utf8.aff,
StopWords = russian
);
ERROR: syntax error at or near "-"
LINE 3: DictFile = russian-utf8.dict,

postgres=# CREATE TEXT SEARCH DICTIONARY ru_ispell (
TEMPLATE = ispell,
DictFile = 'russian-utf8.dict',
AffFile = 'russian-utf8.aff',
StopWords = russian
);
ERROR: invalid text search configuration file name "russian-utf8.dict"


Honestly speaking, I have no time to follow constantly changed syntax,
but documentation
http://momjian.us/main/writings/pgsq...ictionary.html
doesn't make clear what's wrong.

Also, I'm wondering do we really need to show all schemas without
text search configurations defined ? Looks rather stranger.

postgres=# \dF
List of text search configurations
Schema | Name | Description
--------------------+------------+---------------------------------------
information_schema | |
pg_catalog | danish | Configuration for danish language
pg_catalog | dutch | Configuration for dutch language
pg_catalog | english | Configuration for english language
pg_catalog | finnish | Configuration for finnish language
pg_catalog | french | Configuration for french language
pg_catalog | german | Configuration for german language
pg_catalog | hungarian | Configuration for hungarian language
pg_catalog | italian | Configuration for italian language
pg_catalog | norwegian | Configuration for norwegian language
pg_catalog | portuguese | Configuration for portuguese language
pg_catalog | romanian | Configuration for romanian language
pg_catalog | russian | Configuration for russian language
pg_catalog | simple | simple configuration
pg_catalog | spanish | Configuration for spanish language
pg_catalog | swedish | Configuration for swedish language
pg_catalog | turkish | Configuration for turkish language
pg_temp_1 | |
pg_toast | |
pg_toast_temp_1 | |
public | |
(21 rows)

Another problem I see are broken examples of dictionary and parser in
documentation:
http://momjian.us/main/writings/pgsq...y-example.html
http://momjian.us/main/writings/pgsq...r-example.html

Include files in dictionary example are now in tsearch directory:

#include "tsearch/ts_locale.h"
#include "tsearch/ts_public.h"
#include "tsearch/ts_utils.h"

I didn't test parser example.

Oleg

PS. Sorry, I miss last syntax changes, but I really don't understand
parenthesis and commas usage in SQL. It's so strange.
I remember Peter raised an objections at the very beginning.


On Sun, 2 Sep 2007, Pavel Stehule wrote:

> Hello
> I am found small bug
> postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,DictFile= 'cs_czutf');ERROR: invalid text search configuration file name "cs_czutf"postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,DictFile= 'csczutf8');ERROR: invalid text search configuration file name "csczutf8"postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,DictFile= "csczutf8");ERROR: invalid text search configuration file name "csczutf8"postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,DictFile= "cs_czutf");ERROR: invalid text search configuration file name "cs_czutf"postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,DictFile= "csczutf");ERROR: could not open dictionary file"/usr/local/pgsql/share/tsearch_data/csczutf.dict": nen? souborem aniadres??em
> regardsPavel Stehule
>


Regards,
Oleg
__________________________________________________ ___________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-15-2008, 09:48 PM
Tom Lane
 
Posts: n/a
Default Re: tsearch filenames unlikes special symbols and numbers

Oleg Bartunov <oleg@sai.msu.su> writes:
> postgres=# CREATE TEXT SEARCH DICTIONARY ru_ispell (
> TEMPLATE = ispell,
> DictFile = 'russian-utf8.dict',
> AffFile = 'russian-utf8.aff',
> StopWords = russian
> );
> ERROR: invalid text search configuration file name "russian-utf8.dict"


I made it reject all but latin letters, which is the same restriction
that's in place for timezone set filenames. That might be overly
strong, but we definitely have to forbid "." and "/" (and "\" on
Windows). Do we want to restrict it to letters, digits, underscore?
Or does it need to be weaker than that?

> Also, I'm wondering do we really need to show all schemas without
> text search configurations defined ? Looks rather stranger.


Um ... I don't see that; I get

regression=# \dF
List of text search configurations
Schema | Name | Description
------------+------------+---------------------------------------
pg_catalog | danish | Configuration for danish language
pg_catalog | dutch | Configuration for dutch language
pg_catalog | english | Configuration for english language
pg_catalog | finnish | Configuration for finnish language
pg_catalog | french | Configuration for french language
pg_catalog | german | Configuration for german language
pg_catalog | hungarian | Configuration for hungarian language
pg_catalog | italian | Configuration for italian language
pg_catalog | norwegian | Configuration for norwegian language
pg_catalog | portuguese | Configuration for portuguese language
pg_catalog | romanian | Configuration for romanian language
pg_catalog | russian | Configuration for russian language
pg_catalog | simple | simple configuration
pg_catalog | spanish | Configuration for spanish language
pg_catalog | swedish | Configuration for swedish language
pg_catalog | turkish | Configuration for turkish language
(16 rows)

Are you sure you're using CVS-head psql?

> Another problem I see are broken examples of dictionary and parser in
> documentation:
> http://momjian.us/main/writings/pgsq...y-example.html
> http://momjian.us/main/writings/pgsq...r-example.html


Yeah, I wanted to discuss that with you. Code examples in sgml docs are
a bad idea: they're impossible to use as actual templates, because of
all the weird markup changes, and there's no easy way to notice if
they're broken. It would be better to remove these from the docs and
set them up as contrib modules.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-15-2008, 09:48 PM
Gregory Stark
 
Posts: n/a
Default Re: tsearch filenames unlikes special symbols and numbers


"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Oleg Bartunov <oleg@sai.msu.su> writes:
>> postgres=# CREATE TEXT SEARCH DICTIONARY ru_ispell (
>> TEMPLATE = ispell,
>> DictFile = 'russian-utf8.dict',
>> AffFile = 'russian-utf8.aff',
>> StopWords = russian
>> );
>> ERROR: invalid text search configuration file name "russian-utf8.dict"

>
> I made it reject all but latin letters, which is the same restriction
> that's in place for timezone set filenames. That might be overly
> strong, but we definitely have to forbid "." and "/" (and "\" on
> Windows). Do we want to restrict it to letters, digits, underscore?
> Or does it need to be weaker than that?


What's the problem with "."?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-15-2008, 09:48 PM
Tom Lane
 
Posts: n/a
Default Re: tsearch filenames unlikes special symbols and numbers

Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> I made it reject all but latin letters, which is the same restriction
>> that's in place for timezone set filenames. That might be overly
>> strong, but we definitely have to forbid "." and "/" (and "\" on
>> Windows). Do we want to restrict it to letters, digits, underscore?
>> Or does it need to be weaker than that?


> What's the problem with "."?


.../../../../etc/passwd

Possibly we could allow '.' as long as we forbade /, but the other
trouble with allowing . is that it encourages people to try to specify
the filetype suffix (as indeed Oleg was doing). I'd prefer to keep the
suffixes out of the SQL object definitions, with an eye to possibly
someday migrating all the configuration data inside the database.
There's a reasonable argument for restricting the names used for these
things in the SQL definitions to be valid SQL identifiers, so that that
will work nicely...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-15-2008, 09:48 PM
Gregory Stark
 
Posts: n/a
Default Re: tsearch filenames unlikes special symbols and numbers

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>>> I made it reject all but latin letters, which is the same restriction
>>> that's in place for timezone set filenames. That might be overly
>>> strong, but we definitely have to forbid "." and "/" (and "\" on
>>> Windows). Do we want to restrict it to letters, digits, underscore?
>>> Or does it need to be weaker than that?

>
>> What's the problem with "."?

>
> ../../../../etc/passwd
>
> Possibly we could allow '.' as long as we forbade /,


Right, traditionally the only characters forbidden in filenames in Unix are /
and nul. If we want the files to play nice in Gnome etc then we should
restrict them to ascii since we don't know what encoding the gui expects.

Actually I think in Windows \ : and . are problems (not allowed more than one
dot in dos).

> There's a reasonable argument for restricting the names used for these
> things in the SQL definitions to be valid SQL identifiers, so that that
> will work nicely...


Ah

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-15-2008, 09:48 PM
Trevor Talbot
 
Posts: n/a
Default Re: tsearch filenames unlikes special symbols and numbers

On 9/2/07, Gregory Stark <stark@enterprisedb.com> wrote:

> Right, traditionally the only characters forbidden in filenames in Unix are /
> and nul. If we want the files to play nice in Gnome etc then we should
> restrict them to ascii since we don't know what encoding the gui expects.
>
> Actually I think in Windows \ : and . are problems (not allowed more than one
> dot in dos).


Reserved characters in Windows filenames are < > : " / \ | ? *
DOS limitations aren't relevant on the OS versions Postgres supports.

....but I thought this was about opening existing files, not creating
them, in which case the only relevant limitation is path separators.
Any other reserved characters are going to result in no open file,
rather than a security hole.

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-15-2008, 09:48 PM
Magnus Hagander
 
Posts: n/a
Default Re: tsearch filenames unlikes special symbols and numbers

On Mon, Sep 03, 2007 at 07:47:14AM +0100, Gregory Stark wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
> > Gregory Stark <stark@enterprisedb.com> writes:
> >> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
> >>> I made it reject all but latin letters, which is the same restriction
> >>> that's in place for timezone set filenames. That might be overly
> >>> strong, but we definitely have to forbid "." and "/" (and "\" on
> >>> Windows). Do we want to restrict it to letters, digits, underscore?
> >>> Or does it need to be weaker than that?

> >
> >> What's the problem with "."?

> >
> > ../../../../etc/passwd
> >
> > Possibly we could allow '.' as long as we forbade /,

>
> Right, traditionally the only characters forbidden in filenames in Unix are /
> and nul. If we want the files to play nice in Gnome etc then we should
> restrict them to ascii since we don't know what encoding the gui expects.
>
> Actually I think in Windows \ : and . are problems (not allowed more than one
> dot in dos).


\ and : are problems.

.. is not a problem. We don't support 16-bit windows anyway, and multiple
dots works fine on any system we support.

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-15-2008, 09:48 PM
Tom Lane
 
Posts: n/a
Default Re: tsearch filenames unlikes special symbols and numbers

Magnus Hagander <magnus@hagander.net> writes:
> On Mon, Sep 03, 2007 at 07:47:14AM +0100, Gregory Stark wrote:
>> Actually I think in Windows \ : and . are problems (not allowed more
>> than one dot in dos).


> \ and : are problems.


Is : really a problem, given that the name in question will be appended
to a known directory's path?

> . is not a problem. We don't support 16-bit windows anyway, and multiple
> dots works fine on any system we support.


I'm not convinced that . is issue-free. On most if not all versions of Unix,
you are allowed to open a directory as a file and read the filenames it
contains. While I don't say it'd be easy to manage that through
tsearch, there's at least a potential for discovering the filenames
present in . and .. --- how much do we care about that?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-15-2008, 09:48 PM
Magnus Hagander
 
Posts: n/a
Default Re: tsearch filenames unlikes special symbols and numbers

On Mon, Sep 03, 2007 at 09:27:19AM -0400, Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > On Mon, Sep 03, 2007 at 07:47:14AM +0100, Gregory Stark wrote:
> >> Actually I think in Windows \ : and . are problems (not allowed more
> >> than one dot in dos).

>
> > \ and : are problems.

>
> Is : really a problem, given that the name in question will be appended
> to a known directory's path?


Yes. It won't work - the API calls will reject it.

> > . is not a problem. We don't support 16-bit windows anyway, and multiple
> > dots works fine on any system we support.

>
> I'm not convinced that . is issue-free. On most if not all versions of Unix,
> you are allowed to open a directory as a file and read the filenames it
> contains. While I don't say it'd be easy to manage that through
> tsearch, there's at least a potential for discovering the filenames
> present in . and .. --- how much do we care about that?


I just meant that it's not a problem on Win32 to have a file with multiple
dots in the name. There can certainly be *other* reasons for it. I don't
really see the need to have an extra dot in the filename in this particular
case, so I'd certainly be fine with restricting this one a lot more.

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:31 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com