This is a discussion on tsearch filenames unlikes special symbols and numbers within the pgsql Hackers forums, part of the PostgreSQL category; --> Hello I am found small bug postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell, DictFile= 'cs_czutf'); ERROR: invalid text ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello I am found small bug postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell, DictFile= 'cs_czutf'); ERROR: invalid text search configuration file name "cs_czutf" postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell, DictFile= 'csczutf8'); ERROR: invalid text search configuration file name "csczutf8" postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell, DictFile= "csczutf8"); ERROR: invalid text search configuration file name "csczutf8" postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell, DictFile= "cs_czutf"); ERROR: invalid text search configuration file name "cs_czutf" postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell, DictFile= "csczutf"); ERROR: could not open dictionary file "/usr/local/pgsql/share/tsearch_data/csczutf.dict": není souborem ani adresářem regards Pavel Stehule ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| I just tried on CVS HEAD and seems something is broken postgres=# CREATE TEXT SEARCH DICTIONARY ru_ispell ( TEMPLATE = ispell, DictFile = russian-utf8.dict, AffFile = russian-utf8.aff, StopWords = russian ); ERROR: syntax error at or near "-" LINE 3: DictFile = russian-utf8.dict, postgres=# CREATE TEXT SEARCH DICTIONARY ru_ispell ( TEMPLATE = ispell, DictFile = 'russian-utf8.dict', AffFile = 'russian-utf8.aff', StopWords = russian ); ERROR: invalid text search configuration file name "russian-utf8.dict" Honestly speaking, I have no time to follow constantly changed syntax, but documentation http://momjian.us/main/writings/pgsq...ictionary.html doesn't make clear what's wrong. Also, I'm wondering do we really need to show all schemas without text search configurations defined ? Looks rather stranger. postgres=# \dF List of text search configurations Schema | Name | Description --------------------+------------+--------------------------------------- information_schema | | pg_catalog | danish | Configuration for danish language pg_catalog | dutch | Configuration for dutch language pg_catalog | english | Configuration for english language pg_catalog | finnish | Configuration for finnish language pg_catalog | french | Configuration for french language pg_catalog | german | Configuration for german language pg_catalog | hungarian | Configuration for hungarian language pg_catalog | italian | Configuration for italian language pg_catalog | norwegian | Configuration for norwegian language pg_catalog | portuguese | Configuration for portuguese language pg_catalog | romanian | Configuration for romanian language pg_catalog | russian | Configuration for russian language pg_catalog | simple | simple configuration pg_catalog | spanish | Configuration for spanish language pg_catalog | swedish | Configuration for swedish language pg_catalog | turkish | Configuration for turkish language pg_temp_1 | | pg_toast | | pg_toast_temp_1 | | public | | (21 rows) Another problem I see are broken examples of dictionary and parser in documentation: http://momjian.us/main/writings/pgsq...y-example.html http://momjian.us/main/writings/pgsq...r-example.html Include files in dictionary example are now in tsearch directory: #include "tsearch/ts_locale.h" #include "tsearch/ts_public.h" #include "tsearch/ts_utils.h" I didn't test parser example. Oleg PS. Sorry, I miss last syntax changes, but I really don't understand parenthesis and commas usage in SQL. It's so strange. I remember Peter raised an objections at the very beginning. On Sun, 2 Sep 2007, Pavel Stehule wrote: > Hello > I am found small bug > postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,DictFile= 'cs_czutf');ERROR: invalid text search configuration file name "cs_czutf"postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,DictFile= 'csczutf8');ERROR: invalid text search configuration file name "csczutf8"postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,DictFile= "csczutf8");ERROR: invalid text search configuration file name "csczutf8"postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,DictFile= "cs_czutf");ERROR: invalid text search configuration file name "cs_czutf"postgres=# CREATE TEXT SEARCH DICTIONARY cz1(TEMPLATE = ispell,DictFile= "csczutf");ERROR: could not open dictionary file"/usr/local/pgsql/share/tsearch_data/csczutf.dict": nen? souborem aniadres??em > regardsPavel Stehule > Regards, Oleg __________________________________________________ ___________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Oleg Bartunov <oleg@sai.msu.su> writes: > postgres=# CREATE TEXT SEARCH DICTIONARY ru_ispell ( > TEMPLATE = ispell, > DictFile = 'russian-utf8.dict', > AffFile = 'russian-utf8.aff', > StopWords = russian > ); > ERROR: invalid text search configuration file name "russian-utf8.dict" I made it reject all but latin letters, which is the same restriction that's in place for timezone set filenames. That might be overly strong, but we definitely have to forbid "." and "/" (and "\" on Windows). Do we want to restrict it to letters, digits, underscore? Or does it need to be weaker than that? > Also, I'm wondering do we really need to show all schemas without > text search configurations defined ? Looks rather stranger. Um ... I don't see that; I get regression=# \dF List of text search configurations Schema | Name | Description ------------+------------+--------------------------------------- pg_catalog | danish | Configuration for danish language pg_catalog | dutch | Configuration for dutch language pg_catalog | english | Configuration for english language pg_catalog | finnish | Configuration for finnish language pg_catalog | french | Configuration for french language pg_catalog | german | Configuration for german language pg_catalog | hungarian | Configuration for hungarian language pg_catalog | italian | Configuration for italian language pg_catalog | norwegian | Configuration for norwegian language pg_catalog | portuguese | Configuration for portuguese language pg_catalog | romanian | Configuration for romanian language pg_catalog | russian | Configuration for russian language pg_catalog | simple | simple configuration pg_catalog | spanish | Configuration for spanish language pg_catalog | swedish | Configuration for swedish language pg_catalog | turkish | Configuration for turkish language (16 rows) Are you sure you're using CVS-head psql? > Another problem I see are broken examples of dictionary and parser in > documentation: > http://momjian.us/main/writings/pgsq...y-example.html > http://momjian.us/main/writings/pgsq...r-example.html Yeah, I wanted to discuss that with you. Code examples in sgml docs are a bad idea: they're impossible to use as actual templates, because of all the weird markup changes, and there's no easy way to notice if they're broken. It would be better to remove these from the docs and set them up as contrib modules. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| "Tom Lane" <tgl@sss.pgh.pa.us> writes: > Oleg Bartunov <oleg@sai.msu.su> writes: >> postgres=# CREATE TEXT SEARCH DICTIONARY ru_ispell ( >> TEMPLATE = ispell, >> DictFile = 'russian-utf8.dict', >> AffFile = 'russian-utf8.aff', >> StopWords = russian >> ); >> ERROR: invalid text search configuration file name "russian-utf8.dict" > > I made it reject all but latin letters, which is the same restriction > that's in place for timezone set filenames. That might be overly > strong, but we definitely have to forbid "." and "/" (and "\" on > Windows). Do we want to restrict it to letters, digits, underscore? > Or does it need to be weaker than that? What's the problem with "."? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> I made it reject all but latin letters, which is the same restriction >> that's in place for timezone set filenames. That might be overly >> strong, but we definitely have to forbid "." and "/" (and "\" on >> Windows). Do we want to restrict it to letters, digits, underscore? >> Or does it need to be weaker than that? > What's the problem with "."? .../../../../etc/passwd Possibly we could allow '.' as long as we forbade /, but the other trouble with allowing . is that it encourages people to try to specify the filetype suffix (as indeed Oleg was doing). I'd prefer to keep the suffixes out of the SQL object definitions, with an eye to possibly someday migrating all the configuration data inside the database. There's a reasonable argument for restricting the names used for these things in the SQL definitions to be valid SQL identifiers, so that that will work nicely... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| "Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> "Tom Lane" <tgl@sss.pgh.pa.us> writes: >>> I made it reject all but latin letters, which is the same restriction >>> that's in place for timezone set filenames. That might be overly >>> strong, but we definitely have to forbid "." and "/" (and "\" on >>> Windows). Do we want to restrict it to letters, digits, underscore? >>> Or does it need to be weaker than that? > >> What's the problem with "."? > > ../../../../etc/passwd > > Possibly we could allow '.' as long as we forbade /, Right, traditionally the only characters forbidden in filenames in Unix are / and nul. If we want the files to play nice in Gnome etc then we should restrict them to ascii since we don't know what encoding the gui expects. Actually I think in Windows \ : and . are problems (not allowed more than one dot in dos). > There's a reasonable argument for restricting the names used for these > things in the SQL definitions to be valid SQL identifiers, so that that > will work nicely... Ah -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On 9/2/07, Gregory Stark <stark@enterprisedb.com> wrote: > Right, traditionally the only characters forbidden in filenames in Unix are / > and nul. If we want the files to play nice in Gnome etc then we should > restrict them to ascii since we don't know what encoding the gui expects. > > Actually I think in Windows \ : and . are problems (not allowed more than one > dot in dos). Reserved characters in Windows filenames are < > : " / \ | ? * DOS limitations aren't relevant on the OS versions Postgres supports. ....but I thought this was about opening existing files, not creating them, in which case the only relevant limitation is path separators. Any other reserved characters are going to result in no open file, rather than a security hole. ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| On Mon, Sep 03, 2007 at 07:47:14AM +0100, Gregory Stark wrote: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > > > Gregory Stark <stark@enterprisedb.com> writes: > >> "Tom Lane" <tgl@sss.pgh.pa.us> writes: > >>> I made it reject all but latin letters, which is the same restriction > >>> that's in place for timezone set filenames. That might be overly > >>> strong, but we definitely have to forbid "." and "/" (and "\" on > >>> Windows). Do we want to restrict it to letters, digits, underscore? > >>> Or does it need to be weaker than that? > > > >> What's the problem with "."? > > > > ../../../../etc/passwd > > > > Possibly we could allow '.' as long as we forbade /, > > Right, traditionally the only characters forbidden in filenames in Unix are / > and nul. If we want the files to play nice in Gnome etc then we should > restrict them to ascii since we don't know what encoding the gui expects. > > Actually I think in Windows \ : and . are problems (not allowed more than one > dot in dos). \ and : are problems. .. is not a problem. We don't support 16-bit windows anyway, and multiple dots works fine on any system we support. //Magnus ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Magnus Hagander <magnus@hagander.net> writes: > On Mon, Sep 03, 2007 at 07:47:14AM +0100, Gregory Stark wrote: >> Actually I think in Windows \ : and . are problems (not allowed more >> than one dot in dos). > \ and : are problems. Is : really a problem, given that the name in question will be appended to a known directory's path? > . is not a problem. We don't support 16-bit windows anyway, and multiple > dots works fine on any system we support. I'm not convinced that . is issue-free. On most if not all versions of Unix, you are allowed to open a directory as a file and read the filenames it contains. While I don't say it'd be easy to manage that through tsearch, there's at least a potential for discovering the filenames present in . and .. --- how much do we care about that? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| ||||
| On Mon, Sep 03, 2007 at 09:27:19AM -0400, Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: > > On Mon, Sep 03, 2007 at 07:47:14AM +0100, Gregory Stark wrote: > >> Actually I think in Windows \ : and . are problems (not allowed more > >> than one dot in dos). > > > \ and : are problems. > > Is : really a problem, given that the name in question will be appended > to a known directory's path? Yes. It won't work - the API calls will reject it. > > . is not a problem. We don't support 16-bit windows anyway, and multiple > > dots works fine on any system we support. > > I'm not convinced that . is issue-free. On most if not all versions of Unix, > you are allowed to open a directory as a file and read the filenames it > contains. While I don't say it'd be easy to manage that through > tsearch, there's at least a potential for discovering the filenames > present in . and .. --- how much do we care about that? I just meant that it's not a problem on Win32 to have a file with multiple dots in the name. There can certainly be *other* reasons for it. I don't really see the need to have an extra dot in the filename in this particular case, so I'd certainly be fine with restricting this one a lot more. //Magnus ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| Thread Tools | |
| Display Modes | |
|
|