Unix Technical Forum

DatabaseMetaData.getExtraNameCharacters

This is a discussion on DatabaseMetaData.getExtraNameCharacters within the pgsql Interfaces jdbc forums, part of the PostgreSQL category; --> Hi all, We are developing a very simple application that permits users to define their own tables. We would ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Interfaces jdbc

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-15-2008, 11:18 PM
Giuseppe Sacco
 
Posts: n/a
Default DatabaseMetaData.getExtraNameCharacters

Hi all,
We are developing a very simple application that permits users to define
their own tables.
We would like to leave the user the freedom to use any characters for
table and field identifiers.

We tried using accented letters, like è or ò, and it works. Then we saw
the method java.sql.DatabaseMetaData.getExtraNameCharacters() and we
thought about verifing our identifiers against this method.

The problem is that getExtraNameCharacters() returns an empty string, at
least using a 7.4 jdbc driver.

Is this the right behaviour? (I mean: should we refrain to use "strange"
letters?)

Thanks a lot,
Giuseppe


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-15-2008, 11:18 PM
Kris Jurka
 
Posts: n/a
Default Re: DatabaseMetaData.getExtraNameCharacters



On Wed, 25 May 2005, Giuseppe Sacco wrote:

> We are developing a very simple application that permits users to define
> their own tables. We would like to leave the user the freedom to use any
> characters for table and field identifiers.
>
> We tried using accented letters, like è or ò, and it works. Then we saw
> the method java.sql.DatabaseMetaData.getExtraNameCharacters() and we
> thought about verifing our identifiers against this method.
>
> The problem is that getExtraNameCharacters() returns an empty string, at
> least using a 7.4 jdbc driver.
>


Postgresql allows pretty much any character that is valid in your encoding
to be used in an unquoted identifier, so trying to list them all would be
impossible, but there should be no problem using them.

Kris Jurka

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-15-2008, 11:18 PM
Giuseppe Sacco
 
Posts: n/a
Default Re: DatabaseMetaData.getExtraNameCharacters

Hi Kris,

Il giorno mer, 25-05-2005 alle 12:08 -0500, Kris Jurka ha scritto:
[...]
> Postgresql allows pretty much any character that is valid in your encoding
> to be used in an unquoted identifier, so trying to list them all would be
> impossible, but there should be no problem using them.


Thanks for your reply, but I would really like to have a list of valid
or invalid characters in order to test the string before executing the
statement. There are characters that should be excluded: $%&#^°§[} ...

If the valid characters list is really huge, is there any list of
invalid characters and a JDBC3.0 method that return it?

I checked that mySQL gives '#$' as result string. What are the results
given by other JDBCs?

Bye,
Giuseppe


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-15-2008, 11:18 PM
Kris Jurka
 
Posts: n/a
Default Re: DatabaseMetaData.getExtraNameCharacters



On Wed, 25 May 2005, Giuseppe Sacco wrote:

> Thanks for your reply, but I would really like to have a list of valid
> or invalid characters in order to test the string before executing the
> statement. There are characters that should be excluded: $%&#^°§[} ...
>
> If the valid characters list is really huge, is there any list of
> invalid characters and a JDBC3.0 method that return it?


The backend lexing rules (src/backend/parser/scan.l) show this for
identifiers:

ident_start [A-Za-z\200-\377_]
ident_cont [A-Za-z\200-\377_0-9\$]
identifier {ident_start}{ident_cont}*

So \200-\377 is octal for any character with the high bit set. The list
of what characters this could map to numbers in the tens of thousands for
unicode, so it's not really feasibly to return in this method.

JDBC does not offer a method to retrieve the list of invalid characters.
I'm guessing because for other databases that number would be the
extremely large list we have for valid ones. The above lexing rules will
tell you though.

Kris Jurka


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-15-2008, 11:18 PM
Giuseppe Sacco
 
Posts: n/a
Default Re: DatabaseMetaData.getExtraNameCharacters

Il giorno mer, 25-05-2005 alle 13:25 -0500, Kris Jurka ha scritto:
[...]
> ident_start [A-Za-z\200-\377_]
> ident_cont [A-Za-z\200-\377_0-9\$]
> identifier {ident_start}{ident_cont}*
>
> So \200-\377 is octal for any character with the high bit set. The list
> of what characters this could map to numbers in the tens of thousands for
> unicode, so it's not really feasibly to return in this method.

[...]

Thanks again Kris,
if I understand correclty, the valid charset is the one computed by the
attached class. It seems to me that this is 191 characters long.
Could you please let me know where I am wrong?

Giuseppe


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-15-2008, 11:18 PM
Kris Jurka
 
Posts: n/a
Default Re: DatabaseMetaData.getExtraNameCharacters



On Wed, 25 May 2005, Giuseppe Sacco wrote:

> Il giorno mer, 25-05-2005 alle 13:25 -0500, Kris Jurka ha scritto:
> [...]
> > ident_start [A-Za-z\200-\377_]
> > ident_cont [A-Za-z\200-\377_0-9\$]
> > identifier {ident_start}{ident_cont}*
> >
> > So \200-\377 is octal for any character with the high bit set. The list
> > of what characters this could map to numbers in the tens of thousands for
> > unicode, so it's not really feasibly to return in this method.

>
> if I understand correclty, the valid charset is the one computed by the
> attached class. It seems to me that this is 191 characters long.
> Could you please let me know where I am wrong?


You are assuming that each character is only one byte. The backend lexing
rules are a byte by byte operation, but the JDBC side is returning a
String of characters. Consider the character "Latin Small Letter s with
Acute" (\u015B) gets encoded in UTF-8 as C5 9B or \305\233 in octal. This
is one character in the result of getExtraNameCharacters.

Kris Jurka

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-15-2008, 11:18 PM
Giuseppe Sacco
 
Posts: n/a
Default Re: DatabaseMetaData.getExtraNameCharacters

Il giorno mer, 25-05-2005 alle 14:59 -0500, Kris Jurka ha scritto:
[...]
> You are assuming that each character is only one byte. The backend lexing
> rules are a byte by byte operation, but the JDBC side is returning a
> String of characters. Consider the character "Latin Small Letter s with
> Acute" (\u015B) gets encoded in UTF-8 as C5 9B or \305\233 in octal. This
> is one character in the result of getExtraNameCharacters.


Thank you very much for your reply.

Yes, I assumed that this was a one-byte character encoding and I didn't
know that \200 < \305\233 < \377.

I was also assuming (maybe wrongly) that JDBC driver convert java utf-8
characters to ASCII when connecting to postgresql backend. This is why I
counted 191 characters.

I also made a test: "CREATE TABLE A\u015B ( A VARCHAR(30) )"

fails when calling java.sql.Statement.execute(). The error is:
ERROR: syntax error at or near "["

Looking at the postgresql log, I found the statement was translated in
"CREATE TABLE A[ ( A VARCHAR(30))"

Similarly, "CREATE TABLE A\u00C0 ( A VARCHAR(30) )" succeeds.

so maybe postgresql doesn't accept a broad range of utf-8 characters,
while it might accept the ASCII characters from \200 to \377 when it is
a Character.isLetterOrDigit().

May it be something dependant on the database encoding?

Thanks again,
Giuseppe


---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:42 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com