vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Andrew Dunstan wrote: > > > Tom Lane wrote: >> Andrew Dunstan <andrew@dunslane.net> writes: >> >>> I've just been looking at the state machine in wparser_def.c. I >>> think the processing for entities is also a few bob short in the >>> pound. It recognises decimal numeric character references, but nor >>> hexadecimal numeric character references. That's fairly silly since >>> the HTML spec specifically says the latter are "particularly >>> useful". The rules for named entities are also deficient w.r.t. >>> digits, just like the case of tags that Tom noticed. This isn't >>> academic: HTML features a number of named entities with digits in >>> the name (sup2, frac14 for example). >>> >> >> >>> In XML at least, legal names are defined by the following rules from >>> the spec: >>> ... >>> [A-Za-z:_][A-Za-z0-9:_.-]* >>> >> >> >>> I suggest we use that or something very close to it as the rule for >>> names in these patterns. >>> >> >> No objections here. Who wants to patch wparser_def? >> >> >> > > > I can get to it some time in the next week. - rather snowed under > right now. > > BTW, I'm also suspicious of the clause that allows <?xml ... it > appears that it will allow <?xfoo and <?XFOO also, which seems quite > odd, especially the latter. > Here's a patch that fixes the patterns for numeric entities, tag names, and removes the upper case 'X' case in the special case for an XML prolog. There are still some oddities, but I decided against making heroic efforts to fix them. It's probably less important if the patterns are slightly too liberal (e.g. accepting <a href="qwe<qwe>"> ) than if they don't recognize what they are alleged to recognize. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Andrew Dunstan <andrew@dunslane.net> writes: > Here's a patch that fixes the patterns for numeric entities, tag names, > and removes the upper case 'X' case in the special case for an XML > prolog. There are still some oddities, but I decided against making > heroic efforts to fix them. It's probably less important if the patterns > are slightly too liberal (e.g. accepting <a href="qwe<qwe>"> ) than if > they don't recognize what they are alleged to recognize. I don't approve of the changes to the exposed token type names, but the state machine changes seem sane first-glance. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Tom Lane wrote: > I don't approve of the changes to the exposed token type names, but > the state machine changes seem sane first-glance. > > > Well, I think it's just plain wrong to describe as HTML tags and entities things that just aren't. In any case, what I changed was not the name (or alias, to be more precise), but the exposed description. The aliases (tag, entity) would remain the same. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> I don't approve of the changes to the exposed token type names, but >> the state machine changes seem sane first-glance. > Well, I think it's just plain wrong to describe as HTML tags and > entities things that just aren't. Maybe, but "HTML-type" is an unhelpful description. Isn't there a more general markup standard that subsumes both HTML and XML? (I seem to recall that SGML might be that, but not sure.) regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> Tom Lane wrote: >> >>> I don't approve of the changes to the exposed token type names, but >>> the state machine changes seem sane first-glance. >>> > > >> Well, I think it's just plain wrong to describe as HTML tags and >> entities things that just aren't. >> > > Maybe, but "HTML-type" is an unhelpful description. Isn't there a more > general markup standard that subsumes both HTML and XML? (I seem to > recall that SGML might be that, but not sure.) > > > Most people haven't heard of SGML. I'd settle for "XML tag" or maybe "XML/HTML tag". Any other bids? cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Am Montag, 19. November 2007 schrieb Tom Lane: > Maybe, but "HTML-type" is an unhelpful description. Isn't there a more > general markup standard that subsumes both HTML and XML? (I seem to > recall that SGML might be that, but not sure.) I think "XML tag" would actually cover anything that would be valid as an HTML tag. (As opposed to the fact that an XML document is not a superset of an HTML document.) SGML might be too broad. It would require us to recognize "</>" and "<>" and perhaps a few other odd things. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Peter Eisentraut <peter_e@gmx.net> writes: > Am Montag, 19. November 2007 schrieb Tom Lane: >> Maybe, but "HTML-type" is an unhelpful description. Isn't there a more >> general markup standard that subsumes both HTML and XML? (I seem to >> recall that SGML might be that, but not sure.) > I think "XML tag" would actually cover anything that would be valid as an HTML > tag. +1 for "XML tag", then. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| ||||
| Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > >> Am Montag, 19. November 2007 schrieb Tom Lane: >> >>> Maybe, but "HTML-type" is an unhelpful description. Isn't there a more >>> general markup standard that subsumes both HTML and XML? (I seem to >>> recall that SGML might be that, but not sure.) >>> > > >> I think "XML tag" would actually cover anything that would be valid as an HTML >> tag. >> > > +1 for "XML tag", then. > > > Changed to XML tag and XML entity. Code names adjusted accordingly. Committed. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| Thread Tools | |
| Display Modes | |
|
|