Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 02:34 AM
Rodrigo Hjort
 
Posts: n/a
Default LIKE, leading percent, bind parameters and indexes

PG-Hackers,

I got the following picture:

detran=# \d sa_dut.tb_usuario
Table "sa_dut.tb_usuario"
Column | Type | Modifiers
-------------------------+-----------------------------+-----------
numprocesso | bigint | not null
nome | character varying(44) |
nomemae | character varying(44) |
datanascimento | date |
Indexes:
"tb_usuario_pkey" PRIMARY KEY, btree (numprocesso)
"ix_usuario_11" btree (nome varchar_pattern_ops, nomemae
varchar_pattern_ops)
"ix_usuario_13" btree (datanascimento, nome varchar_pattern_ops)

As I do not use C locale, I created indexes based on "varchar_pattern_ops".
The issue I'm having is based on the following queries:

select * from TB_USUARIO where nome like 'TATIANA CRISTINA G%';
select * from TB_USUARIO where nome like '%TATIANA CRISTINA G%';

For some reasons, I'm not using text-search engines, like TSearch2, but only
the LIKE operator.
Here are the query plans involved:


detran=# explain analyze select count(*) as x0_0_ from sa_dut.TB_PROCESSO
processo0_, sa_dut.TB_USUARIO usuario1_ where (usuario1_.NOME like 'TATIANA
CRISTINA G%' and processo0_.NUMPROCESSO=usuario1_.NUMPROCESSO);

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=11.94..11.95 rows=1 width=0) (actual
time=143.970..143.972rows=1 loops=1)
-> Nested Loop (cost=0.00..11.94 rows=1 width=0) (actual time=
143.935..143.949 rows=1 loops=1)
-> Index Scan using ix_usuario_11 on tb_usuario usuario1_ (cost=
0.00..6.01 rows=1 width=8) (actual time=93.884..93.889 rows=1 loops=1)
Index Cond: (((nome)::text ~>=~ 'TATIANA CRISTINA
G'::character varying) AND ((nome)::text ~<~ 'TATIANA CRISTINA H'::character
varying))
Filter: ((nome)::text ~~ 'TATIANA CRISTINA G%'::text)
-> Index Scan using tb_processo_pkey on tb_processo processo0_
(cost=0.00..5.91 rows=1 width=8) (actual time=50.041..50.044 rows=1 loops=1)
Index Cond: (processo0_.numprocesso = "outer".numprocesso)
Total runtime: 144.176 ms

detran=# explain analyze select count(*) as x0_0_ from sa_dut.TB_PROCESSO
processo0_, sa_dut.TB_USUARIO usuario1_ where
(usuario1_.NOME like '%TATIANA CRISTINA G%' and
processo0_.NUMPROCESSO=usuario1_.NUMPROCESSO);

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=67534.55..67534.56 rows=1 width=0) (actual time=
8101.957..8101.959 rows=1 loops=1)
-> Nested Loop (cost=0.00..67534.55 rows=1 width=0) (actual time=
5404.106..8101.923 rows=1 loops=1)
-> Seq Scan on tb_usuario usuario1_ (cost=0.00..67528.62 rows=1
width=8) (actual time=5404.056..8101.862 rows=1 loops=1)
Filter: ((nome)::text ~~ '%TATIANA CRISTINA G%'::text)
-> Index Scan using tb_processo_pkey on tb_processo
processo0_ (cost=0.00..5.91 rows=1 width=8) (actual
time=0.034..0.037rows=1 loops=1)
Index Cond: (processo0_.numprocesso = "outer".numprocesso)
Total runtime: 8102.105 ms


We use Java, and recently we made an effort in order to avoid the leading
'%' on LIKE expressions.
The problem is that it wasn't solved, and then I made the following Java
code to verify it.

What happens is that only the "004" block uses the index! The "002" code,
which also has no leading percent, does a sequential scan. The difference
between them is that "002" uses bind parameters.

Is it concerned to the JDBC Driver or PostgreSQL itself? What could be done
in order to fix it?
I could use static parameters, but then the queries would have to be
reparsed each time on the backend, missing cache advantages.

************************************************** **************************************************
package db;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

public class SelectLike {

public SelectLike() {
long qtd = 0L, inicio = 0L, tempo[] = {0,0,0,0};

try {
Class.forName("org.postgresql.Driver");
} catch (ClassNotFoundException e) {
e.printStackTrace();
}

Connection con = null;
String dbURL = "jdbcostgresql://10.15.61.6/database";
try {
con = DriverManager.getConnection(dbURL, "user", "password");

String sql = "select count(*) as x0_0_ from
sa_dut.TB_PROCESSO processo0_, sa_dut.TB_USUARIO usuario1_ where
(usuario1_.NOME like ? and processo0_.NUMPROCESSO=usuario1_.NUMPROCESSO)";
String nome = "TATIANA CRISTINA G";

PreparedStatement ps = null;
ResultSet rs = null;

//001 - '%NAME%' binded
if (ps != null) ps.close();
ps = con.prepareStatement(sql);
ps.setString(1, "%" + nome + "%");
inicio = System.currentTimeMillis();
rs = ps.executeQuery();
rs.next();
qtd = rs.getLong(1);
rs.close();
tempo[0] = System.currentTimeMillis() - inicio;

//002 - 'NAME%' binded
if (ps != null) ps.close();
ps = con.prepareStatement(sql);
ps.setString(1, nome + "%");
inicio = System.currentTimeMillis();
rs = ps.executeQuery();
rs.next();
qtd = rs.getLong(1);
rs.close();
tempo[1] = System.currentTimeMillis() - inicio;

//003 - '%NAME%' static
if (ps != null) ps.close();
String sql1 = sql.replaceFirst("\\?", "'%" + nome + "%'");
ps = con.prepareStatement(sql1);
inicio = System.currentTimeMillis();
rs = ps.executeQuery();
rs.next();
qtd = rs.getLong(1);
rs.close();
tempo[2] = System.currentTimeMillis() - inicio;

//004 - 'NAME%' static
if (ps != null) ps.close();
String sql2 = sql.replaceFirst("\\?", "'" + nome + "%'");
ps = con.prepareStatement(sql2);
inicio = System.currentTimeMillis();
rs = ps.executeQuery();
rs.next();
qtd = rs.getLong(1);
rs.close();
tempo[3] = System.currentTimeMillis() - inicio;

ps.close();
con.close();
} catch (SQLException e) {
e.printStackTrace();
}

System.out.println("QTD: " + qtd + "\n\n");
for (int ii = 0; ii < tempo.length; ii++)
System.out.println(ii + ": " + tempo[ii]);
}

public static void main(String[] args) {
new SelectLike();
}

}
************************************************** **************************************************

--
Regards,

Rodrigo Hjort
http://icewall.org/~hjort

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 02:34 AM
Tom Lane
 
Posts: n/a
Default Re: LIKE, leading percent, bind parameters and indexes

"Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes:
> What happens is that only the "004" block uses the index! The "002" code,
> which also has no leading percent, does a sequential scan. The difference
> between them is that "002" uses bind parameters.


Yeah. The LIKE index optimization depends on seeing a constant LIKE
pattern at plan time --- otherwise the planner doesn't know what
indexscan parameters to generate. So a bound-parameter query loses.

Ideas for improving this situation are welcome ... it's not an easy
problem ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 02:34 AM
Qingqing Zhou
 
Posts: n/a
Default Re: LIKE, leading percent, bind parameters and indexes


"Tom Lane" <tgl@sss.pgh.pa.us> wrote
>
> Yeah. The LIKE index optimization depends on seeing a constant LIKE
> pattern at plan time --- otherwise the planner doesn't know what
> indexscan parameters to generate. So a bound-parameter query loses.
>


AFAICS the problem is not restricted to LIKE, we can easily find a lot of
similar problems caused by the actual parameters. For example, SeqScan vs.
IndexScan vs. BitmapIndexScan for a range query. So an improvement is
definitely needed.

> Ideas for improving this situation are welcome ... it's not an easy
> problem ...
>

IMHO basically we have two ways to get better plan: one is to have a set of
alternative plans for prepare queries. This will add some cost but PREPARE
is supposed to do only once against a lot of EXECUTE. But still, the biggest
problem is that number of plans is not controllable.

Another way is to generate a plan on the fly. What we do is to let some
REPLAN nodes sit on top of some critical plan node: at the execution, we
will compare the actual numbers we get and the estimated number we have
(mabye "rows"?), once we find that a re-plan efforts might be deserved, we
will get a new plan on the fly. In this way, I think a not-too-big patch
will do. I remember there is a paper talking about this somewhere but not
remember clearly. -- This method can handle the range query problem above,
but not for LIKE. So we may have to kludge some code to handle LIKE
especially :-(.

Regards,
Qingqing




Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 02:36 AM
Rodrigo Hjort
 
Posts: n/a
Default Re: LIKE, leading percent, bind parameters and indexes

I'm not used to the PG Internals. But let me see if I understood that.

The LIKE operator, when applied on a static string and it is not preceded by
'%', causes the planner to search for some indexes in the table in order to
make a index scan. Otherwise, i.e. using leading '%' on static text or bound
paremeter, makes the planner always do a sequential scan. Is that the
scenario?

--
Rodrigo Hjort
http://icewall.org/~hjort


2006/5/23, Tom Lane <tgl@sss.pgh.pa.us>:
>
> "Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes:
> > What happens is that only the "004" block uses the index! The "002"

> code,
> > which also has no leading percent, does a sequential scan. The

> difference
> > between them is that "002" uses bind parameters.

>
> Yeah. The LIKE index optimization depends on seeing a constant LIKE
> pattern at plan time --- otherwise the planner doesn't know what
> indexscan parameters to generate. So a bound-parameter query loses.
>
> Ideas for improving this situation are welcome ... it's not an easy
> problem ...
>
> regards, tom lane
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 02:36 AM
Andrew Sullivan
 
Posts: n/a
Default Re: LIKE, leading percent, bind parameters and indexes

On Thu, May 25, 2006 at 02:18:10PM -0300, Rodrigo Hjort wrote:
> make a index scan. Otherwise, i.e. using leading '%' on static text or bound
> paremeter, makes the planner always do a sequential scan. Is that the
> scenario?


I think more exactly, the planner can't possibly know how to plan an
indexscan with a leading '%', because it has nowhere to start.

Think of it this way: if you go to the public library, and say, "I
want a book. I can't remember its name exactly, but it starts with
'daytime'," you can find it by going to the title index and browsing
for things that start that way. If you go to the public library, and
say, "There's this book I want, but I can't remember the title. It's
red," you're going to have a lot of books to look through. Maybe all
of them.

If it were important enough -- say you left a $10,000 cheque inside
-- you might just start looking. Maybe you'll get lucky, and hit it.

A

--
Andrew Sullivan | ajs@crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what
you told them to. That actually seems sort of quaint now.
--J.D. Baldwin

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 02:36 AM
Dave Cramer
 
Posts: n/a
Default Re: LIKE, leading percent, bind parameters and indexes

These are two confusing issues.

One is the use of a leading percent sign.

What Tom pointed out was with a bound parameter the planner can't
make any assumptions about indexes.

Leading percent signs can be made to use indexes by creating a
functional index on the column which reverses the order of the
column, then using the same function in the select

Dave
On 25-May-06, at 1:46 PM, Andrew Sullivan wrote:

> On Thu, May 25, 2006 at 02:18:10PM -0300, Rodrigo Hjort wrote:
>> make a index scan. Otherwise, i.e. using leading '%' on static
>> text or bound
>> paremeter, makes the planner always do a sequential scan. Is that the
>> scenario?

>
> I think more exactly, the planner can't possibly know how to plan an
> indexscan with a leading '%', because it has nowhere to start.
>
> Think of it this way: if you go to the public library, and say, "I
> want a book. I can't remember its name exactly, but it starts with
> 'daytime'," you can find it by going to the title index and browsing
> for things that start that way. If you go to the public library, and
> say, "There's this book I want, but I can't remember the title. It's
> red," you're going to have a lot of books to look through. Maybe all
> of them.
>
> If it were important enough -- say you left a $10,000 cheque inside
> -- you might just start looking. Maybe you'll get lucky, and hit
> it.
>
> A
>
> --
> Andrew Sullivan | ajs@crankycanuck.ca
> I remember when computers were frustrating because they *did*
> exactly what
> you told them to. That actually seems sort of quaint now.
> --J.D. Baldwin
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
>



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 02:37 AM
Rodrigo Hjort
 
Posts: n/a
Default Re: LIKE, leading percent, bind parameters and indexes

>
> I think more exactly, the planner can't possibly know how to plan an
> indexscan with a leading '%', because it has nowhere to start.
>


The fact is that index scan is performed on LIKE expression on a string not
preceded by '%', except when bound parameter is used.

select * from table where field like 'THE NAME%'; -- index scan
select * from table where field like '%THE NAME%'; -- seq scan
select * from table where field like :bind_param; -- seq scan (always)

Regards,

Rodrigo Hjort
http://icewall.org/~hjort

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-12-2008, 02:37 AM
Greg Stark
 
Posts: n/a
Default Re: LIKE, leading percent, bind parameters and indexes


"Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes:

> > I think more exactly, the planner can't possibly know how to plan an
> > indexscan with a leading '%', because it has nowhere to start.

>
> The fact is that index scan is performed on LIKE expression on a string not
> preceded by '%', except when bound parameter is used.
>
> select * from table where field like 'THE NAME%'; -- index scan
> select * from table where field like '%THE NAME%'; -- seq scan
> select * from table where field like :bind_param; -- seq scan (always)


Just for reference I found that both Oracle and MSSQL (back when last I used
it, many years ago) did use an index scan for the following case:

select * from table where field like :bind_param || '%'

At the time this seemed perfectly logical but now that I have more experience
it seems hard to justify. There's no principled reason to think this is any
more likely than a plain :bind_param to be an indexable scan.

However in practice this worked great. I rarely if ever put % characters into
the bind parameter and the index scan was exactly what I, as a user, expected.

Even if there's resistance to having this form be treated as indexable there
is certainly a use case for something like this. If not this then something
like

WHERE escape(:bind_param)||'%'

but that would be pretty hard to recognize, certainly much harder than a
simple :bind_param || '%'.


--
greg


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-12-2008, 02:37 AM
Jim C. Nasby
 
Posts: n/a
Default Re: LIKE, leading percent, bind parameters and indexes

On Thu, May 25, 2006 at 08:41:17PM -0300, Rodrigo Hjort wrote:
> >
> >I think more exactly, the planner can't possibly know how to plan an
> >indexscan with a leading '%', because it has nowhere to start.
> >

>
> The fact is that index scan is performed on LIKE expression on a string not
> preceded by '%', except when bound parameter is used.
>
> select * from table where field like 'THE NAME%'; -- index scan
> select * from table where field like '%THE NAME%'; -- seq scan
> select * from table where field like :bind_param; -- seq scan (always)


Since I'm somewhat doubtful of coming up with a generic means for
dealing with plan changes based on different bound parameter values any
time soon...

How difficult would it be to make LIKE check the value of the bound
parameter for a starting % and use that information to decide on a query
plan? IMHO this is worth making into a special case in the planner,
because it's very easy to detect and makes a tremendous difference in
the query plan/performance.

Also, might a bitmap scan be a win for the %string case? Presumably it's
much faster to find matching rows via an index and then go back into the
heap for them; unless you're matching a heck of a lot of rows.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-12-2008, 02:37 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: LIKE, leading percent, bind parameters and indexes

On Fri, May 26, 2006 at 11:38:41AM -0500, Jim C. Nasby wrote:
> > select * from table where field like 'THE NAME%'; -- index scan
> > select * from table where field like '%THE NAME%'; -- seq scan
> > select * from table where field like :bind_param; -- seq scan (always)

>
> How difficult would it be to make LIKE check the value of the bound
> parameter for a starting % and use that information to decide on a query
> plan? IMHO this is worth making into a special case in the planner,
> because it's very easy to detect and makes a tremendous difference in
> the query plan/performance.


Planning doesn't work that way. Like is just a function invokation, the
planner doesn't ask or tell it where it is in the plan. And it's not
the function that determines how the query is planned.

> Also, might a bitmap scan be a win for the %string case? Presumably it's
> much faster to find matching rows via an index and then go back into the
> heap for them; unless you're matching a heck of a lot of rows.


This is an interesting thought. Currently, AFAICS, the bitmap-scan code
only considers operators that are indexable, just like for narmal index
scans. However, in this case the query could scan the entire index,
apply the LIKE to each one and produce a bitmap of possible matches.
Then do a bitmap scan over the table to check the results.

Not just LIKE could use this, but any function marked STABLE. You'd
have to weigh up the cost of scanning the *entire* index (because we
don't have any actual restriction clauses) against avoiding a full
table scan.

Actually, if you're going to scan the whole index, maybe you can use
the recent changes that allow VACUUM to scan the index sequentially,
rather than by index order. Surely a sequential disk scan over the
index to create the bitmap would a big win.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEdzhbIB7bNG8LQkwRArFdAJ0WBpJdqUkyx3UKtaBykw RakcQLTQCfSOJL
pc/BCROHdQnjyB/D2LeNI6M=
=ZhSh
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:32 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
UnixAdminTalk.com

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451