vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| PG-Hackers, I got the following picture: detran=# \d sa_dut.tb_usuario Table "sa_dut.tb_usuario" Column | Type | Modifiers -------------------------+-----------------------------+----------- numprocesso | bigint | not null nome | character varying(44) | nomemae | character varying(44) | datanascimento | date | Indexes: "tb_usuario_pkey" PRIMARY KEY, btree (numprocesso) "ix_usuario_11" btree (nome varchar_pattern_ops, nomemae varchar_pattern_ops) "ix_usuario_13" btree (datanascimento, nome varchar_pattern_ops) As I do not use C locale, I created indexes based on "varchar_pattern_ops". The issue I'm having is based on the following queries: select * from TB_USUARIO where nome like 'TATIANA CRISTINA G%'; select * from TB_USUARIO where nome like '%TATIANA CRISTINA G%'; For some reasons, I'm not using text-search engines, like TSearch2, but only the LIKE operator. Here are the query plans involved: detran=# explain analyze select count(*) as x0_0_ from sa_dut.TB_PROCESSO processo0_, sa_dut.TB_USUARIO usuario1_ where (usuario1_.NOME like 'TATIANA CRISTINA G%' and processo0_.NUMPROCESSO=usuario1_.NUMPROCESSO); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=11.94..11.95 rows=1 width=0) (actual time=143.970..143.972rows=1 loops=1) -> Nested Loop (cost=0.00..11.94 rows=1 width=0) (actual time= 143.935..143.949 rows=1 loops=1) -> Index Scan using ix_usuario_11 on tb_usuario usuario1_ (cost= 0.00..6.01 rows=1 width=8) (actual time=93.884..93.889 rows=1 loops=1) Index Cond: (((nome)::text ~>=~ 'TATIANA CRISTINA G'::character varying) AND ((nome)::text ~<~ 'TATIANA CRISTINA H'::character varying)) Filter: ((nome)::text ~~ 'TATIANA CRISTINA G%'::text) -> Index Scan using tb_processo_pkey on tb_processo processo0_ (cost=0.00..5.91 rows=1 width=8) (actual time=50.041..50.044 rows=1 loops=1) Index Cond: (processo0_.numprocesso = "outer".numprocesso) Total runtime: 144.176 ms detran=# explain analyze select count(*) as x0_0_ from sa_dut.TB_PROCESSO processo0_, sa_dut.TB_USUARIO usuario1_ where (usuario1_.NOME like '%TATIANA CRISTINA G%' and processo0_.NUMPROCESSO=usuario1_.NUMPROCESSO); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=67534.55..67534.56 rows=1 width=0) (actual time= 8101.957..8101.959 rows=1 loops=1) -> Nested Loop (cost=0.00..67534.55 rows=1 width=0) (actual time= 5404.106..8101.923 rows=1 loops=1) -> Seq Scan on tb_usuario usuario1_ (cost=0.00..67528.62 rows=1 width=8) (actual time=5404.056..8101.862 rows=1 loops=1) Filter: ((nome)::text ~~ '%TATIANA CRISTINA G%'::text) -> Index Scan using tb_processo_pkey on tb_processo processo0_ (cost=0.00..5.91 rows=1 width=8) (actual time=0.034..0.037rows=1 loops=1) Index Cond: (processo0_.numprocesso = "outer".numprocesso) Total runtime: 8102.105 ms We use Java, and recently we made an effort in order to avoid the leading '%' on LIKE expressions. The problem is that it wasn't solved, and then I made the following Java code to verify it. What happens is that only the "004" block uses the index! The "002" code, which also has no leading percent, does a sequential scan. The difference between them is that "002" uses bind parameters. Is it concerned to the JDBC Driver or PostgreSQL itself? What could be done in order to fix it? I could use static parameters, but then the queries would have to be reparsed each time on the backend, missing cache advantages. ************************************************** ************************************************** package db; import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; public class SelectLike { public SelectLike() { long qtd = 0L, inicio = 0L, tempo[] = {0,0,0,0}; try { Class.forName("org.postgresql.Driver"); } catch (ClassNotFoundException e) { e.printStackTrace(); } Connection con = null; String dbURL = "jdbc try { con = DriverManager.getConnection(dbURL, "user", "password"); String sql = "select count(*) as x0_0_ from sa_dut.TB_PROCESSO processo0_, sa_dut.TB_USUARIO usuario1_ where (usuario1_.NOME like ? and processo0_.NUMPROCESSO=usuario1_.NUMPROCESSO)"; String nome = "TATIANA CRISTINA G"; PreparedStatement ps = null; ResultSet rs = null; //001 - '%NAME%' binded if (ps != null) ps.close(); ps = con.prepareStatement(sql); ps.setString(1, "%" + nome + "%"); inicio = System.currentTimeMillis(); rs = ps.executeQuery(); rs.next(); qtd = rs.getLong(1); rs.close(); tempo[0] = System.currentTimeMillis() - inicio; //002 - 'NAME%' binded if (ps != null) ps.close(); ps = con.prepareStatement(sql); ps.setString(1, nome + "%"); inicio = System.currentTimeMillis(); rs = ps.executeQuery(); rs.next(); qtd = rs.getLong(1); rs.close(); tempo[1] = System.currentTimeMillis() - inicio; //003 - '%NAME%' static if (ps != null) ps.close(); String sql1 = sql.replaceFirst("\\?", "'%" + nome + "%'"); ps = con.prepareStatement(sql1); inicio = System.currentTimeMillis(); rs = ps.executeQuery(); rs.next(); qtd = rs.getLong(1); rs.close(); tempo[2] = System.currentTimeMillis() - inicio; //004 - 'NAME%' static if (ps != null) ps.close(); String sql2 = sql.replaceFirst("\\?", "'" + nome + "%'"); ps = con.prepareStatement(sql2); inicio = System.currentTimeMillis(); rs = ps.executeQuery(); rs.next(); qtd = rs.getLong(1); rs.close(); tempo[3] = System.currentTimeMillis() - inicio; ps.close(); con.close(); } catch (SQLException e) { e.printStackTrace(); } System.out.println("QTD: " + qtd + "\n\n"); for (int ii = 0; ii < tempo.length; ii++) System.out.println(ii + ": " + tempo[ii]); } public static void main(String[] args) { new SelectLike(); } } ************************************************** ************************************************** -- Regards, Rodrigo Hjort http://icewall.org/~hjort |
| |||
| "Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes: > What happens is that only the "004" block uses the index! The "002" code, > which also has no leading percent, does a sequential scan. The difference > between them is that "002" uses bind parameters. Yeah. The LIKE index optimization depends on seeing a constant LIKE pattern at plan time --- otherwise the planner doesn't know what indexscan parameters to generate. So a bound-parameter query loses. Ideas for improving this situation are welcome ... it's not an easy problem ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| "Tom Lane" <tgl@sss.pgh.pa.us> wrote > > Yeah. The LIKE index optimization depends on seeing a constant LIKE > pattern at plan time --- otherwise the planner doesn't know what > indexscan parameters to generate. So a bound-parameter query loses. > AFAICS the problem is not restricted to LIKE, we can easily find a lot of similar problems caused by the actual parameters. For example, SeqScan vs. IndexScan vs. BitmapIndexScan for a range query. So an improvement is definitely needed. > Ideas for improving this situation are welcome ... it's not an easy > problem ... > IMHO basically we have two ways to get better plan: one is to have a set of alternative plans for prepare queries. This will add some cost but PREPARE is supposed to do only once against a lot of EXECUTE. But still, the biggest problem is that number of plans is not controllable. Another way is to generate a plan on the fly. What we do is to let some REPLAN nodes sit on top of some critical plan node: at the execution, we will compare the actual numbers we get and the estimated number we have (mabye "rows"?), once we find that a re-plan efforts might be deserved, we will get a new plan on the fly. In this way, I think a not-too-big patch will do. I remember there is a paper talking about this somewhere but not remember clearly. -- This method can handle the range query problem above, but not for LIKE. So we may have to kludge some code to handle LIKE especially :-(. Regards, Qingqing |
| |||
| I'm not used to the PG Internals. But let me see if I understood that. The LIKE operator, when applied on a static string and it is not preceded by '%', causes the planner to search for some indexes in the table in order to make a index scan. Otherwise, i.e. using leading '%' on static text or bound paremeter, makes the planner always do a sequential scan. Is that the scenario? -- Rodrigo Hjort http://icewall.org/~hjort 2006/5/23, Tom Lane <tgl@sss.pgh.pa.us>: > > "Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes: > > What happens is that only the "004" block uses the index! The "002" > code, > > which also has no leading percent, does a sequential scan. The > difference > > between them is that "002" uses bind parameters. > > Yeah. The LIKE index optimization depends on seeing a constant LIKE > pattern at plan time --- otherwise the planner doesn't know what > indexscan parameters to generate. So a bound-parameter query loses. > > Ideas for improving this situation are welcome ... it's not an easy > problem ... > > regards, tom lane > |
| |||
| On Thu, May 25, 2006 at 02:18:10PM -0300, Rodrigo Hjort wrote: > make a index scan. Otherwise, i.e. using leading '%' on static text or bound > paremeter, makes the planner always do a sequential scan. Is that the > scenario? I think more exactly, the planner can't possibly know how to plan an indexscan with a leading '%', because it has nowhere to start. Think of it this way: if you go to the public library, and say, "I want a book. I can't remember its name exactly, but it starts with 'daytime'," you can find it by going to the title index and browsing for things that start that way. If you go to the public library, and say, "There's this book I want, but I can't remember the title. It's red," you're going to have a lot of books to look through. Maybe all of them. If it were important enough -- say you left a $10,000 cheque inside -- you might just start looking. Maybe you'll get lucky, and hit it. A -- Andrew Sullivan | ajs@crankycanuck.ca I remember when computers were frustrating because they *did* exactly what you told them to. That actually seems sort of quaint now. --J.D. Baldwin ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| These are two confusing issues. One is the use of a leading percent sign. What Tom pointed out was with a bound parameter the planner can't make any assumptions about indexes. Leading percent signs can be made to use indexes by creating a functional index on the column which reverses the order of the column, then using the same function in the select Dave On 25-May-06, at 1:46 PM, Andrew Sullivan wrote: > On Thu, May 25, 2006 at 02:18:10PM -0300, Rodrigo Hjort wrote: >> make a index scan. Otherwise, i.e. using leading '%' on static >> text or bound >> paremeter, makes the planner always do a sequential scan. Is that the >> scenario? > > I think more exactly, the planner can't possibly know how to plan an > indexscan with a leading '%', because it has nowhere to start. > > Think of it this way: if you go to the public library, and say, "I > want a book. I can't remember its name exactly, but it starts with > 'daytime'," you can find it by going to the title index and browsing > for things that start that way. If you go to the public library, and > say, "There's this book I want, but I can't remember the title. It's > red," you're going to have a lot of books to look through. Maybe all > of them. > > If it were important enough -- say you left a $10,000 cheque inside > -- you might just start looking. Maybe you'll get lucky, and hit > it. > > A > > -- > Andrew Sullivan | ajs@crankycanuck.ca > I remember when computers were frustrating because they *did* > exactly what > you told them to. That actually seems sort of quaint now. > --J.D. Baldwin > > ---------------------------(end of > broadcast)--------------------------- > TIP 6: explain analyze is your friend > ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| > > I think more exactly, the planner can't possibly know how to plan an > indexscan with a leading '%', because it has nowhere to start. > The fact is that index scan is performed on LIKE expression on a string not preceded by '%', except when bound parameter is used. select * from table where field like 'THE NAME%'; -- index scan select * from table where field like '%THE NAME%'; -- seq scan select * from table where field like :bind_param; -- seq scan (always) Regards, Rodrigo Hjort http://icewall.org/~hjort |
| |||
| "Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes: > > I think more exactly, the planner can't possibly know how to plan an > > indexscan with a leading '%', because it has nowhere to start. > > The fact is that index scan is performed on LIKE expression on a string not > preceded by '%', except when bound parameter is used. > > select * from table where field like 'THE NAME%'; -- index scan > select * from table where field like '%THE NAME%'; -- seq scan > select * from table where field like :bind_param; -- seq scan (always) Just for reference I found that both Oracle and MSSQL (back when last I used it, many years ago) did use an index scan for the following case: select * from table where field like :bind_param || '%' At the time this seemed perfectly logical but now that I have more experience it seems hard to justify. There's no principled reason to think this is any more likely than a plain :bind_param to be an indexable scan. However in practice this worked great. I rarely if ever put % characters into the bind parameter and the index scan was exactly what I, as a user, expected. Even if there's resistance to having this form be treated as indexable there is certainly a use case for something like this. If not this then something like WHERE escape(:bind_param)||'%' but that would be pretty hard to recognize, certainly much harder than a simple :bind_param || '%'. -- greg ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Thu, May 25, 2006 at 08:41:17PM -0300, Rodrigo Hjort wrote: > > > >I think more exactly, the planner can't possibly know how to plan an > >indexscan with a leading '%', because it has nowhere to start. > > > > The fact is that index scan is performed on LIKE expression on a string not > preceded by '%', except when bound parameter is used. > > select * from table where field like 'THE NAME%'; -- index scan > select * from table where field like '%THE NAME%'; -- seq scan > select * from table where field like :bind_param; -- seq scan (always) Since I'm somewhat doubtful of coming up with a generic means for dealing with plan changes based on different bound parameter values any time soon... How difficult would it be to make LIKE check the value of the bound parameter for a starting % and use that information to decide on a query plan? IMHO this is worth making into a special case in the planner, because it's very easy to detect and makes a tremendous difference in the query plan/performance. Also, might a bitmap scan be a win for the %string case? Presumably it's much faster to find matching rows via an index and then go back into the heap for them; unless you're matching a heck of a lot of rows. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| ||||
| On Fri, May 26, 2006 at 11:38:41AM -0500, Jim C. Nasby wrote: > > select * from table where field like 'THE NAME%'; -- index scan > > select * from table where field like '%THE NAME%'; -- seq scan > > select * from table where field like :bind_param; -- seq scan (always) > > How difficult would it be to make LIKE check the value of the bound > parameter for a starting % and use that information to decide on a query > plan? IMHO this is worth making into a special case in the planner, > because it's very easy to detect and makes a tremendous difference in > the query plan/performance. Planning doesn't work that way. Like is just a function invokation, the planner doesn't ask or tell it where it is in the plan. And it's not the function that determines how the query is planned. > Also, might a bitmap scan be a win for the %string case? Presumably it's > much faster to find matching rows via an index and then go back into the > heap for them; unless you're matching a heck of a lot of rows. This is an interesting thought. Currently, AFAICS, the bitmap-scan code only considers operators that are indexable, just like for narmal index scans. However, in this case the query could scan the entire index, apply the LIKE to each one and produce a bitmap of possible matches. Then do a bitmap scan over the table to check the results. Not just LIKE could use this, but any function marked STABLE. You'd have to weigh up the cost of scanning the *entire* index (because we don't have any actual restriction clauses) against avoiding a full table scan. Actually, if you're going to scan the whole index, maybe you can use the recent changes that allow VACUUM to scan the index sequentially, rather than by index order. Surely a sequential disk scan over the index to create the bitmap would a big win. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFEdzhbIB7bNG8LQkwRArFdAJ0WBpJdqUkyx3UKtaBykw RakcQLTQCfSOJL pc/BCROHdQnjyB/D2LeNI6M= =ZhSh -----END PGP SIGNATURE----- |
| Thread Tools | |
| Display Modes | |
|
|