This is a discussion on Proposed patch to make mergejoin cost estimation more symmetric within the Pgsql Patches forums, part of the PostgreSQL category; --> There was some discussion earlier today http://archives.postgresql.org/pgsql...2/msg00081.php about how mergejoin cost estimation is not smart about the situation where ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| There was some discussion earlier today http://archives.postgresql.org/pgsql...2/msg00081.php about how mergejoin cost estimation is not smart about the situation where we will have to scan a lot of rows on one side of the join before reaching the range of values found on the other side (and hence having some chance of finding join pairs). Ideally, that effect should be factored into the startup cost estimate for the join. This consideration is the exact dual of one that we already do have code for, namely that the merge can stop early if the range of values on one side ends long before that of the other. So I looked into whether that code couldn't be extended to handle this issue too. It turns out that it can be, and it actually becomes more symmetrical because it's considering both max and min values not just max. Also, I found that there were a couple of new-in-8.3 bugs in that area --- it's been extended to make estimates for descending-order mergejoins, but it wasn't getting that quite right. Since something needs to be done to that code anyway, I'm considering applying the attached patch. It's a bit larger than I would normally consider to be a good idea for an optional patch in late beta. But then again I wouldn't have the slightest hesitation about back-patching a change of this size after final release, so it seems attractive to put it in now. Aside from the patch, I have attached a test script that exercises merge join planning for some simple cases, and the plans output by the script in CVS HEAD with/without the patch. The cost estimates with the patch are in line with expectation, the estimates without, not so much. In particular, the existing bug can be seen at work here in that the sixth and eighth test cases ("big join highm on b=h order by b desc" and "big join high on b=h order by b desc") are given unreasonably small cost estimates by the unpatched code. (Note: the two sets of numbers vary a bit because they were working with nonidentical ANALYZE statistics.) Any objections to applying the patch? regards, tom lane Index: src/backend/optimizer/path/costsize.c ================================================== ================= RCS file: /cvsroot/pgsql/src/backend/optimizer/path/costsize.c,v retrieving revision 1.189 diff -c -r1.189 costsize.c *** src/backend/optimizer/path/costsize.c 15 Nov 2007 22:25:15 -0000 1.189 --- src/backend/optimizer/path/costsize.c 6 Dec 2007 23:46:32 -0000 *************** *** 1372,1383 **** double outer_path_rows = PATH_ROWS(outer_path); double inner_path_rows = PATH_ROWS(inner_path); double outer_rows, ! inner_rows; double mergejointuples, rescannedtuples; double rescanratio; ! Selectivity outerscansel, ! innerscansel; Selectivity joininfactor; Path sort_path; /* dummy for result of cost_sort */ --- 1372,1387 ---- double outer_path_rows = PATH_ROWS(outer_path); double inner_path_rows = PATH_ROWS(inner_path); double outer_rows, ! inner_rows, ! outer_skip_rows, ! inner_skip_rows; double mergejointuples, rescannedtuples; double rescanratio; ! Selectivity outerstartsel, ! outerendsel, ! innerstartsel, ! innerendsel; Selectivity joininfactor; Path sort_path; /* dummy for result of cost_sort */ *************** *** 1444,1453 **** * A merge join will stop as soon as it exhausts either input stream * (unless it's an outer join, in which case the outer side has to be * scanned all the way anyway). Estimate fraction of the left and right ! * inputs that will actually need to be scanned. We use only the first ! * (most significant) merge clause for this purpose. Since ! * mergejoinscansel() is a fairly expensive computation, we cache the ! * results in the merge clause RestrictInfo. */ if (mergeclauses && path->jpath.jointype != JOIN_FULL) { --- 1448,1459 ---- * A merge join will stop as soon as it exhausts either input stream * (unless it's an outer join, in which case the outer side has to be * scanned all the way anyway). Estimate fraction of the left and right ! * inputs that will actually need to be scanned. Likewise, we can ! * estimate the number of rows that will be skipped before the first ! * join pair is found, which should be factored into startup cost. ! * We use only the first (most significant) merge clause for this purpose. ! * Since mergejoinscansel() is a fairly expensive computation, we cache ! * the results in the merge clause RestrictInfo. */ if (mergeclauses && path->jpath.jointype != JOIN_FULL) { *************** *** 1478,1514 **** outer_path->parent->relids)) { /* left side of clause is outer */ ! outerscansel = cache->leftscansel; ! innerscansel = cache->rightscansel; } else { /* left side of clause is inner */ ! outerscansel = cache->rightscansel; ! innerscansel = cache->leftscansel; } if (path->jpath.jointype == JOIN_LEFT) ! outerscansel = 1.0; else if (path->jpath.jointype == JOIN_RIGHT) ! innerscansel = 1.0; } else { /* cope with clauseless or full mergejoin */ ! outerscansel = innerscansel = 1.0; } ! /* convert selectivity to row count; must scan at least one row */ ! outer_rows = clamp_row_est(outer_path_rows * outerscansel); ! inner_rows = clamp_row_est(inner_path_rows * innerscansel); /* * Readjust scan selectivities to account for above rounding. This is * normally an insignificant effect, but when there are only a few rows in * the inputs, failing to do this makes for a large percentage error. */ ! outerscansel = outer_rows / outer_path_rows; ! innerscansel = inner_rows / inner_path_rows; /* cost of source data */ --- 1484,1544 ---- outer_path->parent->relids)) { /* left side of clause is outer */ ! outerstartsel = cache->leftstartsel; ! outerendsel = cache->leftendsel; ! innerstartsel = cache->rightstartsel; ! innerendsel = cache->rightendsel; } else { /* left side of clause is inner */ ! outerstartsel = cache->rightstartsel; ! outerendsel = cache->rightendsel; ! innerstartsel = cache->leftstartsel; ! innerendsel = cache->leftendsel; } if (path->jpath.jointype == JOIN_LEFT) ! { ! outerstartsel = 0.0; ! outerendsel = 1.0; ! } else if (path->jpath.jointype == JOIN_RIGHT) ! { ! innerstartsel = 0.0; ! innerendsel = 1.0; ! } } else { /* cope with clauseless or full mergejoin */ ! outerstartsel = innerstartsel = 0.0; ! outerendsel = innerendsel = 1.0; } ! /* ! * Convert selectivities to row counts. We force outer_rows and ! * inner_rows to be at least 1, but the skip_rows estimates can be zero. ! */ ! outer_skip_rows = rint(outer_path_rows * outerstartsel); ! inner_skip_rows = rint(inner_path_rows * innerstartsel); ! outer_rows = clamp_row_est(outer_path_rows * outerendsel); ! inner_rows = clamp_row_est(inner_path_rows * innerendsel); ! ! Assert(outer_skip_rows <= outer_rows); ! Assert(inner_skip_rows <= inner_rows); /* * Readjust scan selectivities to account for above rounding. This is * normally an insignificant effect, but when there are only a few rows in * the inputs, failing to do this makes for a large percentage error. */ ! outerstartsel = outer_skip_rows / outer_path_rows; ! innerstartsel = inner_skip_rows / inner_path_rows; ! outerendsel = outer_rows / outer_path_rows; ! innerendsel = inner_rows / inner_path_rows; ! ! Assert(outerstartsel <= outerendsel); ! Assert(innerstartsel <= innerendsel); /* cost of source data */ *************** *** 1522,1535 **** outer_path->parent->width, -1.0); startup_cost += sort_path.startup_cost; run_cost += (sort_path.total_cost - sort_path.startup_cost) ! * outerscansel; } else { startup_cost += outer_path->startup_cost; run_cost += (outer_path->total_cost - outer_path->startup_cost) ! * outerscansel; } if (innersortkeys) /* do we need to sort inner? */ --- 1552,1569 ---- outer_path->parent->width, -1.0); startup_cost += sort_path.startup_cost; + startup_cost += (sort_path.total_cost - sort_path.startup_cost) + * outerstartsel; run_cost += (sort_path.total_cost - sort_path.startup_cost) ! * (outerendsel - outerstartsel); } else { startup_cost += outer_path->startup_cost; + startup_cost += (outer_path->total_cost - outer_path->startup_cost) + * outerstartsel; run_cost += (outer_path->total_cost - outer_path->startup_cost) ! * (outerendsel - outerstartsel); } if (innersortkeys) /* do we need to sort inner? */ *************** *** 1542,1555 **** inner_path->parent->width, -1.0); startup_cost += sort_path.startup_cost; run_cost += (sort_path.total_cost - sort_path.startup_cost) ! * innerscansel * rescanratio; } else { startup_cost += inner_path->startup_cost; run_cost += (inner_path->total_cost - inner_path->startup_cost) ! * innerscansel * rescanratio; } /* CPU costs */ --- 1576,1593 ---- inner_path->parent->width, -1.0); startup_cost += sort_path.startup_cost; + startup_cost += (sort_path.total_cost - sort_path.startup_cost) + * innerstartsel * rescanratio; run_cost += (sort_path.total_cost - sort_path.startup_cost) ! * (innerendsel - innerstartsel) * rescanratio; } else { startup_cost += inner_path->startup_cost; + startup_cost += (inner_path->total_cost - inner_path->startup_cost) + * innerstartsel * rescanratio; run_cost += (inner_path->total_cost - inner_path->startup_cost) ! * (innerendsel - innerstartsel) * rescanratio; } /* CPU costs */ *************** *** 1571,1578 **** * joininfactor. */ startup_cost += merge_qual_cost.startup; run_cost += merge_qual_cost.per_tuple * ! (outer_rows + inner_rows * rescanratio); /* * For each tuple that gets through the mergejoin proper, we charge --- 1609,1619 ---- * joininfactor. */ startup_cost += merge_qual_cost.startup; + startup_cost += merge_qual_cost.per_tuple * + (outer_skip_rows + inner_skip_rows * rescanratio); run_cost += merge_qual_cost.per_tuple * ! ((outer_rows - outer_skip_rows) + ! (inner_rows - inner_skip_rows) * rescanratio); /* * For each tuple that gets through the mergejoin proper, we charge *************** *** 1597,1604 **** { MergeScanSelCache *cache; ListCell *lc; ! Selectivity leftscansel, ! rightscansel; MemoryContext oldcontext; /* Do we have this result already? */ --- 1638,1647 ---- { MergeScanSelCache *cache; ListCell *lc; ! Selectivity leftstartsel, ! leftendsel, ! rightstartsel, ! rightendsel; MemoryContext oldcontext; /* Do we have this result already? */ *************** *** 1617,1624 **** pathkey->pk_opfamily, pathkey->pk_strategy, pathkey->pk_nulls_first, ! &leftscansel, ! &rightscansel); /* Cache the result in suitably long-lived workspace */ oldcontext = MemoryContextSwitchTo(root->planner_cxt); --- 1660,1669 ---- pathkey->pk_opfamily, pathkey->pk_strategy, pathkey->pk_nulls_first, ! &leftstartsel, ! &leftendsel, ! &rightstartsel, ! &rightendsel); /* Cache the result in suitably long-lived workspace */ oldcontext = MemoryContextSwitchTo(root->planner_cxt); *************** *** 1627,1634 **** cache->opfamily = pathkey->pk_opfamily; cache->strategy = pathkey->pk_strategy; cache->nulls_first = pathkey->pk_nulls_first; ! cache->leftscansel = leftscansel; ! cache->rightscansel = rightscansel; rinfo->scansel_cache = lappend(rinfo->scansel_cache, cache); --- 1672,1681 ---- cache->opfamily = pathkey->pk_opfamily; cache->strategy = pathkey->pk_strategy; cache->nulls_first = pathkey->pk_nulls_first; ! cache->leftstartsel = leftstartsel; ! cache->leftendsel = leftendsel; ! cache->rightstartsel = rightstartsel; ! cache->rightendsel = rightendsel; rinfo->scansel_cache = lappend(rinfo->scansel_cache, cache); Index: src/backend/utils/adt/selfuncs.c ================================================== ================= RCS file: /cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v retrieving revision 1.241 diff -c -r1.241 selfuncs.c *** src/backend/utils/adt/selfuncs.c 15 Nov 2007 22:25:16 -0000 1.241 --- src/backend/utils/adt/selfuncs.c 6 Dec 2007 23:46:32 -0000 *************** *** 128,135 **** int rangelo, int rangehi); static char *convert_string_datum(Datum value, Oid typid); static double convert_timevalue_to_scalar(Datum value, Oid typid); ! static bool get_variable_maximum(PlannerInfo *root, VariableStatData *vardata, ! Oid sortop, Datum *max); static Selectivity prefix_selectivity(VariableStatData *vardata, Oid vartype, Oid opfamily, Const *prefixcon); static Selectivity pattern_selectivity(Const *patt, Pattern_Type ptype); --- 128,135 ---- int rangelo, int rangehi); static char *convert_string_datum(Datum value, Oid typid); static double convert_timevalue_to_scalar(Datum value, Oid typid); ! static bool get_variable_range(PlannerInfo *root, VariableStatData *vardata, ! Oid sortop, Datum *min, Datum *max); static Selectivity prefix_selectivity(VariableStatData *vardata, Oid vartype, Oid opfamily, Const *prefixcon); static Selectivity pattern_selectivity(Const *patt, Pattern_Type ptype); *************** *** 2172,2189 **** * we can estimate how much of the input will actually be read. This * can have a considerable impact on the cost when using indexscans. * * clause should be a clause already known to be mergejoinable. opfamily, * strategy, and nulls_first specify the sort ordering being used. * ! * *leftscan is set to the fraction of the left-hand variable expected ! * to be scanned (0 to 1), and similarly *rightscan for the right-hand ! * variable. */ void mergejoinscansel(PlannerInfo *root, Node *clause, Oid opfamily, int strategy, bool nulls_first, ! Selectivity *leftscan, ! Selectivity *rightscan) { Node *left, *right; --- 2172,2195 ---- * we can estimate how much of the input will actually be read. This * can have a considerable impact on the cost when using indexscans. * + * Also, we can estimate how much of each input has to be read before the + * first join pair is found, which will affect the join's startup time. + * * clause should be a clause already known to be mergejoinable. opfamily, * strategy, and nulls_first specify the sort ordering being used. * ! * The outputs are: ! * *leftstart is set to the fraction of the left-hand variable expected ! * to be scanned before the first join pair is found (0 to 1). ! * *leftend is set to the fraction of the left-hand variable expected ! * to be scanned before the join terminates (0 to 1). ! * *rightstart, *rightend similarly for the right-hand variable. */ void mergejoinscansel(PlannerInfo *root, Node *clause, Oid opfamily, int strategy, bool nulls_first, ! Selectivity *leftstart, Selectivity *leftend, ! Selectivity *rightstart, Selectivity *rightend) { Node *left, *right; *************** *** 2196,2209 **** Oid opno, lsortop, rsortop, leop, revleop; ! Datum leftmax, rightmax; double selec; /* Set default results if we can't figure anything out. */ ! *leftscan = *rightscan = 1.0; /* Deconstruct the merge clause */ if (!is_opclause(clause)) --- 2202,2224 ---- Oid opno, lsortop, rsortop, + lstatop, + rstatop, + ltop, leop, + revltop, revleop; ! bool isgt; ! Datum leftmin, ! leftmax, ! rightmin, rightmax; double selec; /* Set default results if we can't figure anything out. */ ! /* XXX should default "start" fraction be a bit more than 0? */ ! *leftstart = *rightstart = 0.0; ! *leftend = *rightend = 1.0; /* Deconstruct the merge clause */ if (!is_opclause(clause)) *************** *** 2229,2258 **** /* * Look up the various operators we need. If we don't find them all, it ! * probably means the opfamily is broken, but we cope anyway. */ switch (strategy) { case BTLessStrategyNumber: ! lsortop = get_opfamily_member(opfamily, op_lefttype, op_lefttype, ! BTLessStrategyNumber); ! rsortop = get_opfamily_member(opfamily, op_righttype, op_righttype, ! BTLessStrategyNumber); ! leop = get_opfamily_member(opfamily, op_lefttype, op_righttype, ! BTLessEqualStrategyNumber); ! revleop = get_opfamily_member(opfamily, op_righttype, op_lefttype, ! BTLessEqualStrategyNumber); break; case BTGreaterStrategyNumber: /* descending-order case */ ! lsortop = get_opfamily_member(opfamily, op_lefttype, op_lefttype, ! BTGreaterStrategyNumber); ! rsortop = get_opfamily_member(opfamily, op_righttype, op_righttype, ! BTGreaterStrategyNumber); ! leop = get_opfamily_member(opfamily, op_lefttype, op_righttype, ! BTGreaterEqualStrategyNumber); ! revleop = get_opfamily_member(opfamily, op_righttype, op_lefttype, ! BTGreaterEqualStrategyNumber); break; default: goto fail; /* shouldn't get here */ --- 2244,2346 ---- /* * Look up the various operators we need. If we don't find them all, it ! * probably means the opfamily is broken, but we just fail silently. ! * ! * Note: we expect that pg_statistic histograms will be sorted by the ! * '<' operator, regardless of which sort direction we are considering. */ switch (strategy) { case BTLessStrategyNumber: ! isgt = false; ! if (op_lefttype == op_righttype) ! { ! /* easy case */ ! ltop = get_opfamily_member(opfamily, ! op_lefttype, op_righttype, ! BTLessStrategyNumber); ! leop = get_opfamily_member(opfamily, ! op_lefttype, op_righttype, ! BTLessEqualStrategyNumber); ! lsortop = ltop; ! rsortop = ltop; ! lstatop = lsortop; ! rstatop = rsortop; ! revltop = ltop; ! revleop = leop; ! } ! else ! { ! ltop = get_opfamily_member(opfamily, ! op_lefttype, op_righttype, ! BTLessStrategyNumber); ! leop = get_opfamily_member(opfamily, ! op_lefttype, op_righttype, ! BTLessEqualStrategyNumber); ! lsortop = get_opfamily_member(opfamily, ! op_lefttype, op_lefttype, ! BTLessStrategyNumber); ! rsortop = get_opfamily_member(opfamily, ! op_righttype, op_righttype, ! BTLessStrategyNumber); ! lstatop = lsortop; ! rstatop = rsortop; ! revltop = get_opfamily_member(opfamily, ! op_righttype, op_lefttype, ! BTLessStrategyNumber); ! revleop = get_opfamily_member(opfamily, ! op_righttype, op_lefttype, ! BTLessEqualStrategyNumber); ! } break; case BTGreaterStrategyNumber: /* descending-order case */ ! isgt = true; ! if (op_lefttype == op_righttype) ! { ! /* easy case */ ! ltop = get_opfamily_member(opfamily, ! op_lefttype, op_righttype, ! BTGreaterStrategyNumber); ! leop = get_opfamily_member(opfamily, ! op_lefttype, op_righttype, ! BTGreaterEqualStrategyNumber); ! lsortop = ltop; ! rsortop = ltop; ! lstatop = get_opfamily_member(opfamily, ! op_lefttype, op_lefttype, ! BTLessStrategyNumber); ! rstatop = lstatop; ! revltop = ltop; ! revleop = leop; ! } ! else ! { ! ltop = get_opfamily_member(opfamily, ! op_lefttype, op_righttype, ! BTGreaterStrategyNumber); ! leop = get_opfamily_member(opfamily, ! op_lefttype, op_righttype, ! BTGreaterEqualStrategyNumber); ! lsortop = get_opfamily_member(opfamily, ! op_lefttype, op_lefttype, ! BTGreaterStrategyNumber); ! rsortop = get_opfamily_member(opfamily, ! op_righttype, op_righttype, ! BTGreaterStrategyNumber); ! lstatop = get_opfamily_member(opfamily, ! op_lefttype, op_lefttype, ! BTLessStrategyNumber); ! rstatop = get_opfamily_member(opfamily, ! op_righttype, op_righttype, ! BTLessStrategyNumber); ! revltop = get_opfamily_member(opfamily, ! op_righttype, op_lefttype, ! BTGreaterStrategyNumber); ! revleop = get_opfamily_member(opfamily, ! op_righttype, op_lefttype, ! BTGreaterEqualStrategyNumber); ! } break; default: goto fail; /* shouldn't get here */ *************** *** 2260,2325 **** if (!OidIsValid(lsortop) || !OidIsValid(rsortop) || !OidIsValid(leop) || !OidIsValid(revleop)) goto fail; /* insufficient info in catalogs */ ! /* Try to get maximum values of both inputs */ ! if (!get_variable_maximum(root, &leftvar, lsortop, &leftmax)) ! goto fail; /* no max available from stats */ ! ! if (!get_variable_maximum(root, &rightvar, rsortop, &rightmax)) ! goto fail; /* no max available from stats */ /* * Now, the fraction of the left variable that will be scanned is the * fraction that's <= the right-side maximum value. But only believe ! * non-default estimates, else stick with our 1.0. Also, if the sort ! * order is nulls-first, we're going to have to read over any nulls too. */ ! selec = scalarineqsel(root, leop, false, &leftvar, rightmax, op_righttype); if (selec != DEFAULT_INEQ_SEL) ! { ! if (nulls_first && HeapTupleIsValid(leftvar.statsTuple)) ! { ! Form_pg_statistic stats; ! ! stats = (Form_pg_statistic) GETSTRUCT(leftvar.statsTuple); ! selec += stats->stanullfrac; ! CLAMP_PROBABILITY(selec); ! } ! *leftscan = selec; ! } /* And similarly for the right variable. */ ! selec = scalarineqsel(root, revleop, false, &rightvar, leftmax, op_lefttype); if (selec != DEFAULT_INEQ_SEL) { ! if (nulls_first && HeapTupleIsValid(rightvar.statsTuple)) ! { ! Form_pg_statistic stats; stats = (Form_pg_statistic) GETSTRUCT(rightvar.statsTuple); ! selec += stats->stanullfrac; ! CLAMP_PROBABILITY(selec); } - *rightscan = selec; } ! /* ! * Only one of the two fractions can really be less than 1.0; believe the ! * smaller estimate and reset the other one to exactly 1.0. If we get ! * exactly equal estimates (as can easily happen with self-joins), believe ! * neither. ! */ ! if (*leftscan > *rightscan) ! *leftscan = 1.0; ! else if (*leftscan < *rightscan) ! *rightscan = 1.0; ! else ! *leftscan = *rightscan = 1.0; fail: ReleaseVariableStats(leftvar); --- 2348,2480 ---- if (!OidIsValid(lsortop) || !OidIsValid(rsortop) || + !OidIsValid(lstatop) || + !OidIsValid(rstatop) || + !OidIsValid(ltop) || !OidIsValid(leop) || + !OidIsValid(revltop) || !OidIsValid(revleop)) goto fail; /* insufficient info in catalogs */ ! /* Try to get ranges of both inputs */ ! if (!isgt) ! { ! if (!get_variable_range(root, &leftvar, lstatop, ! &leftmin, &leftmax)) ! goto fail; /* no range available from stats */ ! if (!get_variable_range(root, &rightvar, rstatop, ! &rightmin, &rightmax)) ! goto fail; /* no range available from stats */ ! } ! else ! { ! /* need to swap the max and min */ ! if (!get_variable_range(root, &leftvar, lstatop, ! &leftmax, &leftmin)) ! goto fail; /* no range available from stats */ ! if (!get_variable_range(root, &rightvar, rstatop, ! &rightmax, &rightmin)) ! goto fail; /* no range available from stats */ ! } /* * Now, the fraction of the left variable that will be scanned is the * fraction that's <= the right-side maximum value. But only believe ! * non-default estimates, else stick with our 1.0. */ ! selec = scalarineqsel(root, leop, isgt, &leftvar, rightmax, op_righttype); if (selec != DEFAULT_INEQ_SEL) ! *leftend = selec; /* And similarly for the right variable. */ ! selec = scalarineqsel(root, revleop, isgt, &rightvar, leftmax, op_lefttype); if (selec != DEFAULT_INEQ_SEL) + *rightend = selec; + + /* + * Only one of the two "end" fractions can really be less than 1.0; + * believe the smaller estimate and reset the other one to exactly 1.0. + * If we get exactly equal estimates (as can easily happen with + * self-joins), believe neither. + */ + if (*leftend > *rightend) + *leftend = 1.0; + else if (*leftend < *rightend) + *rightend = 1.0; + else + *leftend = *rightend = 1.0; + + /* + * Also, the fraction of the left variable that will be scanned before + * the first join pair is found is the fraction that's < the right-side + * minimum value. But only believe non-default estimates, else stick with + * our own default. + */ + selec = scalarineqsel(root, ltop, isgt, &leftvar, + rightmin, op_righttype); + if (selec != DEFAULT_INEQ_SEL) + *leftstart = selec; + + /* And similarly for the right variable. */ + selec = scalarineqsel(root, revltop, isgt, &rightvar, + leftmin, op_lefttype); + if (selec != DEFAULT_INEQ_SEL) + *rightstart = selec; + + /* + * Only one of the two "start" fractions can really be more than zero; + * believe the larger estimate and reset the other one to exactly 0.0. + * If we get exactly equal estimates (as can easily happen with + * self-joins), believe neither. + */ + if (*leftstart < *rightstart) + *leftstart = 0.0; + else if (*leftstart > *rightstart) + *rightstart = 0.0; + else + *leftstart = *rightstart = 0.0; + + /* + * If the sort order is nulls-first, we're going to have to skip over any + * nulls too. These would not have been counted by scalarineqsel, and + * we can safely add in this fraction regardless of whether we believe + * scalarineqsel's results or not. But be sure to clamp the sum to 1.0! + */ + if (nulls_first) { ! Form_pg_statistic stats; + if (HeapTupleIsValid(leftvar.statsTuple)) + { + stats = (Form_pg_statistic) GETSTRUCT(leftvar.statsTuple); + *leftstart += stats->stanullfrac; + CLAMP_PROBABILITY(*leftstart); + *leftend += stats->stanullfrac; + CLAMP_PROBABILITY(*leftend); + } + if (HeapTupleIsValid(rightvar.statsTuple)) + { stats = (Form_pg_statistic) GETSTRUCT(rightvar.statsTuple); ! *rightstart += stats->stanullfrac; ! CLAMP_PROBABILITY(*rightstart); ! *rightend += stats->stanullfrac; ! CLAMP_PROBABILITY(*rightend); } } ! /* Disbelieve start >= end, just in case that can happen */ ! if (*leftstart >= *leftend) ! { ! *leftstart = 0.0; ! *leftend = 1.0; ! } ! if (*rightstart >= *rightend) ! { ! *rightstart = 0.0; ! *rightend = 1.0; ! } fail: ReleaseVariableStats(leftvar); *************** *** 3778,3797 **** } /* ! * get_variable_maximum ! * Estimate the maximum value of the specified variable. ! * If successful, store value in *max and return TRUE. * If no data available, return FALSE. * ! * sortop is the "<" comparison operator to use. (To extract the ! * minimum instead of the maximum, just pass the ">" operator instead.) */ static bool ! get_variable_maximum(PlannerInfo *root, VariableStatData *vardata, ! Oid sortop, Datum *max) { Datum tmax = 0; ! bool have_max = false; Form_pg_statistic stats; int16 typLen; bool typByVal; --- 3933,3953 ---- } /* ! * get_variable_range ! * Estimate the minimum and maximum value of the specified variable. ! * If successful, store values in *min and *max, and return TRUE. * If no data available, return FALSE. * ! * sortop is the "<" comparison operator to use. This should generally ! * be "<" not ">", as only the former is likely to be found in pg_statistic. */ static bool ! get_variable_range(PlannerInfo *root, VariableStatData *vardata, Oid sortop, ! Datum *min, Datum *max) { + Datum tmin = 0; Datum tmax = 0; ! bool have_data = false; Form_pg_statistic stats; int16 typLen; bool typByVal; *************** *** 3809,3815 **** get_typlenbyval(vardata->atttype, &typLen, &typByVal); /* ! * If there is a histogram, grab the last or first value as appropriate. * * If there is a histogram that is sorted with some other operator than * the one we want, fail --- this suggests that there is data we can't --- 3965,3971 ---- get_typlenbyval(vardata->atttype, &typLen, &typByVal); /* ! * If there is a histogram, grab the first and last values. * * If there is a histogram that is sorted with some other operator than * the one we want, fail --- this suggests that there is data we can't *************** *** 3823,3864 **** { if (nvalues > 0) { tmax = datumCopy(values[nvalues - 1], typByVal, typLen); ! have_max = true; } free_attstatsslot(vardata->atttype, values, nvalues, NULL, 0); } ! else { ! Oid rsortop = get_commutator(sortop); ! ! if (OidIsValid(rsortop) && ! get_attstatsslot(vardata->statsTuple, ! vardata->atttype, vardata->atttypmod, ! STATISTIC_KIND_HISTOGRAM, rsortop, ! &values, &nvalues, ! NULL, NULL)) ! { ! if (nvalues > 0) ! { ! tmax = datumCopy(values[0], typByVal, typLen); ! have_max = true; ! } ! free_attstatsslot(vardata->atttype, values, nvalues, NULL, 0); ! } ! else if (get_attstatsslot(vardata->statsTuple, ! vardata->atttype, vardata->atttypmod, ! STATISTIC_KIND_HISTOGRAM, InvalidOid, ! &values, &nvalues, ! NULL, NULL)) ! { ! free_attstatsslot(vardata->atttype, values, nvalues, NULL, 0); ! return false; ! } } /* ! * If we have most-common-values info, look for a large MCV. This is * needed even if we also have a histogram, since the histogram excludes * the MCVs. However, usually the MCVs will not be the extreme values, so * avoid unnecessary data copying. --- 3979,4002 ---- { if (nvalues > 0) { + tmin = datumCopy(values[0], typByVal, typLen); tmax = datumCopy(values[nvalues - 1], typByVal, typLen); ! have_data = true; } free_attstatsslot(vardata->atttype, values, nvalues, NULL, 0); } ! else if (get_attstatsslot(vardata->statsTuple, ! vardata->atttype, vardata->atttypmod, ! STATISTIC_KIND_HISTOGRAM, InvalidOid, ! &values, &nvalues, ! NULL, NULL)) { ! free_attstatsslot(vardata->atttype, values, nvalues, NULL, 0); ! return false; } /* ! * If we have most-common-values info, look for extreme MCVs. This is * needed even if we also have a histogram, since the histogram excludes * the MCVs. However, usually the MCVs will not be the extreme values, so * avoid unnecessary data copying. *************** *** 3869,3899 **** &values, &nvalues, NULL, NULL)) { ! bool large_mcv = false; FmgrInfo opproc; fmgr_info(get_opcode(sortop), &opproc); for (i = 0; i < nvalues; i++) { ! if (!have_max) { ! tmax = values[i]; ! large_mcv = have_max = true; } ! else if (DatumGetBool(FunctionCall2(&opproc, tmax, values[i]))) { tmax = values[i]; ! large_mcv = true; } } ! if (large_mcv) tmax = datumCopy(tmax, typByVal, typLen); free_attstatsslot(vardata->atttype, values, nvalues, NULL, 0); } *max = tmax; ! return have_max; } --- 4007,4047 ---- &values, &nvalues, NULL, NULL)) { ! bool tmin_is_mcv = false; ! bool tmax_is_mcv = false; FmgrInfo opproc; fmgr_info(get_opcode(sortop), &opproc); for (i = 0; i < nvalues; i++) { ! if (!have_data) { ! tmin = tmax = values[i]; ! tmin_is_mcv = tmax_is_mcv = have_data = true; ! continue; ! } ! if (DatumGetBool(FunctionCall2(&opproc, values[i], tmin))) ! { ! tmin = values[i]; ! tmin_is_mcv = true; } ! if (DatumGetBool(FunctionCall2(&opproc, tmax, values[i]))) { tmax = values[i]; ! tmax_is_mcv = true; } } ! if (tmin_is_mcv) ! tmin = datumCopy(tmin, typByVal, typLen); ! if (tmax_is_mcv) tmax = datumCopy(tmax, typByVal, typLen); free_attstatsslot(vardata->atttype, values, nvalues, NULL, 0); } + *min = tmin; *max = tmax; ! return have_data; } Index: src/include/nodes/relation.h ================================================== ================= RCS file: /cvsroot/pgsql/src/include/nodes/relation.h,v retrieving revision 1.150 diff -c -r1.150 relation.h *** src/include/nodes/relation.h 15 Nov 2007 22:25:17 -0000 1.150 --- src/include/nodes/relation.h 6 Dec 2007 23:46:32 -0000 *************** *** 993,1000 **** int strategy; /* sort direction (ASC or DESC) */ bool nulls_first; /* do NULLs come before normal values? */ /* Results */ ! Selectivity leftscansel; /* scan fraction for clause left side */ ! Selectivity rightscansel; /* scan fraction for clause right side */ } MergeScanSelCache; /* --- 993,1002 ---- int strategy; /* sort direction (ASC or DESC) */ bool nulls_first; /* do NULLs come before normal values? */ /* Results */ ! Selectivity leftstartsel; /* first-join fraction for clause left side */ ! Selectivity leftendsel; /* last-join fraction for clause left side */ ! Selectivity rightstartsel; /* first-join fraction for clause right side */ ! Selectivity rightendsel; /* last-join fraction for clause right side */ } MergeScanSelCache; /* Index: src/include/utils/selfuncs.h ================================================== ================= RCS file: /cvsroot/pgsql/src/include/utils/selfuncs.h,v retrieving revision 1.41 diff -c -r1.41 selfuncs.h *** src/include/utils/selfuncs.h 7 Nov 2007 22:37:24 -0000 1.41 --- src/include/utils/selfuncs.h 6 Dec 2007 23:46:32 -0000 *************** *** 161,168 **** extern void mergejoinscansel(PlannerInfo *root, Node *clause, Oid opfamily, int strategy, bool nulls_first, ! Selectivity *leftscan, ! Selectivity *rightscan); extern double estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows); --- 161,168 ---- extern void mergejoinscansel(PlannerInfo *root, Node *clause, Oid opfamily, int strategy, bool nulls_first, ! Selectivity *leftstart, Selectivity *leftend, ! Selectivity *rightstart, Selectivity *rightend); extern double estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows); create table big as select * from generate_series(1,100000) b; create table low as select * from generate_series(1,1000) l; create table lowp as select l+9000 as l from generate_series(1,1000) l; create table highm as select l+90000 as h from generate_series(1,1000) l; create table high as select l+100000-1000 as h from generate_series(1,1000) l; analyze big; analyze low; analyze lowp; analyze high; analyze highm; create index bigi on big(b); create index lowi on low(l); create index lowpi on lowp(l); create index highi on high(h); create index highmi on highm(h); set enable_nestloop TO 0; set enable_hashjoin TO 0; explain select * from big join low on b=l order by b; explain select * from big join low on b=l order by b desc; explain select * from big join lowp on b=l order by b; explain select * from big join lowp on b=l order by b desc; explain select * from big join highm on b=h order by b; explain select * from big join highm on b=h order by b desc; explain select * from big join high on b=h order by b; explain select * from big join high on b=h order by b desc; explain select * from big join low on b=l order by b; QUERY PLAN ------------------------------------------------------------------------------ Merge Join (cost=0.00..89.31 rows=1000 width=8) Merge Cond: (big.b = low.l) -> Index Scan using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan using lowi on low (cost=0.00..43.25 rows=1000 width=4) (4 rows) explain select * from big join low on b=l order by b desc; QUERY PLAN --------------------------------------------------------------------------------------- Merge Join (cost=3266.70..3356.01 rows=1000 width=8) Merge Cond: (big.b = low.l) -> Index Scan Backward using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan Backward using lowi on low (cost=0.00..43.25 rows=1000 width=4) (4 rows) explain select * from big join lowp on b=l order by b; QUERY PLAN ------------------------------------------------------------------------------ Merge Join (cost=306.00..395.48 rows=1000 width=8) Merge Cond: (big.b = lowp.l) -> Index Scan using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan using lowpi on lowp (cost=0.00..43.25 rows=1000 width=4) (4 rows) explain select * from big join lowp on b=l order by b desc; QUERY PLAN --------------------------------------------------------------------------------------- Merge Join (cost=2960.53..3050.01 rows=1000 width=8) Merge Cond: (big.b = lowp.l) -> Index Scan Backward using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan Backward using lowpi on lowp (cost=0.00..43.25 rows=1000 width=4) (4 rows) explain select * from big join highm on b=h order by b; QUERY PLAN ------------------------------------------------------------------------------ Merge Join (cost=2968.49..3057.34 rows=1000 width=8) Merge Cond: (big.b = highm.h) -> Index Scan using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan using highmi on highm (cost=0.00..43.25 rows=1000 width=4) (4 rows) explain select * from big join highm on b=h order by b desc; QUERY PLAN --------------------------------------------------------------------------------------- Merge Join (cost=298.67..387.53 rows=1000 width=8) Merge Cond: (big.b = highm.h) -> Index Scan Backward using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan Backward using highmi on highm (cost=0.00..43.25 rows=1000 width=4) (4 rows) explain select * from big join high on b=h order by b; QUERY PLAN ------------------------------------------------------------------------------ Merge Join (cost=3267.82..3355.97 rows=1000 width=8) Merge Cond: (big.b = high.h) -> Index Scan using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan using highi on high (cost=0.00..43.25 rows=1000 width=4) (4 rows) explain select * from big join high on b=h order by b desc; QUERY PLAN --------------------------------------------------------------------------------------- Merge Join (cost=0.05..88.19 rows=1000 width=8) Merge Cond: (big.b = high.h) -> Index Scan Backward using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan Backward using highi on high (cost=0.00..43.25 rows=1000 width=4) (4 rows) explain select * from big join low on b=l order by b; QUERY PLAN ------------------------------------------------------------------------------ Merge Join (cost=0.00..84.43 rows=1000 width=8) Merge Cond: (big.b = low.l) -> Index Scan using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan using lowi on low (cost=0.00..43.25 rows=1000 width=4) (4 rows) explain select * from big join low on b=l order by b desc; QUERY PLAN --------------------------------------------------------------------------------------- Merge Join (cost=0.00..3356.01 rows=1000 width=8) Merge Cond: (big.b = low.l) -> Index Scan Backward using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan Backward using lowi on low (cost=0.00..43.25 rows=1000 width=4) (4 rows) explain select * from big join lowp on b=l order by b; QUERY PLAN ------------------------------------------------------------------------------ Merge Join (cost=0.00..349.01 rows=1000 width=8) Merge Cond: (big.b = lowp.l) -> Index Scan using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan using lowpi on lowp (cost=0.00..43.25 rows=1000 width=4) (4 rows) explain select * from big join lowp on b=l order by b desc; QUERY PLAN --------------------------------------------------------------------------------------- Merge Join (cost=0.00..3356.01 rows=1000 width=8) Merge Cond: (big.b = lowp.l) -> Index Scan Backward using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan Backward using lowpi on lowp (cost=0.00..43.25 rows=1000 width=4) (4 rows) explain select * from big join highm on b=h order by b; QUERY PLAN ------------------------------------------------------------------------------ Merge Join (cost=0.00..3063.81 rows=1000 width=8) Merge Cond: (big.b = highm.h) -> Index Scan using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan using highmi on highm (cost=0.00..43.25 rows=1000 width=4) (4 rows) explain select * from big join highm on b=h order by b desc; QUERY PLAN --------------------------------------------------------------------------------------- Merge Join (cost=0.00..56.08 rows=1000 width=8) Merge Cond: (big.b = highm.h) -> Index Scan Backward using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan Backward using highmi on highm (cost=0.00..43.25 rows=1000 width=4) (4 rows) explain select * from big join high on b=h order by b; QUERY PLAN ------------------------------------------------------------------------------ Merge Join (cost=0.00..3355.87 rows=1000 width=8) Merge Cond: (big.b = high.h) -> Index Scan using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan using highi on high (cost=0.00..43.25 rows=1000 width=4) (4 rows) explain select * from big join high on b=h order by b desc; QUERY PLAN --------------------------------------------------------------------------------------- Merge Join (cost=0.00..56.08 rows=1000 width=8) Merge Cond: (big.b = high.h) -> Index Scan Backward using bigi on big (cost=0.00..3050.26 rows=100000 width=4) -> Index Scan Backward using highi on high (cost=0.00..43.25 rows=1000 width=4) (4 rows) ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| On Thu, 2007-12-06 at 19:19 -0500, Tom Lane wrote: > Since something needs to be done to that code anyway, I'm considering > applying the attached patch. It's a bit larger than I would normally > consider to be a good idea for an optional patch in late beta. But then > again I wouldn't have the slightest hesitation about back-patching a > change of this size after final release, so it seems attractive to put > it in now. > > Aside from the patch, I have attached a test script that exercises merge > join planning for some simple cases, and the plans output by the script > in CVS HEAD with/without the patch. The cost estimates with the patch > are in line with expectation, the estimates without, not so much. > In particular, the existing bug can be seen at work here in that the > sixth and eighth test cases ("big join highm on b=h order by b desc" and > "big join high on b=h order by b desc") are given unreasonably small > cost estimates by the unpatched code. (Note: the two sets of numbers > vary a bit because they were working with nonidentical ANALYZE > statistics.) Looks good. > Any objections to applying the patch? None. I do have a longer term issue that there is no information provided by EXPLAIN to allow us to differentiate these two conditions. That makes it harder to understand the basis of the plans and also gets everybody used to seeing EXPLAINs that can't easily be explained, which leads to people not reporting problems that exist. I doubt we can fix anything now, but increased debugging/logging output is definitely required in some form. > explain select * from big join highm on b=h order by b desc; > QUERY PLAN > --------------------------------------------------------------------------------------- > Merge Join (cost=298.67..387.53 rows=1000 width=8) > Merge Cond: (big.b = highm.h) > -> Index Scan Backward using bigi on big (cost=0.00..3050.26 rows=100000 width=4) > -> Index Scan Backward using highmi on highm (cost=0.00..43.25 rows=1000 width=4) > (4 rows) > explain select * from big join high on b=h order by b desc; > QUERY PLAN > --------------------------------------------------------------------------------------- > Merge Join (cost=0.05..88.19 rows=1000 width=8) > Merge Cond: (big.b = high.h) > -> Index Scan Backward using bigi on big (cost=0.00..3050.26 rows=100000 width=4) > -> Index Scan Backward using highi on high (cost=0.00..43.25 rows=1000 width=4) > (4 rows) -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| "Tom Lane" <tgl@sss.pgh.pa.us> writes: > Any objections to applying the patch? I like applying it. You don't include any negative tests and corner cases as well; cases where the new code shouldn't be disturbing the results such as a symmetric join or a join against a single-row table. The comments make me think you ran them but just didn't show them though. What about a merge join against an empty table? I suppose there would just be no statistics? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Simon Riggs <simon@2ndquadrant.com> writes: > I do have a longer term issue that there is no information provided by > EXPLAIN to allow us to differentiate these two conditions. Sure there is: the new startup condition affects the estimated plan startup cost (only) and the existing termination condition affects the estimated total cost (only). regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Gregory Stark <stark@enterprisedb.com> writes: > What about a merge join against an empty table? I suppose there would just be > no statistics? Yeah. The defenses against silly results in mergejoinscansel should be enough to prevent it from doing anything really insane. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > I do have a longer term issue that there is no information provided by > > EXPLAIN to allow us to differentiate these two conditions. > > Sure there is: the new startup condition affects the estimated plan > startup cost (only) and the existing termination condition affects the > estimated total cost (only). Yes, but how can you tell by looking at the explain? I think the point is that the "fraction that would be skipped" should be reported somehow. -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC Criptografía: Poderosa técnica algorítmica de codificación que es empleada en la creación de manuales de computadores. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Yes, but how can you tell by looking at the explain? I think the point > is that the "fraction that would be skipped" should be reported somehow. It is: it's directly reflected in the startup cost. Previously, a mergejoin would always have startup cost equal to the sum of its input startup costs (hence, zero in the cases of interest here). regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Fri, 2007-12-07 at 09:48 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > I do have a longer term issue that there is no information provided by > > EXPLAIN to allow us to differentiate these two conditions. > > Sure there is: the new startup condition affects the estimated plan > startup cost (only) and the existing termination condition affects the > estimated total cost (only). I think we all accept that you're the master here. The question is how will we know to report problems to you? If we all get used to the idea that sometimes the numbers vary, then we're in the situation where we have to say to people "its magic". That destroys any feedback loop you have for problem reporting and then you get out of touch with what is happening on the ground. It's a long term issue, so I have no complaints about what you've done, its just something to think about. I've been exactly here before in one of my past lives and I don't want to reinvent that particular wheel. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| "Simon Riggs" <simon@2ndquadrant.com> writes: > I think we all accept that you're the master here. The question is how > will we know to report problems to you? Hm, I think I agree. The problem is where to draw the line. Ultimately everything in the statistics tables and more could potentially be relevant. The other problem is that currently our explain output is a bit of a hack. It's just text we print out and we're limited in how much data we can put in that without it being unwieldy. When we get the table-based or xml-based or some other machine-readable explain plan we could stuff a lot more data in there and have it be the UI's responsibility to decide what data to display. When that happens it would be nice to have the raw data used to generate the cost estimations. At least the most important factors. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |