This is a discussion on insert if not exist and delete old rows within the MySQL forums, part of the Database Server Software category; --> Hi to everyboby ;-) i've a simple table with two (or three, according to what you think about it ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi to everyboby ;-) i've a simple table with two (or three, according to what you think about it :-) ) fields: id | description | check I periodically (say, every day, or every hour, or what you prefer) read somewhere the information to store in this table. Well, if the row is already present in the table (that is if a record with the same id is already stored), i dont want to insert it again (in order to save time, and resources, isnt it?). Otherwise I'll insert as a new record. Moreover, I'd want to insert a "check" field so that: - if the record is already in the table I wont insert it again (but i'll check that it's a good record), - if the record is not in the table I'll insert it (and I'll check that it's a good record), and - at the end I want to delete old records, that is all the records that I'v not checked. Any idea about how to complete this task? Thank you. -- rob4you |
| |||
| On 12 Feb, 11:53, rob4you <KIEDIM...@KIEDIMELO.IT> wrote: > Hi to everyboby ;-) > > i've a simple table with two (or three, according to what you think > about it :-) ) fields: > > id | description | check > > I periodically (say, every day, or every hour, or what you prefer) read > somewhere the information to store in this table. > Well, if the row is already present in the table (that is if a record > with the same id is already stored), i dont want to insert it again (in > order to save time, and resources, isnt it?). > Otherwise I'll insert as a new record. > > Moreover, I'd want to insert a "check" field so that: > - if the record is already in the table I wont insert it again (but > i'll check that it's a good record), > - if the record is not in the table I'll insert it (and I'll check that > it's a good record), and > - at the end I want to delete old records, that is all the records that > I'v not checked. > > Any idea about how to complete this task? > > Thank you. > > -- > rob4you UPDATE `simple_table` SET `check` = 0; INSERT INTO `simple_table` (`id,`description`,`check`) VALUES (1,'FRED',1) ON DUPLICATE KEY SET `check` = 1; DELETE FROM `simple_table` WHERE `check` = 0 |
| |||
| UPDATE `simple_table` SET `check` = 0; REPLACE `simple_table` (`id,`description`,`check`) SET id = '1', description='aaa', check='1' DELETE FROM `simple_table` WHERE `check` = 0 --- http://ascii.mastervb.net -- ASCII Art Generator http://anagram.mastervb.net -- Anagram Finder http://www.mastervb.net/phpbooks -- Best PHP Books On Feb 12, 6:53 pm, rob4you <KIEDIM...@KIEDIMELO.IT> wrote: > Hi to everyboby ;-) > > i've a simple table with two (or three, according to what you think > about it :-) ) fields: > > id | description | check > > I periodically (say, every day, or every hour, or what you prefer) read > somewhere the information to store in this table. > Well, if the row is already present in the table (that is if a record > with the same id is already stored), i dont want to insert it again (in > order to save time, and resources, isnt it?). > Otherwise I'll insert as a new record. > > Moreover, I'd want to insert a "check" field so that: > - if the record is already in the table I wont insert it again (but > i'll check that it's a good record), > - if the record is not in the table I'll insert it (and I'll check that > it's a good record), and > - at the end I want to delete old records, that is all the records that > I'v not checked. > > Any idea about how to complete this task? > > Thank you. > > -- > rob4you |
| |||
| > UPDATE `simple_table` SET `check` = 0; > INSERT INTO `simple_table` (`id,`description`,`check`) VALUES > (1,'FRED',1) > ON DUPLICATE KEY SET `check` = 1; > DELETE FROM `simple_table` WHERE `check` = 0 Thank you for your reply. On an italian ng they've suggested me the following solution: CREATE TEMPORARY TABLE `temp` (`id` int primary key, `descr` text); INSERT INTO `temp` (`id,`descr`) VALUES (1,'FRED'); INSERT INTO `temp` (`id,`descr`) VALUES (2,'BOB'); -- ... TRUNCATE `simple_table`; INSERT INTO `simple_table` SELECT * FROM `temp`; So according to them the solution does not consider any "check" field. They told me this is better because in your solution MySQL has to perform twice a full-scan (both with the UPDATE and with the DELETE), and also a check on the INSERT. What do you think about it? -- rob4you |
| |||
| > UPDATE `simple_table` SET `check` = 0; > REPLACE `simple_table` (`id,`description`,`check`) SET id = '1', > description='aaa', check='1' > DELETE FROM `simple_table` WHERE `check` = 0 Thank you for you reply. Please read my reply to Captain Paralytic, and if you want, let me know what you think about that. -- rob4you |
| |||
| On 13 Feb, 10:33, rob4you <KIEDIM...@KIEDIMELO.IT> wrote: > > UPDATE `simple_table` SET `check` = 0; > > INSERT INTO `simple_table` (`id,`description`,`check`) VALUES > > (1,'FRED',1) > > ON DUPLICATE KEY SET `check` = 1; > > DELETE FROM `simple_table` WHERE `check` = 0 > > Thank you for your reply. > > On an italian ng they've suggested me the following solution: > > CREATE TEMPORARY TABLE `temp` (`id` int primary key, `descr` text); > > INSERT INTO `temp` (`id,`descr`) VALUES (1,'FRED'); > INSERT INTO `temp` (`id,`descr`) VALUES (2,'BOB'); > -- ... > > TRUNCATE `simple_table`; > INSERT INTO `simple_table` SELECT * FROM `temp`; > > So according to them the solution does not consider any "check" field. > They told me this is better because in your solution MySQL has to > perform twice a full-scan (both with the UPDATE and with the DELETE), > and also a check on the INSERT. > > What do you think about it? > > -- > rob4you 1) I think that in your case my INSERT ... ON DUPLICATE KEY ... is much more efficienct than using the REPLACE syntax. 2) I think that INSERTING data into a temporary table and then copying all that data via inserts is likely to be far more inefficient than deleting from themain table, unless there are a very large number of deletes compared to inserts. 3) I think that an INDEX on the check field will save a full table scan the second time. 4) I think (and this to me is the killer): Using the temporary table method, for he period of time between the TRUNCATE and the completion of the second INSERT, your real table is not available for use as all the records that should be there are missing for a varying period of time. Using the check method, the required records are always there. 5) I think assuggested above it depends on things like quantities of data to be inserted/refreshed/deleted and availability questions, none of which you have given any indication about here. Both ideas will work, but we do not have enough data to indicate which is better in your case. |
| |||
| > I think ... Thank you. Your comments are very helpfull. Well, I'll try to explain better the situation: periodically (say each two hours), my php script searches on a foreign website for some particular 'text', and inserts them in my db. These 'texts' can be more less 10000. Two hours later, my php script searches again all the particular 'texts'. These 'texts' will be again 10000 more less. Probably, the new one will be just 10. So, i've to perform these operations on the db: - insert all the new 'texts' - delete all the old 'texts', that is the 'texts' that my php script doesnt find anymore in the site. These old 'texts' will be 10-20, so a few number - of course mantain the still available 'texts', that is the 'texts' that have already been inserted previously in the db, and are still present on the site. Is is clearer now? Which is the optimal solution for this case? -- rob4you |
| |||
| On 13 Feb, 11:52, rob4you <KIEDIM...@KIEDIMELO.IT> wrote: > > I think ... > > Thank you. Your comments are very helpfull. > > Well, I'll try to explain better the situation: > > periodically (say each two hours), my php script searches on a foreign > website for some particular 'text', and inserts them in my db. These > 'texts' can be more less 10000. > Two hours later, my php script searches again all the particular > 'texts'. > These 'texts' will be again 10000 more less. Probably, the new one will > be just 10. > > So, i've to perform these operations on the db: > - insert all the new 'texts' > - delete all the old 'texts', that is the 'texts' that my php script > doesnt find anymore in the site. These old 'texts' will be 10-20, so a > few number > - of course mantain the still available 'texts', that is the 'texts' > that have already been inserted previously in the db, and are still > present on the site. > > Is is clearer now? > Which is the optimal solution for this case? > > -- > rob4you OK, let's take what you've said: Every 2 hours you will collect 10000 records of which 10 are new. Using the temp table method you will INSERT 10000 records into one table, you will then copy those 10000 records from that table and insert them into another table. Using my method you will update just under 20000 records and insert 10 new ones So far my method has moved half the amount of data of the other method as the initial delete will not actually have to move any data. Assuming that you have an index on the check field, with my method you will now use it to simply delete 10-20 records. During all this, with my method the database is available for use to read, whereas with the other method means that it is not available for a period. I think my one wins in this scenario. |
| |||
| > Assuming that you have an index on the check field, with my method you > will now use it to simply delete 10-20 records. In the specific, how do you suggest to set the index for the check field? > I think my one wins in this scenario. I'll try both methods, and i'll see which is completed in less time. They told me that your method is heavy because of the twice full-scan, but as you suggest, with the index it should become lighter. -- rob4you |
| ||||
| On 13 Feb, 12:07, rob4you <KIEDIM...@KIEDIMELO.IT> wrote: > > Assuming that you have an index on the check field, with my method you > > will now use it to simply delete 10-20 records. > > In the specific, how do you suggest to set the index for the check > field? > > > I think my one wins in this scenario. > > I'll try both methods, and i'll see which is completed in less time. > They told me that your method is heavy because of the twice full-scan, > but as you suggest, with the index it should become lighter. > > -- > rob4you My method moves far less data and thus is likely to be far better in this case. I don't understand what you mean by how to set the index for the check field? Please try to re-phrase. |