Tuesday, March 20, 2012

Processing one row at a time

I'm populating a new table based on information in an existing table. The new table is a list of all "items" and contains a primary key. The old table is a database of receipts where items can appear many times in any order.

I have put together the off-the-shelf components to do this, using a lookup transformation to see if the item is already in the new table. Problem is, because there's so much repetition in the old table I need to process the old table one row at a time. Batch processing is generating errors because the lookup doesn't detect duplicates within the buffer.

I tried setting the "DefaultBufferMaxRows" property of the task to 1, but that doesn't seem to have any effect.

To get the data from the old table, I'm using an OLE DB source. To get the data into the new table, I'm using the OLE DB Command transformation with parameters to execute an INSERT statement.

This is a job I have to do exactly once, so I don't care if I have to run it overnight. I'd rather have a simple, easy to understand but inefficient script so I understand what it's doing completely.

Any help on forcing SSIS to process one row at a time?

Lee Silverman
JackRabbit Sports

Could you use an Aggregate transform to eliminate the duplicates, or a GROUP BY in the source select?|||

Could you use an Aggregate transform to eliminate the duplicates, or a GROUP BY in the source select?

Unfortunately no. Items wwith the same unique identifier have often had important changes made to the descriptive fields over the course of the last two or three years. When the Lookup finds a unique identifier that is already in the table, the branch that handles that then goes on to check to see if the descriptive info has changed, and if so updates the data in the table. Likewise, on the side of the lookup where the unique identifier was not found, that branch needs to see handle the case where the new item shares an item lookup number with a previous item, and if so marks the previous item as expired.

The data in the source table is ordered sequentially, and the logic of the script requires that each line item be processed one at a time. It's going to be slow, but once it's done it'll be done.

Lee Silverman
JackRabbit Sports

|||Could you use a For Each Container with the ADO enumerator to loop through your recordset of tableA and then do your manipulations/lookups within that?|||You could use an OLE DB Command to do your inserts and updates, and disable caching on the lookups. That will force the lookups to requery the database for each row. I'm not sure that will work in your case, but it might be worth a try.|||

How does one disable caching on lookups? Do you have to check "enable memory restriction" but then not check "enable caching"?

In any event, I solved the problem by writing a script in T-SQL, and I took the opportunity to teach myself how to use a cursor to scroll through the original table one line at a time. It's running now.

Lee

|||

Lee Silverman wrote:

How does one disable caching on lookups? Do you have to check "enable memory restriction" but then not check "enable caching"?

That is it.

No comments:

Post a Comment