Procedure Time out: Processing dataset as a whole, and not per row

Which component should be used to process dataset as a whole, and not on per row basis? I have need to process dataset conditionally (condition based on dataset), e.g. if a special row is present in dataset than dataset should be processed in a special way. Should I maybe use one Script Transformation to determine if dataset satisfies condition (and store that result into a variable), and then based on that condition (variable value) perform or not processing (using Conditional Split and Script Transformation)?

You could try to identify if the 'special row' exists through a sql statement in an execute sql task, then set a variable to '1' or '2' and then have precendent constraints to two different data flow tasks evaluated as expressions based on the value of the variable from the execute sql task. Hope that makes sense, I realize it isn't all that clear.

|||Yes, it makes sense, thanks. With that said, I have a new problem - have to find out how dataset/context is passed between different data flow tasks of a single control flow (using DataReader Source and Destination maybe?). (Did I mention I was a beginner...well I am)

Another problem I have to solve is to implement specific transformation. Within a dataset there are two distinct groups of rows - distinction is based on value of a single column. Transformation should for each row of one group, lookup matching row in other group. If matching row is found than based on its column values, column values of a row from first group should be updated. Otherwise, an additional row should be appended to the dataset with (almost all) column values just copied from first group row. How does one do something like that - looping with inner looping, and appending rows to a dataset?

|||

You can use RAW files to persist the data between data flow tasks. They are extremely fast. The issue with using the recordset is that you have to use a Script Source to process it in the second data flow (the DataReader source uses an ADO.NET connection to a database to retrieve data.)

On your second problem, you could look at using a conditional split to split the groups, then a Merge Join to join them back together based on the matching columns. For adding the additional rows, you'd need to use a script component. There are probably other ways to do this, that's just the first to come to mind.

|||I believe exporting data to files will present a security issue. Maybe I could do everything just on Data Flow - just use single script component to determine whether processing is needed and store it to variable, and then use Split to direct whole dataset either to be processed branch or to the other branch where no processing will be done. Question is how fast would this solution perform.

For second problem, I still don't see how splitting dataset would solve my problem - I don't know how could I reference second part of the dataset when going through first one. For each row in first dataset a matchig row has to be found in second dataset. If found, row in first dataset gets updated (using values from second group). If not found, new row in first group has to be appended. Maybe Lookup component could be used for this - default output when matching column is found, otherwise use error output.

In this short time I couldn't find the way to append a row using Script Transformation component. Could you please give me a reference to a book, web site, or something similar where this information can be found?
|||

Stevo Slavic wrote:
I believe exporting data to files will present a security issue.

Perhaps, but the files aren't in plain text. You can also use staging tables in a database instead of using files.|||

Stevo Slavic wrote:
In this short time I couldn't find the way to append a row using Script Transformation component. Could you please give me a reference to a book, web site, or something similar where this information can be found?

You need to use an async output on the script. See http://agilebi.com/cs/blogs/jwelch/archive/2007/09/14/dynamically-pivoting-rows-to-columns.aspx for an example of an async component. It's also documented in Books Online: http://msdn2.microsoft.com/zh-cn/library/ms136133.aspx

Monday, March 12, 2012

Processing dataset as a whole, and not per row

No comments:

Post a Comment

Procedure Time out

Blog Archive

About Me