Hi,
I'm trying to output an XML file using Data Services, but am unfortunately getting a performance issue with the job consuming all available memory on the system and eventually crashing out when it runs out of memory.
I've stripped out all but the essential parts of the dataflow and it does appear to be the generation of the nested schema that is consuming all of the resources and have tried writing the job with both a Query Transform to generate the nested schema and using the XML_Map Transformat in normal mode (show above) both giving the same result.
I've setup iteration rules for the relevant levels:
- XML_Map at the top level has an iteration rule of the row_gen (which is set to 1 row).
- Product element iterates over the HEADER_DATA schema.
- ProductCrossReference element iterates over the Transform_BOMs schema.
- Value element iterates over the ATTRIBUTE_DATA schema.
For the ProductCrossReference and Value elements they also have where clauses to link the output schema from the Product Id back to schema to match things up correctly.
I can run this for small numbers of products without an issue and the XML produced is perfect, however when running on larger volumes ~150k products and a few hundred thousand BOMs then Data Services consumes all available memory on the system and then crashes. What I do note when watching the smaller volumes complete is that DS appears to build the entire result in memory and then after that is complete then writes the XML file to disk rather than working in chunks.
I had wondered if adding an XML_Map in batch mode would help and looked at the DS wiki at the example of how to use that it only appears to accept flat structures as an input as when I try and use it with a nested structure as input I get the following error:
VAL-4001010 15/12/2015 17:15:44 The target schema <XML_Map.ReplacementRules> cannot contain an iteration rule. The target schema node is not repeatable.
I have no control over iteration rules or repeatable elements in a batch mode version of XML_Map.
Any suggestion on how I can process this XML in batches or reduce the memory usage so that it completes correctly?
We are using Data Services 4.2 SP2.
Many thanks for any support