Quantcast
Channel: SCN : Unanswered Discussions - Data Services and Data Quality
Viewing all articles
Browse latest Browse all 3719

Tracking down a bottle neck

$
0
0

I'm fairly new to Data Services and I'm in a position where I'm seeing a bottleneck in a job we are running.  There are several people who are trying to troubleshoot, but so far nothing obvious has been found.

 

The basic situation is that files come in from various sources and in one part of every job, elements of that file are compared against a master table to see if a master record already exists.  There's no real key between all the different sources, so there's a lot of very fuzzy matching going on in the joins.

 

The master table currently has about 40 million records.  An example new file I'm looking at has about 5000 records.  The join is has a condition where either a constructed key has to match OR a dozen fields from the file have to match up with fields in the master table.

 

The job runs great, right up until the point where it starts trying to do this master table update, when everything slows down.  Looking at the monitor, it almost looks like the table is joining every row in the input to every row in the master, as there are several line items in the monitor that show that the query with the join I mentioned above has a row count over a billion records.

 

Does anyone have any suggestions either as to what might be going wrong here, or on an approach to trying to troubleshoot?


Viewing all articles
Browse latest Browse all 3719

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>