Advice on Web Application Data Correlation.
January 19, 2006 4:32 AM   Subscribe

What is the optimum way in an ASP.NET Web Application with SQL Server 2000 to perform the following data correlation?

Apologies for the open ended nature of the question and the fact that it is all over the show. I am about to embark on building a prototype and just wanted to run it by you all. I am going to have a consultation with a .NET SQL Server expert next week but would very much value any input in advance.

Below I have outlined the parameters of a multiple criteria decision which must be made by a web application :

There are two sets of data

Set A :

Static data, containing values relating to criteria against which data in Set B should be measured
Table A1 contains over 1000 records.
For each record in Table A1 there a related 100 numeric values in Table A2.
These 100 values in Table A2 fall into 6 Categories (1-6).
The number of records per category in Table A2 varies from 6 to 50.


Set B :

Calculated data, containing values for cases which must be measured against criteria described in Set A.
Table B1 Contains one or more records.
For each record in Table B1 there are a related 200 numeric values in Table B2.
These 100 values in Table B2 fall into 6 Categories (1-6).
The number of records per category in Table AB varies from 6 to 50.


For each record in Table B1 and the associated categorized records in Table B2 there are 1000 directly related sets of values in Set A.

Now for the comparision. Take a record in Table B1 and perform a correlation between related values in Table B2 and those in Table A2.
This is done on a per category basis.
There are 6 in all (one per category) and the numbers of values in the correlation will vary between 6 (the smallest) and 50 (the largest).
The correlation used is the Pearson product-moment correlation, outlined here.
This gives rise to 6 values (correlation values 1-6) which range from -1 to +1.

Next z values for correlation values 1-6 must be obtained by applying Fisher's transformation to each.

Finally calculate the arithmetic mean of these 6 z values and then transform that mean z back to the correlational metric by taking its antilog. This will result in a value between +1 and -1 for each record in Table B1 against each record in Table A1.


Now to my 'question'? What is the optimum way to achieve the above in the following situation?

100 people are using the web application at the same time. Each user equating to a record in Table B1. They each request the result of the correlation between their data (Table B2) and all the data in Set A.

This would involves 100 (people) * 6 correlations (containing 100 values in total) * 1000 records in Set A.

The result thankfully does not need to be delivered immediately and I am going to perform the processing in a Windows Service on the webserver, processing the requests one at a time. There is a whole load of other stuff being

I am looking for reactions, considerations, advice, pointers, warnings relating to the task in hand. I want to keep the load on the webserver and the sql server to a minimum and basically implement it in the best way possibly.

Would you recommend upgrading to SQl Server 2005?
How should I access the data in Set A, Set B?
What are the questions I have not asked and what are the answers you would suggest?

Thanks in advance.
posted by kenaman to Computers & Internet (2 answers total)
 
That's a lot to digest before coffee.

Is the data static or dynamic? If static, precompute everything into one denormalized table, which gets indexed appropropiately to optimize each user finding their record.

Then you can go a step further and render the resulting web page to a static html page for each user. That would run mighty fast.
posted by SNACKeR at 5:01 AM on January 19, 2006


Response by poster: Is the data static or dynamic?
The data is Set A (the 1000 records) is static. The data in Set B cant be generated without input from the user. When they have completed their input, the request is made.

There is a whole lot of other stuff going at the same time including the generation of the data in Set B (1 record for Table B1 and the associated 200 in Table B2), some heavy-duty reporting (not in HTML) etc. All of this is done and working well since it was seperated into the Windows Service. However given the amount of number crunching involved in the additional development outlined above, and the fact that this is a Web Application I am conscious that the upper user limit might be reduced and I need it to be as high as possible.
posted by kenaman at 5:17 AM on January 19, 2006


« Older best form of working out?   |   InternetIdiotFilter: How do I get email addresses... Newer »
This thread is closed to new comments.