# Need help on structuring programming problem...

PeterR

Mon, 10/06/2014 - 10:04 am

I am trying to implement a program that for testing purposes should generate synthetic data.

Generally, I have a certain value X, which I want to emulate with a system that has only a predefined set of possible values (these be A,B,C,D).

So A,B,C or D can be chosen to emulate X or come as close as possible to X in the restrictions of the system.

For creating synthetic data, I now want to program a small macro that "decides" on the outcome of a simulated experiment using the probabilities extracted from this data.

For example, I have the following data:

Value: counts/1000:

System 1:

X 12.03

System: 2

A 11.91

B 11.91

C 10.54

D 15.23

I now would have to formulate this input into probabilities, like this:

- A is very close to X, so it should get the highest probability to be chosen to stand for X in the new system

- B is exactly as close to X as A, so A and B should have the same probability to be chosen as best suited replacement for X

- C and D deviate (to different) extent from X, so this should be considered when calculating their probabilities to be chosen with respect to A and B

- probabilities of A,B,C,D should sum to 1

How would you implement this? I'm not asking for the exact code, just the general idea how to solve this problem.

Really would appreciate your input here, thanks a lot in advance for any help!

Regards,

Peter

* Take fc_i = abs(T - S_i)/T as the metric of "closeness", where T is the target and S_i is the signal

* Eliminate duplicate values of fc_i

* Generate the equation sum ac_i * fc_i = 1, which in matrix notation becomes Ac * Fc = I

* Solve for the coefficient terms in Ac

For example, the first Fc term in your set is fc_A = (12.03 - 11.91)/12.03

--

J. J. Weimer

Chemistry / Chemical & Materials Engineering, UAHuntsville

October 6, 2014 at 10:16 am - Permalink

Thanks al lot for the quick reply :)

Sounds promising, will try to implement this.

But would elimination of duplicates not result in neglecting A (or B) because they have the same closeness?

Is there a Command in IGOR to solve for coefficients? (I am not very fond of matrix algebra ^^)

How would the Matrix look like in the case of the example?

Another big question mark for me is at the moment: Considering I have the probabilities calculated correctly, how do I decide on which af the replacements is chosen depending on their probability?

I know there has to be randomization process in the code somewhere, but I cannot figure out at the moment how exactly it should be done.

Regards,

Peter

October 6, 2014 at 10:31 am - Permalink

if (rn < 0.4)

(choose A)

elseif (rn < 0.8)

(choose B)

elseif (rn < 0.95)

(choose C)

else

(choose D)

endif

John Weeks

WaveMetrics, Inc.

support@wavemetrics.com

October 6, 2014 at 04:07 pm - Permalink

Using your proposed structure, I have meanwhile managed to calculate the "closeness" as proposed by jjweimer.

For X=12.03 they are (same example as above):

A = 73.39 %

B = 87.61 %

C = 99.00 %

D = 99.00 %

The appropriate probabilities I calculated so far are:

A = 0.204445

B = 0.244038

C = 0.275758

D = 0.275758

The question I have now is: how to deal with the Values that have the exact same probability..? And the probabilities sum up to 1, but they are so close to each other that I have the feeling that choosing a random number between 0 and 1 would not make the correct decision...

Any ideas?

October 7, 2014 at 03:07 am - Permalink

Notice that in John's example the if...elseif... chain contains cumulative probabilities. Hence a random number between 0 and 1 can be used to provide a sensible bias in the decision.

Methinks something is not right here - you wanted A and B to have the highest probabilities.

I would suggest something like the following:

1. Calculate the absolute difference of each value from the target X:

D_A = abs (A - X) , and similarly for B, C & D

2. Sum these:

Sum = D_A + D_B + D_C + D_D

3. Calculate a probability based on these differences:

P_A = (1 - D_A / Sum) / (N - 1) , and similarly for B, C & D

where N = number of values (4 in this case)

4. Construct a decision function along the lines of:

(choose A)

elseif (rn < P_A + P_B)

(choose B)

elseif (rn < P_A + P_B + P_C)

(choose C)

else

(choose D)

endif

For the values you provided, the probabilities are (approximately):

P_A = 0.325

P_B = 0.325

P_C = 0.233

P_D = 0.117

HTH,

Kurt

October 7, 2014 at 03:43 am - Permalink

But I still wonder whether A and B are really treated equal in this decision making if-construct...?

October 7, 2014 at 04:52 am - Permalink

Hi Peter,

Perhaps thinking of it like this will help:

The random number rn lies between 0 and 1, and have a uniform distribution. This means that, for example, the probability of 0.2 <= rn < 0.3 has the same as the probability of 0.5 <= rn < 0.6, which has the same probability of 0.9 <= rn < 1.0, and so on.

The if...elseif... construct is basically saying

if 0.0 <= rn < 0.325 then do 'A'

if 0.325 <= rn < 0.650 then do 'B'

( and so on for C and D).

In other words, the 'range' of values of rn that will give rise to 'A' is the same as 'range' of values that will give rise to 'B'. Given the uniform probability of rn to have any value (within the 0 to 1 range), the outcomes 'A' and 'B' must have the same probability.

HTH,

Kurt

October 7, 2014 at 05:09 am - Permalink

I meanwhile calculated the correct probabilities, put them in a wave and sorted them with

`Sort`

.String Substitution

Variable j

for(j=0;j<(numpnts(Pool));j+=1)

if (random < Probabilities_Sorted [j])

Substitution = Pool_Sorted [j]

break

elseif(random < (Probabilities_Sorted [j] + Probabilities_Sorted [j+1]))

Substitution = Pool_Sorted [j+1]

break

elseif(random < (Probabilities_Sorted [j] + Probabilities_Sorted [j+1] + Probabilities_Sorted [j+2]))

Substitution = Pool_Sorted [j+2]

break

else

Substitution = Pool_Sorted [j+3]

endif

endfor

The "pool" of possible values is made up of 4 values in this case.

In order to redesign the construct to achieve applicability for any size of pool, I think a Do-Loop must be applied...

Thanks for all your help, was a pleasure!

October 7, 2014 at 09:23 am - Permalink

for(j=0;j<(numpnts(Pool));j+=1,k+=1)

if (random < sum(Probabilities_Sorted,0,j)

Substituted = Pool_Sorted [j]

break

endif

endfor

Thanks again for all the help!

Best regards,

Peter

October 7, 2014 at 10:25 am - Permalink

How can I circumvent this unwanted anomaly?

October 13, 2014 at 10:27 pm - Permalink

I may be missing something here, but I can't see the code for where you have calculated the probabilities?

I have re-checked the algorithm I presented previously and changing the target to X=11.91 (i.e. the same as A and B) I get the following probabilities:

P_A = 0.3333

P_B = 0.3333

P_C = 0.2360

P_D = 0.0974

The question of whether this method for calculating the probabilities is appropriate for your needs is one I cannot answer.

HTH,

Kurt

October 13, 2014 at 11:53 pm - Permalink