Saad Mushtaq

Greenhorn

Posts: 21

posted 1 year ago

I have a program which reads in a pair of integers from a file and stores that in a Point class which i have created. The first integer is the x coordinate and the second is the y coordinate on each line of the file.

The input file contains data like this:

I then have to plot a regression line on top of those points.

Th formula used for regression line is this:

Below is the code snippet that i think where i need help:

This is how i print the charArray in case it helps solving my problem.

Can anyone help me with the formatting please. Any hints or pointers that would set me in the right direction. I can provide more explanation if needed

**All valid points have x-coordinates in the range [0, 40] and y-coordinates in the range [1, 20].**The input file contains data like this:

*I performed some validation checks so that my program ignores invalid data or out of range data*.I then have to plot a regression line on top of those points.

**I have to use 2D arrays of chars and would prefer to do it this way insetad of using Point2D class of java or some other graphical classes of java.***"X"s is used to represent the points, "-"s the regression line segments, and ""s where a line segment and a point are located at the same spot**Th formula used for regression line is this:

Below is the code snippet that i think where i need help:

This is how i print the charArray in case it helps solving my problem.

This is my program's output:This is my program's output:

**My expected output is this:**Can anyone help me with the formatting please. Any hints or pointers that would set me in the right direction. I can provide more explanation if needed

Saad Mushtaq

Greenhorn

Posts: 21

Campbell Ritchie

Marshal

Posts: 56541

172

posted 1 year ago

Why are you using integer arithmetic in lines 11-12? Why have you got separate variables

I looked here and they had a different format for regression. It appears the Σ operator has a precedence higher than − and lower than ×.

Apart from the fact that there is no such thing as a 2D array in Java®; what you see are arrays of arrays, which are different ... why do you have to use such a strange technique?Saad Mushtaq wrote:. . .I have to use 2D arrays of chars and would prefer to do it this way insetad of using Point2D class of java or some other graphical classes of java.

You are calling a point points and the list of points point? There is nothing like clear naming and that is nothing like clear naming. . .. . .

Why are you using integer arithmetic in lines 11-12? Why have you got separate variables

*n*and

`count`? Why are you going on about an array and then using the size() method? Have you got a List<Point> rather than an array? You mean you are only using the array for displaying your results? What does line 22 mean, apart from confusing enough to make the whole program incomprehensible all by itself?

I looked here and they had a different format for regression. It appears the Σ operator has a precedence higher than − and lower than ×.

Piet Souris

Master Rancher

Posts: 2044

75

posted 1 year ago

The formulas are correct. A very serious mistake is that the slope is defined as an integer, so if the slope is say 0.99, then this pogram uses a slope of 0. That is indeed the case for the two means as well, but the error is far less serious.

Further, trying to print the graph as a series of chars is indeed not simple. I think you have to sort the points to decreasing y's first, to get the height correct. Not to mention the spaces that have to be inserted. Far easier is to use a simple JPanel, preferrably using a dedicated coordinate system.

If you want to check the formulas yourself, with linear regression we are looking for constants a and b such that the vector y = (y1, y2, ... yn) in the series equations a * xi + b = yi (i = 1, ... n) is minimized in length.

We then get a function f(a, b) = sigma(a^2 * xi ^2 + 2ab * xi - 2a * xi * yi + b^2 - 2b * yi + yi ^2), which can be solved by setting the partial derivatives in a and b to 0. A bit tedious, but it rehearses things that one has long forgotten. Easier is to start from the equation Ax + b = y, and minimize this, using higher order differentials. But that is not for the faints of heart, although far less tedious to do.

Further, trying to print the graph as a series of chars is indeed not simple. I think you have to sort the points to decreasing y's first, to get the height correct. Not to mention the spaces that have to be inserted. Far easier is to use a simple JPanel, preferrably using a dedicated coordinate system.

If you want to check the formulas yourself, with linear regression we are looking for constants a and b such that the vector y = (y1, y2, ... yn) in the series equations a * xi + b = yi (i = 1, ... n) is minimized in length.

We then get a function f(a, b) = sigma(a^2 * xi ^2 + 2ab * xi - 2a * xi * yi + b^2 - 2b * yi + yi ^2), which can be solved by setting the partial derivatives in a and b to 0. A bit tedious, but it rehearses things that one has long forgotten. Easier is to start from the equation Ax + b = y, and minimize this, using higher order differentials. But that is not for the faints of heart, although far less tedious to do.

Piet Souris

Master Rancher

Posts: 2044

75

posted 1 year ago

Can't believe I wrote that...

I meant to write: if we approach our original equations by writing, in vextor notation, ax + b = y, then we try to minimize the error vector, or minimize (ax + b - y).. We then caclulate |ax + b - y| ^2 and that leads to the long f(a, b) formula. Pfff... anyone still around?

Piet Souris wrote:(...) such that the vector y = (y1, y2, ... yn) in the series equations a * xi + b = yi (i = 1, ... n) is minimized in length.

Can't believe I wrote that...

I meant to write: if we approach our original equations by writing, in vextor notation, ax + b = y, then we try to minimize the error vector, or minimize (ax + b - y).. We then caclulate |ax + b - y| ^2 and that leads to the long f(a, b) formula. Pfff... anyone still around?

Saad Mushtaq

Greenhorn

Posts: 21

posted 1 year ago

Apart from the fact that there is no such thing as a 2D array in Java®; what you see are arrays of arrays, which are different ... why do you have to use such a strange technique?

I have changed the int to float. Also there is a Point2D class in java. Here is the link in case you're interested. Java Doc

I have created a point class of my own which stores the x and y coordinate from the text file. Then i have used ArrayList to store those objects.

I am just using the array just to display the output and draw a graph. I know there are several other techniques that i could have used but i want to go with this one even though this is not elegant.

. . .

You are calling a point points and the list of points point? There is nothing like clear naming and that is nothing like clear naming

Why are you using integer arithmetic in lines 11-12? Why have you got separate variablesnandcount? Why are you going on about an array and then using the size() method? Have you got a List<Point> rather than an array? You mean you are only using the array for displaying your results? What does line 22 mean, apart from confusing enough to make the whole program incomprehensible all by itself?

I have changed the int to float. Also there is a Point2D class in java. Here is the link in case you're interested. Java Doc

I have created a point class of my own which stores the x and y coordinate from the text file. Then i have used ArrayList to store those objects.

I am just using the array just to display the output and draw a graph. I know there are several other techniques that i could have used but i want to go with this one even though this is not elegant.

Campbell Ritchie

Marshal

Posts: 56541

172

posted 1 year ago

I think it is a bad idea to useSaad Mushtaq wrote:. . . I have changed the int to float.

`float`s; if you have to use floating‑point arithmetic, use

`double`s.

I think you were right to create your own point class rather than using Point2D.Also there is a Point2D class in java.

Are you really telling us you are happy writing not elegant code. . . i want to go with this one even though this is not elegant.

Saad Mushtaq

Greenhorn

Posts: 21

posted 1 year ago

Well tbh my passion is clean coding but for this particular problem i have to go with the not so elegant code. Okay i was able to display some output. Here is my code where my program is supposed to do

My array is char[][] graph = new char[21][42] cause x-coordinates in the range [0, 40] and y-coordinates in the range [1, 20].

This is a pic of my output.

The problem here is that my dashes stop mid way though the graph and also my asterisk is off by one index. The correct output is given in the question. Do you know what i could be doing wrong? If you need more info i can provide you with that.

Are you really telling us you are happy writing not elegant code

Well tbh my passion is clean coding but for this particular problem i have to go with the not so elegant code. Okay i was able to display some output. Here is my code where my program is supposed to do

**"X"s representing the points, "-"s the regression line segments, and "*"s where a line segment and a point are located at the same spot.**

My array is char[][] graph = new char[21][42] cause x-coordinates in the range [0, 40] and y-coordinates in the range [1, 20].

This is a pic of my output.

The problem here is that my dashes stop mid way though the graph and also my asterisk is off by one index. The correct output is given in the question. Do you know what i could be doing wrong? If you need more info i can provide you with that.

posted 1 year ago

I think we're going to need the full code to help debug this.

A note about "elegant" code. Poor formatting is a bug. It may not cause a compile error but it can hide errors that could be more easily spotted. Also, if you want others to help debug your code, formatting really is a must. This style guide will help you with one kind of formatting.

A note about "elegant" code. Poor formatting is a bug. It may not cause a compile error but it can hide errors that could be more easily spotted. Also, if you want others to help debug your code, formatting really is a must. This style guide will help you with one kind of formatting.

All things are lawful, but not all things are profitable.

Saad Mushtaq

Greenhorn

Posts: 21

posted 1 year ago

Here is my full code.

Also here is the text that i am reading in.

20 10

0 1

40 20

13 17

13 12

10 ?

the x coordinates represent time

15 0

10 20

Knute Snortum wrote:I think we're going to need the full code to help debug this.

A note about "elegant" code. Poor formatting is a bug. It may not cause a compile error but it can hide errors that could be more easily spotted. Also, if you want others to help debug your code, formatting really is a must. This style guide will help you with one kind of formatting.

I think we're going to need the full code to help debug this.

Here is my full code.

Also here is the text that i am reading in.

20 10

0 1

40 20

13 17

13 12

10 ?

the x coordinates represent time

15 0

10 20

Saad Mushtaq

Greenhorn

Posts: 21

posted 1 year ago

I think i have a problem around 64 to 66, that's why my output is not similar to the correct output. Now it's been almost 2 days and i have been changing things. I even removed the Point class which i created and now i am storing the x and y points directly into array. So basically i have 3 arrays.

Saad Mushtaq

Greenhorn

Posts: 21

posted 1 year ago

Okay i was able to get this output:

I have narrowed down my problem to the for loop that is putting dashes and asterisks on top of the plot. Can anyone please help me now. I have come so close to the end of this problem. I just need a little bit of help.

Knute Snortum wrote:I think we're going to need the full code to help debug this.

A note about "elegant" code. Poor formatting is a bug. It may not cause a compile error but it can hide errors that could be more easily spotted. Also, if you want others to help debug your code, formatting really is a must. This style guide will help you with one kind of formatting.

Okay i was able to get this output:

I have narrowed down my problem to the for loop that is putting dashes and asterisks on top of the plot. Can anyone please help me now. I have come so close to the end of this problem. I just need a little bit of help.

Saad Mushtaq

Greenhorn

Posts: 21

Piet Souris

Master Rancher

Posts: 2044

75

posted 1 year ago

- 1

(Edit: had I seen Knutes reply I wouldn't have written this reply. Anyway...)

I had a look this evening, and the solution is as simple as it is impossible.

First of all, in this part of the code:

you are only calculating points for the length of the first dimension, which is 21 (see the declaration of your graph).

That's why you only see 21 dashes.

So, to see them all you would plot the regression points for all the x values. These x-values range from 0 to 41, so the above loop shoud go from 0 to 41.

Furthermore, you use the y-coordinates as your first dimension, which is very confusing.

Then, in your graph array, you use the x and y values of your observations (or from the calculated regression) as the coordinates. The value itself is then set to either a space, a dash or an 'X'.

This way of doing has some serious drawbacks. First of all, you set the dimensions of this graph as 'graph[21][41]' or about, since these two values are the maximum values of the observed x and y. But these are hard coded, so you have to inspect your observation data first to see what these maximums should be. Why not calculate these maximums on the fly, when you read in the observations? But even then you have no guarantee that all the calculated expected y-values of your regression will not cause an 'indx out of bound' exception.

The other big drawback (see also my xomment at DreanInCode) is that you cannot use any observation for whic the x- or y-value is negative. Currently, you skip these now, but in general that could mess up your regression.

Well, I tried to repair things a little, but I'm afraid that the overall structure is still as messy and clumsy as the original code, can't say it any more friendly I'm afraid.

Advices? I already advised t use reading about Swing, and for a dedicated coordiante system excel or R for this work, much easier than Java. Another possibility would be to use a JPanel with a dedicated coordinate system. That involves knowing a little linear algebra, but the maths involved are very much easier than the maths behind linear regression. The big advantage is that you can then draw the observations directly, without further ado.

Anyway, here is the code that I revised a little. Still a mess as said. I changed the graph so that the first coordinate represents the x-value, and the second the y-value.

That means that in the plotting routine, the outer loop steers the y-value, from maxY to 0, and the inner loop steeds the x values, from 0 to maxX. Both max's are determined when I read in the observations. I gave the y some additional space, so that the regression y values could go beyond the y-max without immediate index out of bound exception. Pfff...

The output I got was:

I had a look this evening, and the solution is as simple as it is impossible.

First of all, in this part of the code:

you are only calculating points for the length of the first dimension, which is 21 (see the declaration of your graph).

That's why you only see 21 dashes.

So, to see them all you would plot the regression points for all the x values. These x-values range from 0 to 41, so the above loop shoud go from 0 to 41.

Furthermore, you use the y-coordinates as your first dimension, which is very confusing.

Then, in your graph array, you use the x and y values of your observations (or from the calculated regression) as the coordinates. The value itself is then set to either a space, a dash or an 'X'.

This way of doing has some serious drawbacks. First of all, you set the dimensions of this graph as 'graph[21][41]' or about, since these two values are the maximum values of the observed x and y. But these are hard coded, so you have to inspect your observation data first to see what these maximums should be. Why not calculate these maximums on the fly, when you read in the observations? But even then you have no guarantee that all the calculated expected y-values of your regression will not cause an 'indx out of bound' exception.

The other big drawback (see also my xomment at DreanInCode) is that you cannot use any observation for whic the x- or y-value is negative. Currently, you skip these now, but in general that could mess up your regression.

Well, I tried to repair things a little, but I'm afraid that the overall structure is still as messy and clumsy as the original code, can't say it any more friendly I'm afraid.

Advices? I already advised t use reading about Swing, and for a dedicated coordiante system excel or R for this work, much easier than Java. Another possibility would be to use a JPanel with a dedicated coordinate system. That involves knowing a little linear algebra, but the maths involved are very much easier than the maths behind linear regression. The big advantage is that you can then draw the observations directly, without further ado.

Anyway, here is the code that I revised a little. Still a mess as said. I changed the graph so that the first coordinate represents the x-value, and the second the y-value.

That means that in the plotting routine, the outer loop steers the y-value, from maxY to 0, and the inner loop steeds the x values, from 0 to maxX. Both max's are determined when I read in the observations. I gave the y some additional space, so that the regression y values could go beyond the y-max without immediate index out of bound exception. Pfff...

The output I got was: