2024-01-18, 07:00 PM
(This post was last modified: 2024-01-18, 07:11 PM by visibleonbush1.)
Hi, I'm working on optimizing a piece of code, the code performs well for smaller values of 'n' (e.g., 10,000), but when 'n' is significantly larger (e.g., 100,000), it seems to skip a considerable number of iterations. Specifically, I've tested it with 'n' set to 100,000, and it only executes 1,410,065,408 iterations out of the expected 10,000,000,000., im asking here because if n is not so big (10k) he is perfectly fine, am i missing anything (thank you in advance).
Code:
int compute_forces( void )
{
int n_intersections = 0;
int nThread = omp_get_num_threads();
int *nit = calloc(8, sizeof(int));
long long asdf = 0;
// Parallelized the outer function, because the internal would have a lot more overhead
#pragma omp parallel for collapse(2) default(none) shared(ncircles, circles, EPSILON, K, nit) schedule(static, ncircles) reduction(+:n_intersections) reduction(+:asdf)
for (int i=0; i<ncircles; i++) {
for (int j=0; j<ncircles; j++) {
//nit[omp_get_thread_num()]++;
asdf++;
if (j <= i) {
continue;
}
const float deltax = circles[j].x - circles[i].x;
const float deltay = circles[j].y - circles[i].y;
const float dist = hypotf(deltax, deltay);
const float Rsum = circles[i].r + circles[j].r;
if (dist < Rsum - EPSILON) {
n_intersections++;
const float overlap = Rsum - dist;
assert(overlap > 0.0);
const float overlap_tmp = overlap / (dist + EPSILON);
const float overlap_x = overlap_tmp * deltax;
const float overlap_y = overlap_tmp * deltay;
#pragma omp atomic
circles[i].dx -= overlap_x;
#pragma omp atomic
circles[i].dy -= overlap_y;
#pragma omp atomic
circles[j].dx += overlap_x;
#pragma omp atomic
circles[j].dy += overlap_y;
}
}
}