[Question] Parallel for missing iterations?

visibleonbush1 · (This post was last modified: 2024-01-18, 07:11 PM by visibleonbush1.)

Hi, I'm working on optimizing a piece of code, the code performs well for smaller values of 'n' (e.g., 10,000), but when 'n' is significantly larger (e.g., 100,000), it seems to skip a considerable number of iterations. Specifically, I've tested it with 'n' set to 100,000, and it only executes 1,410,065,408 iterations out of the expected 10,000,000,000., im asking here because if n is not so big (10k) he is perfectly fine, am i missing anything (thank you in advance).

Code:
int compute_forces( void )

{

    int n_intersections = 0;

    int nThread = omp_get_num_threads();

    int *nit = calloc(8, sizeof(int));

    long long asdf = 0;

    // Parallelized the outer function, because the internal would have a lot more overhead

    #pragma omp parallel for collapse(2) default(none) shared(ncircles, circles, EPSILON, K, nit) schedule(static, ncircles) reduction(+:n_intersections) reduction(+:asdf)

    for (int i=0; i<ncircles; i++) {

        for (int j=0; j<ncircles; j++) {

            //nit[omp_get_thread_num()]++;

            asdf++;

            if (j <= i) {

                continue;

            }

            const float deltax = circles[j].x - circles[i].x;

            const float deltay = circles[j].y - circles[i].y;

            const float dist = hypotf(deltax, deltay);

            const float Rsum = circles[i].r + circles[j].r;

            if (dist < Rsum - EPSILON) {

                n_intersections++;

                const float overlap = Rsum - dist;

                assert(overlap > 0.0);

                const float overlap_tmp = overlap / (dist + EPSILON);

                const float overlap_x = overlap_tmp * deltax;

                const float overlap_y = overlap_tmp * deltay;

                #pragma omp atomic

                circles[i].dx -= overlap_x;

                #pragma omp atomic

                circles[i].dy -= overlap_y;

                #pragma omp atomic

                circles[j].dx += overlap_x;

                #pragma omp atomic

                circles[j].dy += overlap_y;

            }

        }

    }

Threshold · 2024-01-19, 03:52 AM

Would this line be causing any grief?

Code:
if (j <= i) {

    continue;

}

visibleonbush1 · (This post was last modified: 2024-01-20, 01:13 AM by visibleonbush1.)

(2024-01-19, 03:52 AM)Threshold Wrote: Would this line be causing any grief?

Code:
if (j <= i) { continue; }

No, because i put the the counter before the if clause (its the variable "asdf", yeah i know, not a good name).

johndavidd8888 · (This post was last modified: 2024-05-02, 08:26 AM by johndavidd8888.)

(2024-01-18, 07:00 PM)visibleonbush1 Wrote: Hi, I'm working on optimizing a piece of code, the code performs well for smaller values of 'n' (e.g., 10,000), but when 'n' is significantly larger (e.g., 100,000), it seems to skip a considerable number of iterations. Specifically, I've tested it with 'n' set to 100,000, and it only executes 1,410,065,408 iterations out of the expected 10,000,000,000., im asking here because if n is not so big (10k) he is perfectly fine, am i missing anything (thank you in advance). fnaf

Code:
int compute_forces( void ) { int n_intersections = 0; int nThread = omp_get_num_threads(); int *nit = calloc(8, sizeof(int)); long long asdf = 0; // Parallelized the outer function, because the internal would have a lot more overhead #pragma omp parallel for collapse(2) default(none) shared(ncircles, circles, EPSILON, K, nit) schedule(static, ncircles) reduction(+:n_intersections) reduction(+:asdf) for (int i=0; i<ncircles; i++) { for (int j=0; j<ncircles; j++) { //nit[omp_get_thread_num()]++; asdf++; if (j <= i) { continue; } const float deltax = circles[j].x - circles[i].x; const float deltay = circles[j].y - circles[i].y; const float dist = hypotf(deltax, deltay); const float Rsum = circles[i].r + circles[j].r; if (dist < Rsum - EPSILON) { n_intersections++; const float overlap = Rsum - dist; assert(overlap > 0.0); const float overlap_tmp = overlap / (dist + EPSILON); const float overlap_x = overlap_tmp * deltax; const float overlap_y = overlap_tmp * deltay; #pragma omp atomic circles[i].dx -= overlap_x; #pragma omp atomic circles[i].dy -= overlap_y; #pragma omp atomic circles[j].dx += overlap_x; #pragma omp atomic circles[j].dy += overlap_y; } } }

Evaluating the performance impact of atomic directives. If they cause bottlenecks, consider alternative methods such as using locks or key shares to protect shared data while minimizing operational costs.

Login




Remember me Lost Password?

About Us