open.mp forum
[Question] Parallel for missing iterations? - Printable Version

+ open.mp forum (https://forum.open.mp)
-- Forum: open.mp (https://forum.open.mp/forumdisplay.php?fid=40)
--- Forum: Questions and Suggestions (https://forum.open.mp/forumdisplay.php?fid=42)
--- Thread: [Question] Parallel for missing iterations? (/showthread.php?tid=2553)



Parallel for missing iterations? - visibleonbush1 - 2024-01-18

Hi, I'm working on optimizing a piece of code, the code performs well for smaller values of 'n' (e.g., 10,000), but when 'n' is significantly larger (e.g., 100,000), it seems to skip a considerable number of iterations. Specifically, I've tested it with 'n' set to 100,000, and it only executes 1,410,065,408 iterations out of the expected 10,000,000,000., im asking here because if n is not so big (10k) he is perfectly fine, am i missing anything (thank you in advance).
Code:
int compute_forces( void )
{
    int n_intersections = 0;
    int nThread = omp_get_num_threads();
    int *nit = calloc(8, sizeof(int));
    long long asdf = 0;
    // Parallelized the outer function, because the internal would have a lot more overhead
    #pragma omp parallel for collapse(2) default(none) shared(ncircles, circles, EPSILON, K, nit) schedule(static, ncircles) reduction(+:n_intersections) reduction(+:asdf)
    for (int i=0; i<ncircles; i++) {
        for (int j=0; j<ncircles; j++) {
            //nit[omp_get_thread_num()]++;
            asdf++;
            if (j <= i) {
                continue;
            }
           
            const float deltax = circles[j].x - circles[i].x;
            const float deltay = circles[j].y - circles[i].y;
            const float dist = hypotf(deltax, deltay);
            const float Rsum = circles[i].r + circles[j].r;

            if (dist < Rsum - EPSILON) {
                n_intersections++;
                const float overlap = Rsum - dist;
                assert(overlap > 0.0);

                const float overlap_tmp = overlap / (dist + EPSILON);
               
                const float overlap_x = overlap_tmp * deltax;
                const float overlap_y = overlap_tmp * deltay;

                #pragma omp atomic
                circles[i].dx -= overlap_x;
               
                #pragma omp atomic
                circles[i].dy -= overlap_y;

                #pragma omp atomic
                circles[j].dx += overlap_x;

                #pragma omp atomic
                circles[j].dy += overlap_y;
            }
        }
    }



RE: Parallel for missing iterations? - Threshold - 2024-01-19

Would this line be causing any grief?
Code:
if (j <= i) {
    continue;
}



RE: Parallel for missing iterations? - visibleonbush1 - 2024-01-20

(2024-01-19, 03:52 AM)Threshold Wrote: Would this line be causing any grief?
Code:
if (j <= i) {
    continue;
}

No, because i put the the counter before the if clause (its the variable "asdf", yeah i know, not a good name).


RE: Parallel for missing iterations? - johndavidd8888 - 2024-05-02

(2024-01-18, 07:00 PM)visibleonbush1 Wrote: Hi, I'm working on optimizing a piece of code, the code performs well for smaller values of 'n' (e.g., 10,000), but when 'n' is significantly larger (e.g., 100,000), it seems to skip a considerable number of iterations. Specifically, I've tested it with 'n' set to 100,000, and it only executes 1,410,065,408 iterations out of the expected 10,000,000,000., im asking here because if n is not so big (10k) he is perfectly fine, am i missing anything (thank you in advance). fnaf
Code:
int compute_forces( void )
{
    int n_intersections = 0;
    int nThread = omp_get_num_threads();
    int *nit = calloc(8, sizeof(int));
    long long asdf = 0;
    // Parallelized the outer function, because the internal would have a lot more overhead
    #pragma omp parallel for collapse(2) default(none) shared(ncircles, circles, EPSILON, K, nit) schedule(static, ncircles) reduction(+:n_intersections) reduction(+:asdf)
    for (int i=0; i<ncircles; i++) {
        for (int j=0; j<ncircles; j++) {
            //nit[omp_get_thread_num()]++;
            asdf++;
            if (j <= i) {
                continue;
            }
           
            const float deltax = circles[j].x - circles[i].x;
            const float deltay = circles[j].y - circles[i].y;
            const float dist = hypotf(deltax, deltay);
            const float Rsum = circles[i].r + circles[j].r;

            if (dist < Rsum - EPSILON) {
                n_intersections++;
                const float overlap = Rsum - dist;
                assert(overlap > 0.0);

                const float overlap_tmp = overlap / (dist + EPSILON);
               
                const float overlap_x = overlap_tmp * deltax;
                const float overlap_y = overlap_tmp * deltay;

                #pragma omp atomic
                circles[i].dx -= overlap_x;
               
                #pragma omp atomic
                circles[i].dy -= overlap_y;

                #pragma omp atomic
                circles[j].dx += overlap_x;

                #pragma omp atomic
                circles[j].dy += overlap_y;
            }
        }
    }
Evaluating the performance impact of atomic directives. If they cause bottlenecks, consider alternative methods such as using locks or key shares to protect shared data while minimizing operational costs.