• 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Question] Parallel for missing iterations?
#1
Rainbow 
Hi, I'm working on optimizing a piece of code, the code performs well for smaller values of 'n' (e.g., 10,000), but when 'n' is significantly larger (e.g., 100,000), it seems to skip a considerable number of iterations. Specifically, I've tested it with 'n' set to 100,000, and it only executes 1,410,065,408 iterations out of the expected 10,000,000,000., im asking here because if n is not so big (10k) he is perfectly fine, am i missing anything (thank you in advance).
Code:
int compute_forces( void )
{
    int n_intersections = 0;
    int nThread = omp_get_num_threads();
    int *nit = calloc(8, sizeof(int));
    long long asdf = 0;
    // Parallelized the outer function, because the internal would have a lot more overhead
    #pragma omp parallel for collapse(2) default(none) shared(ncircles, circles, EPSILON, K, nit) schedule(static, ncircles) reduction(+:n_intersections) reduction(+:asdf)
    for (int i=0; i<ncircles; i++) {
        for (int j=0; j<ncircles; j++) {
            //nit[omp_get_thread_num()]++;
            asdf++;
            if (j <= i) {
                continue;
            }
           
            const float deltax = circles[j].x - circles[i].x;
            const float deltay = circles[j].y - circles[i].y;
            const float dist = hypotf(deltax, deltay);
            const float Rsum = circles[i].r + circles[j].r;

            if (dist < Rsum - EPSILON) {
                n_intersections++;
                const float overlap = Rsum - dist;
                assert(overlap > 0.0);

                const float overlap_tmp = overlap / (dist + EPSILON);
               
                const float overlap_x = overlap_tmp * deltax;
                const float overlap_y = overlap_tmp * deltay;

                #pragma omp atomic
                circles[i].dx -= overlap_x;
               
                #pragma omp atomic
                circles[i].dy -= overlap_y;

                #pragma omp atomic
                circles[j].dx += overlap_x;

                #pragma omp atomic
                circles[j].dy += overlap_y;
            }
        }
    }
  Reply
#2
Would this line be causing any grief?
Code:
if (j <= i) {
    continue;
}
  Reply
#3
(2024-01-19, 03:52 AM)Threshold Wrote: Would this line be causing any grief?
Code:
if (j <= i) {
    continue;
}

No, because i put the the counter before the if clause (its the variable "asdf", yeah i know, not a good name).
  Reply
#4
(2024-01-18, 07:00 PM)visibleonbush1 Wrote: Hi, I'm working on optimizing a piece of code, the code performs well for smaller values of 'n' (e.g., 10,000), but when 'n' is significantly larger (e.g., 100,000), it seems to skip a considerable number of iterations. Specifically, I've tested it with 'n' set to 100,000, and it only executes 1,410,065,408 iterations out of the expected 10,000,000,000., im asking here because if n is not so big (10k) he is perfectly fine, am i missing anything (thank you in advance). fnaf
Code:
int compute_forces( void )
{
    int n_intersections = 0;
    int nThread = omp_get_num_threads();
    int *nit = calloc(8, sizeof(int));
    long long asdf = 0;
    // Parallelized the outer function, because the internal would have a lot more overhead
    #pragma omp parallel for collapse(2) default(none) shared(ncircles, circles, EPSILON, K, nit) schedule(static, ncircles) reduction(+:n_intersections) reduction(+:asdf)
    for (int i=0; i<ncircles; i++) {
        for (int j=0; j<ncircles; j++) {
            //nit[omp_get_thread_num()]++;
            asdf++;
            if (j <= i) {
                continue;
            }
           
            const float deltax = circles[j].x - circles[i].x;
            const float deltay = circles[j].y - circles[i].y;
            const float dist = hypotf(deltax, deltay);
            const float Rsum = circles[i].r + circles[j].r;

            if (dist < Rsum - EPSILON) {
                n_intersections++;
                const float overlap = Rsum - dist;
                assert(overlap > 0.0);

                const float overlap_tmp = overlap / (dist + EPSILON);
               
                const float overlap_x = overlap_tmp * deltax;
                const float overlap_y = overlap_tmp * deltay;

                #pragma omp atomic
                circles[i].dx -= overlap_x;
               
                #pragma omp atomic
                circles[i].dy -= overlap_y;

                #pragma omp atomic
                circles[j].dx += overlap_x;

                #pragma omp atomic
                circles[j].dy += overlap_y;
            }
        }
    }
Evaluating the performance impact of atomic directives. If they cause bottlenecks, consider alternative methods such as using locks or key shares to protect shared data while minimizing operational costs.
  Reply


Forum Jump: