I know I must be doing something really dumb, but I have not been able to figure out what. I'm basically just doing the simple reduction example in the TBB book, but using the IPP sum routine (sum of all pixels in a Ipp16u plane). Sounds simple enough, but it looks like join() is not being called enough times. Do I ever need to explicitly call join() or does the system always call it?
class Sum { public: // Methods Sum(Img *img) : m_img(img), m_sum(0) {} void operator () (const tbb::blocked_range &range) { IppiSize sz; Ipp16u *pSrc = (Ipp16u*)m_img->getPixel(0, range.begin(), 0); I32 step = m_img->getStep(0); sz.width = m_img->getWidth(0); sz.height = range.size(); if (ippStsNoErr != ippiSum_16u_C1R(pSrc, step, sz, &m_sum)) throw std::runtime_error("ippiSum_16u_C1R failed!n"); printf("Sum for %d %d = %0.1fn", range.begin(), range.end(), m_sum); } Sum(Sum &x, tbb::split) : m_img(x.m_img), m_sum(0) {} void join(const Sum &y) { printf("%0.1f = %0.1f + %0.1fn", m_sum + y.m_sum, m_sum, y.m_sum); m_sum += y.m_sum; } F64 getSum() { return m_sum; } private: // Attributes Img *m_img; F64 m_sum; };
If I call operator() directly it works fine. As soon as I put it inside a parallel_reduce() I get the wrong (smaller) answer. Looking at the diagnostic prints in my code it looks like all sub-regions are computed correctly, but not all of them end up in join() calls.
Peter