Overview
Features
Download
Documentation
Community
Add-Ons & Services

Thread performance

Please post support and help requests here.

Thread performance

Postby Gregoire » 03 Dec 2008, 18:43

Hi,

I am testing Poco::Thread and I have strange results (compiled under mingw, gcc4.3.1, windows XP).

I measure the execution times.

Test1:
I lock and unlock 1.0e8 times a Mutex.
Result: Poco has the best result.


Test1:
I start and join the same thread 1.0e8 times.
Result: Poco is equivalent to the best result.


Test3:
I have an Instance of an objcect whose method (locked) is called 1.0e8 times by N threads.

For N small (< 10) Poco has the best performance but if N increases its performance decreases exponentially whereas the other libraries have constant performances.

Do you know where it comes from ?
Gregoire
 
Posts: 7
Joined: 02 Dec 2008, 19:59

Re: Thread performance

Postby alex » 03 Dec 2008, 20:36

> Do you know where it comes from ?

Someone may figure it out if you provide code.
alex
 
Posts: 1113
Joined: 11 Jul 2006, 16:27
Location: United_States

Re: Thread performance

Postby guenter » 03 Dec 2008, 22:34

May have to do with our choice of CRITICAL_SECTION for implementing a Mutex. Just a guess, though. Do the other libraries use a WIN32 Mutex for implementing Poco::Mutex/FastMutex?
guenter
 
Posts: 1117
Joined: 11 Jul 2006, 16:27
Location: Austria

Re: Thread performance

Postby Gregoire » 04 Dec 2008, 10:51

Here is the code:

Code: Select all

#include

#include "Poco/Mutex.h"
#include "Poco/Thread.h"
#include "Poco/Runnable.h"

#define NB_THREADS 50

Poco::Mutex mutex;
int iMaxCalls = 1e8;
int iCalls = 0;
int iID = 0;

class MyRunnable: public Poco::Runnable
{
    int _iID;
public:
    MyRunnable():_iID(iID++) {}
    void run()
    {
        for (;;)
        {
            Poco::ScopedLock lock(mutex);

            ++iCalls;

            if (iCalls >= iMaxCalls)
            {
                if (iCalls == iMaxCalls)
                    std::cout __ "winner is " __ _iID  __ std::endl;
                break;
            }
        }
    }
};

int main()
{
    MyRunnable callback[NB_THREADS];
    Poco::Thread t[NB_THREADS];

    for (unsigned int i = 0; i < NB_THREADS; ++i)
    {
        t[i].start(callback[i]);
    }

    for (unsigned int i = 0; i < NB_THREADS; ++i)
    {
        t[i].join();
    }

    return 0;
}
Gregoire
 
Posts: 7
Joined: 02 Dec 2008, 19:59

Re: Thread performance

Postby Gregoire » 04 Dec 2008, 13:15

Every time I run this test, CPU usage has a peak at 100% and then fall back to 50%. For Other libs it is always at 80%. I guess threads spend too much time locked.



Average execution times:

For 5 threads :
My own win32 api wrapper using Critical_Sections: 43 sec.
My own pThread wrapper: 21 sec.
boost (using win32 api) using recursive_mutex : 18.2 sec
boost (pThread api) using recursive_mutex : 19.7 sec
Poco : 16.2 sec


For 15 Threads :
My own win32 api wrapper using Critical_Sections: 57 sec.
My own pThread wrapper: 29 sec.
boost (using win32 api) using recursive_mutex : 18.5 sec
boost (pThread api) using recursive_mutex : 24.0 sec
Poco : 17.1 sec


For 30 Threads :
My own win32 api wrapper using Critical_Sections: 72 sec.
My own pThread wrapper: 30. sec.
boost (using win32 api) using recursive_mutex : 18.2 sec
boost (pThread api) using recursive_mutex : 23.5 sec
Poco : 30. sec


For 50 Threads :
My own win32 api wrapper using Critical_Sections: 87 sec.
My own pThread wrapper: 30. sec.
boost (using win32 api) using recursive_mutex : 18.5 sec
boost (pThread api) using recursive_mutex : 23. sec
Poco : 58 sec
Gregoire
 
Posts: 7
Joined: 02 Dec 2008, 19:59

Re: Re: Thread performance

Postby alex » 04 Dec 2008, 14:18

> Average execution times:

Assuming your Poco results are for windows, I suspect the reason for the difference is because boost implementation is based on events.

What are your results with pthreads?
alex
 
Posts: 1113
Joined: 11 Jul 2006, 16:27
Location: United_States

Re: Re: Re: Thread performance

Postby alex » 05 Dec 2008, 02:56

> > Average execution times:

I've been trying to reproduce your results, but something does not add up. Critical section beats events hands down. Additionally, it is very unlikely that boost would not suffer any performance penalty whatsoever with the increase of thread count. What does your boost code look like and how do you measure times?
alex
 
Posts: 1113
Joined: 11 Jul 2006, 16:27
Location: United_States

Re: Re: Re: Re: Thread performance

Postby Gregoire » 05 Dec 2008, 11:22

Hi, thanks for your time. I really hope that we will end up saying that Poco has best performance. Even if my test does not illustrate a practical case I would like to understand what may go wrong with it so that we may choose Poco without second thoughts.

I'll try these benchmarks on an other machine to compare (I have a laptop with Intel Core2 Duo T8300 @ 2.40HHz).

Once again what is strange is the behaviour of the CPU usage, it goes up to 100% and then stabilize at 50%).

I measure time with the equivalent of a Poco::StopWatch (not shown in the pieces of code I post).

Here is the code I use for boost:

Code: Select all

#include _iostream_

#include "boost/thread.hpp"

#define NB_THREADS 50

boost::recursive_mutex mutex;
int iMaxCalls = 1e8;
int iCalls = 0;
int iID = 0;

struct MyRunnable
{
    int _iID;
public:
    MyRunnable():_iID(iID++) {}
    void operator()()
    {
        for (;;)
        {
            boost::recursive_mutex::scoped_lock lock(mutex);

            ++iCalls;

            if (iCalls >= iMaxCalls)
            {
                if (iCalls == iMaxCalls)
                    std::cout __ "winner is " __ _iID __ std::endl;
                break;
            }
        }
    }
};

int main()
{
    MyRunnable callback[NB_THREADS];
    boost::thread t[NB_THREADS];
    for (unsigned int i = 0; i < NB_THREADS; ++i)
    {
        t[i] = boost::thread(callback[i]);
    }
    for (unsigned int i = 0; i < NB_THREADS; ++i)
    {
        t[i].join();
    }
    return 0;
}
Gregoire
 
Posts: 7
Joined: 02 Dec 2008, 19:59

Re: Re: Re: Re: Re: Thread performance

Postby alex » 05 Dec 2008, 12:26

> I really hope that we will end up saying that Poco has best performance. Even if my test does not illustrate a practical case I would like to understand what may go wrong with it so that we may choose Poco without second thoughts.

If boost mutex is truly so much more performant, we'll match that. In fact my performance results are from poco-ified boost basic_timed_mutex, which I ported taking your word for results and hoping to improve things. But then I could not reproduce the results you are reporting. I've run it on 32 and 64 bit XP Pro (always 32-bit app, though) and got exactly the opposite.

I could be missing something or maybe have a glitch in my port. I'll run the benchmarks with boost itself and if I can reproduce your results then I'll look again into my ported code. In the meantime, can you provide information on what does boost::recursive_mutex resolve to on your platform (i.e. what is the underlying mutex mechanism)?

> Once again what is strange is the behaviour of the CPU usage, it goes up to 100% and then stabilize at 50%).

I assume this is Poco? Did not see that, either.
alex
 
Posts: 1113
Joined: 11 Jul 2006, 16:27
Location: United_States

Re: Re: Re: Re: Re: Re: Thread performance

Postby Gregoire » 05 Dec 2008, 13:14

I tried on another machine : Windows Vista, Intel Core2 Quad Q9300 @ 2.50GHz with the same binaries (compiled on my laptop). I have the same results in terms of execution time but this time CPU load is constant and equal to 70% for poco and 40% for boost (using win32 api).

Other tests :

Test 1: the main thread acquires/releases a mutex 1e8 times:
Results:
Poco::Mutex: 4 seconds
boost::recursive_mutex: 9.5 seconds

Test 2: the main thread launches and joins a thread with a void callback 1e5 times.
Results:
Poco: 6.5 seconds
boost: 6.5 seconds

Gregoire
 
Posts: 7
Joined: 02 Dec 2008, 19:59

Next

Return to Support

Who is online

Users browsing this forum: No registered users and 2 guests

cron