About Me
http://www.eecg.toronto.edu/~tamda/
User-Level Thread Migration
This patch adds/re-introduces user-level thread migration to K42. This feature is targetted at single multithreaded applications in which there is CPU idleness in the workload. This feature attempts to "soak up" this idleness with useful work.
When a dispatcher becomes idle, it looks for available work in a remote dispatcher's ready queue (within the same address space, of course).
Here are my initial SPECjbb2000 results with my user-level thread migration patch enabled. You can obtain my patch on the AGORA system.
- ~tamdavid/public/README_patch_threadmigration_20051102-1127EST
- ~tamdavid/public/patch_threadmigration_20051102-1127EST
Experimental Setup
- victim = k10
- #CPUs enabled = 4
- Power3 630+, 375 MHz
- physical RAM = ~1.5 GB
- OS = K42, last cvs updated June 6, 2005.
- JVM = J9
- Workload = SPECjbb2000
- JVM heap size = 1000 MB
- Ran each experiment 3 separate times with reboots between each one.
- Standard SPECjbb2000 compliant runs consisting of 30 sec warmup, then 2 mins measured time.
For non-compliant SPECjbb2000 runs consisting of 5 min warmup, then 20 mins measured time.
Using a multithreaded ray tracing workload (SPECjvm98 mtrt).
Migration Enabled Parameters
- 10ms period timer to update published dispatcher ready queue length.
Graphs
Standard deviations for the above 2 graphs = summary.txt
Raw Data: No Migration
Raw Data: Migration
Observations
- Throughput:
- When the system is lightly loaded (from 1 to 4 warehouses), thread migration improves throughput.
The overheads of the initial thread migration design are high, causing a ~15% reduction in throughput under high load (>4 warehouses).
- Thread Spread:
- Thread migration consistently reduces thread spread.
Without thread migration, thread spread is typically low only when the # of warehouses is a multiple of the number of processors.
Discussion
Perhaps initial thread placement in K42 is not always effective for SPECjbb2000. Thread migration mitigates this problem by dynamically (although not too intelligently) adjusting thread placement.
Conclusions
User-level thread migration is effective in reducing SPECjbb thread spread. However, currently overheads of the migration mechanism are fairly high and require more work to improve.
Future Work
Lots. This is a work in progress. I am currently actively working in this area. I have a lot of experiments to try, tweaks to make, then more experiments to run, redesigns/re-implementations to do, etc...
Idleness Statistics
1 to 4 Warehouses, *Without* Migration
Here are some initial, approximate graphs of the distribution of idle incidents without migration.
Note: X-axis shows buckets of: 0-10us, 10-100us, 100-1000us, etc...
Combined approximate idleness distribution - histogram1to4warehouses.pdf
Per VP idleness distribution (approximately), trial #1 - histogrampervp1.pdf
Per VP idleness distribution (approximately), trial #2 - histogrampervp2.pdf
Per VP idleness distribution (approximately), trial #3 - histogrampervp3.pdf
Conclusion: There are a significant number of idle incidents where idle time is more than 1ms (between 1-10ms). Distribution of idleness *across* processors (vps) is uneven.
1 to 4 Warehouses, With Migration
2005-10-21 I have examined the idleness incidents, from 1-4 warehouses, in more detail. Here are the results. idlestats1to4.txt
Idleness distribution comparsion - idledistn1to4warehouse_corrected.pdf
Note: X-axis shows buckets of: 0-10us, 10-100us, 100-1000us, etc...
Conclusion: Performance improvements (throughput), from 1-4 warehouses, roughly agrees with total idle time reduction caused by migration.
Summary of Other Observations:
- Migration reduces avg idle time (per incident) by 30% on avg.
- Migration causes a 96% reduction in idle incidents that fall within the 10ms-100ms range.
- Migration reduces the number of idleness incidents and causes them to become more evenly distributed across CPUs.
1 to 16 Warehouses
2005-10-19 I have examined the idleness incidents, from 1-16 warehouses, in more detail. Here are the results. idlestats.txt
Here is a quick summary.
- Migration reduces avg idle time (per incident) by 29% on avg.
- Migration causes a 37% reduction in idle incidents that fall within the 10ms-100ms range.
- Migration causes idleness incidents to become more evenly distributed across CPUs.
More Detailed Results
2005-10-28
Detailed text-based results are here: stats_incremental_comparison.txt
migrations.pdf - The number of migrations increases as more load is placed on the system.
idleincidents.pdf - Migration reduces the number of idle incidents significantly.
idletime.pdf - Total idle times is significantly reduced with migration enabled.
10to100ms.pdf - There is a significant reduction in idle incidents in the 10-100ms range.
1to10ms.pdf - There is a significant reduction in idle incidents in the 1-10ms range.
8 CPU Results
- victim = k0
- #CPUs enabled = 8
- RS64-IV, 601 MHz
- physical RAM = ~16 GB
- OS = K42, last cvs updated June 6, 2005.
- JVM = J9
- Workload = SPECjbb2000
- JVM heap size = 10000 MB
Here are some preliminary results, based on only 1 run of SPECjbb2000.
Throughput - specjbbthruput8cpus.pdf
Thread Spread - specjbbthreadspread8cpus.pdf
Although there are not enough data points to make a strong statement, it appears that when thread spread is bad, thread migration / load-balancing reduces this severity and also improves throughput.
Contact
If you have comments, questions, criticisms, suggestions, please let me know. tamda@eecg.toronto.edu
