About Me

http://www.eecg.toronto.edu/~tamda/


User-Level Thread Migration

This patch adds/re-introduces user-level thread migration to K42. This feature is targetted at single multithreaded applications in which there is CPU idleness in the workload. This feature attempts to "soak up" this idleness with useful work.

When a dispatcher becomes idle, it looks for available work in a remote dispatcher's ready queue (within the same address space, of course).

Here are my initial SPECjbb2000 results with my user-level thread migration patch enabled. You can obtain my patch on the AGORA system.

Experimental Setup

For non-compliant SPECjbb2000 runs consisting of 5 min warmup, then 20 mins measured time.

Using a multithreaded ray tracing workload (SPECjvm98 mtrt).

Migration Enabled Parameters

Graphs

  1. _specjbb2000throughput.pdf

  2. _specjbb2000threadspread.pdf

  3. Standard deviations for the above 2 graphs = summary.txt

Raw Data: No Migration

  1. _SPECjbb.021.results.ksh

  2. _SPECjbb.022.results.ksh

  3. _SPECjbb.023.results.ksh

Raw Data: Migration

  1. _SPECjbb.017.results.ksh

  2. _SPECjbb.018.results.ksh

  3. _SPECjbb.020.results.ksh

Observations

Discussion

Perhaps initial thread placement in K42 is not always effective for SPECjbb2000. Thread migration mitigates this problem by dynamically (although not too intelligently) adjusting thread placement.

Conclusions

User-level thread migration is effective in reducing SPECjbb thread spread. However, currently overheads of the migration mechanism are fairly high and require more work to improve.

Future Work

Lots. This is a work in progress. I am currently actively working in this area. I have a lot of experiments to try, tweaks to make, then more experiments to run, redesigns/re-implementations to do, etc...

Idleness Statistics

1 to 4 Warehouses, *Without* Migration

Here are some initial, approximate graphs of the distribution of idle incidents without migration.

Note: X-axis shows buckets of: 0-10us, 10-100us, 100-1000us, etc...

  1. Combined approximate idleness distribution - histogram1to4warehouses.pdf

  2. Per VP idleness distribution (approximately), trial #1 - histogrampervp1.pdf

  3. Per VP idleness distribution (approximately), trial #2 - histogrampervp2.pdf

  4. Per VP idleness distribution (approximately), trial #3 - histogrampervp3.pdf

Conclusion: There are a significant number of idle incidents where idle time is more than 1ms (between 1-10ms). Distribution of idleness *across* processors (vps) is uneven.

1 to 4 Warehouses, With Migration

2005-10-21 I have examined the idleness incidents, from 1-4 warehouses, in more detail. Here are the results. idlestats1to4.txt

Idleness distribution comparsion - idledistn1to4warehouse_corrected.pdf

Note: X-axis shows buckets of: 0-10us, 10-100us, 100-1000us, etc...

Conclusion: Performance improvements (throughput), from 1-4 warehouses, roughly agrees with total idle time reduction caused by migration.

Summary of Other Observations:

  1. Migration reduces avg idle time (per incident) by 30% on avg.
  2. Migration causes a 96% reduction in idle incidents that fall within the 10ms-100ms range.
  3. Migration reduces the number of idleness incidents and causes them to become more evenly distributed across CPUs.

1 to 16 Warehouses

2005-10-19 I have examined the idleness incidents, from 1-16 warehouses, in more detail. Here are the results. idlestats.txt

Here is a quick summary.

  1. Migration reduces avg idle time (per incident) by 29% on avg.
  2. Migration causes a 37% reduction in idle incidents that fall within the 10ms-100ms range.
  3. Migration causes idleness incidents to become more evenly distributed across CPUs.

More Detailed Results

2005-10-28

  1. Detailed text-based results are here: stats_incremental_comparison.txt

  2. migrations.pdf - The number of migrations increases as more load is placed on the system.

  3. idleincidents.pdf - Migration reduces the number of idle incidents significantly.

  4. idletime.pdf - Total idle times is significantly reduced with migration enabled.

  5. 10to100ms.pdf - There is a significant reduction in idle incidents in the 10-100ms range.

  6. 1to10ms.pdf - There is a significant reduction in idle incidents in the 1-10ms range.

8 CPU Results

Here are some preliminary results, based on only 1 run of SPECjbb2000.

Although there are not enough data points to make a strong statement, it appears that when thread spread is bad, thread migration / load-balancing reduces this severity and also improves throughput.

Contact

If you have comments, questions, criticisms, suggestions, please let me know. tamda@eecg.toronto.edu

DavidTam (last edited 2006-03-02 18:39:34 by DavidTam)