We will provide a dataset ‘data.csv’, which can be used to test your program. It is a 2-category, 10- feature dataset with 5×105samples.
1 Random forest – Wikipedia
In order to implement a parallel random forest, the following tasks should be accomplished:
· Decision tree (without stop-split-early condition) It includes the following components:
1) Calculate information gain;
2) split the data via finding the best feature based on a);
3) create branches recursively based on b) until every leaf only contains a single category;
· Random Forest
4) Bagging. Perform 100 Bootstrapping on the data, generate different decision trees based on task 3), and perform majority-voting on their prediction results.
5) When each node of the decision tree is split, 3 features are randomly selected in the way of non-replacement sampling, and task 2) is performed accordingly.
6) Based on task 2) single-layer decision tree, 3) decision tree, 4) bagging tree, 5) random forest, perform 5-fold CV, and compare their average prediction Accuracy2 on the validation set.
Single-layer
Decision Tree
Decision Tree
Bagging Tree
Random Forest
· Parallelization
7) For the above tasks 1)-4) (no task 5), if they need to be parallelized, what parallel method do you think should be used respectively? Please explain the reasons for your choice.
8) Choose one (not all!) of tasks 1)-4), and implement the parallelization of random forest based on the solution in task 7).
NOTE: Marks are given based on the correctness and clarity of your code. Which approach you perfer will not affect your score.
9) Let the number of decision trees in the Random Forest be fixed at 100, change the number of processors, and measure the running time of the program respectively. Record the results in the table below
processors
Running Time (s)
2 Precision and recall – Wikipedia
Max number of hardware threads on your
Please estimate the speedup of your program, does it achieve a linear speedup? If not, why?
LEARNING OUTCOMES
This assessment tests your ability to:
A. Demonstrate understanding of the concepts used in modern processors for increasing the performance.
B. Demonstrate optimization techniques for serial code.
C. Understand and apply parallel computing paradigms.
D. Write optimized programs designed for high-performance computing systems.
MARKING CRITERIA
The following table indicates what is expected for each classification category, highlighting generic marking criteria that bring together expectations in performance for each percentage (or alphabetical) band and the criteria that need to be satisfied.
Generic Marking Criteria
Criteria to be satisfied
· Outstanding work that is at the upper limit of performance.
· Work would be worthy of dissemination under appropriate conditions.
· Mastery of advanced methods and techniques at a level beyond that explicitly taught.
· Ability to synthesise and employ in an original way ideas from across the subject.
· In group work, there is evidence of an outstanding individual contribution.
· Excellent presentation.
· Outstanding command of critical analysis and judgment.
· Excellent range and depth of attainment of intended learning outcomes.
· Mastery of a wide range of methods and techniques.
· Evidence of study and originality clearly beyond the bounds of what has been taught.
· In group work, there is evidence of an excellent individual contribution.
· Excellent presentation.
· Able to display a command of critical thinking, analysis and
Upper Second
· Attained all the intended learning outcomes for a module or assessment.
· Able to use well a range of methods and techniques to come to conclusions.
· Evidence of study, comprehension, and synthesis beyond the bounds of what has been explicitly taught.
· Very good presentation of material.
· Able to employ critical analysis and judgement.
· Where group work is involved there is evidence of a productive
individual contribution
Lower Second
· Some limitations in attainment of learning objectives but has managed to grasp most of them.
· Able to use most of the methods and techniques taught.
· Evidence of study and comprehension of what has been taught
· Adequate presentation of material.
· Some grasp of issues and concepts underlying the techniques and material taught.
· Where group work is involved there is evidence of a positive
individual contribution.
· Limited attainment of intended learning outcomes.
· Able to use a proportion of the basic methods and techniques taught.
· Evidence of study and comprehension of what has been taught, but grasp insecure.
· Poorly presented.
· Some grasp of the issues and concepts underlying the techniques and material taught, but weak and incomplete.
· Attainment of only a minority of the learning outcomes.
· Able to demonstrate a clear but limited use of some of the basic methods and techniques taught.
· Weak and incomplete grasp of what has been taught.
· Deficient understanding of the issues and concepts underlying the techniques and material taught.
· Attainment of nearly all the intended learning outcomes deficient.
· Lack of ability to use at all or the right methods and techniques taught.
· Inadequately and incoherently presented.
· Wholly deficient grasp of what has been taught.
· Lack of understanding of the issues and concepts underlying the techniques and material taught.
· Incoherence in presentation of information that hinders
understanding.
· No significant assessable material, absent, or assessment missing a
“must pass” component.
Specific Marking Scheme
The tasks in this assessment can be divided into 3 categories:
· Charts Presentation & Analysis: 6), 9);
· Essay: 7);
· Programs: all others.
Criteria(%)
Satisfactory
Very Limited
Demonstrated correctly implemented code that produces correct output. Excellent coding quality follows best practices.
The program runs correctly and gives the expected results.
However, special cases are not fully considered, or the program performs redundant calculations.
Program basically works correctly for major functionality, however, with some conceptional problems.
The program implements some minor functionality, or incorrectly implements major functionality.
There is a certain degree of misunderstanding about the requirements of
the questions.
Program works incorrectly with limited attempt or irrelevant to the task.
Charts Presentation & Analysis
Excellent quality of report with clear structure, clear logic, concise writing, pleasing visual aids.
Most of the results in the chart are correct, but there is a certain degree of sloppy or wordy in the overview and
Moderate quality of report with basic structure, where writing and visual aids can be improved.
Only some of the results in the chart are correct, or some of them are not filled in. The analysis of the results was obviously biased.
Limited or no attempt of report.
Provides a detailed, accurate description of the methods. Provide comprehensive comparison between the methods, including pros and cons,
performance
The analysis provided demonstrates that the student’s understanding of the various methods is correct and that they have the ability to solve problems
independently.
Provides adequate description of the methods. Comparison is provided with some level of details, however, with some obvious mistakes.
There are obvious deviations in the understanding of the main methods, and it fails to reflect the ability to independently design algorithms. The description of the problem is vague,
or the thought is
Limited or no description of methods.
Limited comparison provided.
Although there are certain flaws, or
incomplete.
incomplete.