Programming task description for Data Analyst

In this set of tasks, you will be using made-up data of the flow of patients in an emergency department (ED). You should think how best to structure the data, analyze it, and communicate your approach and findings, including visual representations. Please submit your code for both tasks and indicate how long it took for you to answer each question. Please also prepare a report with your answers. You will be evaluated on the following four criteria: (i) programming (organization, simplicity, commenting), (ii) clarity of presentation, (iii) correctness of dataset and answers, and (iv) timeliness.
Physicians work in shifts, in which they begin work at a set time and stay until they discharge their patients (usually past the official end of their shift). Patients arrive and are immediately assigned to a physician, unless if the physician has not started his or her shift yet. In the latter case, the patient is assigned to the physician at the beginning of the shift. In the dataset test_data.txt, you will see comma- separated data in which each row represents a patient visit. The variables are as follows:
1. visit_num: Row identifier for the patient visit
2. phys_name: Physician
3. shiftid: String variable denoting the date and beginning and end times of the physician¡¯s shift. If
the shift spans midnight, the date corresponds to the beginning time.
4. ed_tc: Date and time of patient arrival to ED
5. dcord_tc: Date and time of patient discharge order
6. xb_lntdc: Measure of expected log length of stay, where length of stay is the difference between
dcord_tc and ed_tc, based on patient demographics and medical conditions (you can think of this as ¡°patient severity¡±)
Using a statistical program, perform the following tasks:
1. Summarize the data. Do some observations appear to be data entry errors (accounting for fact that phenomena in #1 are legitimate)?
2. Some patients may arrive before their physician¡¯s shift starts and therefore would have to wait. Other patients may be discharged after their physician¡¯s shift ends (and the physician would have to stay past the end of shift). What percentages of visits fall in these categories?
3. Describe hourly patterns of patient arrivals and the average severity of these patients. How might one formally test whether patient severity is or is not predicted by hour of the day?
4. Create and include with your solutions a dataset recording the ¡°census,¡± or number of patients under a physician¡¯s care (patients who have arrived and have not yet been discharged), for each hour of their shift and up to 4 hours after the shift ends. The observations in this dataset should correspond to the shift (shiftid), physician (phys_name), and the hour of shift (index). index should correspond to the hour of the shift relative to the official shift end, such that the hour the shift ends has a value of 0, the hour before the shift ends is -1, etc. Hint: You will need to transform the text in shiftid into numerical shift beginning and end times capturing both date and hour; you should ignore patient hour observations falling outside of the shift times of interest. How does the census vary with time relative to end of shift? Discuss conceptually how you construct censuses, and note issues with discrete time.
5. Which physician appears to be the fastest at discharging patients? You should answer this with a regression of log length of stay. You may also show results graphically. Discuss how you thought about which variables you control for. What are potential threats (and any evidence of them) to your assessment (you do not need to actually address these threats)? How robust are your estimates of physician effects to various specifications? Are there any concerns you might have with a finite number of patients per physician?
Programming Help