Computer Science Question

User Generated

qngnpngrq_tvey

Computer Science

Description

Objective of Competition:

You will be implementing stock market future prediction system through multi-stage processing of the stock market data. At the end of each stage, you will process raw time-series data using either ML algorithm or use some preprocessing filters. I have provided you with 20 years worth of daily data for 20 ticker symbols. You will use them for training and testing each stage of the algorithms.

Step 1: Clustering (Completed in assignment 3) - COMPLETED 

In this step, you will cluster the candles (Open, High, Low, Close, Volume(optional)) of daily data in n clusters. Here is numbers of clusters will be chosen by you. You shall provide justification for choice. of number of clusters. Cluster validity shall be used to ensure that the clusters built by your method are good. You are free to choose one or more clustering algorithms and pick results of one that provides you the best cluster validity.

Step 2: Cluster labeling ( To be completed)

You may visually inspect each cluster to perform this step. Use the cluster center and variance to create an orderly label for each cluster that range from (-k to +k) . In this case, total numbers of clusters you will have are (n = 2k+1). Because there are negative and positive candles, your order will represent relative size of candle in each direction. In above example, I am assuming symmetric distribution of positive and negative candles. However, you may have asymmetric distribution.

Use the cluster label to represent each candle and generate a new file which has "Date" and respective "Cluster Label" replacing OHLC data. 

Step 3: Create trends using zigzag indicator:

Download zigzag.ipynb (attached below).This will help you calculate long term trend in price data. The zigzag method expects to have OHLC pandas data frame object and returns same object with additional column called 'trend'. The value +1 represents up trend while -1 represents down trend. You can run program and understand how it works. The plot function will help you visualize those long term trend.

For this assignment, you will calculate 10% trend value, which you can pass as a parameter to zigzag function (pct=10.0). Generated trend values are now your labels for classification step.

Step 4: Create model that can predict trend

Now you have daily data with its trend value. In this step, you will divide all 30 stocks for training your model to predict the trend. Each daily sample is now represented as "Cluster label" and output as "trend". However system will not train better unless you give past n days cluster label as input vector and predict 'trend" of n+1 day as output. Ensure. that the day for which you are predicting trend is not included in past n day cluster label input feature vector.

For each of the 30 stocks, you will generate your experimental data set using described method in prior paragraph. Now combine all 30 stocks data and generate a large dataset for training. 

Step 5: Train and Test your model 

You will perform n fold validation to validate consistency of your model. I suggest using n = 7. At the end of your experiment, generate your confusion matrix representing accuracies with mean and its std deviation. At this point, I will provide you a new ticker symbol to test goodness of your model. The accuracy of this file shall be printed on output after your program calculates the confusion matrix.

Unformatted Attachment Preview

M Inbox - suvarnajoshi X M Inbox (24,207) - joshis X S SOLUTION: Machine L X trip-advisor-hotelrevie x C Desktop/ Х 3 AsignmentCopy - Jup. X th Course: CSI-6160-142 X - + C localhost:8888/notebooks/Desktop/AsignmentCopy.ipynb m Not syncing jupyter AsignmentCopy Last Checkpoint: 5 minutes ago (autosaved) Logout File Edit View Insert Cell Kernel Widgets Help Not Trusted Python 3 O + 1 Run с» Code PdlIl = 1 20210400101111di univer all_files = glob.glob(path + "/*.csv") for filename in all_files: name_file = filename[filename.find(' ')+1: filename.find('\\.')-3] exec('{} = pd.read_csv(filename, header=0)'.format(name_file)) In [55]: all_st = [ AAPL', 'AXP', 'BA', 'CAT', 'CSCO', 'cx', 'DIS', 'S', 'HD', 'IBM', 'INTC', 'JNI', 'JPM', 'Ko', 'MCD', 'MMM', 'MRK', 'MSET'. 'NKE 'PFE' 'PG', 'TRV', 'UNH', 'UTX', 'V', 'vz', 'WBA', 'WMT', 'XOM'] و > In [58]: #Check existence of the df names in global env for x in all_st: if ((x in locals()) I (x in globals())): print(x) In [59]: globals() [all_st[0]] KeyError Traceback (most recent call last) in ---> 1 globals() [all_st[@]] KeyError: 'AAPL' In [64]: AAPL.head() Out[64]: Unnamed: 0 Open High Low Close Volume Adj_Open Adj_High Adj_Low Adj_Close Adj_Volume Divident Split 0 2000-03-06 126.00 129.13 125.00 125.69 1880000 3.905609 4.002629 3.874612 3.8960 58274.166600 0.0 0.030997 1 2000-03-07 126.44 127.44 121.12 122.87 2437600 3.919259 3.950256 3.754355 3.8086 75558.259624 0.0 0.030997 2 2000-03-08 122.87 123.94 118.56 122.00 2421700 3.808567 3.841734 3.674971 3.7816 75064.760000 0.0 0.030997 3 2000-03-09 120.87 125.00 118.25 122.25 2470700 3.746624 3.874642 3.665411 3.7894 76584.626421 0.0 0.030997 4 2000-03-10 121.69 127.94 121.00 125.75 2219700 3.772051 3.965784 3.750663 3.8979 68804.521909 0.0 0.030997 Type here to search # IT IA R [] 1:23 PM 4/21/2021 =
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Hey, let me know if you nee...


Anonymous
Great study resource, helped me a lot.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags