GSP-327: Engineer Data in Google Cloud

GSP-327: Engineer Data in Google Cloud

Overview

Please execute the following tasks:

Task - 1: Clean your training data

 1CREATE OR REPLACE TABLE
 2taxirides.<Table_Name_as_mention_in_lab> AS
 3SELECT
 4(tolls_amount + fare_amount) AS <Fare Amount_as_mention_in_lab>,
 5pickup_datetime,
 6pickup_longitude AS pickuplon,
 7pickup_latitude AS pickuplat,
 8dropoff_longitude AS dropofflon,
 9dropoff_latitude AS dropofflat,
10passenger_count AS passengers,
11FROM
12taxirides.historical_taxi_rides_raw
13WHERE
14RAND() < 0.001
15AND trip_distance > 3       [Change_as_mention_in_lab]
16AND fare_amount >= 2.0      [Change_as_mention_in_lab]
17AND pickup_longitude > -78
18AND pickup_longitude < -70
19AND dropoff_longitude > -78
20AND dropoff_longitude < -70
21AND pickup_latitude > 37
22AND pickup_latitude < 45
23AND dropoff_latitude > 37
24AND dropoff_latitude < 45
25AND passenger_count > 3     [Change_as_mention_in_lab]

Task - 2: Create a BQML model called taxirides.fare_model

 1CREATE OR REPLACE MODEL taxirides.<Model Name_as_mention_in_lab>
 2TRANSFORM(
 3* EXCEPT(pickup_datetime)
 4 
 5, ST_Distance(ST_GeogPoint(pickuplon, pickuplat), ST_GeogPoint(dropofflon, dropofflat)) AS euclidean
 6, CAST(EXTRACT(DAYOFWEEK FROM pickup_datetime) AS STRING) AS dayofweek
 7, CAST(EXTRACT(HOUR FROM pickup_datetime) AS STRING) AS hourofday
 8)
 9OPTIONS(input_label_cols=['<Fare Amount_as_mention_in_lab>'], model_type='linear_reg')
10AS
11 
12SELECT * FROM taxirides.<Table_Name_as_mention_in_lab>

Task - 3: Perform a batch prediction on new data

1CREATE OR REPLACE TABLE taxirides.2015_fare_amount_predictions
2AS
3SELECT * FROM ML.PREDICT(MODEL taxirides.<Model Name_as_mention_in_lab>,(
4SELECT * FROM taxirides.report_prediction_data)
5)

Congratulations, you're all done with the lab 😄