ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • 머신러닝 개발일지 001
    카테고리 없음 2022. 1. 20. 09:43
    import os
    os.environ['KAGGLE_USERNAME'] = 'whitesmithdp' # username
    os.environ['KAGGLE_KEY'] = '109b4f2a434174fa3c3accb734e78af4' # key
     
     
     
    일단 아이디와 키값
     
     
     
    !kaggle datasets download -d rsadiq/salary

    데이터셋 다운

     

     

    !unzip salary.zip
     
    다운받은 데이터셋 언팩
     
     
     
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import Dense
    from tensorflow.keras.optimizers import Adam, SGD
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt 
    import seaborn as sns
    from sklearn.model_selection import train_test_split

    df = pd.read_csv('Salary.csv')

    df.tail(5)
     
     
     
    패키지들
     
     
     
     
     
    x_data = np.array(df['YearsExperience'], dtype=np.float32)
    y_data = np.array(df['Salary'], dtype=np.float32)

    x_data = x_data.reshape((-11))
    y_data = y_data.reshape((-11))

    print(x_data.shape)
    print(y_data.shape)

    x_train, x_val, y_train, y_val = train_test_split(x_data, y_data, test_size=0.2, random_state=2021)

    print(x_train.shape, x_val.shape)
    print(y_train.shape, y_val.shape)

    model = Sequential([
      Dense(1)
    ])

    model.compile(loss='mean_squared_error', optimizer=SGD(lr=0.01))

    model.fit(
        x_train,
        y_train,
        validation_data=(x_val, y_val), # 검증 데이터를 넣어주면 한 epoch이 끝날때마다 자동으로 검증
        epochs=100 # epochs 복수형으로 쓰기!
    )

     

    데이터셋 활용

    loss의

    mean_squared_error
    mean_absolute_error로
     
    변경
     
     
    lr (learning rate)  숫자 변경하여 적용
     
     
     
    y_pred = model.predict(x_val)

    plt.scatter(x_val, y_val)
    plt.scatter(x_val, y_pred, color='r')
    plt.show()

    붉게 표시되는 것이 함수를 통해 예상하는 값

Designed by Tistory.