본문 바로가기

Tech Stack/MLflow

MLOps - 배치서빙

배치서빙

특정 모델을 배치서빙 파이프라인에 넣어 예측값을 받아옴

 

시간이나 일단위는 Batch

1초 이하는 realtime serving

 

 

 

 

배치 데이터 생성

 

#
# upload data to minio
#
bucket_name = "not-predicted"
object_name = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

if not client.bucket_exists(bucket_name):
    client.make_bucket(bucket_name)

client.fput_object(bucket_name, object_name, "batch.csv")

이전에 진행한것처럼 진행.

object의 이름은 올린 시간 분 초로 한 후 데이터를 업로드

없다면 버킷을 만들고 업로드

 

 

배치서빙

 

def predict(run_id, model_name):
    #
    # load model
    #
    clf = mlflow.pyfunc.load_model(f"runs:/{run_id}/{model_name}")

    #
    # minio client
    #
    url = "0.0.0.0:9000"
    access_key = "minio"
    secret_key = "miniostorage"
    client = Minio(url, access_key=access_key, secret_key=secret_key, secure=False)

    #
    # get data list to predict
    #
    if "predicted" not in client.list_buckets():
        # 최초 실행시 predicted bucket 생성
        client.make_bucket("predicted")
    not_predicted_list = [objects.object_name for objects in client.list_objects(bucket_name="not-predicted")]
    predicted_list = [objects.object_name for objects in client.list_objects(bucket_name="predicted")]
    to_predict_list = []
    for not_predicted in not_predicted_list:
        if not_predicted not in predicted_list:
            to_predict_list += [not_predicted]

    #
    # predict
    #
    for filename in to_predict_list:
        print("data to predict:", filename)
        # download and read data
        client.fget_object(bucket_name="not-predicted", object_name=filename, file_path=filename)
        data = pd.read_csv(filename)

        # predict
        pred = clf.predict(data)

        # save to minio prediction bucket
        pred_filename = f"pred_{filename}"
        pred.to_csv(pred_filename, index=None)
        client.fput_object(bucket_name="predicted", object_name=filename, file_path=pred_filename)


if __name__ == "__main__":
    from argparse import ArgumentParser

    parser = ArgumentParser()
    parser.add_argument("--run-id", type=str)
    parser.add_argument("--model-name", type=str, default="my_model")
    args = parser.parse_args()

    #
    # predict
    #
    predict(args.run_id, args.model_name)

 

predict와 notpredict를 비교하여 notpredict에서 predict에 없다면 예측 후 저장하는 과정

 

 

 

 

다른사람들도 모델을 사용하며 서빙할 수 있도록 image를 서빙

 

docker build -t batch_predict -f batch.Dockerfile .

docker 파일이 많아 질 경우를 대비 -f를 사용

 

docker run -it batch_predict MLFLOW RUN ID

 

 

(myenv) (base) dinoqos@jangjeong-uui-MacBookAir section1 % docker build -t batch-serving -f batch.Dockerfile .
[+] Building 18.2s (12/12) FINISHED                                 docker:desktop-linux
 => [internal] load build definition from batch.Dockerfile                          0.0s
 => => transferring dockerfile: 386B                                                0.0s
 => [internal] load metadata for docker.io/amd64/python:3.9-slim                    1.5s
 => [auth] amd64/python:pull token for registry-1.docker.io                         0.0s
 => [internal] load .dockerignore                                                   0.0s
 => => transferring context: 2B                                                     0.0s
 => [1/6] FROM docker.io/amd64/python:3.9-slim@sha256:51c781cd11dd1f2a95e2bef833a5  0.0s
 => [internal] load build context                                                   0.0s
 => => transferring context: 2.36kB                                                 0.0s
 => CACHED [2/6] WORKDIR /usr/app/                                                  0.0s
 => CACHED [3/6] RUN pip install -U pip &&    pip install mlflow==2.3.2 boto3==1.2  0.0s
 => [4/6] COPY requirements.txt requirements.txt                                    0.0s
 => [5/6] RUN pip install -r requirements.txt                                      13.1s
 => [6/6] COPY local_predict.py predict.py                                          0.0s
 => exporting to image                                                              3.5s
 => => exporting layers                                                             3.5s
 => => writing image sha256:acccdf80d068eafb0b135f26874367b15e4849009610740ab5e541  0.0s
 => => naming to docker.io/library/batch-serving                                    0.0s

What's Next?
  View a summary of image vulnerabilities and recommendations → docker scout quickview
(myenv) (base) dinoqos@jangjeong-uui-MacBookAir section1 % docker images
REPOSITORY                        TAG       IMAGE ID       CREATED          SIZE
batch-serving                     latest    acccdf80d068   18 minutes ago   1.16GB
05_model_registry-mlflow-server   latest    424be2774d1b   2 days ago       1.13GB
mlflow-server                     latest    86b81958f80b   6 days ago       963MB
03_experiment-mlflow-server       latest    4f01cd2005e2   6 days ago       963MB
reproduce                         latest    6eaff9538a5c   8 days ago       531MB
minio/minio                       latest    81f7d6495208   12 days ago      147MB
postgres                          14.0      01b2dbb34042   2 years ago      354MB
(myenv) (base) dinoqos@jangjeong-uui-MacBookAir section1 % docker run batch-serving 9c067f37a61246ee9bd7b4ff36c7f2df
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/connection.py", line 95, in create_connection
    raise err
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 416, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 244, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/local/lib/python3.9/http/client.py", line 1285, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.9/http/client.py", line 1331, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.9/http/client.py", line 1280, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.9/http/client.py", line 1040, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.9/http/client.py", line 980, in send
    self.connect()
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 205, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fffcd22a730>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 827, in urlopen
    return self.urlopen(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 827, in urlopen
    return self.urlopen(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 827, in urlopen
    return self.urlopen(
  [Previous line repeated 2 more times]
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 799, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='0.0.0.0', port=5001): Max retries exceeded with url: /api/2.0/mlflow/runs/get?run_uuid=9c067f37a61246ee9bd7b4ff36c7f2df&run_id=9c067f37a61246ee9bd7b4ff36c7f2df (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fffcd22a730>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 187, in http_request
    return _get_http_response_with_retries(
  File "/usr/local/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 118, in _get_http_response_with_retries
    return session.request(method, url, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=5001): Max retries exceeded with url: /api/2.0/mlflow/runs/get?run_uuid=9c067f37a61246ee9bd7b4ff36c7f2df&run_id=9c067f37a61246ee9bd7b4ff36c7f2df (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fffcd22a730>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/app/predict.py", line 71, in <module>
    predict(args.run_id, args.model_name)
  File "/usr/app/predict.py", line 17, in predict
    clf = mlflow.pyfunc.load_model(f"runs:/{run_id}/{model_name}")
  File "/usr/local/lib/python3.9/site-packages/mlflow/pyfunc/__init__.py", line 577, in load_model
    local_path = _download_artifact_from_uri(artifact_uri=model_uri, output_path=dst_path)
  File "/usr/local/lib/python3.9/site-packages/mlflow/tracking/artifact_utils.py", line 100, in _download_artifact_from_uri
    return get_artifact_repository(artifact_uri=root_uri).download_artifacts(
  File "/usr/local/lib/python3.9/site-packages/mlflow/store/artifact/artifact_repository_registry.py", line 106, in get_artifact_repository
    return _artifact_repository_registry.get_artifact_repository(artifact_uri)
  File "/usr/local/lib/python3.9/site-packages/mlflow/store/artifact/artifact_repository_registry.py", line 72, in get_artifact_repository
    return repository(artifact_uri)
  File "/usr/local/lib/python3.9/site-packages/mlflow/store/artifact/runs_artifact_repo.py", line 26, in __init__
    uri = RunsArtifactRepository.get_underlying_uri(artifact_uri)
  File "/usr/local/lib/python3.9/site-packages/mlflow/store/artifact/runs_artifact_repo.py", line 39, in get_underlying_uri
    uri = get_artifact_uri(run_id, artifact_path, tracking_uri)
  File "/usr/local/lib/python3.9/site-packages/mlflow/tracking/artifact_utils.py", line 47, in get_artifact_uri
    run = store.get_run(run_id)
  File "/usr/local/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 134, in get_run
    response_proto = self._call_endpoint(GetRun, req_body)
  File "/usr/local/lib/python3.9/site-packages/mlflow/store/tracking/rest_store.py", line 56, in _call_endpoint
    return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
  File "/usr/local/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 296, in call_endpoint
    response = http_request(
  File "/usr/local/lib/python3.9/site-packages/mlflow/utils/rest_utils.py", line 205, in http_request
    raise MlflowException(f"API request to {url} failed with exception {e}")
mlflow.exceptions.MlflowException: API request to http://0.0.0.0:5001/api/2.0/mlflow/runs/get failed with exception HTTPConnectionPool(host='0.0.0.0', port=5001): Max retries exceeded with url: /api/2.0/mlflow/runs/get?run_uuid=9c067f37a61246ee9bd7b4ff36c7f2df&run_id=9c067f37a61246ee9bd7b4ff36c7f2df (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fffcd22a730>: Failed to establish a new connection: [Errno 111] Connection refused'))
(myenv) (base) dinoqos@jangjeong-uui-MacBookAir section1 %

실행이 안 되는 이유는 도커네트워크의 문제임

 

데스크탑과 컨테이너는 분리가 되어있음

컨테이너끼리도 분리가 되어있음

 

아 이거 해결하느라 너무 힘들었다 파일명 지정 잘 하고 다시 minio에 프레딕트 올리고 다시 빌드하고 아 그전에

docker compose up --build도 꼭 해주고 하자.

 

(myenv) (base) dinoqos@jangjeong-uui-MacBookAir section1 % docker run --network 06_batch_serving_default batch-serving 7f68f810112c4b10ae67b9fc73d29f97
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
2024/02/29 09:39:06 WARNING mlflow.pyfunc: The version of Python that the model was saved in, `Python 3.7.6`, differs from the version of Python that is currently running, `Python 3.9.18`, and may be incompatible
/usr/local/lib/python3.9/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.0.2 when using version 1.2.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/usr/local/lib/python3.9/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator RandomForestClassifier from version 1.0.2 when using version 1.2.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
data to predict: 2024-02-29 18:39:00
(myenv) (base) dinoqos@jangjeong-uui-MacBookAir section1 %

 

다음과 같이 결과를 확인할 수 있다.

 

도커이미지에 model을 다운 받은 후 바로 사용할 수 있도록, 즉 MLflow에 연결없이 할 수 있도록 도커에 저장하도록 해보겠습니다.

 

import os

import mlflow

os.environ["MLFLOW_S3_ENDPOINT_URL"] = "http://0.0.0.0:9000"
os.environ["MLFLOW_TRACKING_URI"] = "http://0.0.0.0:5001"
os.environ["AWS_ACCESS_KEY_ID"] = "minio"
os.environ["AWS_SECRET_ACCESS_KEY"] = "miniostorage"

if __name__ == "__main__":
    from argparse import ArgumentParser

    parser = ArgumentParser()
    parser.add_argument("--run-id", type=str)
    parser.add_argument("--model-name", type=str, default="my_model")
    args = parser.parse_args()

    mlflow.artifacts.download_artifacts(run_id=args.run_id, artifact_path=args.model_name, dst_path="./downloads")

다음 코드로 model에 필요한 모든 부분을 다운 받도록 합니다.

(myenv) (base) dinoqos@jangjeong-uui-MacBookAir section2 %  docker build -t model_predict
 -f batch_image.Dockerfile .
[+] Building 115.8s (13/13) FINISHED                                docker:desktop-linux
 => [internal] load build definition from batch_image.Dockerfile                    0.0s
 => => transferring dockerfile: 413B                                                0.0s
 => [internal] load metadata for docker.io/amd64/python:3.9-slim                    1.6s
 => [auth] amd64/python:pull token for registry-1.docker.io                         0.0s
 => [internal] load .dockerignore                                                   0.0s
 => => transferring context: 2B                                                     0.0s
 => [1/7] FROM docker.io/amd64/python:3.9-slim@sha256:51c781cd11dd1f2a95e2bef833a5  0.0s
 => [internal] load build context                                                   0.0s
 => => transferring context: 1.07MB                                                 0.0s
 => CACHED [2/7] WORKDIR /usr/app/                                                  0.0s
 => [3/7] RUN pip install -U pip &&    pip install mlflow==2.3.2 minio==7.1.15     96.6s
 => [4/7] COPY requirements.txt requirements.txt                                    0.1s
 => [5/7] RUN pip install -r requirements.txt                                      15.4s
 => [6/7] COPY downloads/ /usr/app/downloads/                                       0.0s
 => [7/7] COPY model_predict.py predict.py                                          0.0s
 => exporting to image                                                              1.9s
 => => exporting layers                                                             1.9s
 => => writing image sha256:646bf888921600e2f464666069356242671dcae1f189a95a33e148  0.0s
 => => naming to docker.io/library/model_predict                                    0.0s

What's Next?
  View a summary of image vulnerabilities and recommendations → docker scout quickview
(myenv) (base) dinoqos@jangjeong-uui-MacBookAir section2 % docker network ls
NETWORK ID     NAME                        DRIVER    SCOPE
d1c8c0c7a7fa   03_experiment_default       bridge    local
58e06af341bd   05_model_registry_default   bridge    local
a8a6be375619   06_batch_serving_default    bridge    local
615e3e80795a   bridge                      bridge    local
b0469949ba5a   host                        host      local
659d54bb76fd   none                        null      local
(myenv) (base) dinoqos@jangjeong-uui-MacBookAir section2 % docker run --network 06_batch_serving_default model-predict
Unable to find image 'model-predict:latest' locally
docker: Error response from daemon: pull access denied for model-predict, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
See 'docker run --help'.
(myenv) (base) dinoqos@jangjeong-uui-MacBookAir section2 % docker run --network 06_batch_serving_default model_predict
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
2024/03/01 12:30:24 WARNING mlflow.pyfunc: The version of Python that the model was saved in, `Python 3.7.6`, differs from the version of Python that is currently running, `Python 3.9.18`, and may be incompatible
/usr/local/lib/python3.9/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.0.2 when using version 1.2.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
/usr/local/lib/python3.9/site-packages/sklearn/base.py:318: UserWarning: Trying to unpickle estimator RandomForestClassifier from version 1.0.2 when using version 1.2.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
(myenv) (base) dinoqos@jangjeong-uui-MacBookAir section2 % docker run -it --entrypoint /b
in/bash model_predict
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested
root@1bfbbbc35117:/usr/app# ls
downloads  predict.py  requirements.txt
root@1bfbbbc35117:/usr/app# ls downloads
my_model
root@1bfbbbc35117:/usr/app#

'Tech Stack > MLflow' 카테고리의 다른 글

MLOps - FastAPI 서빙  (0) 2024.03.04
MLOps - 모델 저장  (1) 2024.02.29
MLOps - 데이터  (0) 2024.02.27
MLOps - 교차검증  (0) 2024.02.23
MLOps - HPO 반영  (0) 2024.02.22