Project: Develop an Android mobile application to show patient friendly costs of care

Okay I agree with you @muarachmann, I will be there

@muarachmann I have joined meeting

@muarachmann

I think you got busy , you can message me here, I am available, I will join then

Hi @Darshpreet2000, sorry I had another appointment and couldn’t be available. I am about to test the pipeline on a demo branch (demo-data) with just one hospital in the url and lets see if we get the expected results. We still have plenty (not much) of things to tackle, the flask integration is just a way of testing the api with the client maybe locally or hosted. Do you think we can use it for generating the process files over the gui to keep our tech stack not to far rather than bringing in PyQT

1 Like

@Darshpreet2000, @r0bby, @sunbiz mind explaining me something. I think I am confused maybe how git works these days :slight_smile: . I am cloning the repo using git clone https://gitlab.com/muarachmann/lh-toolkit-cost-of-care-app-data-scraper.git and the default branch is develop, just has the source codes yet on cloning, it looks massive see about 600MB

And when i finally cd in to the directory, and run a ls -lh I see the files don’t even sum up to give a 1MB see

I wish to understand what’s going on and also when I look at my branches, it is only develop which is there and that’s the expected result.

I really can’t help as I didn’t really work with it closely.

However when you work with git, remember that it is a decentralized version control system, that means when you clone, you get a copy of every change made to the repo.

Every change you make results in a new git object since git objects are immutable.

1 Like

@r0bby please can you create back the token ($CI_TOKEN) for the bot, it seem to be expired, and we need it for pushing to the various branches . We keep getting this

remote: HTTP Basic: Access denied
fatal: Authentication failed for 'https://librebot:@gitlab.com/librehealth/toolkit/cost-of-care/lh-toolkit-cost-of-care-app-data-scraper.git/'
ERROR: Job failed: exit code 1

@Darshpreet2000 FYI the pipeline looks great see - https://gitlab.com/librehealth/toolkit/cost-of-care/lh-toolkit-cost-of-care-app-data-scraper/-/jobs/675197476 . I think the minor auth issue should make it pass

1 Like

No problem, we can have it whenever you feel convenient.

Yes, we can generate process files with GUI. I can do that

I will check this by cloning in my system.

Thankyou @muarachmann

Because I created CDM & Data directory earlier which were large in size & I later removed them by commit but they were not removed from Git history of commits. So, that is why repo is same large as earlier.

Pushing a commit which removes files doesn’t make the git repo smaller.

We need to remove those files across the history of the git repo in order for its size to shrink.

Git keeps track of each and every line change made, it makes its history huge

One nitpick here: This is one of those rare circumstances where private messages are not only allowed but encouraged. That said, I will handle it.

1 Like

Okay @r0bby, from next time I will discuss this in private

@muarachmann @sunbiz @judywawira please check the content of this about screen

1 Like

@Darshpreet2000 see the chat.

@Darshpreet2000 see here, I am using CI_SSH_KEY the but pipeline is the same - https://gitlab.com/librehealth/toolkit/cost-of-care/lh-toolkit-cost-of-care-app-data-scraper/-/jobs/677019928 - Here is my branch - https://gitlab.com/librehealth/toolkit/cost-of-care/lh-toolkit-cost-of-care-app-data-scraper/-/tree/demo-data.

What could be wrong

Also see -

 col_indices.append(usecols_key.index(col))
ValueError: 'generic name' is not in list
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "Process CDM/Alaska/Providence Seward Medical Center/process.py", line 71, in <module>
    a.ProcessXLSX(n, filename, hospitalId, charge,description, category, destination, sheet)
  File "/builds/librehealth/toolkit/cost-of-care/lh-toolkit-cost-of-care-app-data-scraper/ParseData.py", line 55, in ProcessXLSX
    content = pandas.read_excel(filename, sheet, skiprows=n, usecols=[charge, description], encoding='unicode_escape')
  File "/usr/local/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 311, in read_excel
    return io.parse(
  File "/usr/local/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 868, in parse
    return self._reader.parse(
  File "/usr/local/lib/python3.8/site-packages/pandas/io/excel/_base.py", line 492, in parse
    parser = TextParser(
  File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 2201, in TextParser
    return TextFileReader(*args, **kwds)
  File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 880, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 1126, in _make_engine
    self._engine = klass(self.f, **self.options)
  File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 2286, in __init__
    ) = self._infer_columns()
  File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 2656, in _infer_columns
    columns = self._handle_usecols(columns, columns[0])
  File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 2714, in _handle_usecols
    _validate_usecols_names(self.usecols, usecols_key)
  File "/usr/local/lib/python3.8/site-packages/pandas/io/parsers.py", line 1232, in _validate_usecols_names
    raise ValueError(
ValueError: Usecols do not match columns, columns expected but not found: ['generic name', 'unit price']
/builds/librehealth/toolkit/cost-of-care/lh-toolkit-cost-of-care-app-data-scraper/CDM
Parsing /builds/librehealth/toolkit/cost-of-care/lh-toolkit-cost-of-care-app-data-scraper/Data/Alaska/Samuel Simmonds Memorial Hospital/SamuelSimmondsCDM.csv
(2583, 3)
No Data Found For /builds/librehealth/toolkit/cost-of-care/lh-toolkit-cost-of-care-app-data-scraper/Data/Alaska/Providence Valdez Medical Center
/builds/librehealth/toolkit/cost-of-care/lh-toolkit-cost-of-care-app-data-scraper/CDM/Alaska/PeaceHealth Medical Center Ketchikan Alaska.csv
Parsing 2020_peacehealth_medical_center_ketchikan_alaska.xlsx
(615, 4)
Parsing 2020_peacehealth_medical_center_ketchikan_alaska.xlsx
(235, 4)
/builds/librehealth/toolkit/cost-of-care/lh-toolkit-cost-of-care-app-data-scraper/CDM/Alaska/PeaceHealth Medical Center Cottage Grove Oregon.csv
Parsing 2020_peacehealth_medical_center_cottage_grove_oregon.xlsx
(329, 4)
Parsing 2020_peacehealth_medical_center_cottage_grove_oregon.xlsx
(255, 4)
Parsing 2020_peacehealth_medical_center_cottage_grove_oregon.xlsx
(77, 4)
/builds/librehealth/toolkit/cost-of-care/lh-toolkit-cost-of-care-app-data-scraper/CDM/Alaska/PeaceHealth Medical Center Eugene Oregon.csv
Parsing 2020_peacehealth_medical_center_eugene_oregon.xlsx
(475, 4)
Parsing 2020_peacehealth_medical_center_eugene_oregon.xlsx
(179, 4)
/builds/librehealth/toolkit/cost-of-care/lh-too

I will fix this, probably CDMs are updated & they have changed column names

Because pushing with an access token is not same as pushing with SSH key. I am also searching on how to push using SSH Key.

What I found is

We have to use GIT_SSH_COMMAND as a variable

I found it in Git Docs Git - git Documentation

GIT_SSH_COMMAND=CI_SSH_KEY git clone example

& then use simple git commands to clone or push

I will test this on my repo & I will create a MR for this

Great, I look forward to it. Also what about the history of the commits? Anything that can be done?

@muarachmann

Yes, I have reduced size of my forked repo by 250 MB using BFG repo cleaner or we can use git filter branch but BFG repo is faster than filter branch

After downloading BFG repo cleaner , I used this command which removed all files in History of commits which were greater than 1 MB

java -jar bfg.jar --strip-blobs-bigger-than 1M some-big-repo.git

I also have a better idea than this

We can create an orphan branch in repo which will start from scratch & we can push to it, so repo size will be 1 MB

So , I think we should create orphan branch instead of using BFG repo cleaner

I fixed CI to use a token. That token cannot be disclosed and would be very, very bad for us if ever leaked.

I hope you ticked the box which allows masking the variables in the CI?

1 Like