Conventional resampling method
Jack-knife
Bootstrap
Suppose is directly calculate on the lattice, is what we are intrested in
usually is a nonlinear function
Usually,
Error estimation of
Original Datasets:
time slice:
th Jack-knife sample
Estimator
, : covaiance matrix of Jack-knife samples
Vectroize your code instead of using loop
#!/usr/bin/env python3
# encoding: utf-8
import numpy as np
from os import listdir
from iog_reader import iog_read
from re import match
import cProfile
from functools import reduce
iog_path = "./Data/"
T = 72
files = listdir(iog_path)
iogs = [file for file in files if match(r".*2pt.*.dat.iog",file)!=None]
Ncnfg = len(iogs)
c2pt = np.zeros((Ncnfg,T))
for indx in range(0,Ncnfg):
dat=iog_read(iog_path+iogs[indx],["cnfg","hdrn","t"])
c2pt[indx] = dat.loc[(dat["hdrn"]=="0")]["Re"].to_numpy()
c2pt_jcknf_1 = np.zeros((Ncnfg,T))
c2pt_jcknf_2 = np.zeros((Ncnfg,T))
c2pt_jcknf_3 = np.zeros((Ncnfg,T))
def c2pt_jcknf_loop_1():
for i in range(0,Ncnfg):
for j in range(0,T):
c2pt_jcknf_1[i,j] = np.mean(np.delete(c2pt,[i],axis=0)[:,j])
def c2pt_jcknf_loop_2():
for i in range(0,Ncnfg):
c2pt_jcknf_2[i] = np.mean(np.delete(c2pt,[i],axis=0),axis=0)
def c2pt_jcknf():
c2pt_sum = np.sum(c2pt,axis=0)
global c2pt_jcknf_3
c2pt_jcknf_3 = ((c2pt_sum) - c2pt)/(Ncnfg-1)
cProfile.run('c2pt_jcknf_loop_1()')
print(" ------ \n")
cProfile.run('c2pt_jcknf_loop_2()')
print(" ------ \n")
cProfile.run('c2pt_jcknf()')
print(" ------ \n")
print(reduce(lambda x,y: x&y,np.fabs((c2pt_jcknf_1-c2pt_jcknf_2)/c2pt_jcknf_2)<1e-8))
print(reduce(lambda x,y: x&y,np.fabs((c2pt_jcknf_2-c2pt_jcknf_3)/c2pt_jcknf_2)<1e-8))
output
115780 function calls in 0.082 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
4824 0.003 0.000 0.039 0.000 <__array_function__ internals>:177(delete)
4824 0.002 0.000 0.036 0.000 <__array_function__ internals>:177(mean)
......
------
1612 function calls in 0.001 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
67 0.000 0.000 0.001 0.000 <__array_function__ internals>:177(delete)
67 0.000 0.000 0.001 0.000 <__array_function__ internals>:177(mean)
......
------
13 function calls in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <__array_function__ internals>:177(sum)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
......
------
[ True True True True True True True True True True True True ......
Original Datasets:
time slice:
th Bootstrap sample
Repeat resampling times, , usually
denote the resampled data
Calculate
Centeral value
Error estimation
Jack-knife resample 🢩
Bootstrap resample 🢩
Parallel your code to do
: time slice indices
: cnfg indices
: time slices (w.r.t source)
: fit parameters 🢩
e.g. For two-point function (two-state fit)
- 🢩 mean value of the -th timeslices
- 🢩 standard deviation of the -th timeslices
python array index starts from 0numpy package has build-in mean, std, cov functions
numpy: numpy.mean, numpy.std, numpy.cov support high rank tensors
axis parameter in numpy functionR, R' are numpy arraysS=np.mean(R,axis=1) 🢪🢩 R*R' 🢩 R.shape, S.shape 🢩 measure the deviation between data and model
least fit 🢩 optimize for minimal BUT resonable
lsqfit 🢩 a python package perform least -square fit
fit = lsqfit.nonlinear_fit(data=(var_val, dat_val),
fcn=model, prior=ini_val, debug=False)
var_val: value of variables 🢩 -axisdat_val: value of datas (with erros) 🢩 -axis (with err/cov)model : fit model functionini_val : initial values of (also constrains on) fit parameters in modellsqfit - an example#!/usr/bin/env python3
# encoding: utf-8
import gvar as gv
import lsqfit
import numpy as np
import matplotlib.pyplot as plt
c2pt_raw = np.load("./data_2pt.npy")
c2pt = c2pt_raw[:, 0, :, -2]
Ncnfg = c2pt.shape[0]
T = c2pt.shape[1]
T_hlf = T//2
#? jack-knife resample
#! only used in plot , compare original data with fit
c2pt_sum = np.sum(c2pt,axis=0)
c2pt_jcknf = ((c2pt_sum) - c2pt)/(Ncnfg-1) # jack-knife resample
c2pt_cntrl = np.mean(c2pt_jcknf,axis=0) # jack-knife mean
c2pt_err = np.sqrt(Ncnfg-1)*np.std(c2pt_jcknf,axis=0) # jack-knife std
#? drop the first data point, normalized to c2[1] ~ 1
t_ary = np.array(range(1,T))
c2pt_jcknf = c2pt_jcknf/c2pt_cntrl[1]
c2pt_cntrl = np.mean(c2pt_jcknf,axis=0) # jack-knife mean
c2pt_err = np.sqrt(Ncnfg-1)*np.std(c2pt_jcknf,axis=0) # jack-knife std
#? average forward/backward propgation, for better symmetry
c2pt_jcknf_fwrd = c2pt_jcknf[:,1:T_hlf+1] #? forward propgation part
c2pt_jcknf_bwrd = c2pt_jcknf[:,T_hlf:][:,::-1] #? backward propgation part
c2pt_jcknf_avg = (c2pt_jcknf_fwrd+c2pt_jcknf_bwrd)/2 #? average forward/backward propgation
c2pt_avg_cntrl = np.mean(c2pt_jcknf_avg,axis=0) # jack-knife mean
c2pt_avg_cov = (Ncnfg-1)*np.cov(np.transpose(c2pt_jcknf_avg,axes=(1,0))) # jack-knife covariance
T_strt = 10 #? starting point of the fit
T_end = 30 #? ending point of the fit
tsep_dctnry = {'c2pt': t_ary[T_strt:T_end]}
c2pt_dctnry = {'c2pt': gv.gvar(c2pt_avg_cntrl[T_strt:T_end], c2pt_avg_cov[T_strt:T_end, T_strt:T_end])}
def ft_mdls(t_dctnry, p):
mdls = {}
ts = t_dctnry['c2pt']
mdls['c2pt'] = p['c0']*np.exp(-p['E0']*T_hlf/2)*np.cosh(p['E0']*(ts-T_hlf))
return mdls
ini_prr = gv.gvar({'c0': '0(5)','E0': '0.2(0.5)'}) #? initial value
fit = lsqfit.nonlinear_fit(data=(tsep_dctnry, c2pt_dctnry), fcn=ft_mdls, prior=ini_prr, debug=True) #? excute the fit
print(fit.format(True)) #? print out fit results
#? time slices used in fit
t_ary = fit.data[0]['c2pt']
#? data to be fitted
c2pt_avg_dat_gvar = fit.data[1]['c2pt']
c2pt_avg_dat_cntrl = np.array([c2.mean for c2 in c2pt_avg_dat_gvar])
c2pt_avg_dat_err = np.array([c2.sdev for c2 in c2pt_avg_dat_gvar])
#? fitted function values
t_lst = np.linspace(10, T-10, 50)
c2pt_fit_fcn_gvar = fit.fcn({'c2pt':t_lst}, fit.p)['c2pt']
c2pt_fit_fcn_cntrl = np.array([c2.mean for c2 in c2pt_fit_fcn_gvar])
c2pt_fit_fcn_err = np.array([c2.sdev for c2 in c2pt_fit_fcn_gvar])
#? plots
fig, ax = plt.subplots(1,1, figsize=(10, 7*0.5))
ax.errorbar(np.array(range(0,T)),c2pt_cntrl,yerr=c2pt_err,fmt='bo',alpha=0.4,label="$C_2$") #? plot original data
ax.errorbar(t_ary,c2pt_avg_dat_cntrl,yerr=c2pt_avg_dat_err,fmt='go',alpha=0.5,label="frwrd/bckwrd avg. $C_2$") #? plot forward/backward averaged data
ax.plot(t_lst,c2pt_fit_fcn_cntrl,color="b",label="best fit") #? plot fitted function
ax.fill_between(t_lst,c2pt_fit_fcn_cntrl-c2pt_fit_fcn_err,c2pt_fit_fcn_cntrl+c2pt_fit_fcn_err,alpha=0.3) #? plot fitted function errorband
plt.yscale("log")
plt.xlabel('t/a')
plt.ylabel('$C_2$')
ax.legend(loc='upper center', fontsize=10, frameon=True, fancybox=True, framealpha=0.8, borderpad=0.3, \
ncol=1, markerfirst=True, markerscale=1, numpoints=1, handlelength=1.5)
fig.subplots_adjust(left=0.15, right=0.9, top=0.9, bottom=0.15)
fig.savefig("c2pt_fit.png")
var_val, dat_val, model, ini_val: dictionary
dictionary {index_1: value_1, index_2: value_2, ... }
key
value
Why wrap data in dictionaries?
Combined fit: e.g. 2 data sets, 2 models (with shared parameters)
| label | label1 | label2 |
|---|---|---|
| data | ||
| model | ||
| ini_val | , |
Wraping data for combined fit
var_val = {'label1': [x1_1,x1_2,...], 'label2': [x2_1,x2_2,...
dat_val = {'label1': [y1_1,y1_2,...], 'label2': [y2_1,y2_2,...
prior = {'p_1': p_1,'p_2': p_2, ..., 'q_1': q_1, 'q_2': q_2, ... ,'r_1': r_1, 'r_2': r_2}
y_1,2: gvar class 🢩 y_1 = gvar(y_1_cntrl, y_1_cov) ...
p_1,2: gvar class 🢩 p_1 = gvar(p_1_cntrl, p_1_err) ...
Feed the var_val, dat_val and prior to lsqfit.nonlinear_fit
Wrapping model for combined fit
def fit_mdls(t_dctnry, p):
mdls = {}
mdls['label_1'] = ...
mdls['label_2'] = ...
return mdls
Put multiple source in time direction to increase the statistics
Average over sources within each config (iog file)
Example of open boundary condition, two sources
Least Square Fit:
chi2/dof [dof] = 0.96 [20] Q = 0.5 logGBF = 116.43
Parameters:
c0 0.252 (10) [ 0.0 (5.0) ]
E0 0.1090 (27) [ 0.20 (50) ]
Error has been magnified with factor of 5 in figure
Fited effective mass:
Least Square Fit:
chi2/dof [dof] = 0.96 [20] Q = 0.5 logGBF = 116.43
Parameters:
c0 0.252 (10) [ 0.0 (5.0) ]
E0 0.1090 (27) [ 0.20 (50) ]
Fit:
key y[key] f(p)[key]
----------------------------------------
c2pt 0 0.277 (12) 0.2712 (79)
1 0.248 (11) 0.2435 (74)
2 0.223 (11) 0.2186 (69)
3 0.199 (10) 0.1964 (66)
4 0.1790 (94) 0.1765 (62)
5 0.1610 (87) 0.1586 (59)
6 0.1446 (79) 0.1427 (56)
7 0.1304 (75) 0.1284 (54)
8 0.1180 (71) 0.1157 (51)
9 0.1073 (68) 0.1044 (49)
10 0.0981 (68) 0.0943 (47)
11 0.0900 (69) 0.0853 (44)
12 0.0825 (68) 0.0773 (42)
13 0.0758 (66) 0.0703 (41)
14 0.0701 (64) 0.0641 (39)
15 0.0649 (62) 0.0586 (37) *
16 0.0603 (61) 0.0539 (36) *
17 0.0562 (58) 0.0498 (35) *
18 0.0523 (55) 0.0462 (33) *
19 0.0490 (52) 0.0433 (32) *
Settings:
svdcut/n = 1e-12/0 tol = (1e-08,1e-10,1e-10*) (itns/time = 9/0.1)
fitter = scipy_least_squares method = trf
Print this detailed log by setting print(fit.format(True))
fit = lsqfit.nonlinear_fit(data=(tsep_dctnry, c2pt_dctnry),
fcn=ft_mdls, prior=ini_prr, debug=True)
c2pt_fit_gvar = fit.fcn({'c2pt':t_lst}, fit.p)['c2pt']
tsep_dctnry = {'c2pt':np.array(range(T_strt,T_end))}
fit.p 🢩 fitting parameter, dictionry
fit.p['E0']Case 1
Least Square Fit:
chi2/dof [dof] = 13 [10]
Q = 5.2e-24
logGBF = -74.281
Underfit: Model failed to discribe data
Case 2
Least Square Fit:
chi2/dof [dof] = 0.07 [10]
Q = 1
logGBF = -46.654
Case 3
Least Square Fit:
chi2/dof [dof] = 1 [10]
Q = 0.41
logGBF = -15.05
lsqfit logLeast Square Fit:
chi2/dof [dof] = 1 [10] Q = 0.41 logGBF = -15.05
chi2/dof: better around
Q: The probability that the from the fit could have been larger, by chance, assuming the best-fit model is correct.
logGBF : 🢩 Probability of obtaining the data by randomly sampling the fitted model
logGBF is the one preferred by the data.c2pt_cntrl, c2pt_cov, t_ary v.s. a, b, thome directory/dssg/home/acct-phyww/phyww/qazhang/training_camp/class1_xiong/Exercise_1 to the created directorylsqfit to fit the pion's 2pt data and determine the gournd state energyiog_reader, pandas.DataFrame.loc,pandas.DataFrame.to_numpy, numpy.mean, numpy.cov, numpy.std, lsqfit.nonlinear_fit
Data containing the 50 two-point function iog data files, from 50 configuration
72 lattice spacings in time direction
iog_reader is provided in iog_read_demo.py
x=gv.gvar('2(0.5)') : x.mean ㉱ 2.00, x.sdev 🢩 0.50
Construct gvar from tuple x=gv.gvar((2.0.5))
Python indices starts from 0, Mathmatica, Julia starts from 1
numpy.ndarray slice, range doe not include the last index e.g.
a[2:5] 🢩 [a2,a3,a4]range(2,5) 🢩 [2,3,4]iog_read.py returns pandas.Dataframe, columns in this data frame are strings (not integers)
command --help and man command
man for "mannual"man --help, man man/
pwd (present working directory)/home/chimaoshu/Nutstore_Filespwd./demo/c2pt_fit.png, ../LPC_trnsvrsty~ 🢩 /home/USR... , also in combination ../..cd pathls pathtouch file_name 🢩 create under PWDtouch path/file_name 🢩 create under pathmkdir dir_name 🢩 create under PWDmkdir path/dir_name 🢩 create under pathrm file_name, rm path/file_namerm -r dir_name rm -r path/dir_name rm -rf cp and move(rename) mvcat: show contents of a file
cat ./demo/lsqfit_2pt_iog.pygrep: catch specified contents of a file
cat ./demo/lsqfit_2pt_iog.py | grep gvar'|' : pipe
>: redirect output stream
./lsqfit_2pt.py > lsq.log 2>&1
2>&1append error message
tar
tar --helpssh usr@host
ssh phyzjun-user3@sylogin.hpc.sjtu.edu.cnusr: username, host: remote machinekey instead of password
ssh-keygen: create RSA key (public and private)id_rsa.pub to othersid_rsa privatessh-copyid usr@host: install your pubkey to host~/.ssh/configHost sysjtu
HostName sylogin.hpc.sjtu.edu.cn
User phyzjun-user3
IdentityFile /home/chimaoshu/.ssh/id_rsa
AddKeysToAgent yes
ssh sysjtupowershell has built-in ssh.exe, ssh-keygen.exe, scp.exe
ssh-copy-id equivlent: type path_to_id_rsa.pub | ssh sysjtu "cat >> ~/.ssh/authorized_keys"scp source target, (scp = s(ssh) + cp) e.g.
scp ./demo/lsqfit_2pt.py sysjtu:~/demoscp -r sysjtu:~/demo /home/chimaoshu
sysjtucould be replaced byusr@hostif.ssh/configis not set
vi, vim
Q: How do you generate a random string?
A: Put a Windows user in front of vi, and tell them to exit
Download to local 🢩 edit locally 🢩 upload to remote 🢩
compile/excute 🢩 download to local, ...
Recommanded: vscode + remote development extension
https://code.visualstudio.com/docs/remote/ssh-tutorial
better set.ssh/config
Extensions
Extensions
module avail: list avaliable modulesmodule load mod: load modmodule list: list currenly loaded modulesmodule purge: unload all modulesh5glance: installed via pip: pip3 install h5glanceh5glance h5_file [path]path: unix like path to data (string)h5glance --attrs h5_file [path]h5glance --attrs h5_file pathh5glance -s slice --attrs h5_file path
slice: numpy like slice, e.g: 2,10:15HDF5: Hierarchical Data Format Version 5
High-performance data management and storage suite
Utilize the HDF5 high performance data software library and file format to manage, process, and store your heterogeneous data. HDF5 is built for fast I/O processing and storage.
Open source library
Supported by C/C++, Julia, Mathmetaica, Python ...
In[1]:= c = 1 + 0.2*I;
In[2]:= Export["cmplx.h5", "/cmplx" -> c];
In[3]:= Import["cmplx.h5", {"Datasets", "/cmplx"}]
Out[4] = <|"Re" -> 1., "Im" -> 0.2|>
gzip
h5py by setting filterh5servPython and HDF5, Andrew Collette, ISBN: 9781449367831 (2013)
Based on iog.so, iog_reader.py, requiring h5py python package
iog2h5_2pt.py: convert 2pt iog files to h5
iog2h5_3pt.py: convert 3pt iog files to h5
provided on demand
The last comment block of each slide will be treated as slide notes. It will be visible and editable in Presenter Mode along with the slide. [Read more in the docs](https://sli.dev/guide/syntax.html#notes)