Discussion:
[JSch-users] Enormous traffic when uploading file to non-empty folder
Milano Nicolum
2016-08-23 10:24:48 UTC
Permalink
Hi, I'm currently experiencing an issue with my company's file transfering
osgi bundle using the jsch library. Normally it should only upload some log
files archive files once per few minutes. But once there were more files
present in the target folder, the data consumption on my device also grew
proportionally.

E.g. if there are only 40 files in the upload folder, everything seems to
be OK. So nobody noticed any problem. But once there are 7000 files or so,
the *incomming *data consumption gets really high. The log files can have
20-35 MB per day, but my download takes even about 2,5 GB data per day.
With no other networking application active.

Of course there may be an error in my code (which I'm trying to discover
right now), but I wonder if anyone experienced similar problem.

Thanks for any tips.
Milano Nicolum
2016-08-24 07:34:32 UTC
Permalink
After further investigation I tested two scenarios of uploading a file to
the SFTP server. In scenario one I only had a handful of files in the
upload directory, in the second scenario there were 6400 tiny files more.

*Scenario 1:* lstat takes about 1 sec to execute. Tcpdump shows 907020
bytes traffic for a 800 KB file...
2016-08-23 13:19:28,655 [TRACE] SFTP Operation - Get file attributes
(lstat): /home/milano/upload_test/Clean_code.pptx
2016-08-23 13:19:30,844 [TRACE] SFTP Operation - Getting file attributes
finished.
2016-08-23 13:19:30,877 [TRACE] SFTP Operation - Get file attributes
(lstat): /home/milano/upload_test/Clean_code.pptx
2016-08-23 13:19:32,044 [TRACE] SFTP Operation - Getting file attributes
finished.

*Scenario 2:* lstat takes about 110 sec to execute. Tcpdump shows 4130593
bytes traffic for a 800 KB file...
2016-08-23 14:05:10,222 [TRACE] SFTP Operation - Get file attributes
(lstat): /home/milano/upload_test/Clean_code.pptx
2016-08-23 14:06:58,868 [TRACE] SFTP Operation - Getting file attributes
finished.
2016-08-23 14:06:59,336 [TRACE] SFTP Operation - Get file attributes
(lstat): /home/milano/upload_test/Clean_code.pptx
2016-08-23 14:08:45,189 [TRACE] SFTP Operation - Getting file attributes
finished.

So the upload took about the same time in the end, but it was the lstat
command using a lot of data. Now I have to find out how to avoid such
behaviour...
Milano Nicolum
2016-08-24 09:57:02 UTC
Permalink
Another symptom of the issue is that listing the directory content via
sftp.ls(sftpAbsolutePath) call takes approx. 100 seconds, tcpdump says it
transfers about 1555 KB server -> client and 58KB client -> server.

If I connect to the server via SSH (using putty), run the ll command in
console and redirect the command output to a textfile, the resulting file
has 768 KB instantly. I understand that there is allways difference in the
computing time, part of it depends on the fact computing is done on
server/client side. (Even though the difference is huge). But where is the
additional data traffic coming from? Approximately twice as much data is
transfered. Any ideas?

P.S. I wouldn't expect any issues with encoding as both machines run some
unix derivate system. But who knows. Is there a possibility of such problem?
Lothar Kimmeringer
2016-08-24 10:19:54 UTC
Permalink
Another symptom of the issue is that listing the directory content via sftp.ls
sftp.ls(sftpAbsolutePath) call takes approx. 100 seconds, tcpdump says it
transfers about 1555 KB server -> client and 58KB client -> server.
If I connect to the server via SSH (using putty), run the ll command in console
and redirect the command output to a textfile, the resulting file has 768 KB
instantly.
SFTP is a different thing than SSH-shell, so it's not easy to compare these.
what was the exact command for listing the files in the shell?
Since you ran tcpdump, where exactly were the 100 seconds happening? While
waiting for data from the server or is it a slower transfer of bytes?
I understand that there is allways difference in the computing time, part of
it depends on the fact computing is done on server/client side. (Even though
the difference is huge). But where is the additional data traffic coming from?
A reason might be that you compare apples to pears. In case you entered
ls subdir
in the shell and you are listing the directory with
/home/user/subdir
in SFTP, the SFTP server will most likely return the filenames including
their paths while the shell only returns the names. This would easily
explain the increased amount of data. Also there is additional data per
file containing the attributes (see next paragraph). With short filenames
this easily becomes the majority of tranfered data.

In case the time got lost while waiting for the server the reason might
be that the file-listing in SFTP provides more information than the
resulting file listing in a shell. For every file the FileAttributes are
retrieved from the file system containing things like last access time,
"extended attributes", etc. Some of them are part of the directory-entry
itself that can be accessed quickly. Others are not that easy to retrieve
(or at least need an extra file-system-access per file) and might be the
reason why your listing takes exponentially longer for longer files.

This is not a SFTP-specific problem, you can run into the same problem
when using the file-command with search-criterias for last access time, etc.


Cheers, Lothar

------------------------------------------------------------------------------
Milano Nicolum
2016-08-25 05:52:52 UTC
Permalink
OK, now I feel embarassed and stupid for not noticing earlier that in my
code there is hidden the *ChannelSftp.ls* call. No idea why. It's been
there since beginning of the app. It is called once when new file
abstraction is requested and once for any other operation (e.g. requesting
an OutputStream from the abstraction).

*Let's do the math now:*
If you have about 6400 files in the upload directory, it means about 1,5 MB
of data to download if you want to list it. And it can take up to two
minutes (120 seconds) to download such data depending on your connection.

So if you want to *upload* one *4KB* file to the SFTP using streams, you
are going to *download 3MB* of useless data first, *wait* up to about *four
minutes* and then your file is downloaded. I'm not even speaking about the
fact the operation is going to fail if your connection is slow since you're
not going to simply download so much data on a bad connection.
Thanks to Lothar for pointing me to the right direction!

Loading...